Re: [PATCH 0/4]
Stefan Beller writes: > A couple days before the 2.19 release we had a bug report about > broken submodules[1] and reverted[2] the commits leading up to them. > > The behavior of said bug fixed itself by taking a different approach[3], > specifically by a weaker enforcement of having `core.worktree` set in a > submodule [4]. > > The revert [2] was overly broad as we neared the release, such that we wanted > to rather keep the known buggy behavior of always having `core.worktree` set, > rather than figuring out how to fix the new bug of having 'git submodule > update' > not working in old style repository setups. > > This series re-introduces those reverted patches, with no changes in code, > but with drastically changed commit messages, as those focus on why it is safe > to re-introduce them instead of explaining the desire for the change. The above was a bit too cryptic for me to grok, so let me try rephrasing to see if I got them all correctly. - three-patch series leading to 984cd77ddb were meant to fix some bug, but the series itself was buggy and caused problems; we got rid of them - the problem 984cd77ddb wanted to fix was fixed differently without reintroducing the problem three-patch series introduced. That fix is already with us since 4d6d6ef1fc. - now these three changes that were problematic in the past is resent without any update (other than that it has one preparatory patch to add tests). Is that what is going on? Obviously I am not getting "the other" benefit we wanted to gain out of these three patches (because the above description fails to explain what that is), other than to fix the issue that was fixed by 4d6d6ef1fc. Sorry for being puzzled... > [1] https://public-inbox.org/git/2659750.rG6xLiZASK@twilight > [2] f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", > 2018-09-07) > [3] 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17) > [4] 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by > ensure-core-worktree, 2018-08-13) > > Stefan Beller (4): > submodule update: add regression test with old style setups > submodule: unset core.worktree if no working tree is present > submodule--helper: fix BUG message in ensure_core_worktree > submodule deinit: unset core.worktree > > builtin/submodule--helper.c| 4 +++- > submodule.c| 14 ++ > submodule.h| 2 ++ > t/lib-submodule-update.sh | 5 +++-- > t/t7400-submodule-basic.sh | 5 + > t/t7412-submodule-absorbgitdirs.sh | 7 ++- > 6 files changed, 33 insertions(+), 4 deletions(-)
Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering
Hi Junio, On Wed, 31 Oct 2018, Junio C Hamano wrote: > "Johannes Schindelin via GitGitGadget" > writes: > > > An alternative approach which was rejected at the time (because it > > interfered with the then-ongoing work to compile Git for Windows using > > MS Visual C++) would patch the make_environment_block() function to > > skip the specified environment variables before handing down the > > environment block to the spawned process. Currently it would interfere > > with the mingw-utf-8-env patch series I sent earlier today > > [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com]. > > So the rejected approach that was not used in this series would > interfere with the other one, but I do not have to worry about it > because this series does not use that approach? I had to read the six > lines above twice to figure out that it essentially is saying "I > shouldn't care", but please let me know if I misread the paragraph and I > need to care ;-) Well, you might want to worry about it. Or not. The approach taken by this patch series is to call `unsetenv()` for the variable names listed in `core.unsetenvvars` (if any), just before spawning off the first process (if any). What I meant by this comment in the cover letter is that I thought about doing it differently. We already have a perfectly fine function called `make_environment_block()` that takes a "deltaenv", and then constructs a new environment block from the current environment plus deltaenv. And this would be an obvious alternative place to "unset" the variables, as this function is only called just before spawning new processes. I was weighing both options, and both back then as right now, there are other patches in flight that conflict with the second approach, so the first approach is what I went with. >From a purely aesthetic point of view, the second approach looks nicer, as it really only affects the spawned processes (and not the current one), and it allows for changing the list between spawning processes. But to do it right, I would also have to use a hash set, and it would complicate the code of `make_environment_block()` even further. And it sounds a bit like overkill to me, too. So I sided with the approach presented in the current revision of the patch series, but I wanted to describe the other approach in case you (or other reviewers) have a different opinion. > > While at it, this patch series also cleans up house and moves the > > Windows-specific core.* variable handling to compat/mingw.c rather > > than cluttering environment.c and config.c with things that e.g. > > developers on Linux do not want to care about. > > Or Macs. Or macOS. Or FreeBSD. Or Irix. Or any other, that's right ;-) Traditionally, we did not really care all that much about platforms other than Linux, though, which is what made me write what I wrote. Having said that, I get the impression that this is changing slowly. The benefits are pretty clear, too. After all, it was a *Windows* build failure recently that let Peff identify and fix a *non-Windows* bug... > When I skimmed this series earlier, I found that patches 2 & 3 sensibly > implemented to achieve this goal. Thanks! Dscho > > > > > Johannes Schindelin (4): > > config: rename `dummy` parameter to `cb` in git_default_config() > > Allow for platform-specific core.* config settings > > Move Windows-specific config settings into compat/mingw.c > > mingw: unset PERL5LIB by default > > > > Documentation/config.txt | 6 > > cache.h | 8 - > > compat/mingw.c | 58 +++- > > compat/mingw.h | 3 ++ > > config.c | 18 --- > > environment.c| 1 - > > git-compat-util.h| 8 + > > t/t0029-core-unsetenvvars.sh | 30 +++ > > 8 files changed, 109 insertions(+), 23 deletions(-) > > create mode 100755 t/t0029-core-unsetenvvars.sh > > > > > > base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2 > > Published-As: > > https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1 > > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git > > pr-62/dscho/perl5lib-v1 > > Pull-Request: https://github.com/gitgitgadget/git/pull/62 >
Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering
"Johannes Schindelin via GitGitGadget" writes: > An alternative approach which was rejected at the time (because it > interfered with the then-ongoing work to compile Git for Windows using MS > Visual C++) would patch the make_environment_block() function to skip the > specified environment variables before handing down the environment block to > the spawned process. Currently it would interfere with the mingw-utf-8-env > patch series I sent earlier today > [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com]. So the rejected approach that was not used in this series would interfere with the other one, but I do not have to worry about it because this series does not use that approach? I had to read the six lines above twice to figure out that it essentially is saying "I shouldn't care", but please let me know if I misread the paragraph and I need to care ;-) > While at it, this patch series also cleans up house and moves the > Windows-specific core.* variable handling to compat/mingw.c rather than > cluttering environment.c and config.c with things that e.g. developers on > Linux do not want to care about. Or Macs. When I skimmed this series earlier, I found that patches 2 & 3 sensibly implemented to achieve this goal. > > Johannes Schindelin (4): > config: rename `dummy` parameter to `cb` in git_default_config() > Allow for platform-specific core.* config settings > Move Windows-specific config settings into compat/mingw.c > mingw: unset PERL5LIB by default > > Documentation/config.txt | 6 > cache.h | 8 - > compat/mingw.c | 58 +++- > compat/mingw.h | 3 ++ > config.c | 18 --- > environment.c| 1 - > git-compat-util.h| 8 + > t/t0029-core-unsetenvvars.sh | 30 +++ > 8 files changed, 109 insertions(+), 23 deletions(-) > create mode 100755 t/t0029-core-unsetenvvars.sh > > > base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2 > Published-As: > https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1 > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git > pr-62/dscho/perl5lib-v1 > Pull-Request: https://github.com/gitgitgadget/git/pull/62
Re: [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking
Geert Jansen writes: > Maybe a better option would be to check for the non-existence of the [00-ff] > directories under .git/objects. Please do not do this; I expect many people do this before they leave work, just like I do: $ git repack -a -d -f --window=$largs --depth=$small $ git prune which would typically leave only info/ and pack/ subdirectories under .git/objects/ directory.
Re: [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking
On Sun, Oct 28, 2018 at 10:50:19PM +, Ævar Arnfjörð Bjarmason wrote: > I left the door open for that in the new config option 4/4 implements, > but I suspect for Geert's purposes this is something he'd prefer to > turn off in git on clone entirely, i.e. because it may be running on > some random Amazon's customer's EFS instance, and they won't know > about this new core.checkCollisions option. > > But maybe I'm wrong about that and Geert is happy to just turn on > core.checkCollisions=false and use this series instead. I think that the best user experience would probably be if git were fast by default without having to give up on (future) security by removing the sha1 collision check. Maybe core.checkCollisons could default to "on" only when there's no loose objects in the repository? That would give a fast experience for many common cases (git clone, git init && git fetch) while still doing the collision check when relevant. My patch used the --cloning flag as an approximation of "no loose objects". Maybe a better option would be to check for the non-existence of the [00-ff] directories under .git/objects.
Re: [PATCH 0/4] Bloom filter experiment
Ævar Arnfjörð Bjarmason writes: >> This is all to say: having a maximum size is good. 512 is big enough >> to cover _most_ commits, but not so big that we may store _really_ big >> filters. > > Makes sense. 512 is good enough to hardcode initially, but I couldn't > tell from briefly skimming the patches if it was possible to make this > size dynamic per-repo when the graph/filter is written. > > I.e. we might later add some discovery step where we look at N number of > commits at random, until we're satisfied that we've come up with some > average/median number of total (recursive) tree entries & how many tend > to be changed per-commit. > > I.e. I can imagine repositories (with some automated changes) where we > have 10k files and tend to change 1k per commit, or ones with 10k files > where we tend to change just 1-10 per commit, which would mean a > larger/smaller filter would be needed / would do. I was more interested to find out what the advice our docs should give to the end users to tune the value once such a knob is invented, and what you gave in the above paragraphs may lead us to a nice auto-tuning heuristics.
Re: [PATCH 0/4] Bloom filter experiment
> | Implementation | Queries | Maybe | FP # | FP % | > ||-|---|--|---| > | Szeder | 66095 | 1142 | 256 | 0.38% | > | Jonathan | 66459 | 107 | 89 | 0.16% | > | Stolee | 53025 | 492 | 479 | 0.90% | > > (Note that we must have used different starting points, which is why my > "Queries" is so much smaller.) I suspect it's because your bloom filter implementation covers only the first parent (if I'm understanding get_bloom_filter() correctly). When I only covered the first parent in my initial test (see patch 2 of [1]), I got (following the columns in the table above): 53096 107 89 0.001676 Also, I think that the rejecting power (Queries - Maybe)/(Total tree comparisons if no bloom filters were used) needs to be in the evaluation criteria somewhere, as that indicates how many tree comparisons we managed to avoid. Also, we probably should also test on a file that changes more frequently :-) [1] https://public-inbox.org/git/cover.1539219248.git.jonathanta...@google.com/ > The increase in false-positive percentage is expected in my > implementation. I'm using the correct filter sizes to hit a <1% FP > ratio. This could be lowered by changing the settings, and the size > would dynamically grow. For my Git repo (which contains > git-for-windows/git and microsoft/git) this implementation grows the > commmit-graph file from 5.8 MB to 7.3 MB (1.5 MB total, compared to > Szeder's 8MB filter). For 105,260 commits, that rounds out to less than > 20 bytes per commit (compared to Jonathan's 256 bytes per commit). Mine has 256 bits per commit, which is 32 bytes per commit (still more than yours). Having said all that, thanks for writing up your version - in particular, variable sized filters (like in yours) seem to be the way to go. > We'll see how much time I have to do this polish, but I think the > benefit is proven. Agreed.
Re: [PATCH 0/4] Bloom filter experiment
On 10/16/2018 8:57 AM, Ævar Arnfjörð Bjarmason wrote: On Tue, Oct 16 2018, Derrick Stolee wrote: On 10/16/2018 12:45 AM, Junio C Hamano wrote: Derrick Stolee writes: 2. The filters are sized according to the number of changes in each commit, with a minimum of one 64-bit word. ... 6. When we compute the Bloom filters, we don't store a filter for commits whose first-parent diff has more than 512 paths. Just being curious but was 512 taken out of thin air or is there some math behind it, e.g. to limit false positive rate down to certain threshold? With a wide-enough bitset, you could store arbitrary large number of paths with low enough false positive, I guess, but is there a point where there is too many paths in the change that gives us diminishing returns and not worth having a filter in the first place? 512 is somewhat arbitrary, but having a maximum size is not. In a normal source-code-control context, the set of paths modified by any single commit ought to be a small subset of the entire paths, and whole-tree changes ought to be fairly rare. In a project for which that assumption does not hold, it might help to have a negative bloom filter (i.e. "git log -- A" asks "does the commit modify A?" and the filter would say "we know it does not, because we threw all the paths that are not touched to the bloom filter"), but I think that would optimize for a wrong case. A commit with many changed paths is very rare. The 512 I picked above is enough to cover 99% of commits in each of the repos I sampled when first investigating Bloom filters. When a Bloom filter response says "maybe yes" (in our case, "maybe not TREESAME"), then we need to verify that it is correct. In the extreme case that every path is changed, then the Bloom filter does nothing but add extra work. These extreme cases are also not unprecedented: in our Azure Repos codebase, we were using core.autocrlf to smudge CRLFs to LFs, but when it was time to dogfood VFS for Git, we needed to turn off the smudge filter. So, there is one commit that converts every LF to a CRLF in every text file. Storing a Bloom filter for those ~250,000 entries would take ~256KB for essentially no value. By not storing a filter for this commit, we go immediately to the regular TREESAME check, which would happen for most pathspecs. This is all to say: having a maximum size is good. 512 is big enough to cover _most_ commits, but not so big that we may store _really_ big filters. Makes sense. 512 is good enough to hardcode initially, but I couldn't tell from briefly skimming the patches if it was possible to make this size dynamic per-repo when the graph/filter is written. My proof-of-concept has it as a constant, but part of my plan is to make these all config options, as of this item in my message: >>> 2. We need config values for writing and consuming bloom filters, but also to override the default settings. I.e. we might later add some discovery step where we look at N number of commits at random, until we're satisfied that we've come up with some average/median number of total (recursive) tree entries & how many tend to be changed per-commit. I.e. I can imagine repositories (with some automated changes) where we have 10k files and tend to change 1k per commit, or ones with 10k files where we tend to change just 1-10 per commit, which would mean a larger/smaller filter would be needed / would do. I'm not sure a dynamic approach would be worth the effort, but I'm open to hearing the results of an experiment. Thanks, -Stolee
Re: [PATCH 0/4] Bloom filter experiment
On Tue, Oct 16 2018, Derrick Stolee wrote: > On 10/16/2018 12:45 AM, Junio C Hamano wrote: >> Derrick Stolee writes: >> >>> 2. The filters are sized according to the number of changes in each >>> commit, with a minimum of one 64-bit word. >>> ... >>> 6. When we compute the Bloom filters, we don't store a filter for >>> commits whose first-parent diff has more than 512 paths. >> Just being curious but was 512 taken out of thin air or is there >> some math behind it, e.g. to limit false positive rate down to >> certain threshold? With a wide-enough bitset, you could store >> arbitrary large number of paths with low enough false positive, I >> guess, but is there a point where there is too many paths in the >> change that gives us diminishing returns and not worth having a >> filter in the first place? > 512 is somewhat arbitrary, but having a maximum size is not. >> In a normal source-code-control context, the set of paths modified >> by any single commit ought to be a small subset of the entire paths, >> and whole-tree changes ought to be fairly rare. In a project for >> which that assumption does not hold, it might help to have a >> negative bloom filter (i.e. "git log -- A" asks "does the commit >> modify A?" and the filter would say "we know it does not, because we >> threw all the paths that are not touched to the bloom filter"), but >> I think that would optimize for a wrong case. > > A commit with many changed paths is very rare. The 512 I picked above > is enough to cover 99% of commits in each of the repos I sampled when > first investigating Bloom filters. > > When a Bloom filter response says "maybe yes" (in our case, "maybe not > TREESAME"), then we need to verify that it is correct. In the extreme > case that every path is changed, then the Bloom filter does nothing > but add extra work. > > These extreme cases are also not unprecedented: in our Azure Repos > codebase, we were using core.autocrlf to smudge CRLFs to LFs, but > when it was time to dogfood VFS for Git, we needed to turn off the > smudge filter. So, there is one commit that converts every LF to a > CRLF in every text file. Storing a Bloom filter for those ~250,000 > entries would take ~256KB for essentially no value. By not storing a > filter for this commit, we go immediately to the regular TREESAME > check, which would happen for most pathspecs. > > This is all to say: having a maximum size is good. 512 is big enough > to cover _most_ commits, but not so big that we may store _really_ big > filters. Makes sense. 512 is good enough to hardcode initially, but I couldn't tell from briefly skimming the patches if it was possible to make this size dynamic per-repo when the graph/filter is written. I.e. we might later add some discovery step where we look at N number of commits at random, until we're satisfied that we've come up with some average/median number of total (recursive) tree entries & how many tend to be changed per-commit. I.e. I can imagine repositories (with some automated changes) where we have 10k files and tend to change 1k per commit, or ones with 10k files where we tend to change just 1-10 per commit, which would mean a larger/smaller filter would be needed / would do.
Re: [PATCH 0/4] Bloom filter experiment
On 10/16/2018 12:45 AM, Junio C Hamano wrote: Derrick Stolee writes: 2. The filters are sized according to the number of changes in each commit, with a minimum of one 64-bit word. ... 6. When we compute the Bloom filters, we don't store a filter for commits whose first-parent diff has more than 512 paths. Just being curious but was 512 taken out of thin air or is there some math behind it, e.g. to limit false positive rate down to certain threshold? With a wide-enough bitset, you could store arbitrary large number of paths with low enough false positive, I guess, but is there a point where there is too many paths in the change that gives us diminishing returns and not worth having a filter in the first place? 512 is somewhat arbitrary, but having a maximum size is not. In a normal source-code-control context, the set of paths modified by any single commit ought to be a small subset of the entire paths, and whole-tree changes ought to be fairly rare. In a project for which that assumption does not hold, it might help to have a negative bloom filter (i.e. "git log -- A" asks "does the commit modify A?" and the filter would say "we know it does not, because we threw all the paths that are not touched to the bloom filter"), but I think that would optimize for a wrong case. A commit with many changed paths is very rare. The 512 I picked above is enough to cover 99% of commits in each of the repos I sampled when first investigating Bloom filters. When a Bloom filter response says "maybe yes" (in our case, "maybe not TREESAME"), then we need to verify that it is correct. In the extreme case that every path is changed, then the Bloom filter does nothing but add extra work. These extreme cases are also not unprecedented: in our Azure Repos codebase, we were using core.autocrlf to smudge CRLFs to LFs, but when it was time to dogfood VFS for Git, we needed to turn off the smudge filter. So, there is one commit that converts every LF to a CRLF in every text file. Storing a Bloom filter for those ~250,000 entries would take ~256KB for essentially no value. By not storing a filter for this commit, we go immediately to the regular TREESAME check, which would happen for most pathspecs. This is all to say: having a maximum size is good. 512 is big enough to cover _most_ commits, but not so big that we may store _really_ big filters. Thanks, -Stolee
Re: [PATCH 0/4] Bloom filter experiment
Derrick Stolee writes: > 2. The filters are sized according to the number of changes in each > commit, with a minimum of one 64-bit word. > ... > 6. When we compute the Bloom filters, we don't store a filter for > commits whose first-parent diff has more than 512 paths. Just being curious but was 512 taken out of thin air or is there some math behind it, e.g. to limit false positive rate down to certain threshold? With a wide-enough bitset, you could store arbitrary large number of paths with low enough false positive, I guess, but is there a point where there is too many paths in the change that gives us diminishing returns and not worth having a filter in the first place? In a normal source-code-control context, the set of paths modified by any single commit ought to be a small subset of the entire paths, and whole-tree changes ought to be fairly rare. In a project for which that assumption does not hold, it might help to have a negative bloom filter (i.e. "git log -- A" asks "does the commit modify A?" and the filter would say "we know it does not, because we threw all the paths that are not touched to the bloom filter"), but I think that would optimize for a wrong case.
Re: [PATCH 0/4] Bloom filter experiment
On 10/9/2018 3:34 PM, SZEDER Gábor wrote: To keep the ball rolling, here is my proof of concept in a somewhat cleaned-up form, with still plenty of rough edges. Peff, Szeder, and Jonathan, Thanks for giving me the kick in the pants to finally write a proof of concept for my personal take on how this should work. My implementation borrows things from both Szeder and Jonathan's series. You can find my commits for all of the versions on GitHub (it's a bit too messy to share as a patch series right now, I think): Repo: https://github.com/derrickstolee/git Branches: bloom/* (includes bloom/stolee, bloom/peff, bloom/szeder, and bloom/tan for the respective implementations, and bloom/base as the common ancestor) My implementation uses the following scheme: 1. Bloom filters are computed and stored on a commit-by-commit basis. 2. The filters are sized according to the number of changes in each commit, with a minimum of one 64-bit word. 3. The filters are stored in the commit-graph using two new optional chunks: one stores a single 32-bit integer for each commit that provides the end of its Bloom filter in the second "data" chunk. The data chunk also stores the magic constants (hash version, num hash keys, and num bits per entry). 4. We fill the Bloom filters as (const char *data, int len) pairs as "struct bloom_filter"s in a commit slab. 5. In order to evaluate containment, we need the struct bloom_filter, but also struct bloom_settings (stores the magic constants in one place), and struct bloom_key (stores the _k_ hash values). This allows us to hash a path once and test the same path against many Bloom filters. 6. When we compute the Bloom filters, we don't store a filter for commits whose first-parent diff has more than 512 paths. 7. When we compute the commit-graph, we can re-use the pre-existing filters without needing to recompute diffs. (Caveat: the current implementation will re-compute diffs for the commits with diffs that were too large.) You can build the Bloom filters in my implementation this way: GIT_TEST_BLOOM_FILTERS=1 ./git commit-graph write --reachable You can play around with it like this: $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write Computing commit graph generation numbers: 100% (52801/52801), done. Computing bloom filter: 100% (52801/52801), done. # Yeah, I even added progress indicator! :) $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count --full-history HEAD -- t/valgrind/valgrind.sh 886 20:40:24.783699 revision.c:486 bloom filter total queries: 66095 definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873 Jonathan used this same test, so will I. Here is a summary table: | Implementation | Queries | Maybe | FP # | FP % | ||-|---|--|---| | Szeder | 66095 | 1142 | 256 | 0.38% | | Jonathan | 66459 | 107 | 89 | 0.16% | | Stolee | 53025 | 492 | 479 | 0.90% | (Note that we must have used different starting points, which is why my "Queries" is so much smaller.) The increase in false-positive percentage is expected in my implementation. I'm using the correct filter sizes to hit a <1% FP ratio. This could be lowered by changing the settings, and the size would dynamically grow. For my Git repo (which contains git-for-windows/git and microsoft/git) this implementation grows the commmit-graph file from 5.8 MB to 7.3 MB (1.5 MB total, compared to Szeder's 8MB filter). For 105,260 commits, that rounds out to less than 20 bytes per commit (compared to Jonathan's 256 bytes per commit). Related stats for my Linux repo: 781,756 commits, commit-graph grows from 43.8 to 55.6 MB (~12 MB additional, ~16 bytes per commit). I haven't done a side-by-side performance test for these implementations, but it would be interesting to do so. Despite writing a lot of code in a short amount of time, there is a lot of work to be done before this is submittable: 1. There are three different environment variables right now. It would be better to have one GIT_TEST_ variable and rely on existing tracing for logs (trace2 values would be good here). 2. We need config values for writing and consuming bloom filters, but also to override the default settings. 3. My bloom.c/bloom.h is too coupled to the commit-graph. I want to harden that interface to let Bloom filters live as their own thing, but that the commit-graph could load a bloom filter from the file instead of from the slab. 4. Tests, tests, and more tests. We'll see how much time I have to do this polish, but I think the benefit is proven. Thanks, -Stolee
Re: [PATCH 0/4] Bloom filter experiment
On 10/9/2018 3:34 PM, SZEDER Gábor wrote: To keep the ball rolling, here is my proof of concept in a somewhat cleaned-up form, with still plenty of rough edges. You can play around with it like this: $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write Computing commit graph generation numbers: 100% (52801/52801), done. Computing bloom filter: 100% (52801/52801), done. # Yeah, I even added progress indicator! :) $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count --full-history HEAD -- t/valgrind/valgrind.sh 886 20:40:24.783699 revision.c:486 bloom filter total queries: 66095 definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873 The value of $GIT_USE_POC_BLOOM_FILTER only really matters when writing the Bloom filter, and it specifies the number of bits in the filter's bitmap, IOW the above command creates a 8MB Bloom filter. To make use of the filter the variable can be anything non-empty. Writing the Bloom filter is very slow as it is (yeah, that's why bothered with the progress indicator ;). I wrote about it in patch 2's commit message: the cause for about half of the slowness is rather obvious, but I don't (yet) know what's responsible for the other half. Not a single test... but I've run loops over all files in git.git comparing 'git rev-list HEAD -- $file's output with and without the Bloom filter, and, surprisingly, they match. My quick'n'dirty experiments usually don't fare this well... It's also available at: https://github.com/szeder/git bloom-filter-experiment Thanks! I will take a close look at this tomorrow and start playing with it. -Stolee
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
Jeff King writes: >> Either is fine. I am not moving 'next' beyond what is necessary for >> this release cycle during the pre-release freeze period, and because >> I thought that Peff was in favor of squashing in the changes to the >> original when the next cycle starts, I haven't bothered to merge the >> fix there yet. > > I think Ævar's point is just that the current tip of next is badly > broken if you use bitmaps, and it's worth hot-fixing that in the > meantime. I know that ;-) My point is that anybody who *needs* handholding from the upstream project (i.e. not Ævar, but those whom he is trying to help with the suggestion) should not be looking at 'next' during the prerelease freeze. They should be looking at 'master' instead, and that is why I am not spending more than minimum cycles of my own worrying about 'next' during the prerelease. In any case, I think the final is ready to go, almost. I still need to do the preformatted docs, though.
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
On Mon, Sep 10, 2018 at 09:53:56AM -0700, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > > > I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when > > building my own package of git, but I think this really should be fixed > > in that branch, either by merging the fix down or reverting the original > > series out of next, I think just merging the fix down makes sense, but > > have no strong opinion on it. > > Either is fine. I am not moving 'next' beyond what is necessary for > this release cycle during the pre-release freeze period, and because > I thought that Peff was in favor of squashing in the changes to the > original when the next cycle starts, I haven't bothered to merge the > fix there yet. I think Ævar's point is just that the current tip of next is badly broken if you use bitmaps, and it's worth hot-fixing that in the meantime. After the release, I'm OK with either one of squashing that first patch in, or just merging as-is. -Peff
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
Ævar Arnfjörð Bjarmason writes: > I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when > building my own package of git, but I think this really should be fixed > in that branch, either by merging the fix down or reverting the original > series out of next, I think just merging the fix down makes sense, but > have no strong opinion on it. Either is fine. I am not moving 'next' beyond what is necessary for this release cycle during the pre-release freeze period, and because I thought that Peff was in favor of squashing in the changes to the original when the next cycle starts, I haven't bothered to merge the fix there yet.
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
On Sat, Sep 01 2018, Jeff King wrote: > On Fri, Aug 31, 2018 at 06:55:58PM -0400, Jeff King wrote: > >> On Fri, Aug 31, 2018 at 05:23:17PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >> > On Tue, Aug 21 2018, Jeff King wrote: >> > >> > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git, >> > > + const unsigned char *sha1) >> > > +{ >> > > +int pos; >> > > + >> > > +if (!bitmap_git) >> > > +return 0; /* no bitmap loaded */ >> > > +if (!bitmap_git->result) >> > > +BUG("failed to perform bitmap walk before querying"); >> > >> > Some part of what calls this completely breaks pushing from the "next" >> > branch when you have local bitmaps (we *really* should have some tests >> > for this...). >> >> Yikes, thanks for reporting. I agree we need better tests here. > > OK, here is the fix. Since the problem is in 'next', this is done as a > patch on top of jk/pack-delta-reuse-with-bitmap. But since we're set to > rewind 'next' post-release anyway, we could squash it directly into > 30cdc33fba from the original series. That would help later bisections > from running into it, which may be worth it as it's a pretty severe > breakage. Or maybe not: Junio: Just a reminder that next is still broken with this, and I see e.g. the Debian "experimental" has the bug but not the fix at this point. I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when building my own package of git, but I think this really should be fixed in that branch, either by merging the fix down or reverting the original series out of next, I think just merging the fix down makes sense, but have no strong opinion on it.
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
On Tue, Sep 04, 2018 at 12:30:14PM -0700, Stefan Beller wrote: > > [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check > > > > The actual fix. This should get merged to next ASAP (or the original > > topic just reverted). > > > > [2/4]: t5310: test delta reuse with bitmaps > > > > I did this separately to give us flexibility to squash or merge > > quickly. But it does find Ævar's bug on a git without patch 1. > > > > [3/4]: traverse_bitmap_commit_list(): don't free result > > > > The original assert should have simply been useless, but it was the > > surprising behavior of this function that turned it into a bug. > > > > [4/4]: pack-bitmap: drop "loaded" flag > > > > And this is just an annoyance I ran into, which is a fallout from > > our conversion to using an allocated bitmap_index struct. > > FWIW, the whole series is > Reviewed-by: Stefan Beller > I am sorry for suggesting the BUG() in the first place. Heh. You only asked a question about the interface. I was the one who was _supposed_ to be familiar with the code, and put in the actual assertion. So if your suggestion was dumb, my response was doubly so. :) -Peff
Re: [PATCH 0/4] un-breaking pack-objects with bitmaps
> [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check > > The actual fix. This should get merged to next ASAP (or the original > topic just reverted). > > [2/4]: t5310: test delta reuse with bitmaps > > I did this separately to give us flexibility to squash or merge > quickly. But it does find Ævar's bug on a git without patch 1. > > [3/4]: traverse_bitmap_commit_list(): don't free result > > The original assert should have simply been useless, but it was the > surprising behavior of this function that turned it into a bug. > > [4/4]: pack-bitmap: drop "loaded" flag > > And this is just an annoyance I ran into, which is a fallout from > our conversion to using an allocated bitmap_index struct. FWIW, the whole series is Reviewed-by: Stefan Beller I am sorry for suggesting the BUG() in the first place. Stefan
Re: [PATCH 0/4] tests: make more use of 'test_must_be_empty'
Note that you can also trivially convert a lot more instances of empty checking to the "right" way: 'test_line_count = 0' $ grep -Ir 'test_line_count = 0' t | wc -l 76 I think it would be nice to clean these up as well.
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Hi Peff, On Fri, 10 Aug 2018, Jeff King wrote: > On Fri, Aug 10, 2018 at 06:43:07PM +0200, Johannes Schindelin wrote: > > > So unless you are willing to ignore, to willfully keep this breakage, > > I would suggest not to introduce the ugliness of an overridden > > upload-pack for the sole purpose of disabling the tracing on one side, > > but instead to get this here bug fixed, by helping me with this here > > patch series. > > I'm OK if you want to live with the broken test in the interim. I realize that I failed to tell you that I basically spent 2.5 days worth of worktime to figure this out and come up with three iterations of the patch series (you only saw the latest). I want this patch series in git.git, maybe not in the current form, but in one form or another. I don't want to spend any more minute on trying to figure out the same problem with any other regression test (which might not even be as easily worked around as with a semi-simple --upload-pack option). Thank you for wanting to help. Please accept my apologies for expressing in a poor way that I appreciate your eagerness to provide a patch, but that I think nevertheless that it would be better to work on the GIT_TRACE concurrency problem and its resolution via file locks. It would solve that class of problems, instead of that single regression test being flakey. Sorry, Dscho
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On Fri, Aug 10, 2018 at 11:34:59AM -0700, Junio C Hamano wrote: > Johannes Sixt writes: > > > As this buglet looks like a recurring theme, and a proper fix is > > preferable over repeated work-arounds. To me it looks like we need > > some sort of locking on Windows. Unless your friends at Microsoft have > > an ace in their sleeves that lets us have atomic O_APPEND the POSIX > > way... > > Just to put the severity of the issue in context, we use O_APPEND in > a few codepaths, and the trace thing for debugging is the only thing > that could have multiple writers. Other users of O_APPEND are: > > * refs/files-backend.c uses it so that a reflog entry can be >appended at the end, but because update to each ref is protected >from racing at a lot higher layer with a lock, no two people >would try to append to the same reflog file, so atomicity of >O_APPEND does not matter here. Just an interesting side note, but I think one of the reasons we use dot-locks and not flock() is that the former generally works on network filesystems. I think O_APPEND also is not reliable for non-local files, though, so short of actually dot-locking trace files (yuck) I don't think there's a great solution there. > It may make sense to allow GIT_TRACE to have a placeholder > (e.g. "/tmp/traceout.$$") to help debuggers arrange to give > different processes their own trace output file, which perhaps may > be a simpler and less impactful to the performance solution than > having to make locks at an application layer. Heh, I actually wrote that patch several years ago, as part of an abandoned effort to have config-triggered tracing (e.g., on the server side where you're getting hundreds of requests and you want to enable tracing on a repo for a slice of time). One annoying thing for a case like the test in question is that you don't actually know the pid of the process you're interested in. So we'd generate two trace files, and then you'd have to figure out which one is which. In my earlier work, I coupled it with some magic to allow per-command config, like: [trace "fetch"] packet = /tmp/trace.%p but you could also imagine a "%n" which uses the command name. I don't remember why I abandoned it, but I think a lot had to do with violating the "don't look at config" rules for our repo startup code. A lot of that has been fixed since then, but I haven't really needed it. If anybody is really interested, the patches are at: https://github.com/peff/git jk/trace-stdin (my ultimate goal was to record stdin for pack-objects to replay slow fetches, but I ended up hacking together a patch to pack-objects instead). -Peff
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Jeff King writes: > I have tried to help with the actual problem, by identifying the root > cause (that the trace code's strategy is not fundamentally broken, but > that it relies on O_APPEND whose guarantees are obviously not portable > to Windows) and exploring possible solutions there (FILE_APPEND_DATA). > > Lacking any kind of Windows development environment, I'd just as soon > stop there and let more knowledgeable people take over. But please don't > characterize my presence in this thread as some kind of willful harm or > ignoring of the problem. Thanks. It seems like the other Johannes and the other Jeff are also interested, so we may hopefully see a properly fixed O_APPEND, which would be the real solution that would help appenders that are *not* tracing subsystem, too.
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Johannes Sixt writes: > As this buglet looks like a recurring theme, and a proper fix is > preferable over repeated work-arounds. To me it looks like we need > some sort of locking on Windows. Unless your friends at Microsoft have > an ace in their sleeves that lets us have atomic O_APPEND the POSIX > way... Just to put the severity of the issue in context, we use O_APPEND in a few codepaths, and the trace thing for debugging is the only thing that could have multiple writers. Other users of O_APPEND are: * refs/files-backend.c uses it so that a reflog entry can be appended at the end, but because update to each ref is protected from racing at a lot higher layer with a lock, no two people would try to append to the same reflog file, so atomicity of O_APPEND does not matter here. * sequencer.c wants to use it when moving one insn from the todo list to the 'done' list when it finishes one operation. If you are running two sequences in a single repository, intermixed 'done' list would be the least of your problem, so presumably we are fine here. It may make sense to allow GIT_TRACE to have a placeholder (e.g. "/tmp/traceout.$$") to help debuggers arrange to give different processes their own trace output file, which perhaps may be a simpler and less impactful to the performance solution than having to make locks at an application layer.
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On Fri, Aug 10, 2018 at 06:43:07PM +0200, Johannes Schindelin wrote: > > > Correct. But I am more worried about the "mixed/overwriting" > > > breakage, if there is one; it means we may need to be prepared for > > > systems that lack O_APPEND that works correctly. I initially just > > > assumed that it was what Dscho was seeing, but after re-reading his > > > message, I am not sure anymore. > > > > > > I think the "do not trace the other side" approach you suggest for > > > these tests that only care about one side is more appropriate > > > solution for this particular case. We then do not have to worry > > > about overwriting or output from both sides mixed randomly. > > Just so everybody knows: I never received the above two mails. Something > is seriously rotten in GMX. This seems to have started a week or two ago. I think it may just have been that they went to the gitgitgadget address; I explicitly added you as a cc when I sent the patch with a commit message, which was when I first noticed you weren't cc'd on the earlier bits. > What you cannot see easily, unless you go the route that I offered > Jonathan (open a PR on gitgitgadget/git which will automatically run the > test suite on Windows, macOS and Linux, and of course you can do anything > in a PR, including narrowing down what tests are run) is that sometimes > those lines in the `trace` file were clearly *incomplete*. That is, even > if both processes tried to write atomically, one managed to overwrite the > other's buffer in flight. > > This is a pretty serious thing, even worse than the failing test suite, > and I don't think that your patch acknowledges how serious this is. I don't see anything in what you wrote above or in your original cover letter or series that contradicts the notion that O_APPEND is simply not atomic on Windows. Overwriting the same byte ranges in the file is what I would expect on Linux, as well, if the file were opened without O_APPEND. I have mixed reports from searching online whether an equivalent to O_APPEND exists. Some sources seem to say that FILE_APPEND_DATA is. I couldn't find anything definitive in Microsoft documentation about whether it actually provides atomicity. From my (admittedly limited) digging, that does not seem to be used by the msys2 runtime, so it might be worth exploring. > And please don't give me that "but it works on Linux" response. Seriously. > Sheesh. But it does. And it's designed to. There is literally nothing to fix on Linux. > Even if you have not managed to trigger it, the POSIX standard seems not > to define clearly what the behavior is of two competing write() calls, > unless you lock the files appropriately, as my patch series does. POSIX says: If the O_APPEND flag of the file status flags is set, the file offset shall be set to the end of the file prior to each write and no intervening file modification operation shall occur between changing the file offset and the write operation. and: This volume of POSIX.1-2017 does not specify the behavior of concurrent writes to a regular file from multiple threads, except that each write is atomic. I admit the writing is a little turgid, but I think between the two it is promising the behavior that we expect (and I think that interpretation is pretty common in the Unix world). > So unless you are willing to ignore, to willfully keep this breakage, I > would suggest not to introduce the ugliness of an overridden upload-pack > for the sole purpose of disabling the tracing on one side, but instead to > get this here bug fixed, by helping me with this here patch series. I'm OK if you want to live with the broken test in the interim. But after your complaining about the imminence of it "infecting" master, I thought you might be happy to see a solution that was more immediate. I have tried to help with the actual problem, by identifying the root cause (that the trace code's strategy is not fundamentally broken, but that it relies on O_APPEND whose guarantees are obviously not portable to Windows) and exploring possible solutions there (FILE_APPEND_DATA). Lacking any kind of Windows development environment, I'd just as soon stop there and let more knowledgeable people take over. But please don't characterize my presence in this thread as some kind of willful harm or ignoring of the problem. -Peff
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Am 10.08.2018 um 18:51 schrieb Jeff Hostetler: On 8/10/2018 12:15 PM, Johannes Sixt wrote: Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget: I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. Thanks for digging down to the root cause. As has been said, the behavior of O_APPEND is well-defined under POSIX, but last time I looked for equivalent feature on Windows, I did not find any. Last time was when I worked around the same failure in t5503-tagfollow.sh in my private builds: https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 It is basically the same as Peff suggests: log only one side of the fetch. As this buglet looks like a recurring theme, and a proper fix is preferable over repeated work-arounds. To me it looks like we need some sort of locking on Windows. Unless your friends at Microsoft have an ace in their sleeves that lets us have atomic O_APPEND the POSIX way... The official Windows CRT for open() does document an O_APPEND, but after looking at the distributed source, I'm not sure I trust it. I looked at the CRT code, too, and no, we can't trust it. I have not looked at the MSYS version of the CRT yet so I cannot comment on it. I did find that the following code does what we want. Notice the use of FILE_APPEND_DATA in place of the usual GENERIC_READ. (And yes, the docs are very vague here.) As far as I understand, FILE_APPEND_DATA is just a permission flag (I'm not sure why that would be necessary), but at least the documentation states that it is still necessary for the caller to set the file pointer. But I haven't tried your code below, yet. Should it append the line of text on each invocation? And if so, is it an atomic operation? Before we try a locking solution, perhaps we should tweak mingw_open() to use something like this to get a proper HANDLE directly and then use _open_osfhandle() to get a "fd" for it. Jeff - #include int main() { HANDLE h = CreateFileW(L"foo.txt", FILE_APPEND_DATA, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); const char * buf = "this is a test\n"; DWORD len = strlen(buf); DWORD nbw; WriteFile(h, buf, len, , NULL); CloseHandle(h); return 0; }
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On 8/10/2018 12:51 PM, Jeff Hostetler wrote: On 8/10/2018 12:15 PM, Johannes Sixt wrote: Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget: I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. Thanks for digging down to the root cause. As has been said, the behavior of O_APPEND is well-defined under POSIX, but last time I looked for equivalent feature on Windows, I did not find any. Last time was when I worked around the same failure in t5503-tagfollow.sh in my private builds: https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 It is basically the same as Peff suggests: log only one side of the fetch. As this buglet looks like a recurring theme, and a proper fix is preferable over repeated work-arounds. To me it looks like we need some sort of locking on Windows. Unless your friends at Microsoft have an ace in their sleeves that lets us have atomic O_APPEND the POSIX way... The official Windows CRT for open() does document an O_APPEND, but after looking at the distributed source, I'm not sure I trust it. I have not looked at the MSYS version of the CRT yet so I cannot comment on it. I did find that the following code does what we want. Notice the use of FILE_APPEND_DATA in place of the usual GENERIC_READ. (And yes, the docs are very vague here.) d'oh s/GENERIC_READ/GENERIC_WRITE/ Before we try a locking solution, perhaps we should tweak mingw_open() to use something like this to get a proper HANDLE directly and then use _open_osfhandle() to get a "fd" for it. Jeff - #include int main() { HANDLE h = CreateFileW(L"foo.txt", FILE_APPEND_DATA, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); const char * buf = "this is a test\n"; DWORD len = strlen(buf); DWORD nbw; WriteFile(h, buf, len, , NULL); CloseHandle(h); return 0; }
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On 8/10/2018 12:15 PM, Johannes Sixt wrote: Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget: I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. Thanks for digging down to the root cause. As has been said, the behavior of O_APPEND is well-defined under POSIX, but last time I looked for equivalent feature on Windows, I did not find any. Last time was when I worked around the same failure in t5503-tagfollow.sh in my private builds: https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 It is basically the same as Peff suggests: log only one side of the fetch. As this buglet looks like a recurring theme, and a proper fix is preferable over repeated work-arounds. To me it looks like we need some sort of locking on Windows. Unless your friends at Microsoft have an ace in their sleeves that lets us have atomic O_APPEND the POSIX way... The official Windows CRT for open() does document an O_APPEND, but after looking at the distributed source, I'm not sure I trust it. I have not looked at the MSYS version of the CRT yet so I cannot comment on it. I did find that the following code does what we want. Notice the use of FILE_APPEND_DATA in place of the usual GENERIC_READ. (And yes, the docs are very vague here.) Before we try a locking solution, perhaps we should tweak mingw_open() to use something like this to get a proper HANDLE directly and then use _open_osfhandle() to get a "fd" for it. Jeff - #include int main() { HANDLE h = CreateFileW(L"foo.txt", FILE_APPEND_DATA, FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE, NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL); const char * buf = "this is a test\n"; DWORD len = strlen(buf); DWORD nbw; WriteFile(h, buf, len, , NULL); CloseHandle(h); return 0; }
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Hi, On Fri, 10 Aug 2018, Jeff King wrote: > On Thu, Aug 09, 2018 at 01:49:52PM -0700, Junio C Hamano wrote: > > > Jeff King writes: > > > > > Are you sure that it's not well-defined? We open the path with O_APPEND, > > > which means every write() will be atomically positioned at the end of > > > file. So we would never lose or overwrite data. > > > > > > We do our own buffering in a strbuf, writing the result out in a single > > > write() call (modulo the OS returning a short write, but that should not > > > generally happen when writing short strings to a file). So we should get > > > individual trace lines as atomic units. > > > > > > The order of lines from the two processes is undefined, of course. > > > > Correct. But I am more worried about the "mixed/overwriting" > > breakage, if there is one; it means we may need to be prepared for > > systems that lack O_APPEND that works correctly. I initially just > > assumed that it was what Dscho was seeing, but after re-reading his > > message, I am not sure anymore. > > > > I think the "do not trace the other side" approach you suggest for > > these tests that only care about one side is more appropriate > > solution for this particular case. We then do not have to worry > > about overwriting or output from both sides mixed randomly. Just so everybody knows: I never received the above two mails. Something is seriously rotten in GMX. This seems to have started a week or two ago. > Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets > us pursue any fix for interleaved trace output on Windows without the > pressure of an impending flaky test. > > My gut says that looking into making O_APPEND work there is going to be > the nicest solution, but my gut is not very well versed in Windows > subtleties. ;) Your patch seems to be a patch in the original sense of the word: it is not a solution. And let me spell it out very clearly (as I have mentioned to Jonathan a couple times, without any response): t5552 fails regularly, way more than rarely, see for yourself: https://git-for-windows.visualstudio.com/git/_build/index?definitionId=1&_a=history And the reason it fails was analyzed by me and described in the commit message (I am sorry that the cover letter was still stale and talked about flock(), which I had decided was not the right approach after fighting with Linux over it). What you cannot see easily, unless you go the route that I offered Jonathan (open a PR on gitgitgadget/git which will automatically run the test suite on Windows, macOS and Linux, and of course you can do anything in a PR, including narrowing down what tests are run) is that sometimes those lines in the `trace` file were clearly *incomplete*. That is, even if both processes tried to write atomically, one managed to overwrite the other's buffer in flight. This is a pretty serious thing, even worse than the failing test suite, and I don't think that your patch acknowledges how serious this is. And please don't give me that "but it works on Linux" response. Seriously. Sheesh. Even if you have not managed to trigger it, the POSIX standard seems not to define clearly what the behavior is of two competing write() calls, unless you lock the files appropriately, as my patch series does. So unless you are willing to ignore, to willfully keep this breakage, I would suggest not to introduce the ugliness of an overridden upload-pack for the sole purpose of disabling the tracing on one side, but instead to get this here bug fixed, by helping me with this here patch series. We don't have to rush this, you know? We have to fix it, though. Thank you, Dscho > -- >8 -- > Subject: [PATCH] t5552: suppress upload-pack trace output > > The t5552 test script uses GIT_TRACE_PACKET to monitor what > git-fetch sends and receives. However, because we're > accessing a local repository, the child upload-pack also > sends trace output to the same file. > > On Linux, this works out OK. We open the trace file with > O_APPEND, so all writes are atomically positioned at the end > of the file. No data can be overwritten or omitted. And > since we prepare our small writes in a strbuf and write them > with a single write(), we should see each line as an atomic > unit. The order of lines between the two processes is > undefined, but the test script greps only for "fetch>" or > "fetch<" lines. So under Linux, the test results are > deterministic. > > The test fails intermittently on Windows, however, > reportedly even overwriting bits of the output file (i.e., > O_APPEND does not seem to give us an atomic position+write). > > Since the test only cares about the trace output from fetch, > we can just disable the output from upload-pack. That > doesn't solve the greater question of O_APPEND/trace issues > under Windows, but it easily fixes the flakiness from this > test. > > Reported-by: Johannes Schindelin > Signed-off-by: Jeff King > --- > I'm assuming that this really isn't
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget: I reported a couple of times that t5552 is not passing reliably. It has now reached next, and will no doubt infect master soon. Turns out that it is not a Windows-specific issue, even if it occurs a lot more often on Windows than elsewhere. The culprit is that two processes try simultaneously to write to the same file specified via GIT_TRACE_PACKET, and it is not well defined how that should work, even on Linux. Thanks for digging down to the root cause. As has been said, the behavior of O_APPEND is well-defined under POSIX, but last time I looked for equivalent feature on Windows, I did not find any. Last time was when I worked around the same failure in t5503-tagfollow.sh in my private builds: https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 It is basically the same as Peff suggests: log only one side of the fetch. As this buglet looks like a recurring theme, and a proper fix is preferable over repeated work-arounds. To me it looks like we need some sort of locking on Windows. Unless your friends at Microsoft have an ace in their sleeves that lets us have atomic O_APPEND the POSIX way... -- Hannes
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Jeff King writes: > Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets > us pursue any fix for interleaved trace output on Windows without the > pressure of an impending flaky test. > > My gut says that looking into making O_APPEND work there is going to be > the nicest solution, but my gut is not very well versed in Windows > subtleties. ;) As we know these tests are only interested in "fetch" lines and not record of communication in the other direction, I think this is a nice clean-up that is worth doing. For only two grep's it may not look worth doing, but it would be also a good clean-up to give an got_acked helper that sits next to have_sent and have_not_sent helpers for readability. That is of course outside the scope of this change. Will queue. Thanks. > -- >8 -- > Subject: [PATCH] t5552: suppress upload-pack trace output > > The t5552 test script uses GIT_TRACE_PACKET to monitor what > git-fetch sends and receives. However, because we're > accessing a local repository, the child upload-pack also > sends trace output to the same file. > > On Linux, this works out OK. We open the trace file with > O_APPEND, so all writes are atomically positioned at the end > of the file. No data can be overwritten or omitted. And > since we prepare our small writes in a strbuf and write them > with a single write(), we should see each line as an atomic > unit. The order of lines between the two processes is > undefined, but the test script greps only for "fetch>" or > "fetch<" lines. So under Linux, the test results are > deterministic. > > The test fails intermittently on Windows, however, > reportedly even overwriting bits of the output file (i.e., > O_APPEND does not seem to give us an atomic position+write). > > Since the test only cares about the trace output from fetch, > we can just disable the output from upload-pack. That > doesn't solve the greater question of O_APPEND/trace issues > under Windows, but it easily fixes the flakiness from this > test. > > Reported-by: Johannes Schindelin > Signed-off-by: Jeff King > --- > I'm assuming that this really isn't triggerable on Linux. I tried and > couldn't manage to get it to fail, and the reasoning above explains why. > But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows. > > t/t5552-skipping-fetch-negotiator.sh | 23 ++- > 1 file changed, 18 insertions(+), 5 deletions(-) > > diff --git a/t/t5552-skipping-fetch-negotiator.sh > b/t/t5552-skipping-fetch-negotiator.sh > index 0a8e0e42ed..0ad50dd839 100755 > --- a/t/t5552-skipping-fetch-negotiator.sh > +++ b/t/t5552-skipping-fetch-negotiator.sh > @@ -28,6 +28,19 @@ have_not_sent () { > done > } > > +# trace_fetch [args] > +# > +# Trace the packet output of fetch, but make sure we disable the variable > +# in the child upload-pack, so we don't combine the results in the same file. > +trace_fetch () { > + client=$1; shift > + server=$1; shift > + GIT_TRACE_PACKET="$(pwd)/trace" \ > + git -C "$client" fetch \ > + --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \ > + "$server" "$@" > +} > + > test_expect_success 'commits with no parents are sent regardless of skip > distance' ' > git init server && > test_commit -C server to_fetch && > @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent > regardless of skip distanc > # "c1" has no parent, it is still sent as "have" even though it would > # normally be skipped. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c7 c5 c2 c1 && > have_not_sent c6 c4 c3 > ' > @@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the > larger one' ' > # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4" > # (from "c5side" skip 1). > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c5side c11 c9 c6 c1 && > have_not_sent c10 c8 c7 c5 c4 c3 c2 > ' > @@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out > commits' ' > # not need to send any ancestors of "c3", but we still need to send "c3" > # itself. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch && > + trace_fetch client origin to_fetch && > have_sent c5 c4^ c2side && > have_not_sent c4 c4^^ c4^^^ > ' > @@ -121,7 +134,7 @@ test_expect_success 'handle clock skew' ' > # and sent, because (due to clock skew) its only parent has already been > # popped off the priority queue. > test_config -C client fetch.negotiationalgorithm skipping && > -
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Jeff King writes: > Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets > us pursue any fix for interleaved trace output on Windows without the > pressure of an impending flaky test. > > My gut says that looking into making O_APPEND work there is going to be > the nicest solution, but my gut is not very well versed in Windows > subtleties. ;) As we know these tests are only interested in "fetch" lines and not record of communication in the other direction, I think this is a nice clean-up that is worth doing. For only two grep's it may not look worth doing, but it would be also a good clean-up to give an got_acked helper that sits next to have_sent and have_not_sent helpers for readability. That is of course outside the scope of this change. Will queue. Thansk. > -- >8 -- > Subject: [PATCH] t5552: suppress upload-pack trace output > > The t5552 test script uses GIT_TRACE_PACKET to monitor what > git-fetch sends and receives. However, because we're > accessing a local repository, the child upload-pack also > sends trace output to the same file. > > On Linux, this works out OK. We open the trace file with > O_APPEND, so all writes are atomically positioned at the end > of the file. No data can be overwritten or omitted. And > since we prepare our small writes in a strbuf and write them > with a single write(), we should see each line as an atomic > unit. The order of lines between the two processes is > undefined, but the test script greps only for "fetch>" or > "fetch<" lines. So under Linux, the test results are > deterministic. > > The test fails intermittently on Windows, however, > reportedly even overwriting bits of the output file (i.e., > O_APPEND does not seem to give us an atomic position+write). > > Since the test only cares about the trace output from fetch, > we can just disable the output from upload-pack. That > doesn't solve the greater question of O_APPEND/trace issues > under Windows, but it easily fixes the flakiness from this > test. > > Reported-by: Johannes Schindelin > Signed-off-by: Jeff King > --- > I'm assuming that this really isn't triggerable on Linux. I tried and > couldn't manage to get it to fail, and the reasoning above explains why. > But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows. > > t/t5552-skipping-fetch-negotiator.sh | 23 ++- > 1 file changed, 18 insertions(+), 5 deletions(-) > > diff --git a/t/t5552-skipping-fetch-negotiator.sh > b/t/t5552-skipping-fetch-negotiator.sh > index 0a8e0e42ed..0ad50dd839 100755 > --- a/t/t5552-skipping-fetch-negotiator.sh > +++ b/t/t5552-skipping-fetch-negotiator.sh > @@ -28,6 +28,19 @@ have_not_sent () { > done > } > > +# trace_fetch [args] > +# > +# Trace the packet output of fetch, but make sure we disable the variable > +# in the child upload-pack, so we don't combine the results in the same file. > +trace_fetch () { > + client=$1; shift > + server=$1; shift > + GIT_TRACE_PACKET="$(pwd)/trace" \ > + git -C "$client" fetch \ > + --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \ > + "$server" "$@" > +} > + > test_expect_success 'commits with no parents are sent regardless of skip > distance' ' > git init server && > test_commit -C server to_fetch && > @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent > regardless of skip distanc > # "c1" has no parent, it is still sent as "have" even though it would > # normally be skipped. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c7 c5 c2 c1 && > have_not_sent c6 c4 c3 > ' > @@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the > larger one' ' > # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4" > # (from "c5side" skip 1). > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c5side c11 c9 c6 c1 && > have_not_sent c10 c8 c7 c5 c4 c3 c2 > ' > @@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out > commits' ' > # not need to send any ancestors of "c3", but we still need to send "c3" > # itself. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch && > + trace_fetch client origin to_fetch && > have_sent c5 c4^ c2side && > have_not_sent c4 c4^^ c4^^^ > ' > @@ -121,7 +134,7 @@ test_expect_success 'handle clock skew' ' > # and sent, because (due to clock skew) its only parent has already been > # popped off the priority queue. > test_config -C client fetch.negotiationalgorithm skipping && > -
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On Thu, Aug 09, 2018 at 01:49:52PM -0700, Junio C Hamano wrote: > Jeff King writes: > > > Are you sure that it's not well-defined? We open the path with O_APPEND, > > which means every write() will be atomically positioned at the end of > > file. So we would never lose or overwrite data. > > > > We do our own buffering in a strbuf, writing the result out in a single > > write() call (modulo the OS returning a short write, but that should not > > generally happen when writing short strings to a file). So we should get > > individual trace lines as atomic units. > > > > The order of lines from the two processes is undefined, of course. > > Correct. But I am more worried about the "mixed/overwriting" > breakage, if there is one; it means we may need to be prepared for > systems that lack O_APPEND that works correctly. I initially just > assumed that it was what Dscho was seeing, but after re-reading his > message, I am not sure anymore. > > I think the "do not trace the other side" approach you suggest for > these tests that only care about one side is more appropriate > solution for this particular case. We then do not have to worry > about overwriting or output from both sides mixed randomly. Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets us pursue any fix for interleaved trace output on Windows without the pressure of an impending flaky test. My gut says that looking into making O_APPEND work there is going to be the nicest solution, but my gut is not very well versed in Windows subtleties. ;) -- >8 -- Subject: [PATCH] t5552: suppress upload-pack trace output The t5552 test script uses GIT_TRACE_PACKET to monitor what git-fetch sends and receives. However, because we're accessing a local repository, the child upload-pack also sends trace output to the same file. On Linux, this works out OK. We open the trace file with O_APPEND, so all writes are atomically positioned at the end of the file. No data can be overwritten or omitted. And since we prepare our small writes in a strbuf and write them with a single write(), we should see each line as an atomic unit. The order of lines between the two processes is undefined, but the test script greps only for "fetch>" or "fetch<" lines. So under Linux, the test results are deterministic. The test fails intermittently on Windows, however, reportedly even overwriting bits of the output file (i.e., O_APPEND does not seem to give us an atomic position+write). Since the test only cares about the trace output from fetch, we can just disable the output from upload-pack. That doesn't solve the greater question of O_APPEND/trace issues under Windows, but it easily fixes the flakiness from this test. Reported-by: Johannes Schindelin Signed-off-by: Jeff King --- I'm assuming that this really isn't triggerable on Linux. I tried and couldn't manage to get it to fail, and the reasoning above explains why. But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows. t/t5552-skipping-fetch-negotiator.sh | 23 ++- 1 file changed, 18 insertions(+), 5 deletions(-) diff --git a/t/t5552-skipping-fetch-negotiator.sh b/t/t5552-skipping-fetch-negotiator.sh index 0a8e0e42ed..0ad50dd839 100755 --- a/t/t5552-skipping-fetch-negotiator.sh +++ b/t/t5552-skipping-fetch-negotiator.sh @@ -28,6 +28,19 @@ have_not_sent () { done } +# trace_fetch [args] +# +# Trace the packet output of fetch, but make sure we disable the variable +# in the child upload-pack, so we don't combine the results in the same file. +trace_fetch () { + client=$1; shift + server=$1; shift + GIT_TRACE_PACKET="$(pwd)/trace" \ + git -C "$client" fetch \ + --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \ + "$server" "$@" +} + test_expect_success 'commits with no parents are sent regardless of skip distance' ' git init server && test_commit -C server to_fetch && @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent regardless of skip distanc # "c1" has no parent, it is still sent as "have" even though it would # normally be skipped. test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && + trace_fetch client "$(pwd)/server" && have_sent c7 c5 c2 c1 && have_not_sent c6 c4 c3 ' @@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the larger one' ' # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4" # (from "c5side" skip 1). test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && + trace_fetch client "$(pwd)/server" && have_sent c5side c11 c9 c6 c1 && have_not_sent c10 c8 c7 c5 c4 c3 c2 ' @@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On Thu, Aug 09, 2018 at 02:08:56PM -0700, Junio C Hamano wrote: > > Correct. But I am more worried about the "mixed/overwriting" > > breakage, if there is one; it means we may need to be prepared for > > systems that lack O_APPEND that works correctly. I initially just > > assumed that it was what Dscho was seeing, but after re-reading his > > message, I am not sure anymore. > > > > I think the "do not trace the other side" approach you suggest for > > these tests that only care about one side is more appropriate > > solution for this particular case. We then do not have to worry > > about overwriting or output from both sides mixed randomly. > > A concluding sentence I forgot to add, after saying "this is simpler > and better to fix test breakage", was > > But if we really are seeing O_APPEND breakage, a mandatory > locking mechanism like this one might be necessary to work > around it (I seriously hope we do not have to, though). > > Sorry for an additional noise. Yeah, my earlier email said "if we just care about this test, then...". But if we do care about making this multiple-tracing case work all the time, then we'd need some solution. If there's something that can work like O_APPEND, even if we have to make some system-specific call, I'd prefer that to locking. I did some searching for atomic append on Windows and got mixed reports on whether their FILE_DATA_APPEND does what we want or not. Knowing nothing about msys, I'd _assume_ that O_APPEND would get translated to that flag, but I got lost trying to find it in git-for-windows/msys2-runtime. Wouldn't it be wonderful if the solution were as simple as making sure that flag was plumbed through? I seriously doubt it will be that easy, though. :) -Peff
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Junio C Hamano writes: > Jeff King writes: > >> Are you sure that it's not well-defined? We open the path with O_APPEND, >> which means every write() will be atomically positioned at the end of >> file. So we would never lose or overwrite data. >> >> We do our own buffering in a strbuf, writing the result out in a single >> write() call (modulo the OS returning a short write, but that should not >> generally happen when writing short strings to a file). So we should get >> individual trace lines as atomic units. >> >> The order of lines from the two processes is undefined, of course. > > Correct. But I am more worried about the "mixed/overwriting" > breakage, if there is one; it means we may need to be prepared for > systems that lack O_APPEND that works correctly. I initially just > assumed that it was what Dscho was seeing, but after re-reading his > message, I am not sure anymore. > > I think the "do not trace the other side" approach you suggest for > these tests that only care about one side is more appropriate > solution for this particular case. We then do not have to worry > about overwriting or output from both sides mixed randomly. A concluding sentence I forgot to add, after saying "this is simpler and better to fix test breakage", was But if we really are seeing O_APPEND breakage, a mandatory locking mechanism like this one might be necessary to work around it (I seriously hope we do not have to, though). Sorry for an additional noise.
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
Jeff King writes: > Are you sure that it's not well-defined? We open the path with O_APPEND, > which means every write() will be atomically positioned at the end of > file. So we would never lose or overwrite data. > > We do our own buffering in a strbuf, writing the result out in a single > write() call (modulo the OS returning a short write, but that should not > generally happen when writing short strings to a file). So we should get > individual trace lines as atomic units. > > The order of lines from the two processes is undefined, of course. Correct. But I am more worried about the "mixed/overwriting" breakage, if there is one; it means we may need to be prepared for systems that lack O_APPEND that works correctly. I initially just assumed that it was what Dscho was seeing, but after re-reading his message, I am not sure anymore. I think the "do not trace the other side" approach you suggest for these tests that only care about one side is more appropriate solution for this particular case. We then do not have to worry about overwriting or output from both sides mixed randomly. > diff --git a/t/t5552-skipping-fetch-negotiator.sh > b/t/t5552-skipping-fetch-negotiator.sh > index 3b60bd44e0..5ad5bece55 100755 > --- a/t/t5552-skipping-fetch-negotiator.sh > +++ b/t/t5552-skipping-fetch-negotiator.sh > @@ -28,6 +28,19 @@ have_not_sent () { > done > } > > +# trace_fetch [args] > +# > +# Trace the packet output of fetch, but make sure we disable the variable > +# in the child upload-pack, so we don't combine the results in the same file. > +trace_fetch () { > + client=$1; shift > + server=$1; shift > + GIT_TRACE_PACKET="$(pwd)/trace" \ > + git -C "$client" fetch \ > + --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \ > + "$server" "$@" > +} > + > test_expect_success 'commits with no parents are sent regardless of skip > distance' ' > git init server && > test_commit -C server to_fetch && > @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent > regardless of skip distanc > # "c1" has no parent, it is still sent as "have" even though it would > # normally be skipped. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c7 c5 c2 c1 && > have_not_sent c6 c4 c3 > ' > @@ -88,7 +101,7 @@ test_expect_success 'when two skips collide, favor the > larger one' ' > # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4" > # (from "c5side" skip 1). > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c5side c11 c9 c6 c1 && > have_not_sent c10 c8 c7 c5 c4 c3 c2 > ' > @@ -114,7 +127,7 @@ test_expect_success 'use ref advertisement to filter out > commits' ' > # not need to send any ancestors of "c3", but we still need to send "c3" > # itself. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch && > + trace_fetch client origin to_fetch && > have_sent c5 c4^ c2side && > have_not_sent c4 c4^^ c4^^^ > ' > @@ -144,7 +157,7 @@ test_expect_success 'handle clock skew' ' > # and sent, because (due to clock skew) its only parent has already been > # popped off the priority queue. > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && > + trace_fetch client "$(pwd)/server" && > have_sent c2 c1 old4 old2 old1 && > have_not_sent old3 > ' > @@ -176,7 +189,7 @@ test_expect_success 'do not send "have" with ancestors of > commits that server AC > test_commit -C server commit-on-b1 && > > test_config -C client fetch.negotiationalgorithm skipping && > - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" > to_fetch && > + trace_fetch client "$(pwd)/server" to_fetch && > grep " fetch" trace && > > # fetch-pack sends 2 requests each containing 16 "have" lines before
Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE
On Thu, Aug 09, 2018 at 10:35:24AM -0700, Johannes Schindelin via GitGitGadget wrote: > The culprit is that two processes try simultaneously to write to the same > file specified via GIT_TRACE_PACKET, and it is not well defined how that > should work, even on Linux. Are you sure that it's not well-defined? We open the path with O_APPEND, which means every write() will be atomically positioned at the end of file. So we would never lose or overwrite data. We do our own buffering in a strbuf, writing the result out in a single write() call (modulo the OS returning a short write, but that should not generally happen when writing short strings to a file). So we should get individual trace lines as atomic units. The order of lines from the two processes is undefined, of course. > This patch series addresses that by locking the trace fd. I chose to use > flock() instead of fcntl() because the Win32 API LockFileEx() > [https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-lockfileex] > > (which does exactly what I want in this context) has much more similar > semantics to the former than the latter. I must admit I'm not excited to see us adding extra locking and complexity when it is not necessarily needed. Is this a feature we actually care about delivering for normal users of GIT_TRACE, or do we just care about solving the test flakiness here? Because we should be able to do the latter with the patch below. I also wonder if we should be clearing GIT_TRACE as part of clearing repo-env. It would do what we want automatically in this case, though there are definitely cases where I prefer the current behavior (because I really do want to see the upload-pack side of it). Of course I could always use "--upload-pack 'GIT_TRACE_PACKET=... upload-pack'", so this is really just a default. diff --git a/t/t5552-skipping-fetch-negotiator.sh b/t/t5552-skipping-fetch-negotiator.sh index 3b60bd44e0..5ad5bece55 100755 --- a/t/t5552-skipping-fetch-negotiator.sh +++ b/t/t5552-skipping-fetch-negotiator.sh @@ -28,6 +28,19 @@ have_not_sent () { done } +# trace_fetch [args] +# +# Trace the packet output of fetch, but make sure we disable the variable +# in the child upload-pack, so we don't combine the results in the same file. +trace_fetch () { + client=$1; shift + server=$1; shift + GIT_TRACE_PACKET="$(pwd)/trace" \ + git -C "$client" fetch \ + --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \ + "$server" "$@" +} + test_expect_success 'commits with no parents are sent regardless of skip distance' ' git init server && test_commit -C server to_fetch && @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent regardless of skip distanc # "c1" has no parent, it is still sent as "have" even though it would # normally be skipped. test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && + trace_fetch client "$(pwd)/server" && have_sent c7 c5 c2 c1 && have_not_sent c6 c4 c3 ' @@ -88,7 +101,7 @@ test_expect_success 'when two skips collide, favor the larger one' ' # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4" # (from "c5side" skip 1). test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && + trace_fetch client "$(pwd)/server" && have_sent c5side c11 c9 c6 c1 && have_not_sent c10 c8 c7 c5 c4 c3 c2 ' @@ -114,7 +127,7 @@ test_expect_success 'use ref advertisement to filter out commits' ' # not need to send any ancestors of "c3", but we still need to send "c3" # itself. test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch && + trace_fetch client origin to_fetch && have_sent c5 c4^ c2side && have_not_sent c4 c4^^ c4^^^ ' @@ -144,7 +157,7 @@ test_expect_success 'handle clock skew' ' # and sent, because (due to clock skew) its only parent has already been # popped off the priority queue. test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" && + trace_fetch client "$(pwd)/server" && have_sent c2 c1 old4 old2 old1 && have_not_sent old3 ' @@ -176,7 +189,7 @@ test_expect_success 'do not send "have" with ancestors of commits that server AC test_commit -C server commit-on-b1 && test_config -C client fetch.negotiationalgorithm skipping && - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" to_fetch && + trace_fetch client "$(pwd)/server" to_fetch && grep " fetch" trace && # fetch-pack sends 2 requests each containing 16 "have"
Re: [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges
On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget wrote: > Now, I am fairly certain that the changes are correct, but given my track > record with off-by-one bugs (and once even an off-by-two bug), I would > really appreciate some thorough review of this code, in particular the > second one that is the actual bug fix. I am specifically interested in > reviews from people who know line-log.c pretty well and can tell me whether > the src[i].end > target[j].end is correct, or whether it should actually > have been a >= (I tried to wrap my head around this, but I would feel more > comfortable if a domain expert would analyze this, whistling, and looking > Eric's way). Although I fixed a number of bugs in basic range-set manipulation code (5 years ago), I have no experience with or knowledge of the code actually dealing with range-set manipulation in relation to diffs, so I'm not a domain expert by any means, and I'm _still_ trying to wrap my head around that code. That said, I left some comments on the individual patches suggesting a simpler and more correct fix for the crash. (Despite that suggestion for fixing the crash, I still can't say whether the actual computations performed by the existing code are correct, since I still don't understand that aspect of it.)
Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()
Johannes Schindelin writes: > Hi Olga, > > On Fri, 13 Jul 2018, Оля Тележная wrote: > >> 2018-07-09 11:27 GMT+03:00 Оля Тележная : >> > Hello everyone, >> > This is my new attempt to start using oid_object_info_extended() in >> > ref-filter. You could look at previous one [1] [2] but it is not >> > necessary. >> > >> > The goal (still) is to improve performance by avoiding calling expensive >> > functions when we don't need the information they provide >> > or when we could get it by using a cheaper function. >> > >> > This patch is a middle step. In the end, I want to add new atoms >> > ("objectsize:disk" and "deltabase") and reuse ref-filter logic in >> > cat-file command. >> > >> > I also know about problems with memory leaks in ref-filter: that would >> > be my next task that I will work on. Since I did not generate any new >> > leaks in this patch (just use existing ones), I decided to put this >> > part on a review and fix leaks as a separate task. >> >> UPDATES since v1: >> add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin) >> improve second commit message (thanks to Junio C Hamano) >> add static keyword (thanks to Ramsay Jones) >> >> > >> > Thank you! >> > >> > [1] https://github.com/git/git/pull/493 > > Could you please populate the description of that PR so that SubmitGit > picks it up as cover letter? Thanks for suggesting that. Yes, an updated version of a series, even if it is a small one with just 4 or 5 patches, becomes much easier to read with a well-written cover letter.
Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()
Hi Olga, On Fri, 13 Jul 2018, Оля Тележная wrote: > 2018-07-09 11:27 GMT+03:00 Оля Тележная : > > Hello everyone, > > This is my new attempt to start using oid_object_info_extended() in > > ref-filter. You could look at previous one [1] [2] but it is not > > necessary. > > > > The goal (still) is to improve performance by avoiding calling expensive > > functions when we don't need the information they provide > > or when we could get it by using a cheaper function. > > > > This patch is a middle step. In the end, I want to add new atoms > > ("objectsize:disk" and "deltabase") and reuse ref-filter logic in > > cat-file command. > > > > I also know about problems with memory leaks in ref-filter: that would > > be my next task that I will work on. Since I did not generate any new > > leaks in this patch (just use existing ones), I decided to put this > > part on a review and fix leaks as a separate task. > > UPDATES since v1: > add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin) > improve second commit message (thanks to Junio C Hamano) > add static keyword (thanks to Ramsay Jones) > > > > > Thank you! > > > > [1] https://github.com/git/git/pull/493 Could you please populate the description of that PR so that SubmitGit picks it up as cover letter? Thanks, Johannes > > [2] > > https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/ >
Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()
2018-07-09 11:27 GMT+03:00 Оля Тележная : > Hello everyone, > This is my new attempt to start using oid_object_info_extended() in > ref-filter. You could look at previous one [1] [2] but it is not > necessary. > > The goal (still) is to improve performance by avoiding calling expensive > functions when we don't need the information they provide > or when we could get it by using a cheaper function. > > This patch is a middle step. In the end, I want to add new atoms > ("objectsize:disk" and "deltabase") and reuse ref-filter logic in > cat-file command. > > I also know about problems with memory leaks in ref-filter: that would > be my next task that I will work on. Since I did not generate any new > leaks in this patch (just use existing ones), I decided to put this > part on a review and fix leaks as a separate task. UPDATES since v1: add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin) improve second commit message (thanks to Junio C Hamano) add static keyword (thanks to Ramsay Jones) > > Thank you! > > [1] https://github.com/git/git/pull/493 > [2] > https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/
Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()
Hi Olga, On Mon, 9 Jul 2018, Оля Тележная wrote: > [2] > https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/ This type of Message-Id makes me think that you used SubmitGit to send this patch series. The main problem I see here is that the patches are not sent as replies to this cover letter, and therefore they are seemingly disconnected on the mailing list. It was also my impression that SubmitGit started supporting sending cover letters, in which case you would not have to jump through hoops to thread the mails properly. But for that to work, the PR has to have a description which is then used as cover letter. I do not see any description in https://github.com/git/git/pull/520, though. Maybe provide one? Ciao, Johannes P.S.: You might have noticed that I am working (slowly, but steadily) on a contender for SubmitGit that I call GitGitGadget. Originally, I really wanted to enhance SubmitGit instead because I am a big believer of *not* reinventing the wheel (so much energy gets wasted that way). However, in this case the limitations of the chosen language (I do not want to learn Scala, I have absolutely zero need to know Scala in any of my other endeavors, and my time to learn new things is limited, so I spend it wisely) and the limitations of the design (the UI is completely separate from GitHub, you have to allow Amazon to send mails in your name, and SubmitGit's design makes it impossible to work bi-directionally, it is only GitHub -> mailing list, while I also want the option to add replies on the mailing list as comments to the GitHub PR in the future) made me reconsider. If you want to kick the tires, so to say, I welcome you to give GitGitGadget a try. It would require only a couple of things from you: - You would have to settle for a branch name, and then not open new PRs for every iteration you want to send, but force-push the branch instead. - You would have to open a PR at https://github.com/gitgitgadget/git. - You would have to provide the cover letter via the PR's description (and update that description before sending newer iterations). - I would have to add you to the list of users allowed to send patches via GitGitGadget (GitGitGadget has some really light-weight access control to prevent spamming). - You would then send a new iteration by simply adding a comment to your PR that contains this command: /submit - To integrate well with previous patch series iterations (i.e. to connect the threads), I would have to come up with a little bit of tooling to add some metadata that I have to reconstruct manually from your previously-sent iterations.
Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()
Оля Тележная writes: > Hello everyone, > This is my new attempt to start using oid_object_info_extended() in > ref-filter. You could look at previous one [1] [2] but it is not > necessary. Yup, it sounds like a sensible thing to do to try asking object-info helper instead of reading the whole object in-core and inspecting it ourselves when we can avoid it.
Re: [PATCH 0/4] fsck: doc fixes & fetch.fsck.* implementation
Ævar Arnfjörð Bjarmasonwrites: > On Thu, May 24, 2018 at 9:02 PM, Jeff King wrote: >> On Thu, May 24, 2018 at 07:04:04PM +0200, Ævar Arnfjörð Bjarmason wrote: >> >>> That doesn't work, because that's for the server-side, but I need the >>> fetch.fsck.* that doesn't exist. This works (I'll send a better patch >>> with tests / docs etc. soon): >> >> Yeah, I think this is the right direction. It seems that this 4-patch series is sufficiently commented and earlier parts can be further polished soonish using help made by others. I need to re-read the last patch (implementation) but I think this generally is good. Thanks.
Re: [PATCH 0/4] a few mark_parents_uninteresting cleanups
On 5/11/2018 2:00 PM, Jeff King wrote: This is a follow-up to the discussion from February: https://public-inbox.org/git/1519522496-73090-1-git-send-email-dsto...@microsoft.com/ There I theorized that some of these extra has_object_file() checks were unnecessary. After poking around a bit, I've convinced myself that this is the case, so here are some patches. After Stolee's fix in ebbed3ba04 (revision.c: reduce object database queries, 2018-02-24), I doubt this will provide any noticeable speedup. IMHO the value is just in simplifying the logic. The first two patches are the actual has_object_file simplifications. The second two are an attempt to fix some head-scratching I had while reading mark_parents_uninteresting(). I hope the result is easier to follow, but I may have just shuffled one confusion for another. :) [1/4]: mark_tree_contents_uninteresting(): drop missing object check [2/4]: mark_parents_uninteresting(): drop missing object check [3/4]: mark_parents_uninteresting(): replace list with stack [4/4]: mark_parents_uninteresting(): avoid most allocation revision.c | 90 ++ 1 file changed, 50 insertions(+), 40 deletions(-) -Peff This series looks good to me. I found Patch 3 hard to read in the diff, so I just looked at the final result and the new arrangement is very clear about how it should behave. Reviewed-by: Derrick Stolee
Re: [PATCH 0/4] doc: cleaning up instances of \--
On Wed, Apr 18, 2018 at 01:24:55PM +0900, Junio C Hamano wrote: > Martin Ågrenwrites: > > > This is a patch series to convert \-- to -- in our documentation. The > > first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to > > --option, 2015-05-13) to fix some instances that have appeared since. > > The other three patches deal with standalone "\--" which we can't > > always turn into "--" since it can be rendered as an em dash. > > All looked sensible. As you mentioned in [2/4], "\--::" that is > part of an enumulation appear in documentation for about a dozen > commands after the series, but I do not think we can avoid it. > > One thing that makes me wonder related to these patches is if a > newer AsciiDoc we assume lets us do without {litdd} macro. This > series and our earlier effort like 1c262bb7 ("doc: convert \--option > to --option", 2015-05-13) mentions that "\--word" is less pleasant > on the eyes than "--word", but the ugliness "two{litdd}words" has > over "two--words" is far worse than that, so... I think many cases that use {litdd} would be better off using literal backticks anyway (e.g., git-add.txt mentions the filename `git-add--interactive.perl`). There are certainly a few that can't, though (e.g., config.txt uses linkgit:git-web{litdd}browse[1]). I agree that "\--" is less ugly there (and seems to work on my modern asciidoc). There's some history on the litdd versus "\--" choice in 565e135a1e (Documentation: quote double-dash for AsciiDoc, 2011-06-29). That in turn references the 2839478774 (Work around em-dash handling in newer AsciiDoc, 2010-08-23), but I wouldn't be surprised if all of that is now obsolete with our AsciiDoc 8+ requirement. -Peff PS Late review, I know, but the patches look good to me. :)
Re: [PATCH 0/4] subtree: move out of contrib
On Tue, May 01 2018, Johannes Schindelin wrote: > Hi Ævar, > > On Mon, 30 Apr 2018, Ævar Arnfjörð Bjarmason wrote: > >> I think at this point git-subtree is widely used enough to move out of >> contrib/, maybe others disagree, but patches are always better for >> discussion that patch-less ML posts. > > Sure, it is used widely enough. > > However, it flies in the face of so many GSoC efforts to introduce yet > another one of those poorly portable Unix shell scripts, as central part > of Git's code base. > > The script itself does look quite straight-forward to port to a builtin, > so why not give it a try? That's a valid point. I think it makes sense to leave that aside for now, maybe the consensus is that subtree is fine in every way except we'd like to have a policy not to introduce new shellscript built-ins. Let's first just assume it's in C already and look at it in terms of its functionality, to figure out if it's worth even getting to that point. > If you are completely opposed to porting it to C, I will be completely > opposed to moving it out of contrib/. This series shows that we should split the concern about whether something lives in contrib/ from whether it's built/installed by default. No matter if we decide that subtree should be a blessed default command it makes sense to move it out of contrib, purely because as can be seen from this series it'll replace >100 lines of hacks with 1 line in our main Makefile. We can then just e.g. add a flag to guard for it, e.g. CONTRIB_SUBTREE=YesPlease. But that's just an internal implementation detail of how we manage code sitting in git.git.
Re: [PATCH 0/4] subtree: move out of contrib
Hi Ævar, On Mon, 30 Apr 2018, Ævar Arnfjörð Bjarmason wrote: > I think at this point git-subtree is widely used enough to move out of > contrib/, maybe others disagree, but patches are always better for > discussion that patch-less ML posts. Sure, it is used widely enough. However, it flies in the face of so many GSoC efforts to introduce yet another one of those poorly portable Unix shell scripts, as central part of Git's code base. The script itself does look quite straight-forward to port to a builtin, so why not give it a try? If you are completely opposed to porting it to C, I will be completely opposed to moving it out of contrib/. If you need help with porting it, please come up with a task plan and I can jump in, to help (but please do collaborate with me on this one, don't leave all of the hard work to me). Ciao, Dscho
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 11:50 AM, Ævar Arnfjörð Bjarmasonwrote: > I think at this point git-subtree is widely used enough to move out of > contrib/, maybe others disagree, but patches are always better for > discussion that patch-less ML posts. After narrow/partial clone becomes real, it should be "easy" to implement some sort of narrow checkout that would achieve the same thing. But it took me forever with all other stuff to get back to this. If we remove it from contrib and there are people willing to update/maintain it, should it be a separate repository then? The willing people will have much more freedom to update it. And I don't have to answer the questions about who will maintain this thing in git.git. I don't like the idea of adding new (official) shell-based commands either to be honest. -- Duy
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 6:21 PM, Ævar Arnfjörð Bjarmasonwrote: > Pretty clear it's garbage data, unless we're to believe that the > relative interest of submodules in the US, Germany and Sweden is 51, 64 > & 84, but 75, 100 and 0 for subtree. Oh yeah, Swedish people hate git-subtree. Nobody knows why. Avery
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30 2018, Avery Pennarun wrote: > On Mon, Apr 30, 2018 at 5:38 PM, Stefan Bellerwrote: > There's one exception, which is doing a one-time permanent merge of > two projects into one. That's a nice feature, but is probably used > extremely rarely. FWIW this is the only thing I've used it for. I do this occasionally and used to do this manually with format-patch + "perl -pe" before or similar when I needed to merge some repositories together, and then some other times I was less stupid and manually started doing something similar to what subtree is doing with a "move everything" commit just before the merge of the two histories. >> https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule >> >> Not sure what to make of this data. > > Clearly people need a lot more help when using submodules than when > using subtree :) Pretty clear it's garbage data, unless we're to believe that the relative interest of submodules in the US, Germany and Sweden is 51, 64 & 84, but 75, 100 and 0 for subtree.
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 2:53 PM, Avery Pennarunwrote: > For the best of both worlds, I've often thought that a good balance > would be to use the same data structure that submodule uses, but to > store all the code in a single git repo under different refs, which we > might or might not download (or might or might not have different > ACLs) under different circumstances. There has been some experimentation with having a simpler ref surface on the submodule side, https://public-inbox.org/git/cover.1512168087.git.jonathanta...@google.com/ The way you describe the future of submodules, all we'd have to do is to teach git-clone how to select the the "interesting" refs for your use case. Any other command would assume all submodule data to be in the main repository. The difference to Jonathans proposal linked above, would be the object store to be in the main repo and the refs to be prefixed per submodule instead of "shadowed". > However, when some projects get > really huge (lots of very big submodule dependencies), then repacking > one-big-repo starts becoming unwieldy; in that situation git-subtree > also fails completely. Yes, but that is a general scaling problem of Git that could be tackled, e.g. repack into multiple packs serially instead of putting everything into one pack. >> Submodules do not need to produce a synthetic project history >> when splitting off again, as the history is genuine. This allows >> for easier work with upstream. > > Splitting for easier work upstream is great, and there really ought to > be an official version of 'git subtree split', which is good for all > sorts of purposes. > > However, I suspect almost all uses of the split feature are a) > splitting a subtree that you previously merged in, or b) splitting a > subtree into a separate project that you want to maintain separately > from now on. Repeated splits in case (a) are only necessary because > you're not using submodules, or in case (b) are only necessary because > you didn't *switch* to submodules when it finally came time to split > the projects. (In both cases you probably didn't switch to submodules > because you didn't like one of its tradeoffs, especially the need to > track multiple repos when you fork.) That makes sense. > > There's one exception, which is doing a one-time permanent merge of > two projects into one. That's a nice feature, but is probably used > extremely rarely. More often people get into a > merge-split-merge-split cycle that would be better served by a > slightly improved git-submodule. This rare use case is how git-subtree came into existence in gits contrib directory AFAICT, https://kernel.googlesource.com/pub/scm/git/git/+/634392b26275fe5436c0ea131bc89b46476aa4ae which is interesting to view in git-show, but I think defaults could be tweaked there, as it currently shows me mostly a license file. >> Conceptually Gerrit is doing >> >> while true: >> git submodule update --remote >> if worktree is dirty: >> git commit "update the submodules" >> >> just that Gerrit doesn't poll but does it event based. > > ...and it's super handy :) The problem is it's fundamentally > centralized: because gerrit can serialize merges into the submodule, > it also knows exactly how to update the link in the supermodule. If > there was wild branching and merging (as there often is in git) and > you had to resolve conflicts between two submodules, I don't think it > would be obvious at all how to do it automatically when pushing a > submodule. (This also works quite badly with git subtree --squash.) With the poll based solution I don't think you'd run into many more problems than you would with Gerrits solution. In a nearby thread, we were just discussing the submodule merging strategies, https://public-inbox.org/git/1524739599.20251.17.ca...@klsmartin.com/ which might seem confusing, but the implementation is actually easy as we just fastforward-only in submodules. >> >> https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule >> >> Not sure what to make of this data. > > Clearly people need a lot more help when using submodules than when > using subtree :) That could be true. :) Thanks, Stefan
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 5:38 PM, Stefan Bellerwrote: > On Mon, Apr 30, 2018 at 1:45 PM, Avery Pennarun wrote: > No objections from me either. > > Submodules seem to serve a slightly different purpose, though? I think the purpose is actually the same - it's just the tradeoffs that are difference. Both sets of tradeoffs kind of suck. > With Subtrees the superproject always contains all the code, > even when you squash the subtree histroy when merging it in. > In the submodule world, you may not have access to one of the > submodules. Right. Personally I think it's a disadvantage of subtree that it always contains all the code (what if some people don't want the code for a particular build variant?). However, it's a huge pain that submodules *don't* contain all the code (what if I'm not online right now, or the site supposedly containing the code goes offline, or I want to make my own fork?). For the best of both worlds, I've often thought that a good balance would be to use the same data structure that submodule uses, but to store all the code in a single git repo under different refs, which we might or might not download (or might or might not have different ACLs) under different circumstances. However, when some projects get really huge (lots of very big submodule dependencies), then repacking one-big-repo starts becoming unwieldy; in that situation git-subtree also fails completely. > Submodules do not need to produce a synthetic project history > when splitting off again, as the history is genuine. This allows > for easier work with upstream. Splitting for easier work upstream is great, and there really ought to be an official version of 'git subtree split', which is good for all sorts of purposes. However, I suspect almost all uses of the split feature are a) splitting a subtree that you previously merged in, or b) splitting a subtree into a separate project that you want to maintain separately from now on. Repeated splits in case (a) are only necessary because you're not using submodules, or in case (b) are only necessary because you didn't *switch* to submodules when it finally came time to split the projects. (In both cases you probably didn't switch to submodules because you didn't like one of its tradeoffs, especially the need to track multiple repos when you fork.) > Subtrees present you the whole history by default and the user > needs to be explicit about not wanting to see history from the > subtree, which is the opposite of submodules (though this > may be planned in the future to switch). It turns out that AFAIK, almost everyone prefers 'git subtree --squash', which squashes into a single commit each time you merge, much like git submodule does. I doubt people would cry too much if the full-history feature went away. There's one exception, which is doing a one-time permanent merge of two projects into one. That's a nice feature, but is probably used extremely rarely. More often people get into a merge-split-merge-split cycle that would be better served by a slightly improved git-submodule. >> The gerrit team (eg. Stefan Beller) has been doing some really great >> stuff to make submodules more usable by helping with relative >> submodule links and by auto-updating links in supermodules at the >> right times. Unfortunately doing that requires help from the server >> side, which kind of messes up decentralization and so doesn't solve >> the problem in the general case. > > Conceptually Gerrit is doing > > while true: > git submodule update --remote > if worktree is dirty: > git commit "update the submodules" > > just that Gerrit doesn't poll but does it event based. ...and it's super handy :) The problem is it's fundamentally centralized: because gerrit can serialize merges into the submodule, it also knows exactly how to update the link in the supermodule. If there was wild branching and merging (as there often is in git) and you had to resolve conflicts between two submodules, I don't think it would be obvious at all how to do it automatically when pushing a submodule. (This also works quite badly with git subtree --squash.) >> I really wish there were a good answer, but I don't know what it is. >> I do know that lots of people seem to at least be happy using >> git-subtree, and would be even happier if it were installed >> automatically with git. > > https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule > > Not sure what to make of this data. Clearly people need a lot more help when using submodules than when using subtree :) Have fun, Avery
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 1:45 PM, Avery Pennarunwrote: > On Mon, Apr 30, 2018 at 5:50 AM, Ævar Arnfjörð Bjarmason > wrote: >> I think at this point git-subtree is widely used enough to move out of >> contrib/, maybe others disagree, but patches are always better for >> discussion that patch-less ML posts. > > I really was hoping git-subtree would be completely obsoleted by > git-submodule by now, but... it hasn't been, so... no objections on > this end. No objections from me either. Submodules seem to serve a slightly different purpose, though? With Subtrees the superproject always contains all the code, even when you squash the subtree histroy when merging it in. In the submodule world, you may not have access to one of the submodules. Submodules do not need to produce a synthetic project history when splitting off again, as the history is genuine. This allows for easier work with upstream. Subtrees present you the whole history by default and the user needs to be explicit about not wanting to see history from the subtree, which is the opposite of submodules (though this may be planned in the future to switch). > The gerrit team (eg. Stefan Beller) has been doing some really great > stuff to make submodules more usable by helping with relative > submodule links and by auto-updating links in supermodules at the > right times. Unfortunately doing that requires help from the server > side, which kind of messes up decentralization and so doesn't solve > the problem in the general case. Conceptually Gerrit is doing while true: git submodule update --remote if worktree is dirty: git commit "update the submodules" just that Gerrit doesn't poll but does it event based. > I really wish there were a good answer, but I don't know what it is. > I do know that lots of people seem to at least be happy using > git-subtree, and would be even happier if it were installed > automatically with git. https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule Not sure what to make of this data.
Re: [PATCH 0/4] subtree: move out of contrib
On Mon, Apr 30, 2018 at 5:50 AM, Ævar Arnfjörð Bjarmasonwrote: > I think at this point git-subtree is widely used enough to move out of > contrib/, maybe others disagree, but patches are always better for > discussion that patch-less ML posts. I really was hoping git-subtree would be completely obsoleted by git-submodule by now, but... it hasn't been, so... no objections on this end. The gerrit team (eg. Stefan Beller) has been doing some really great stuff to make submodules more usable by helping with relative submodule links and by auto-updating links in supermodules at the right times. Unfortunately doing that requires help from the server side, which kind of messes up decentralization and so doesn't solve the problem in the general case. I really wish there were a good answer, but I don't know what it is. I do know that lots of people seem to at least be happy using git-subtree, and would be even happier if it were installed automatically with git. Have fun, Avery
Re: [PATCH 0/4] subtree: move out of contrib
From: "Ævar Arnfjörð Bjarmason"I think at this point git-subtree is widely used enough to move out of contrib/, maybe others disagree, but patches are always better for discussion that patch-less ML posts. Assuming this lands in Git, then there will also need to be a simple follow on into Duy's series that is updating the command-list.txt (Message-Id: <20180429181844.21325-10-pclo...@gmail.com>). Duy's series also does the completions thing IIUC;-). -- Philip Ævar Arnfjörð Bjarmason (4): git-subtree: move from contrib/subtree/ subtree: remove support for git version <1.7 subtree: fix a test failure under GETTEXT_POISON i18n: translate the git-subtree command .gitignore| 1 + Documentation/git-submodule.txt | 2 +- .../subtree => Documentation}/git-subtree.txt | 3 + Makefile | 1 + contrib/subtree/.gitignore| 7 - contrib/subtree/COPYING | 339 -- contrib/subtree/INSTALL | 28 -- contrib/subtree/Makefile | 97 - contrib/subtree/README| 8 - contrib/subtree/t/Makefile| 86 - contrib/subtree/todo | 48 --- .../subtree/git-subtree.sh => git-subtree.sh | 109 +++--- {contrib/subtree/t => t}/t7900-subtree.sh | 21 +- 13 files changed, 78 insertions(+), 672 deletions(-) rename {contrib/subtree => Documentation}/git-subtree.txt (99%) delete mode 100644 contrib/subtree/.gitignore delete mode 100644 contrib/subtree/COPYING delete mode 100644 contrib/subtree/INSTALL delete mode 100644 contrib/subtree/Makefile delete mode 100644 contrib/subtree/README delete mode 100644 contrib/subtree/t/Makefile delete mode 100644 contrib/subtree/todo rename contrib/subtree/git-subtree.sh => git-subtree.sh (84%) rename {contrib/subtree/t => t}/t7900-subtree.sh (99%) -- 2.17.0.290.gded63e768a
Re: [PATCH 0/4] doc: cleaning up instances of \--
On Tue, Apr 17, 2018 at 09:15:25PM +0200, Martin Ågren wrote: > This is a patch series to convert \-- to -- in our documentation. The > first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to > --option, 2015-05-13) to fix some instances that have appeared since. > The other three patches deal with standalone "\--" which we can't > always turn into "--" since it can be rendered as an em dash. This series looked sane to me as well. I appreciate the work to keep our documentation working in both AsciiDoc and Asciidoctor. -- brian m. carlson: Houston, Texas, US OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH 0/4] doc: cleaning up instances of \--
Martin Ågrenwrites: > This is a patch series to convert \-- to -- in our documentation. The > first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to > --option, 2015-05-13) to fix some instances that have appeared since. > The other three patches deal with standalone "\--" which we can't > always turn into "--" since it can be rendered as an em dash. All looked sensible. As you mentioned in [2/4], "\--::" that is part of an enumulation appear in documentation for about a dozen commands after the series, but I do not think we can avoid it. One thing that makes me wonder related to these patches is if a newer AsciiDoc we assume lets us do without {litdd} macro. This series and our earlier effort like 1c262bb7 ("doc: convert \--option to --option", 2015-05-13) mentions that "\--word" is less pleasant on the eyes than "--word", but the ugliness "two{litdd}words" has over "two--words" is far worse than that, so... Thanks, will queue.
Re: [PATCH 0/4] Convert some stash functionality to a builtin
On 03/24, Joel Teichroeb wrote: > I've been working on converting all of git stash to be a > builtin, however it's hard to get it all working at once with > limited time, so I've moved around half of it to a new > stash--helper builtin and called these functions from the shell > script. Once this is stabalized, it should be easier to convert > the rest of the commands one at a time without breaking > anything. > > I've sent most of this code before, but that was targetting a > full replacement of stash. The code is overall the same, but > with some code review changes and updates for internal api > changes. Thanks for splitting this up into multiple patches, I found that much more pleasant to review, and thanks for your continued work on this :) > Since there seems to be interest from GSOC students who want to > work on converting builtins, I figured I should finish what I > have that works now so they could build on top of it. > > Joel Teichroeb (4): > stash: convert apply to builtin > stash: convert branch to builtin > stash: convert drop and clear to builtin > stash: convert pop to builtin > > .gitignore | 1 + > Makefile| 1 + > builtin.h | 1 + > builtin/stash--helper.c | 514 > > git-stash.sh| 13 +- > git.c | 1 + > 6 files changed, 526 insertions(+), 5 deletions(-) > create mode 100644 builtin/stash--helper.c > > -- > 2.16.2 >
Re: [PATCH 0/4] Speed up git tag --contains
On Mon, Mar 12, 2018 at 09:45:27AM -0400, Derrick Stolee wrote: > > diff --git a/builtin/branch.c b/builtin/branch.c > > index 8dcc2ed058..4d674e86d5 100644 > > --- a/builtin/branch.c > > +++ b/builtin/branch.c > > @@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, > > struct ref_sorting *sortin > > memset(, 0, sizeof(array)); > > + filter->with_commit_tag_algo = 1; > > filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN); > > if (filter->verbose) > > > > drops my run of "git branch -a --contains HEAD~100" from 8.6s to > > 0.4s on a repo with ~1800 branches. That sounds good, but on a repo with > > a smaller number of branches, we may actually end up slower (because we > > dig further down in history, and don't benefit from the multiple-branch > > speedup). > > It's good to know that we already have an algorithm for the multi-head > approach. Things like `git branch -vv` are harder to tease out because the > graph walk is called by the line-format code. Yeah, the ahead/behind stuff will need some work. Part of it is just code structuring. We know ahead of time which branches (and their upstreams) are going to need this ahead/behind computation, so we should be able to do collect them all for a single call. But I'm not sure if a general multi-pair ahead/behind is going to be easy. I don't have even experimental code for that. :) We have a multi-pair ahead/behind command which we use at GitHub, but it does each pair separately. It leans heavily on reachability bitmaps, so the main advantage is that it's able to amortize the cost of loading the bitmaps (both off disk, but also we sometimes have to walk to complete the bitmaps). -Peff
Re: [PATCH 0/4] Speed up git tag --contains
On 3/3/2018 12:15 AM, Jeff King wrote: On Fri, Jan 12, 2018 at 10:56:00AM -0800, csilvers wrote: This is a resubmission of Jeff King's patch series to speed up git tag --contains with some changes. It's been cooking for a while as: Replying to this 6-year-old thread: Is there any chance this could be resurrected? We are using phabricator, which uses `git branch --contains` as part of its workflow. Our repo has ~1000 branches on it, and the contains operation is eating up all our CPU (and time). It would be very helpful to us to make this faster! (The original thread is at https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/ Sorry, this got thrown on my "to respond" pile and languished. Thanks for adding me to the thread. It's good to know the pain point people are having around commit graph walks. There are actually three things that make "git branch --contains" slow. First, if you're filtering 1000 branches, we'll run 1000 merge-base traversals, which may walk over the same commits multiple times. These days "tag --contains" uses a different algorithm that can look at all heads in a single traversal. But the downside is that it's depth-first, so it tends to walk down to the roots. That's generally OK for tags, since you often have ancient tags that mean getting close to the roots anyway. But for branches, they're more likely to be recent, and you can get away without going very deep into the history. So it's a tradeoff. There's no run-time switch to flip between them, but a patch like this: diff --git a/builtin/branch.c b/builtin/branch.c index 8dcc2ed058..4d674e86d5 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin memset(, 0, sizeof(array)); + filter->with_commit_tag_algo = 1; filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN); if (filter->verbose) drops my run of "git branch -a --contains HEAD~100" from 8.6s to 0.4s on a repo with ~1800 branches. That sounds good, but on a repo with a smaller number of branches, we may actually end up slower (because we dig further down in history, and don't benefit from the multiple-branch speedup). It's good to know that we already have an algorithm for the multi-head approach. Things like `git branch -vv` are harder to tease out because the graph walk is called by the line-format code. I tried to do a "best of both" algorithm in: https://public-inbox.org/git/20140625233429.ga20...@sigill.intra.peff.net/ which finds arbitrary numbers of merge bases in a single traversal. It did seem to work, but I felt uneasy about some of the corner cases. I've been meaning to revisit it, but obviously have never gotten around to it. The second slow thing is that during the traversal we load each commit object from disk. The solution there is to keep the parent information in a faster cache. I had a few proposals over the years, but I won't even bother to dig them up, because there's quite recent and promising work in this area from Derrick Stolee: https://public-inbox.org/git/1519698787-190494-1-git-send-email-dsto...@microsoft.com/ And finally, the thing that the patches you linked are referencing is about using commit timestamps as a proxy for generation numbers. And Stolee's patches actually leave room for real, trustable generation numbers. Once we have the serialized commit graph and generation numbers, think the final step would just be to teach the "tag --contains" algorithm to stop walking down unproductive lines of history. And in fact, I think we can forget about the best-of-both multi-tip merge-base idea entirely. Because if you can use the generation numbers to avoid going too deep, then a depth-first approach is fine. And we'd just want to flip git-branch over to using that algorithm by default. I'll keep this in mind as a target for performance measurements in the serialized commit graph patch and the following generation number patch. Thanks, -Stolee
Re: [PATCH 0/4] Speed up git tag --contains
} I had a few proposals over the years, but I won't even bother to dig } them up, because there's quite recent and promising work in this } area from Derrick Stolee: It sounds like the best thing to do is to wait for this, then. We managed to convert a bunch of our branches to tags, so our immediate problem has been resolved. But I'm sure it will come up again as more branches are created... carig
Re: [PATCH 0/4] Teach `--default` to `git-config(1)`
On Mon, Mar 05, 2018 at 06:17:25PM -0800, Taylor Blau wrote: > Attached is a patch series to introduce `--default` and `--color` to the > `git-config(1)` builtin with the aim of introducing a consistent interface to > replace `--get-color`. This series draws significant inspiration from Soukaina Nait Hmid's work in: https://public-inbox.org/git/0102015fb0bf2f74-cb456171-fe65-4d83-8784-b553c7c9e584-000...@eu-west-1.amazonses.com/ -- - Taylor
Re: [PATCH 0/4] Speed up git tag --contains
On Fri, Jan 12, 2018 at 10:56:00AM -0800, csilvers wrote: > > This is a resubmission of Jeff King's patch series to speed up git tag > > --contains with some changes. It's been cooking for a while as: > > Replying to this 6-year-old thread: > > Is there any chance this could be resurrected? We are using > phabricator, which uses `git branch --contains` as part of its > workflow. Our repo has ~1000 branches on it, and the contains > operation is eating up all our CPU (and time). It would be very > helpful to us to make this faster! > > (The original thread is at > https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/ Sorry, this got thrown on my "to respond" pile and languished. There are actually three things that make "git branch --contains" slow. First, if you're filtering 1000 branches, we'll run 1000 merge-base traversals, which may walk over the same commits multiple times. These days "tag --contains" uses a different algorithm that can look at all heads in a single traversal. But the downside is that it's depth-first, so it tends to walk down to the roots. That's generally OK for tags, since you often have ancient tags that mean getting close to the roots anyway. But for branches, they're more likely to be recent, and you can get away without going very deep into the history. So it's a tradeoff. There's no run-time switch to flip between them, but a patch like this: diff --git a/builtin/branch.c b/builtin/branch.c index 8dcc2ed058..4d674e86d5 100644 --- a/builtin/branch.c +++ b/builtin/branch.c @@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, struct ref_sorting *sortin memset(, 0, sizeof(array)); + filter->with_commit_tag_algo = 1; filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN); if (filter->verbose) drops my run of "git branch -a --contains HEAD~100" from 8.6s to 0.4s on a repo with ~1800 branches. That sounds good, but on a repo with a smaller number of branches, we may actually end up slower (because we dig further down in history, and don't benefit from the multiple-branch speedup). I tried to do a "best of both" algorithm in: https://public-inbox.org/git/20140625233429.ga20...@sigill.intra.peff.net/ which finds arbitrary numbers of merge bases in a single traversal. It did seem to work, but I felt uneasy about some of the corner cases. I've been meaning to revisit it, but obviously have never gotten around to it. The second slow thing is that during the traversal we load each commit object from disk. The solution there is to keep the parent information in a faster cache. I had a few proposals over the years, but I won't even bother to dig them up, because there's quite recent and promising work in this area from Derrick Stolee: https://public-inbox.org/git/1519698787-190494-1-git-send-email-dsto...@microsoft.com/ And finally, the thing that the patches you linked are referencing is about using commit timestamps as a proxy for generation numbers. And Stolee's patches actually leave room for real, trustable generation numbers. Once we have the serialized commit graph and generation numbers, think the final step would just be to teach the "tag --contains" algorithm to stop walking down unproductive lines of history. And in fact, I think we can forget about the best-of-both multi-tip merge-base idea entirely. Because if you can use the generation numbers to avoid going too deep, then a depth-first approach is fine. And we'd just want to flip git-branch over to using that algorithm by default. -Peff
Re: [PATCH 0/4] Delete ignore_env member in struct repository
On Tue, Feb 27, 2018 at 3:46 AM, Stefan Bellerwrote: > What is the plan from here on? > Should I build further series on top of yours? The next series will > focus on the pack side of things (pack.h, packfile.{c,h}) > > So maybe we'll have Junio merge down my series (and yours as it > needs one reroll?) and then build on top of that? I think that's the less painful way for everybody. -- Duy
Re: [PATCH 0/4] Delete ignore_env member in struct repository
On Mon, Feb 26, 2018 at 2:30 AM, Nguyễn Thái Ngọc Duywrote: > It turns out I don't need my other series [1] in order to delete this > field. This series moves getenv() calls from > repo_set_gitdir()/repo_setup_env() and prepare_alt_odb() back in > environment.c where they belong in my opinion. > > The repo_set_gitdir() now takes $GIT_DIR and optionally all other > configurable paths. If those paths are NULL, default repo layout will > be used. With getenv() no longer called inside repo_set_gitdir(), > ignore_env has no reason to stay. This is in 1/4. > > The getenv() in prepare_alt_odb() is also moved back to > setup_git_env() in 3/4. It demonstrates how we could move other > getenv() back to if we want. > > This series is built on top of Stefan's object-store-part1, v4. I > could rebase it on 'master' too, but then Junio may need to resolve > some conflicts. > Thanks for working on this, I found this series a pleasant read, the only issue I saw was Erics upbringing of multiple getenv calls without strdup()ing the content. What is the plan from here on? Should I build further series on top of yours? The next series will focus on the pack side of things (pack.h, packfile.{c,h}) So maybe we'll have Junio merge down my series (and yours as it needs one reroll?) and then build on top of that? Thanks, Stefan
Re: [PATCH 0/4] Delete ignore_env member in struct repository
Nguyễn Thái Ngọc Duywrites: > This series is built on top of Stefan's object-store-part1, v4. I > could rebase it on 'master' too, but then Junio may need to resolve > some conflicts. As a follow-up fix for combined "move things to the_repo as much as possible" efforts I think what you did makes most sense. I would say that it makes even more sense to make these part of the object-store series if/when the series needs rerolling--after all the topic is already a multi-author effort. I'll have to read them carefully before commenting on the actual patches ;-) Thanks. > > [1] https://public-inbox.org/git/20180225111840.16421-1-pclo...@gmail.com/ > > Nguyễn Thái Ngọc Duy (4): > repository.c: move env-related setup code back to environment.c > repository.c: delete dead functions > sha1_file.c: move delayed getenv(altdb) back to setup_git_env() > repository: delete ignore_env member > > cache.h| 2 +- > environment.c | 13 +++-- > object-store.h | 5 +++- > object.c | 1 + > repository.c | 79 ++ > repository.h | 21 +++--- > setup.c| 3 +- > sha1_file.c| 6 +--- > 8 files changed, 64 insertions(+), 66 deletions(-)
Re: [PATCH 0/4] Correct offsets of hunks when one is skipped
On 13/02/18 23:56, brian m. carlson wrote: > On Tue, Feb 13, 2018 at 10:44:04AM +, Phillip Wood wrote: >> From: Phillip Wood>> >> While working on a patch series to stage selected lines from a hunk >> without having to edit it I got worried that subsequent patches would >> be applied in the wrong place which lead to this series to correct the >> offsets of hunks following those that are skipped or edited. >> >> Phillip Wood (4): >> add -i: add function to format hunk header >> t3701: add failing test for pathological context lines >> add -p: Adjust offsets of subsequent hunks when one is skipped >> add -p: calculate offset delta for edited patches >> >> git-add--interactive.perl | 93 >> +++--- >> t/t3701-add-interactive.sh | 30 +++ >> 2 files changed, 102 insertions(+), 21 deletions(-) > > This looks reasonably sane to me. I really like that you managed to > produce failing tests for this situation. I know pathological cases > like this have bit GCC in the past, so it's good that you fixed this. > Thanks Brain, it's interesting to hear that GCC has been bitten in the past Best Wishes Phillip
Re: [PATCH 0/4] Correct offsets of hunks when one is skipped
On Tue, Feb 13, 2018 at 10:44:04AM +, Phillip Wood wrote: > From: Phillip Wood> > While working on a patch series to stage selected lines from a hunk > without having to edit it I got worried that subsequent patches would > be applied in the wrong place which lead to this series to correct the > offsets of hunks following those that are skipped or edited. > > Phillip Wood (4): > add -i: add function to format hunk header > t3701: add failing test for pathological context lines > add -p: Adjust offsets of subsequent hunks when one is skipped > add -p: calculate offset delta for edited patches > > git-add--interactive.perl | 93 > +++--- > t/t3701-add-interactive.sh | 30 +++ > 2 files changed, 102 insertions(+), 21 deletions(-) This looks reasonably sane to me. I really like that you managed to produce failing tests for this situation. I know pathological cases like this have bit GCC in the past, so it's good that you fixed this. -- brian m. carlson / brian with sandals: Houston, Texas, US https://www.crustytoothpaste.net/~bmc | My opinion only OpenPGP: https://keybase.io/bk2204 signature.asc Description: PGP signature
Re: [PATCH 0/4] Speed up git tag --contains
> This is a resubmission of Jeff King's patch series to speed up git tag > --contains with some changes. It's been cooking for a while as: Replying to this 6-year-old thread: Is there any chance this could be resurrected? We are using phabricator, which uses `git branch --contains` as part of its workflow. Our repo has ~1000 branches on it, and the contains operation is eating up all our CPU (and time). It would be very helpful to us to make this faster! (The original thread is at https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/ ) craig
Re: [PATCH 0/4] "Fast-"track Git GUI changes
Hi, On Tue, 9 Jan 2018, Junio C Hamano wrote: > Junio C Hamanowrites: > > > Johannes Schindelin writes: > > > >> As it seems to be impossible to get the attention of the Git GUI > >> maintainer these "days" (I have opened Pull Requests on October 16th > >> 2016 that have not even been looked at), let's just side-step that > >> contribution path which seems to be dormant. > > > > Good to see that finally somebody else steps up after I did the same > > for a few times recently. > > > >> These fixes have been in Git for Windows for various amounts of time. > >> > >> Note: there are more Git GUI changes in Git for Windows, I only > >> accumulated the ones I deem wort considering for inclusion into v2.16.0, > >> still. > > > > Thanks. I am not sure if it is too late for 2.16, as these are not > > fixes for regression during this cycle, though. > > Heh, I changed my mind. > > Just like I pretended to be interim maintainer of Git-GUI for the > past two times, you are doing the same this time and I even agreed > that it was a good thing that you volunteered to pretend as one. > > So let's follow through the pretence to its conclusion and merge > these directly to 'master'. Thank you, fellow pretender. Ciao, Dscho
Re: [PATCH 0/4] "Fast-"track Git GUI changes
Junio C Hamanowrites: > Johannes Schindelin writes: > >> As it seems to be impossible to get the attention of the Git GUI >> maintainer these "days" (I have opened Pull Requests on October 16th >> 2016 that have not even been looked at), let's just side-step that >> contribution path which seems to be dormant. > > Good to see that finally somebody else steps up after I did the same > for a few times recently. > >> These fixes have been in Git for Windows for various amounts of time. >> >> Note: there are more Git GUI changes in Git for Windows, I only >> accumulated the ones I deem wort considering for inclusion into v2.16.0, >> still. > > Thanks. I am not sure if it is too late for 2.16, as these are not > fixes for regression during this cycle, though. Heh, I changed my mind. Just like I pretended to be interim maintainer of Git-GUI for the past two times, you are doing the same this time and I even agreed that it was a good thing that you volunteered to pretend as one. So let's follow through the pretence to its conclusion and merge these directly to 'master'. Thanks.
Re: [PATCH 0/4] "Fast-"track Git GUI changes
Johannes Schindelinwrites: > As it seems to be impossible to get the attention of the Git GUI > maintainer these "days" (I have opened Pull Requests on October 16th > 2016 that have not even been looked at), let's just side-step that > contribution path which seems to be dormant. Good to see that finally somebody else steps up after I did the same for a few times recently. > These fixes have been in Git for Windows for various amounts of time. > > Note: there are more Git GUI changes in Git for Windows, I only > accumulated the ones I deem wort considering for inclusion into v2.16.0, > still. Thanks. I am not sure if it is too late for 2.16, as these are not fixes for regression during this cycle, though.
Re: [PATCH 0/4] Add --no-ahead-behind to status
On Wed, Dec 20, 2017 at 02:42:41PM +, Jeff Hostetler wrote: > From: Jeff Hostetler> > This patch series adds a "--no-ahead-behind" option to status > to request that it avoid a possibly expensive ahead/behind > computation for the current branch. Instead, it just prints a > not up to date message in place of the detailed counts. > > This idea was previously discussed in [1]. Working with the > enormous Windows repository, we found that 20+ seconds was being > spent in the ahead/behind computation when the current branch was > 150K commits behind the upstream branch. (Yes, this happens and > only took 3 weeks on the reporter's system.) Overall this looks cleanly done. I raised a few minor points in the individual patches, but certainly nothing that would be a show-stopper for the general idea. > I've only modified "git status" in this patch series. A similar > change could be added to "git branch -vv" and "git checkout" to > avoid delays there too. I avoided doing it here to keep this > patch series focused. I have a feeling that the same user who got annoyed by "git status" is going to get annoyed by "git checkout" sooner or later. I'm OK with handling the other commands separately, but I suspect we may want at least "git checkout" support in the near future. There is one thing that it's worth thinking about (because it will be hard to take back later): config option naming. If your repository is so large that ahead/behind checks are annoying, would you want to be able to set a single config to tell Git that, rather than one for each command? If so, then do we want to ditch "status.aheadbehind" in favor of a more unified name? -Peff
Re: [PATCH 0/4] Fix issues with rename detection limits
On Fri, Nov 10, 2017 at 10:13 AM, Elijah Newrenwrote: > In a repo with around 60k files with deep directory hierarchies (due to > Elijah Newren (4): > sequencer: Warn when internal merge may be suboptimal due to > renameLimit > Remove silent clamp of renameLimit > progress: Fix progress meters when dealing with lots of work > sequencer: Show rename progress during cherry picks Sorry for the duplicate send. I noticed the cover letter didn't appear on in the email list, double-checked https://public-inbox.org/git/, waited over half an hour and double checked both email and public-inbox again and still didn't see it, so I re-sent. Just as soon as I did, it seems that the original and the new cover letter emails suddenly showed up right then. *shrug*.
Re: [PATCH 0/4] WIP: git-remote-media wiki namespace support
On 2017-10-30 11:40:06, Matthieu Moy wrote: > Antoine Beaupréwrites: > >> Obviously, doing unit tests against a full MediaWiki instance isn't >> exactly trivial. > > Not trivial, but doable: there is all the infrastructure to do so in t/: > install-wiki.sh to automatically install Mediawiki, and then a testsuite > that interacts with it. > > This has been written under the assumption that the developer had a > lighttpd instance running on localhost, but this can probably be adapted > to run on Travis-CI (install lighttpd & Mediawiki in the install: part, > and run the tests afterwards), so that anyone can run the tests by just > submitting a pull-request to Git-Mediawiki. > > If you are to work more on Git-Mediawiki, don't underestimate the > usefullness of the testsuite (for example, Git-Mediawiki was developped > against a prehistoric version of Mediawiki, the testsuite can help > ensuring it still works on the lastest version), nor the fun of playing > with install scripts and CI systems ;-). Hello! Glad to hear from you. :) So I actually tried install-wiki.sh, and it "failed to start lighttpd" and told me to see logs. I couldn't find them and stopped there... It would be great to hook this up into CI somewhere, but I suspect it isn't considering how it doesn't actually work out of the box. I'm hoping we can still do things and fix some things without going through that trouble, but I recognize it would be better to have unit tests operational. Honestly, I would prefer just having this thing work and not have to work on it. :) I have lots of things on my plate and I'm just scratching an itch on this one - some backup script broke and I am trying to fix it. Once it works, my work is done, so unfortunately I cannot lead that project (but I'd be happy to help when I can of course). A. -- The greatest tragedy in mankind's entire history may be the hijacking of morality by religion. - Arthur C. Clarke
Re: [PATCH 0/4] WIP: git-remote-media wiki namespace support
Antoine Beaupréwrites: > Obviously, doing unit tests against a full MediaWiki instance isn't > exactly trivial. Not trivial, but doable: there is all the infrastructure to do so in t/: install-wiki.sh to automatically install Mediawiki, and then a testsuite that interacts with it. This has been written under the assumption that the developer had a lighttpd instance running on localhost, but this can probably be adapted to run on Travis-CI (install lighttpd & Mediawiki in the install: part, and run the tests afterwards), so that anyone can run the tests by just submitting a pull-request to Git-Mediawiki. If you are to work more on Git-Mediawiki, don't underestimate the usefullness of the testsuite (for example, Git-Mediawiki was developped against a prehistoric version of Mediawiki, the testsuite can help ensuring it still works on the lastest version), nor the fun of playing with install scripts and CI systems ;-). Cheers, -- Matthieu Moy https://matthieu-moy.fr/
Re: [PATCH 0/4] (x)diff cleanup: remove duplicate code
On Tue, Oct 24, 2017 at 11:59 AM, Stefan Bellerwrote: > Junio wrote: > >> * I was hoping that the next_byte() and string_hash() thing, once >>they are cleaned up, will eventually be shared with the xdiff/ >>code at the lower layer, which needs to do pretty much the same >>in order to implement various whitespace ignoring options. I am >>not sure how well the approach taken by the WIP patch meshes with >>the needs of the lower layer. > > This series does exactly this; although I chose to reuse the code in > xdiff/xutils.c instead of the new fancy next_byte/string_hash, as that > code has seen more exercise already (hence I assume it has fewer bugs > if any as well as its performance implications are well understood). > > However to do so, we need to pollute xdiff/xutils.c and include > hashmap.h there (which also requires cache.h as that header has > an inline function using BUG()), which I find a bit unfortunate but > worth the trade off of using better tested code. > This, of course, applies on top of jk/diff-color-moved-fix.
Re: [PATCH 0/4] fsmonitor fixes
Hi Alex, On Thu, 19 Oct 2017, Alex Vandiver wrote: > A few fixes found from playing around with the fsmonitor branch in > next. Thank you for having a look, and even better: for sending a couple of improvements on top of Ben's & my work. I sent a couple of suggestions and hope you will find the useful? Ciao, Johannes
Re: [PATCH 0/4] pre-merge hook
Michael J Gruberwrites: > This series revives an old suggestion of mine to make merge honor > pre-commit hook or a separate pre-merge hook This seems to have become an abandoned loose end, so I'll drop the topic from my tree for now; revival of the discussion is _not_ unwelcome (aka "dropping without prejudice").
Re: [PATCH 0/4] pre-merge hook
On Sun, Oct 1, 2017 at 10:54 PM, Junio C Hamanowrote: > Michael J Gruber writes: > >> Now that f8b863598c ("builtin/merge: honor commit-msg hook for merges", >> 2017-09-07) has landed, merge is getting closer to behaving like commit, >> which is important because both are used to complete merges (automatic >> vs. non-automatic). > > Just a gentle ping to the thread to see if it is still alive. > > I think people agree that this is a good thing to do, but it seems > that the execution leaves a few loose ends, judging from the review > comments that have yet to be answered. > > Thanks. I agree. IIRC the series was ok in design and goal, just minor comments for coding. Thanks, Stefan
Re: [PATCH 0/4] pre-merge hook
Michael J Gruberwrites: > Now that f8b863598c ("builtin/merge: honor commit-msg hook for merges", > 2017-09-07) has landed, merge is getting closer to behaving like commit, > which is important because both are used to complete merges (automatic > vs. non-automatic). Just a gentle ping to the thread to see if it is still alive. I think people agree that this is a good thing to do, but it seems that the execution leaves a few loose ends, judging from the review comments that have yet to be answered. Thanks.
Re: [PATCH 0/4] Fixes from the per-repository object store series
Jonathan Niederwrites: > This is a continuation of the series at [1]. That was part 1 --- > you can think of this as part 0, since it contains the simplest and > least controversial part of the series --- each patch in this series > is a bugfix in its own right. > > Patch 1 should be familiar if you reviewed the series [1]. It is > unchanged from the patch there, except to note Peff's ack. > > Patches 2-4 have not yet visited the list but are fixes (or, in the > case of patch 4, cleanups) noticed as part of the same process of > teaching object store code to handle multiple repositories. > > We hope that splitting these out should make them easier to review > and for people to benefit from these fixes without having to wait > for the rest of the series to settle. One thing that is unclear is if you are tentatively withdrawing the longer series with this update, but I'd assume that it is the case because these two obviously will conflict with each other, and the most of what these four patches do are subset of what the larger one you sent earlier do. Thanks.
Re: [PATCH 0/4] Test name-rev with small stack
On Fri, Sep 08, 2017 at 02:33:35PM +0200, Michael J Gruber wrote: > Looking at it more closely, the solution in cbc60b6720 ("git tag > --contains: avoid stack overflow", 2014-04-24) seems to be a bit "ad > hoc" to me: > > First of all, there is more than "tag --contains" that may exceed the > stack by recursion. So I would expect the solution to be more general, > and not localised and specialised to builtin/tag.c At the time, tag was the only one using this depth-first contains algorithm. It's since been adapted to ref-filter.c, but of course the stack handling went with it. Most traversals have a date-sorted queue, so are effectively doing a breadth-first iteration with no recursion. > Second, this is a walk, so I'm wondering whether our revision walk > machinery should be the place to add missing functionality (if any). > That way, everything would benefit from possible or existing > improvements there. For example, I think some of our "extra walkers" > don't heed object replacements. (I need to test more.) It's possible that name-rev could make better use of the existing traversal machinery. It's often tough to do so, though, because that machinery gives you a linearized ordering of the commits. Whereas something like name-rev really cares about the order that it visits the commits, because it's building up the names. It's the same for this "tag --contains" traversal. It _used_ to be a series of merge-base computations. But by doing a custom traversal, we can cache incremental results through the graph and avoid walking over the same bits multiple times. There actually is a way to do it with the regular breadth-first traversal, but you have to store one bit per ref you're checking for each commit. I played around with that a bit a while ago, and it did seem to work. I can dig up the patches if you're interested. But one downside is that one bit per ref per commit adds up if you have a lot of refs. A large number of those bitfields will be the same, so you could probably get by with a copy-on-write scheme, but I never implemented that. Of course somebody may have a more clever algorithm, too. I don't claim the above is a proof. ;) -Peff
Re: [PATCH 0/4] Test name-rev with small stack
Jeff King venit, vidit, dixit 07.09.2017 16:54: > On Thu, Sep 07, 2017 at 04:02:19PM +0200, Michael J Gruber wrote: > >> name-rev segfaults for me in emacs.git with the typical 8102 stack size. >> The reason is the recursive walk that name-rev uses. >> >> This series adds a test to mark this as known failure, after some >> clean-ups. > > These all look reasonable to me. The size of the test case in the final > one is presumably arbitrary and just copied from t7004. I don't know if > it's worth trying to shrink it. It could shorten a rather expensive > test. OTOH, if we shorten it too much then we might get a false pass > (e.g., if the algorithm remains recursive but has a smaller stack > footprint). > >> Michael J Gruber (4): >> t7004: move limited stack prereq to test-lib >> t6120: test name-rev --all and --stdin >> t6120: clean up state after breaking repo >> t6120: test describe and name-rev with deep repos > > Now comes the hard part: rewriting the C code. :) Looking at it more closely, the solution in cbc60b6720 ("git tag --contains: avoid stack overflow", 2014-04-24) seems to be a bit "ad hoc" to me: First of all, there is more than "tag --contains" that may exceed the stack by recursion. So I would expect the solution to be more general, and not localised and specialised to builtin/tag.c Second, this is a walk, so I'm wondering whether our revision walk machinery should be the place to add missing functionality (if any). That way, everything would benefit from possible or existing improvements there. For example, I think some of our "extra walkers" don't heed object replacements. (I need to test more.) Michael
Re: [PATCH 0/4] Test name-rev with small stack
On Thu, Sep 07, 2017 at 04:02:19PM +0200, Michael J Gruber wrote: > name-rev segfaults for me in emacs.git with the typical 8102 stack size. > The reason is the recursive walk that name-rev uses. > > This series adds a test to mark this as known failure, after some > clean-ups. These all look reasonable to me. The size of the test case in the final one is presumably arbitrary and just copied from t7004. I don't know if it's worth trying to shrink it. It could shorten a rather expensive test. OTOH, if we shorten it too much then we might get a false pass (e.g., if the algorithm remains recursive but has a smaller stack footprint). > Michael J Gruber (4): > t7004: move limited stack prereq to test-lib > t6120: test name-rev --all and --stdin > t6120: clean up state after breaking repo > t6120: test describe and name-rev with deep repos Now comes the hard part: rewriting the C code. :) -Peff
Re: [PATCH 0/4] vcs-svn: remove repo_tree library
On Tue, Aug 22, 2017 at 4:37 PM, Jonathan Niederwrote: > Hi, > > Stefan noticed that repo_init from vcs-svn/repo_tree.h conflicts with > repository.h[1]. Earlier brian m. carlson noticed the same thing[2]. > > Originally repo_tree.h was used to manage an in-memory representation > of the state of the svn tree being imported. When that in-memory > representation was retired, we were lazy and left some utility > functions there. Here is a patch series to finish cleaning up and > remove vcs-svn/repo_tree.h completely. > > This is an alternative to bc/vcs-svn-cleanup from 'next'. Those > patches weren't cc-ed to me and I missed them --- sorry about that. I > can rebase on top of them if that is more convenient. A rebased version would be easier IIUC Junios reply to the one-off that I sent. The patches look good. Thanks, Stefan > > Thoughts of all kinds welcome, as always. > > Thanks, > Jonathan > > Jonathan Nieder (4): > vcs-svn: remove prototypes for missing functions > vcs-svn: remove custom mode constants > vcs-svn: remove repo_delete wrapper function > vcs-svn: move remaining repo_tree functions to fast_export.h > > Makefile | 1 - > vcs-svn/fast_export.c | 41 + > vcs-svn/fast_export.h | 3 +++ > vcs-svn/repo_tree.c | 48 > vcs-svn/repo_tree.h | 23 --- > vcs-svn/svndump.c | 33 - > 6 files changed, 56 insertions(+), 93 deletions(-) > delete mode 100644 vcs-svn/repo_tree.c > delete mode 100644 vcs-svn/repo_tree.h > > [1] https://public-inbox.org/git/20170822213501.5928-1-sbel...@google.com > [2] > https://public-inbox.org/git/2017082122.26729-3-sand...@crustytoothpaste.net
Re: [PATCH 0/4] dropping support for older curl
Hi Paolo, On Thu, 10 Aug 2017, Paolo Ciarrocchi wrote: > Il 10 ago 2017 11:39 AM, "Johannes Schindelin"> ha scritto: > > > > Footnote *1*: It is no secret that I find our patch submission less than > inviting. Granted, *I* use it. *I* did not have problems entering the > mailing list. But then, my mails were not swallowed silently, because my > mail program does not send HTML by default. And prepared by the German > school system (I learned the term "sugar coating" only when exposed to > some US culture), I had little emotional problems with being criticized > and not thanked for my contribution, I persisted nevertheless. The opinion > that the Git contribution process is a lot less inviting than it could be > is not only my view, by the way. I hear this a lot. I give you that we are > not quite as textbook "keep out from here unless you look like us, smell > like us, talk like us, have the same genital setup like us" as the Linux > kernel mailing list, but we are in a different universe compared to, say, > the Drupal community. And their universe is a lot nicer to live in. > > > Isn't SumbitGit a possible answer to your doubts (I strongly agree with > you) about the current development process? No. I hate to say that SubmitGit neither integrates well with GitHub Pull Requests (code comments on GitHub are approximately 1,523x easier to write, read, associate with the actual code, and see the current state of, compared to the mailing list, and SubmitGit does not even hint at integrating with that user experience). Also, the barrier to start using SubmitGit is rather high. If you open a Pull Request on github.com/git/git, you get *no* indication that SubmitGit is an option to *actually* get the code into Git. There are also concerns about required permissions that Junio Hamano himself would not accept. Now, let's assume that you submitted the code via SubmitGit. The challenges of the patch submission process do not end there, yet SubmitGit goes home and has a beer. But the hard part, the discussions on the mailing list, the status updates in the completely separate What's cooking mails, the missing links back to the original source code (let alone the information in which worktree on your computer under which branch name that topic was developed again?), the diverging mail threads, the "rerolls" that should not forget to Cc: all reviewers of previous rounds, all that jazz is still all very, very manual. And even if there was an easier path from having a local branch that works to finally getting it onto the list in the required form, your mail is an eloquent example of one of the most preposterous hurdles along the way: we pride ourselves with the openness we demonstrate by communicating via a mailing list, everybody has a mail address, amirite? But of course, HTML mails, like, about 130% of all mails on the internet (or so it feels), including yours, are dropped. Silently. Not so open anymore, are we. It is all so frustrating, really. I work in a team (Visual Studio Team Services, you can think of it as kind of a Microsoft take on Git hosting plus a lot of other tooling) where we get really, really positive feedback regarding our user experience, in particular the frequent enhancements to PRs. It is really powerful to open, review and merge PRs, interact with other developers, see the state of PRs and their respective discussions, open and resolve issues, automate workflows and build tasks [*1*]. And then I try to convince people here on this mailing list that it really makes a difference if you start using tools to lighten the load of everybody, and... little changes. At least thanks to Lars Schneider's incredible efforts we have Continuous Testing (I know how much time he spent on this, it is another thing deserving the label "preposterous", and I know how much time I spent on top to add the Windows part which mostly works). If only we could push it further to true Continuous Integration. Or at least to accepting PRs from professionals who simply have no time to fight with our patch contribution process and whose expertise we lose as a consequence. Ciao, Johannes Footnote *1*: The funniest part about this is that I do get mails about all of this all the time. When I am pulled in as a reviewer. When a build failed. When a previously failing task was fixed by a new build. When somebody responded to my comments. The difference to the mailing list-centric approach is of course that those mails are only notifications, and link back to the tool appropriately supporting what I, the user, want to get done.
Re: [PATCH 0/4] dropping support for older curl
On Thu, Aug 10, 2017 at 07:09:02PM -0400, Jeff King wrote: > The first is "should we eventually drop support for antiquated versions > of dependencies?". And the argument in favor is the one I was making > here: besides lowering maintenance cost, it is more honest to our users > about what to expect[1]. As usual, I forgot all my footnotes. [1] When I've talked about keeping expectations reasonable, I'm a lot less interested in "oops, I built Git and this particular feature didn't work". It's easy to write that off as "well, you have an old version of curl, patches welcome". I'm much more concerned about security issues. Curl is network-facing. Leaving aside security vulnerabilities in curl itself (which hopefully distros with 10-year support periods would fix), I wouldn't be surprised if there are bad interactions possible due to our tangle of ifdefs. One way to address that would be more careful auditing. But then that goes back to the cost/benefit thing. > One is to do it by date and what dependencies are in long-term OS > releases, and then compare that to the benefit. Requiring curl 7.11.1 > still keeps us working back to rhel4, which was already end-of-lifed > completely after a 12 year run. Bumping to 7.16.0 drops rhel4 and rhel5, > the latter of which is in its final "barely supported" phase after 10 > years. But it gives us a bit more bang for our buck by making CURL_MULTI > uconditional[2]. Requiring 7.19.4 actually doesn't drop any more rhel > releases. So by that metric, we might as well go there. [2] The line-count change from dropping CURL_MULTI isn't _too_ exciting. But a lot of the tangled design of our http code revolves around the abstractions we've introduced. I have a feeling that it will enable further cleanups as we move forward (OTOH, a lot of the worst parts of our design are because of _using_ curl_multi for dumb http, which of course hardly anyone does these days. But I have a feeling if I suggested removing that, people would really scream). -Peff
Re: [PATCH 0/4] dropping support for older curl
On 11/08/17 01:23, Jeff King wrote: On Fri, Aug 11, 2017 at 01:17:51AM +0200, Tom G. Christensen wrote: OK, thanks for double-checking. I'm still puzzled why your build succeeds and mine does not. I know what's going on now and it's so simple. Red Hats version of curl 7.15.5 includes a number of patches including one that backports support for CURLPROTO_* (as part of a fix for CVE-2009-0037). I haven't checked el6 but I would not be surprised if there where similar things going on there. el6 should have it already as part of 7.19.7, right? Yes of course. So in conclusion version based #ifdefs are misleading when used with curl as shipped with RHEL. Yeah, that's certainly an interesting finding. In this case your builds are missing out on redirect protection that we _could_ be providing. Yes and I'm looking into that right now. If we do keep the compat ifdefs around this feature, it may be worth converting them to "#ifdef CURLPROTO_HTTP" to more directly check the feature. Yes, a feature test would be better. -tgc
Re: [PATCH 0/4] dropping support for older curl
On Fri, Aug 11, 2017 at 01:17:51AM +0200, Tom G. Christensen wrote: > > OK, thanks for double-checking. I'm still puzzled why your build > > succeeds and mine does not. > > I know what's going on now and it's so simple. > Red Hats version of curl 7.15.5 includes a number of patches including one > that backports support for CURLPROTO_* (as part of a fix for CVE-2009-0037). > I haven't checked el6 but I would not be surprised if there where similar > things going on there. el6 should have it already as part of 7.19.7, right? > So in conclusion version based #ifdefs are misleading when used with curl as > shipped with RHEL. Yeah, that's certainly an interesting finding. In this case your builds are missing out on redirect protection that we _could_ be providing. If we do keep the compat ifdefs around this feature, it may be worth converting them to "#ifdef CURLPROTO_HTTP" to more directly check the feature. -Peff
Re: [PATCH 0/4] dropping support for older curl
On 11/08/17 00:54, Jeff King wrote: On Fri, Aug 11, 2017 at 12:23:42AM +0200, Tom G. Christensen wrote: Er, sorry if I'm being dense, but how? Are you suggesting that by removing the callsite of get_curl_allowed_protocols(), the compiler might elide the now-dead code completely? I could certainly see it being dropped after the compilation, but I'm surprised that it wouldn't complain about the undeclared identifiers in the first place. You're right, that should not be able to handle it. Can you please double-check that you're building against the correct version of curl, and that you are building the HTTP parts of Git (which _are_ optional, and the test suite will pass without them). I use a mock buildroot and there is no other curl than the vendor supplied 7.15.5 installed: [...] OK, thanks for double-checking. I'm still puzzled why your build succeeds and mine does not. I know what's going on now and it's so simple. Red Hats version of curl 7.15.5 includes a number of patches including one that backports support for CURLPROTO_* (as part of a fix for CVE-2009-0037). I haven't checked el6 but I would not be surprised if there where similar things going on there. So in conclusion version based #ifdefs are misleading when used with curl as shipped with RHEL. -tgc
Re: [PATCH 0/4] dropping support for older curl
On Thu, Aug 10, 2017 at 03:17:06PM -0700, Junio C Hamano wrote: > Jeff Kingwrites: > > > On Thu, Aug 10, 2017 at 11:36:41AM +0200, Johannes Schindelin wrote: > > > >> Hopefully I had better luck expressing my concerns this time? > > > > I understand your argument much better now. I'm still not sure I agree. > > > > -Peff > > I do not think "there are a dozen #ifdefs and I don't know whether > they still work. I don't know whether anybody (who most likely has > better things to do than read the Git mailing list) is still using > those. So let's just remove them." was why you were suggesting to > clean up the (apparent) support of older curl in the code, though. > > Isn't the reason why your series simplifies these #ifdefs away > because we by accident started using some features that require a > version that is even newer than any of these #ifdef's try to cater > to and yet nobody complained? That is a lot more similar to the > removal of rsync transport that happened in a not so distant past, > where the reason for removal was "We have been shipping code that > couldn't have possibly worked for some time and nobody complained > ---we know nobody is depending on it." I think there are two questions to be asked, and their answers come from different lines of reasoning. The first is "should we eventually drop support for antiquated versions of dependencies?". And the argument in favor is the one I was making here: besides lowering maintenance cost, it is more honest to our users about what to expect[1]. The second is "how far back should we keep support?". And there are two lines of thought there. One is to do it by date and what dependencies are in long-term OS releases, and then compare that to the benefit. Requiring curl 7.11.1 still keeps us working back to rhel4, which was already end-of-lifed completely after a 12 year run. Bumping to 7.16.0 drops rhel4 and rhel5, the latter of which is in its final "barely supported" phase after 10 years. But it gives us a bit more bang for our buck by making CURL_MULTI uconditional[2]. Requiring 7.19.4 actually doesn't drop any more rhel releases. So by that metric, we might as well go there. And the second line of thought is: it was already broken and nobody reported it or offered up a fix. And that's where the "similar to rsync" thing comes in. Though in this case we do have some evidence that people (at least Tom) was patching and distributing behind the scenes. And our breakage period was much shorter (since v2.12.0, but that's only months or so). -Peff
Re: [PATCH 0/4] dropping support for older curl
On Fri, Aug 11, 2017 at 12:23:42AM +0200, Tom G. Christensen wrote: > > > I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems > > > at > > > all neither with building nor with running the testsuite. > > > > As you can see, this does not compile for me. What's going on? > > > The call site for get_curl_allowed_protocols() in http.c is still protected > by an #if: > #if LIBCURL_VERSION_NUM >= 0x071304 > curl_easy_setopt(result, CURLOPT_REDIR_PROTOCOLS, > get_curl_allowed_protocols(0)); > curl_easy_setopt(result, CURLOPT_PROTOCOLS, > get_curl_allowed_protocols(-1)); > #else > warning("protocol restrictions not applied to curl redirects > because\n" > "your curl version is too old (>= 7.19.4)"); > #endif > > > I don't see how it could work, as CURLPROTO_HTTP is not defined at all > > in that version of curl. > > Indeed but the #if will handle that. Er, sorry if I'm being dense, but how? Are you suggesting that by removing the callsite of get_curl_allowed_protocols(), the compiler might elide the now-dead code completely? I could certainly see it being dropped after the compilation, but I'm surprised that it wouldn't complain about the undeclared identifiers in the first place. And if that _is_ what is happening...that seems like a very fragile and unportable thing to be depending on. > > Can you please double-check that you're > > building against the correct version of curl, and that you are building > > the HTTP parts of Git (which _are_ optional, and the test suite will > > pass without them). > > I use a mock buildroot and there is no other curl than the vendor supplied > 7.15.5 installed: > [...] OK, thanks for double-checking. I'm still puzzled why your build succeeds and mine does not. > > I saw that, too. But as I understand it, they provide no code updates: > > no bugfixes and no security updates. They just promise to answer the > > phone and help you with troubleshooting. It's possible my perception is > > wrong, though; I'm certainly not one of their customers. > > I am refering to the Extended Life-cycle Support product (ELS), which > promises: > "the ELS Add-On delivers certain critical-impact security fixes and selected > urgent priority bug fixes and troubleshooting for the last minor release" > > The full description is here: > https://access.redhat.com/support/policy/updates/errata#Extended_Life_Cycle_Phase That was the same page I was looking at. The bit I read was: For versions of products in the Extended Life Phase, Red Hat will provide limited ongoing technical support. No bug fixes, security fixes, hardware enablement or root-cause analysis will be available during this phase, and support will be provided on existing installations only. But I missed the bit about the "ELS add-on" below there, which I guess is an extra thing. I do suspect that "install arbitrary new versions of Git" is outside of their scope of "urgent priority bug fixes". But in a sense it doesn't really matter. What is much more interesting is whether there's a significant population that is running RHEL5 and has a strong need for newer versions of Git. That I'm not sure about. -Peff
Re: [PATCH 0/4] dropping support for older curl
On 10/08/17 23:32, Jeff King wrote: On Thu, Aug 10, 2017 at 10:33:18PM +0200, Tom G. Christensen wrote: You've totally ignored the argument I made back then[1], and which I reiterated in this thread. So I'll say it one more time: the more compelling reason is not the #ifdefs, but the fact that the older versions are totally untested. Perhaps you forgot but I stated in the original thread that I build RPMS for RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite every single time. I didn't forget. I actually double-checked the patches you sent at the time, but I didn't see one for the CURLPROTO issue. And indeed, it is still broken for me: $ cd /path/to/curl/repo $ git checkout curl-7_15_5 $ ./buildconf && ./configure --prefix=/tmp/foo && make install $ cd /path/to/git $ git checkout v2.14.0 $ make CURLDIR=/tmp/foo V=1 http.o gcc -o http.o -c -MF ./.depend/http.o.d -MQ http.o -MMD -MP -g -O0 -Wall -Werror -Wdeclaration-after-statement -Wpointer-arith -Wstrict-prototypes -Wvla -Wold-style-declaration -Wold-style-definition -Wno-error -Wno-cpp -Wno-unused-value -Wno-strict-prototypes -I. -DUSE_LIBPCRE1 -DHAVE_ALLOCA_H -I/tmp/foo/include -DUSE_CURL_FOR_IMAP_SEND -DNO_GETTEXT -DSHA1_DC -DSHA1DC_NO_STANDARD_INCLUDES -DSHA1DC_INIT_SAFE_HASH_DEFAULT=0 -DSHA1DC_CUSTOM_INCLUDE_SHA1_C="\"cache.h\"" -DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_C="\"sha1dc_git.c\"" -DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_H="\"sha1dc_git.h\"" -DSHA1DC_CUSTOM_INCLUDE_UBC_CHECK_C="\"git-compat-util.h\"" -DHAVE_PATHS_H -DHAVE_DEV_TTY -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM -DFREAD_READS_DIRECTORIES -DNO_STRLCPY -DSHELL_PATH=' "/bin/sh"' -DPAGER_ENV='"LESS=FRX LV=-c"' http.c http.c: In function ‘get_curl_allowed_protocols’: http.c:685:24: error: ‘CURLPROTO_HTTP’ undeclared (first use in this function); did you mean ‘CURLPROXY_HTTP’? allowed_protocols |= CURLPROTO_HTTP; ^~ CURLPROXY_HTTP [and so on] I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems at all neither with building nor with running the testsuite. As you can see, this does not compile for me. What's going on? The call site for get_curl_allowed_protocols() in http.c is still protected by an #if: #if LIBCURL_VERSION_NUM >= 0x071304 curl_easy_setopt(result, CURLOPT_REDIR_PROTOCOLS, get_curl_allowed_protocols(0)); curl_easy_setopt(result, CURLOPT_PROTOCOLS, get_curl_allowed_protocols(-1)); #else warning("protocol restrictions not applied to curl redirects because\n" "your curl version is too old (>= 7.19.4)"); #endif I don't see how it could work, as CURLPROTO_HTTP is not defined at all in that version of curl. Indeed but the #if will handle that. Can you please double-check that you're building against the correct version of curl, and that you are building the HTTP parts of Git (which _are_ optional, and the test suite will pass without them). I use a mock buildroot and there is no other curl than the vendor supplied 7.15.5 installed: # pwd /var/lib/mock/jrpms-el5-x86_64/root # find . -name 'curlver.h' ./usr/include/curl/curlver.h # grep LIBCURL_VERSION_NUM ./usr/include/curl/curlver.h parsing and comparions by programs. The LIBCURL_VERSION_NUM define will #define LIBCURL_VERSION_NUM 0x070f05 # [root@c5-32bit-01 ~]# rpm -q git git-2.14.1-1.el5.jr [root@c5-32bit-01 ~]# ldd /usr/libexec/git-core/git-http-fetch |grep libcurl libcurl.so.3 => /usr/lib/libcurl.so.3 (0x001e7000) [root@c5-32bit-01 ~]# rpm -qf /usr/lib/libcurl.so.3 curl-7.15.5-17.el5_9 [root@c5-32bit-01 ~]# git --version git version 2.14.1 [root@c5-32bit-01 ~]# git clone https://github.com/tgc/tgcware-for-solaris.git Cloning into 'tgcware-for-solaris'... warning: protocol restrictions not applied to curl redirects because your curl version is too old (>= 7.19.4) remote: Counting objects: 2793, done. remote: Total 2793 (delta 0), reused 0 (delta 0), pack-reused 2793 Receiving objects: 100% (2793/2793), 780.88 KiB | 639.00 KiB/s, done. Resolving deltas: 100% (1233/1233), done. [root@c5-32bit-01 ~]# I also won't claim any absolutes. I think we all agree this is a cost/benefit tradeoff. But there are a lot of options for building on a very old system. For instance, building without http if you don't need it. Or building a more recent libcurl (and even linking statically for simplicity). Of course that is always an option but it does complicate things. I'd find arguments against the latter more compelling if recent curl were hard to compile on old systems. I don't know whether that's the case (certainly on a modern system, it's much easier to get newer versions of curl to compile than older ones). I have no experience with building curl on older Linux systems. I know that I can build it on old Solaris releases but that is not
Re: [PATCH 0/4] dropping support for older curl
Jeff Kingwrites: > On Thu, Aug 10, 2017 at 11:36:41AM +0200, Johannes Schindelin wrote: > >> Hopefully I had better luck expressing my concerns this time? > > I understand your argument much better now. I'm still not sure I agree. > > -Peff I do not think "there are a dozen #ifdefs and I don't know whether they still work. I don't know whether anybody (who most likely has better things to do than read the Git mailing list) is still using those. So let's just remove them." was why you were suggesting to clean up the (apparent) support of older curl in the code, though. Isn't the reason why your series simplifies these #ifdefs away because we by accident started using some features that require a version that is even newer than any of these #ifdef's try to cater to and yet nobody complained? That is a lot more similar to the removal of rsync transport that happened in a not so distant past, where the reason for removal was "We have been shipping code that couldn't have possibly worked for some time and nobody complained ---we know nobody is depending on it." Or "We accidentally started shipping code with comma after the last element of enum decl and nobody compalined---everybody's compiler must be ready" ;-)
Re: [PATCH 0/4] dropping support for older curl
On Thu, Aug 10, 2017 at 11:36:41AM +0200, Johannes Schindelin wrote: > Hopefully I had better luck expressing my concerns this time? I understand your argument much better now. I'm still not sure I agree. -Peff
Re: [PATCH 0/4] dropping support for older curl
On Thu, Aug 10, 2017 at 10:33:18PM +0200, Tom G. Christensen wrote: > > You've totally ignored the argument I made back then[1], and which I > > reiterated in this thread. So I'll say it one more time: the more > > compelling reason is not the #ifdefs, but the fact that the older > > versions are totally untested. > > Perhaps you forgot but I stated in the original thread that I build RPMS for > RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite every > single time. I didn't forget. I actually double-checked the patches you sent at the time, but I didn't see one for the CURLPROTO issue. And indeed, it is still broken for me: $ cd /path/to/curl/repo $ git checkout curl-7_15_5 $ ./buildconf && ./configure --prefix=/tmp/foo && make install $ cd /path/to/git $ git checkout v2.14.0 $ make CURLDIR=/tmp/foo V=1 http.o gcc -o http.o -c -MF ./.depend/http.o.d -MQ http.o -MMD -MP -g -O0 -Wall -Werror -Wdeclaration-after-statement -Wpointer-arith -Wstrict-prototypes -Wvla -Wold-style-declaration -Wold-style-definition -Wno-error -Wno-cpp -Wno-unused-value -Wno-strict-prototypes -I. -DUSE_LIBPCRE1 -DHAVE_ALLOCA_H -I/tmp/foo/include -DUSE_CURL_FOR_IMAP_SEND -DNO_GETTEXT -DSHA1_DC -DSHA1DC_NO_STANDARD_INCLUDES -DSHA1DC_INIT_SAFE_HASH_DEFAULT=0 -DSHA1DC_CUSTOM_INCLUDE_SHA1_C="\"cache.h\"" -DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_C="\"sha1dc_git.c\"" -DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_H="\"sha1dc_git.h\"" -DSHA1DC_CUSTOM_INCLUDE_UBC_CHECK_C="\"git-compat-util.h\"" -DHAVE_PATHS_H -DHAVE_DEV_TTY -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM -DFREAD_READS_DIRECTORIES -DNO_STRLCPY -DSHELL_PATH='"/bin/sh"' -DPAGER_ENV='"LESS=FRX LV=-c"' http.c http.c: In function ‘get_curl_allowed_protocols’: http.c:685:24: error: ‘CURLPROTO_HTTP’ undeclared (first use in this function); did you mean ‘CURLPROXY_HTTP’? allowed_protocols |= CURLPROTO_HTTP; ^~ CURLPROXY_HTTP [and so on] > I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems at > all neither with building nor with running the testsuite. As you can see, this does not compile for me. What's going on? I don't see how it could work, as CURLPROTO_HTTP is not defined at all in that version of curl. Can you please double-check that you're building against the correct version of curl, and that you are building the HTTP parts of Git (which _are_ optional, and the test suite will pass without them). > > So IMHO this is about being honest with users about which versions we > > _actually_ support. > > I have no problem with you wanting to drop support for older curl releases > (such as 7.15.5) but don't use the argument that it doesn't currently build > and nobody cares. My argument isn't quite that nobody cares. It's that we do users a disservice by shipping a version of the code that very well may have hidden problems like security holes (for instance, we do not handle redirects safely in old versions of curl). So if you can get it to build it may _seem_ fine, but it's a bit of a booby-trap waiting to spring. I also won't claim any absolutes. I think we all agree this is a cost/benefit tradeoff. But there are a lot of options for building on a very old system. For instance, building without http if you don't need it. Or building a more recent libcurl (and even linking statically for simplicity). I'd find arguments against the latter more compelling if recent curl were hard to compile on old systems. I don't know whether that's the case (certainly on a modern system, it's much easier to get newer versions of curl to compile than older ones). > Also FWIW Red Hat continues to support RHEL 5 with the Extended Life-cycle > Support program until 2020-11-30. I saw that, too. But as I understand it, they provide no code updates: no bugfixes and no security updates. They just promise to answer the phone and help you with troubleshooting. It's possible my perception is wrong, though; I'm certainly not one of their customers. -Peff
Re: [PATCH 0/4] dropping support for older curl
On 09/08/17 23:47, Jeff King wrote: On Wed, Aug 09, 2017 at 11:42:12PM +0200, Johannes Schindelin wrote: I mean, if we even go out of our way to support the completely outdated and obsolete .git/branches/ for what is likely a single user, it may not be the worst to keep those couple of #ifdef guards to keep at least nominal support for older cURLs? You've totally ignored the argument I made back then[1], and which I reiterated in this thread. So I'll say it one more time: the more compelling reason is not the #ifdefs, but the fact that the older versions are totally untested. Perhaps you forgot but I stated in the original thread that I build RPMS for RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite every single time. I currently have 2.13.3 up for el4, el5, el6 and el7. Only el4 requires any patches, the rest will build out of the box with the vendor supplied version of curl. The plan was to drop the el4 builds for 2.14.0 to get rid of the patches. In fact, they do not even compile, and yet I have not seen any patches to fix that. I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems at all neither with building nor with running the testsuite. So IMHO this is about being honest with users about which versions we _actually_ support. I have no problem with you wanting to drop support for older curl releases (such as 7.15.5) but don't use the argument that it doesn't currently build and nobody cares. Also FWIW Red Hat continues to support RHEL 5 with the Extended Life-cycle Support program until 2020-11-30. -tgc
Re: [PATCH 0/4] dropping support for older curl
[I am resending this since the original does not seem to have made it to the list, at least I cannot find it in any archives] On 09/08/17 23:47, Jeff King wrote: On Wed, Aug 09, 2017 at 11:42:12PM +0200, Johannes Schindelin wrote: I mean, if we even go out of our way to support the completely outdated and obsolete .git/branches/ for what is likely a single user, it may not be the worst to keep those couple of #ifdef guards to keep at least nominal support for older cURLs? You've totally ignored the argument I made back then[1], and which I reiterated in this thread. So I'll say it one more time: the more compelling reason is not the #ifdefs, but the fact that the older versions are totally untested. Perhaps you forgot but I stated in the original thread that I build RPMS for RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite every single time. I currently have 2.13.3 up for el4, el5, el6 and el7. Only el4 requires any patches, the rest will build out of the box with the vendor supplied version of curl. The plan was to drop the el4 builds for 2.14.0 to get rid of the patches. In fact, they do not even compile, and yet I have not seen any patches to fix that. I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems at all neither with building nor with running the testsuite. So IMHO this is about being honest with users about which versions we _actually_ support. I have no problem with you wanting to drop support for older curl releases (such as 7.15.5) but don't use the argument that it doesn't currently build and nobody cares. Also FWIW Red Hat continues to support RHEL 5 with the Extended Life-cycle Support program until 2020-11-30. -tgc