Re: [PATCH 0/4]

2018-12-07 Thread Junio C Hamano
Stefan Beller  writes:

> A couple days before the 2.19 release we had a bug report about
> broken submodules[1] and reverted[2] the commits leading up to them.
>
> The behavior of said bug fixed itself by taking a different approach[3],
> specifically by a weaker enforcement of having `core.worktree` set in a
> submodule [4].
>
> The revert [2] was overly broad as we neared the release, such that we wanted
> to rather keep the known buggy behavior of always having `core.worktree` set,
> rather than figuring out how to fix the new bug of having 'git submodule 
> update'
> not working in old style repository setups.
>
> This series re-introduces those reverted patches, with no changes in code,
> but with drastically changed commit messages, as those focus on why it is safe
> to re-introduce them instead of explaining the desire for the change.

The above was a bit too cryptic for me to grok, so let me try
rephrasing to see if I got them all correctly.

 - three-patch series leading to 984cd77ddb were meant to fix some
   bug, but the series itself was buggy and caused problems; we got
   rid of them

 - the problem 984cd77ddb wanted to fix was fixed differently
   without reintroducing the problem three-patch series introduced.
   That fix is already with us since 4d6d6ef1fc.

 - now these three changes that were problematic in the past is
   resent without any update (other than that it has one preparatory
   patch to add tests).

Is that what is going on?  Obviously I am not getting "the other"
benefit we wanted to gain out of these three patches (because the
above description fails to explain what that is), other than to fix
the issue that was fixed by 4d6d6ef1fc.

Sorry for being puzzled...

> [1] https://public-inbox.org/git/2659750.rG6xLiZASK@twilight
> [2] f178c13fda (Revert "Merge branch 'sb/submodule-core-worktree'", 
> 2018-09-07)
> [3] 4d6d6ef1fc (Merge branch 'sb/submodule-update-in-c', 2018-09-17)
> [4] 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree by 
> ensure-core-worktree, 2018-08-13)
>
> Stefan Beller (4):
>   submodule update: add regression test with old style setups
>   submodule: unset core.worktree if no working tree is present
>   submodule--helper: fix BUG message in ensure_core_worktree
>   submodule deinit: unset core.worktree
>
>  builtin/submodule--helper.c|  4 +++-
>  submodule.c| 14 ++
>  submodule.h|  2 ++
>  t/lib-submodule-update.sh  |  5 +++--
>  t/t7400-submodule-basic.sh |  5 +
>  t/t7412-submodule-absorbgitdirs.sh |  7 ++-
>  6 files changed, 33 insertions(+), 4 deletions(-)


Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering

2018-10-31 Thread Johannes Schindelin
Hi Junio,

On Wed, 31 Oct 2018, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" 
> writes:
> 
> > An alternative approach which was rejected at the time (because it
> > interfered with the then-ongoing work to compile Git for Windows using
> > MS Visual C++) would patch the make_environment_block() function to
> > skip the specified environment variables before handing down the
> > environment block to the spawned process. Currently it would interfere
> > with the mingw-utf-8-env patch series I sent earlier today
> > [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com].
> 
> So the rejected approach that was not used in this series would
> interfere with the other one, but I do not have to worry about it
> because this series does not use that approach?  I had to read the six
> lines above twice to figure out that it essentially is saying "I
> shouldn't care", but please let me know if I misread the paragraph and I
> need to care ;-)

Well, you might want to worry about it. Or not.

The approach taken by this patch series is to call `unsetenv()` for the
variable names listed in `core.unsetenvvars` (if any), just before
spawning off the first process (if any).

What I meant by this comment in the cover letter is that I thought about
doing it differently. We already have a perfectly fine function called
`make_environment_block()` that takes a "deltaenv", and then constructs a
new environment block from the current environment plus deltaenv. And this
would be an obvious alternative place to "unset" the variables, as this
function is only called just before spawning new processes.

I was weighing both options, and both back then as right now, there are
other patches in flight that conflict with the second approach, so the
first approach is what I went with.

>From a purely aesthetic point of view, the second approach looks nicer, as
it really only affects the spawned processes (and not the current one),
and it allows for changing the list between spawning processes.

But to do it right, I would also have to use a hash set, and it would
complicate the code of `make_environment_block()` even further. And it
sounds a bit like overkill to me, too.

So I sided with the approach presented in the current revision of the
patch series, but I wanted to describe the other approach in case you (or
other reviewers) have a different opinion.

> > While at it, this patch series also cleans up house and moves the
> > Windows-specific core.* variable handling to compat/mingw.c rather
> > than cluttering environment.c and config.c with things that e.g.
> > developers on Linux do not want to care about.
> 
> Or Macs.

Or macOS. Or FreeBSD. Or Irix. Or any other, that's right ;-)

Traditionally, we did not really care all that much about platforms other
than Linux, though, which is what made me write what I wrote. Having said
that, I get the impression that this is changing slowly. The benefits are
pretty clear, too. After all, it was a *Windows* build failure recently
that let Peff identify and fix a *non-Windows* bug...

> When I skimmed this series earlier, I found that patches 2 & 3 sensibly
> implemented to achieve this goal.

Thanks!
Dscho

> 
> >
> > Johannes Schindelin (4):
> >   config: rename `dummy` parameter to `cb` in git_default_config()
> >   Allow for platform-specific core.* config settings
> >   Move Windows-specific config settings into compat/mingw.c
> >   mingw: unset PERL5LIB by default
> >
> >  Documentation/config.txt |  6 
> >  cache.h  |  8 -
> >  compat/mingw.c   | 58 +++-
> >  compat/mingw.h   |  3 ++
> >  config.c | 18 ---
> >  environment.c|  1 -
> >  git-compat-util.h|  8 +
> >  t/t0029-core-unsetenvvars.sh | 30 +++
> >  8 files changed, 109 insertions(+), 23 deletions(-)
> >  create mode 100755 t/t0029-core-unsetenvvars.sh
> >
> >
> > base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
> > Published-As: 
> > https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
> > pr-62/dscho/perl5lib-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/62
> 


Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering

2018-10-30 Thread Junio C Hamano
"Johannes Schindelin via GitGitGadget" 
writes:

> An alternative approach which was rejected at the time (because it
> interfered with the then-ongoing work to compile Git for Windows using MS
> Visual C++) would patch the make_environment_block() function to skip the
> specified environment variables before handing down the environment block to
> the spawned process. Currently it would interfere with the mingw-utf-8-env
> patch series I sent earlier today
> [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com].

So the rejected approach that was not used in this series would
interfere with the other one, but I do not have to worry about it
because this series does not use that approach?  I had to read the
six lines above twice to figure out that it essentially is saying "I
shouldn't care", but please let me know if I misread the paragraph
and I need to care ;-)

> While at it, this patch series also cleans up house and moves the
> Windows-specific core.* variable handling to compat/mingw.c rather than
> cluttering environment.c and config.c with things that e.g. developers on
> Linux do not want to care about.

Or Macs.  When I skimmed this series earlier, I found that patches 2
& 3 sensibly implemented to achieve this goal.

>
> Johannes Schindelin (4):
>   config: rename `dummy` parameter to `cb` in git_default_config()
>   Allow for platform-specific core.* config settings
>   Move Windows-specific config settings into compat/mingw.c
>   mingw: unset PERL5LIB by default
>
>  Documentation/config.txt |  6 
>  cache.h  |  8 -
>  compat/mingw.c   | 58 +++-
>  compat/mingw.h   |  3 ++
>  config.c | 18 ---
>  environment.c|  1 -
>  git-compat-util.h|  8 +
>  t/t0029-core-unsetenvvars.sh | 30 +++
>  8 files changed, 109 insertions(+), 23 deletions(-)
>  create mode 100755 t/t0029-core-unsetenvvars.sh
>
>
> base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
> Published-As: 
> https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1
> Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
> pr-62/dscho/perl5lib-v1
> Pull-Request: https://github.com/gitgitgadget/git/pull/62


Re: [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking

2018-10-30 Thread Junio C Hamano
Geert Jansen  writes:

> Maybe a better option would be to check for the non-existence of the [00-ff]
> directories under .git/objects.

Please do not do this; I expect many people do this before they
leave work, just like I do:

$ git repack -a -d -f --window=$largs --depth=$small
$ git prune

which would typically leave only info/ and pack/ subdirectories
under .git/objects/ directory.


Re: [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking

2018-10-29 Thread Geert Jansen
On Sun, Oct 28, 2018 at 10:50:19PM +, Ævar Arnfjörð Bjarmason wrote:

> I left the door open for that in the new config option 4/4 implements,
> but I suspect for Geert's purposes this is something he'd prefer to
> turn off in git on clone entirely, i.e. because it may be running on
> some random Amazon's customer's EFS instance, and they won't know
> about this new core.checkCollisions option.
> 
> But maybe I'm wrong about that and Geert is happy to just turn on
> core.checkCollisions=false and use this series instead.

I think that the best user experience would probably be if git were fast by
default without having to give up on (future) security by removing the sha1
collision check.  Maybe core.checkCollisons could default to "on" only when
there's no loose objects in the repository? That would give a fast experience
for many common cases (git clone, git init && git fetch) while still doing the
collision check when relevant.

My patch used the --cloning flag as an approximation of "no loose objects".
Maybe a better option would be to check for the non-existence of the [00-ff]
directories under .git/objects.


Re: [PATCH 0/4] Bloom filter experiment

2018-10-17 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

>> This is all to say: having a maximum size is good. 512 is big enough
>> to cover _most_ commits, but not so big that we may store _really_ big
>> filters.
>
> Makes sense. 512 is good enough to hardcode initially, but I couldn't
> tell from briefly skimming the patches if it was possible to make this
> size dynamic per-repo when the graph/filter is written.
>
> I.e. we might later add some discovery step where we look at N number of
> commits at random, until we're satisfied that we've come up with some
> average/median number of total (recursive) tree entries & how many tend
> to be changed per-commit.
>
> I.e. I can imagine repositories (with some automated changes) where we
> have 10k files and tend to change 1k per commit, or ones with 10k files
> where we tend to change just 1-10 per commit, which would mean a
> larger/smaller filter would be needed / would do.

I was more interested to find out what the advice our docs should
give to the end users to tune the value once such a knob is
invented, and what you gave in the above paragraphs may lead us to a
nice auto-tuning heuristics.



Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Jonathan Tan
> | Implementation | Queries | Maybe | FP # | FP %  |
> ||-|---|--|---|
> | Szeder | 66095   | 1142  | 256  | 0.38% |
> | Jonathan   | 66459   | 107   | 89   | 0.16% |
> | Stolee | 53025   | 492   | 479  | 0.90% |
> 
> (Note that we must have used different starting points, which is why my 
> "Queries" is so much smaller.)

I suspect it's because your bloom filter implementation covers only the
first parent (if I'm understanding get_bloom_filter() correctly). When I
only covered the first parent in my initial test (see patch 2 of [1]), I
got (following the columns in the table above):

  53096 107 89 0.001676

Also, I think that the rejecting power (Queries - Maybe)/(Total tree
comparisons if no bloom filters were used) needs to be in the evaluation
criteria somewhere, as that indicates how many tree comparisons we
managed to avoid.

Also, we probably should also test on a file that changes more
frequently :-)

[1] https://public-inbox.org/git/cover.1539219248.git.jonathanta...@google.com/

> The increase in false-positive percentage is expected in my 
> implementation. I'm using the correct filter sizes to hit a <1% FP 
> ratio. This could be lowered by changing the settings, and the size 
> would dynamically grow. For my Git repo (which contains 
> git-for-windows/git and microsoft/git) this implementation grows the 
> commmit-graph file from 5.8 MB to 7.3 MB (1.5 MB total, compared to 
> Szeder's 8MB filter). For 105,260 commits, that rounds out to less than 
> 20 bytes per commit (compared to Jonathan's 256 bytes per commit).

Mine has 256 bits per commit, which is 32 bytes per commit (still more
than yours).

Having said all that, thanks for writing up your version - in
particular, variable sized filters (like in yours) seem to be the way to
go.

> We'll see how much time I have to do this polish, but I think the 
> benefit is proven.

Agreed.


Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Derrick Stolee

On 10/16/2018 8:57 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Oct 16 2018, Derrick Stolee wrote:


On 10/16/2018 12:45 AM, Junio C Hamano wrote:

Derrick Stolee  writes:


2. The filters are sized according to the number of changes in each
commit, with a minimum of one 64-bit word.
...
6. When we compute the Bloom filters, we don't store a filter for
commits whose first-parent diff has more than 512 paths.

Just being curious but was 512 taken out of thin air or is there
some math behind it, e.g. to limit false positive rate down to
certain threshold?  With a wide-enough bitset, you could store
arbitrary large number of paths with low enough false positive, I
guess, but is there a point where there is too many paths in the
change that gives us diminishing returns and not worth having a
filter in the first place?

512 is somewhat arbitrary, but having a maximum size is not.

In a normal source-code-control context, the set of paths modified
by any single commit ought to be a small subset of the entire paths,
and whole-tree changes ought to be fairly rare.  In a project for
which that assumption does not hold, it might help to have a
negative bloom filter (i.e. "git log -- A" asks "does the commit
modify A?" and the filter would say "we know it does not, because we
threw all the paths that are not touched to the bloom filter"), but
I think that would optimize for a wrong case.

A commit with many changed paths is very rare. The 512 I picked above
is enough to cover 99% of commits in each of the repos I sampled when
first investigating Bloom filters.

When a Bloom filter response says "maybe yes" (in our case, "maybe not
TREESAME"), then we need to verify that it is correct. In the extreme
case that every path is changed, then the Bloom filter does nothing
but add extra work.

These extreme cases are also not unprecedented: in our Azure Repos
codebase, we were using core.autocrlf to smudge CRLFs to LFs, but
when it was time to dogfood VFS for Git, we needed to turn off the
smudge filter. So, there is one commit that converts every LF to a
CRLF in every text file. Storing a Bloom filter for those ~250,000
entries would take ~256KB for essentially no value. By not storing a
filter for this commit, we go immediately to the regular TREESAME
check, which would happen for most pathspecs.

This is all to say: having a maximum size is good. 512 is big enough
to cover _most_ commits, but not so big that we may store _really_ big
filters.

Makes sense. 512 is good enough to hardcode initially, but I couldn't
tell from briefly skimming the patches if it was possible to make this
size dynamic per-repo when the graph/filter is written.
My proof-of-concept has it as a constant, but part of my plan is to make 
these all config options, as of this item in my message:


>>> 2. We need config values for writing and consuming bloom filters, 
but also to override the default settings.



I.e. we might later add some discovery step where we look at N number of
commits at random, until we're satisfied that we've come up with some
average/median number of total (recursive) tree entries & how many tend
to be changed per-commit.

I.e. I can imagine repositories (with some automated changes) where we
have 10k files and tend to change 1k per commit, or ones with 10k files
where we tend to change just 1-10 per commit, which would mean a
larger/smaller filter would be needed / would do.
I'm not sure a dynamic approach would be worth the effort, but I'm open 
to hearing the results of an experiment.


Thanks,
-Stolee


Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Ævar Arnfjörð Bjarmason


On Tue, Oct 16 2018, Derrick Stolee wrote:

> On 10/16/2018 12:45 AM, Junio C Hamano wrote:
>> Derrick Stolee  writes:
>>
>>> 2. The filters are sized according to the number of changes in each
>>> commit, with a minimum of one 64-bit word.
>>> ...
>>> 6. When we compute the Bloom filters, we don't store a filter for
>>> commits whose first-parent diff has more than 512 paths.
>> Just being curious but was 512 taken out of thin air or is there
>> some math behind it, e.g. to limit false positive rate down to
>> certain threshold?  With a wide-enough bitset, you could store
>> arbitrary large number of paths with low enough false positive, I
>> guess, but is there a point where there is too many paths in the
>> change that gives us diminishing returns and not worth having a
>> filter in the first place?
> 512 is somewhat arbitrary, but having a maximum size is not.
>> In a normal source-code-control context, the set of paths modified
>> by any single commit ought to be a small subset of the entire paths,
>> and whole-tree changes ought to be fairly rare.  In a project for
>> which that assumption does not hold, it might help to have a
>> negative bloom filter (i.e. "git log -- A" asks "does the commit
>> modify A?" and the filter would say "we know it does not, because we
>> threw all the paths that are not touched to the bloom filter"), but
>> I think that would optimize for a wrong case.
>
> A commit with many changed paths is very rare. The 512 I picked above
> is enough to cover 99% of commits in each of the repos I sampled when
> first investigating Bloom filters.
>
> When a Bloom filter response says "maybe yes" (in our case, "maybe not
> TREESAME"), then we need to verify that it is correct. In the extreme
> case that every path is changed, then the Bloom filter does nothing
> but add extra work.
>
> These extreme cases are also not unprecedented: in our Azure Repos
> codebase, we were using core.autocrlf to smudge CRLFs to LFs, but
> when it was time to dogfood VFS for Git, we needed to turn off the
> smudge filter. So, there is one commit that converts every LF to a
> CRLF in every text file. Storing a Bloom filter for those ~250,000
> entries would take ~256KB for essentially no value. By not storing a
> filter for this commit, we go immediately to the regular TREESAME
> check, which would happen for most pathspecs.
>
> This is all to say: having a maximum size is good. 512 is big enough
> to cover _most_ commits, but not so big that we may store _really_ big
> filters.

Makes sense. 512 is good enough to hardcode initially, but I couldn't
tell from briefly skimming the patches if it was possible to make this
size dynamic per-repo when the graph/filter is written.

I.e. we might later add some discovery step where we look at N number of
commits at random, until we're satisfied that we've come up with some
average/median number of total (recursive) tree entries & how many tend
to be changed per-commit.

I.e. I can imagine repositories (with some automated changes) where we
have 10k files and tend to change 1k per commit, or ones with 10k files
where we tend to change just 1-10 per commit, which would mean a
larger/smaller filter would be needed / would do.


Re: [PATCH 0/4] Bloom filter experiment

2018-10-16 Thread Derrick Stolee

On 10/16/2018 12:45 AM, Junio C Hamano wrote:

Derrick Stolee  writes:


2. The filters are sized according to the number of changes in each
commit, with a minimum of one 64-bit word.
...
6. When we compute the Bloom filters, we don't store a filter for
commits whose first-parent diff has more than 512 paths.

Just being curious but was 512 taken out of thin air or is there
some math behind it, e.g. to limit false positive rate down to
certain threshold?  With a wide-enough bitset, you could store
arbitrary large number of paths with low enough false positive, I
guess, but is there a point where there is too many paths in the
change that gives us diminishing returns and not worth having a
filter in the first place?

512 is somewhat arbitrary, but having a maximum size is not.

In a normal source-code-control context, the set of paths modified
by any single commit ought to be a small subset of the entire paths,
and whole-tree changes ought to be fairly rare.  In a project for
which that assumption does not hold, it might help to have a
negative bloom filter (i.e. "git log -- A" asks "does the commit
modify A?" and the filter would say "we know it does not, because we
threw all the paths that are not touched to the bloom filter"), but
I think that would optimize for a wrong case.


A commit with many changed paths is very rare. The 512 I picked above is 
enough to cover 99% of commits in each of the repos I sampled when first 
investigating Bloom filters.


When a Bloom filter response says "maybe yes" (in our case, "maybe not 
TREESAME"), then we need to verify that it is correct. In the extreme 
case that every path is changed, then the Bloom filter does nothing but 
add extra work.


These extreme cases are also not unprecedented: in our Azure Repos 
codebase, we were using core.autocrlf  to smudge CRLFs to LFs, but when 
it was time to dogfood VFS for Git, we needed to turn off the smudge 
filter. So, there is one commit that converts every LF to a CRLF in 
every text file. Storing a Bloom filter for those ~250,000 entries would 
take ~256KB for essentially no value. By not storing a filter for this 
commit, we go immediately to the regular TREESAME check, which would 
happen for most pathspecs.


This is all to say: having a maximum size is good. 512 is big enough to 
cover _most_ commits, but not so big that we may store _really_ big filters.


Thanks,
-Stolee



Re: [PATCH 0/4] Bloom filter experiment

2018-10-15 Thread Junio C Hamano
Derrick Stolee  writes:

> 2. The filters are sized according to the number of changes in each
> commit, with a minimum of one 64-bit word.
> ...
> 6. When we compute the Bloom filters, we don't store a filter for
> commits whose first-parent diff has more than 512 paths.

Just being curious but was 512 taken out of thin air or is there
some math behind it, e.g. to limit false positive rate down to
certain threshold?  With a wide-enough bitset, you could store
arbitrary large number of paths with low enough false positive, I
guess, but is there a point where there is too many paths in the
change that gives us diminishing returns and not worth having a
filter in the first place?

In a normal source-code-control context, the set of paths modified
by any single commit ought to be a small subset of the entire paths,
and whole-tree changes ought to be fairly rare.  In a project for
which that assumption does not hold, it might help to have a
negative bloom filter (i.e. "git log -- A" asks "does the commit
modify A?" and the filter would say "we know it does not, because we
threw all the paths that are not touched to the bloom filter"), but
I think that would optimize for a wrong case.



Re: [PATCH 0/4] Bloom filter experiment

2018-10-15 Thread Derrick Stolee

On 10/9/2018 3:34 PM, SZEDER Gábor wrote:

To keep the ball rolling, here is my proof of concept in a somewhat
cleaned-up form, with still plenty of rough edges.


Peff, Szeder, and Jonathan,

Thanks for giving me the kick in the pants to finally write a proof of 
concept for my personal take on how this should work. My implementation 
borrows things from both Szeder and Jonathan's series. You can find my 
commits for all of the versions on GitHub (it's a bit too messy to share 
as a patch series right now, I think):


Repo: https://github.com/derrickstolee/git
Branches: bloom/* (includes bloom/stolee, bloom/peff, bloom/szeder, and 
bloom/tan for the respective implementations, and bloom/base as the 
common ancestor)


My implementation uses the following scheme:

1. Bloom filters are computed and stored on a commit-by-commit basis.

2. The filters are sized according to the number of changes in each 
commit, with a minimum of one 64-bit word.


3. The filters are stored in the commit-graph using two new optional 
chunks: one stores a single 32-bit integer for each commit that provides 
the end of its Bloom filter in the second "data" chunk. The data chunk 
also stores the magic constants (hash version, num hash keys, and num 
bits per entry).


4. We fill the Bloom filters as (const char *data, int len) pairs as 
"struct bloom_filter"s in a commit slab.


5. In order to evaluate containment, we need the struct bloom_filter, 
but also struct bloom_settings (stores the magic constants in one 
place), and struct bloom_key (stores the _k_ hash values). This allows 
us to hash a path once and test the same path against many Bloom filters.


6. When we compute the Bloom filters, we don't store a filter for 
commits whose first-parent diff has more than 512 paths.


7. When we compute the commit-graph, we can re-use the pre-existing 
filters without needing to recompute diffs. (Caveat: the current 
implementation will re-compute diffs for the commits with diffs that 
were too large.)


You can build the Bloom filters in my implementation this way:

GIT_TEST_BLOOM_FILTERS=1 ./git commit-graph write --reachable


You can play around with it like this:

   $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write
   Computing commit graph generation numbers: 100% (52801/52801), done.
   Computing bloom filter: 100% (52801/52801), done.
   # Yeah, I even added progress indicator! :)
   $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count 
--full-history HEAD -- t/valgrind/valgrind.sh
   886
   20:40:24.783699 revision.c:486  bloom filter total queries: 66095 
definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873


Jonathan used this same test, so will I. Here is a summary table:

| Implementation | Queries | Maybe | FP # | FP %  |
||-|---|--|---|
| Szeder | 66095   | 1142  | 256  | 0.38% |
| Jonathan   | 66459   | 107   | 89   | 0.16% |
| Stolee | 53025   | 492   | 479  | 0.90% |

(Note that we must have used different starting points, which is why my 
"Queries" is so much smaller.)


The increase in false-positive percentage is expected in my 
implementation. I'm using the correct filter sizes to hit a <1% FP 
ratio. This could be lowered by changing the settings, and the size 
would dynamically grow. For my Git repo (which contains 
git-for-windows/git and microsoft/git) this implementation grows the 
commmit-graph file from 5.8 MB to 7.3 MB (1.5 MB total, compared to 
Szeder's 8MB filter). For 105,260 commits, that rounds out to less than 
20 bytes per commit (compared to Jonathan's 256 bytes per commit).


Related stats for my Linux repo: 781,756 commits, commit-graph grows 
from 43.8 to 55.6 MB (~12 MB additional, ~16 bytes per commit).


I haven't done a side-by-side performance test for these 
implementations, but it would be interesting to do so.


Despite writing a lot of code in a short amount of time, there is a lot 
of work to be done before this is submittable:


1. There are three different environment variables right now. It would 
be better to have one GIT_TEST_ variable and rely on existing tracing 
for logs (trace2 values would be good here).


2. We need config values for writing and consuming bloom filters, but 
also to override the default settings.


3. My bloom.c/bloom.h is too coupled to the commit-graph. I want to 
harden that interface to let Bloom filters live as their own thing, but 
that the commit-graph could load a bloom filter from the file instead of 
from the slab.


4. Tests, tests, and more tests.

We'll see how much time I have to do this polish, but I think the 
benefit is proven.


Thanks,
-Stolee


Re: [PATCH 0/4] Bloom filter experiment

2018-10-09 Thread Derrick Stolee

On 10/9/2018 3:34 PM, SZEDER Gábor wrote:

To keep the ball rolling, here is my proof of concept in a somewhat
cleaned-up form, with still plenty of rough edges.

You can play around with it like this:

   $ GIT_USE_POC_BLOOM_FILTER=$((8*1024*1024*8)) git commit-graph write
   Computing commit graph generation numbers: 100% (52801/52801), done.
   Computing bloom filter: 100% (52801/52801), done.
   # Yeah, I even added progress indicator! :)
   $ GIT_TRACE_BLOOM_FILTER=2 GIT_USE_POC_BLOOM_FILTER=y git rev-list --count 
--full-history HEAD -- t/valgrind/valgrind.sh
   886
   20:40:24.783699 revision.c:486  bloom filter total queries: 66095 
definitely not: 64953 maybe: 1142 false positives: 256 fp ratio: 0.003873

The value of $GIT_USE_POC_BLOOM_FILTER only really matters when writing
the Bloom filter, and it specifies the number of bits in the filter's
bitmap, IOW the above command creates a 8MB Bloom filter.  To make use
of the filter the variable can be anything non-empty.

Writing the Bloom filter is very slow as it is (yeah, that's why
bothered with the progress indicator ;).  I wrote about it in patch 2's
commit message: the cause for about half of the slowness is rather
obvious, but I don't (yet) know what's responsible for the other half.


Not a single test...  but I've run loops over all files in git.git
comparing 'git rev-list HEAD -- $file's output with and without the
Bloom filter, and, surprisingly, they match.  My quick'n'dirty
experiments usually don't fare this well...


It's also available at:

   https://github.com/szeder/git bloom-filter-experiment


Thanks! I will take a close look at this tomorrow and start playing with it.

-Stolee



Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-10 Thread Junio C Hamano
Jeff King  writes:

>> Either is fine.  I am not moving 'next' beyond what is necessary for
>> this release cycle during the pre-release freeze period, and because
>> I thought that Peff was in favor of squashing in the changes to the
>> original when the next cycle starts, I haven't bothered to merge the
>> fix there yet.
>
> I think Ævar's point is just that the current tip of next is badly
> broken if you use bitmaps, and it's worth hot-fixing that in the
> meantime.

I know that ;-)

My point is that anybody who *needs* handholding from the upstream
project (i.e. not Ævar, but those whom he is trying to help with the
suggestion) should not be looking at 'next' during the prerelease
freeze.  They should be looking at 'master' instead, and that is why
I am not spending more than minimum cycles of my own worrying about
'next' during the prerelease.

In any case, I think the final is ready to go, almost.  I still need
to do the preformatted docs, though.



Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-10 Thread Jeff King
On Mon, Sep 10, 2018 at 09:53:56AM -0700, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
> 
> > I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when
> > building my own package of git, but I think this really should be fixed
> > in that branch, either by merging the fix down or reverting the original
> > series out of next, I think just merging the fix down makes sense, but
> > have no strong opinion on it.
> 
> Either is fine.  I am not moving 'next' beyond what is necessary for
> this release cycle during the pre-release freeze period, and because
> I thought that Peff was in favor of squashing in the changes to the
> original when the next cycle starts, I haven't bothered to merge the
> fix there yet.

I think Ævar's point is just that the current tip of next is badly
broken if you use bitmaps, and it's worth hot-fixing that in the
meantime.

After the release, I'm OK with either one of squashing that first patch
in, or just merging as-is.

-Peff


Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-10 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when
> building my own package of git, but I think this really should be fixed
> in that branch, either by merging the fix down or reverting the original
> series out of next, I think just merging the fix down makes sense, but
> have no strong opinion on it.

Either is fine.  I am not moving 'next' beyond what is necessary for
this release cycle during the pre-release freeze period, and because
I thought that Peff was in favor of squashing in the changes to the
original when the next cycle starts, I haven't bothered to merge the
fix there yet.


Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-08 Thread Ævar Arnfjörð Bjarmason


On Sat, Sep 01 2018, Jeff King wrote:

> On Fri, Aug 31, 2018 at 06:55:58PM -0400, Jeff King wrote:
>
>> On Fri, Aug 31, 2018 at 05:23:17PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>> > On Tue, Aug 21 2018, Jeff King wrote:
>> >
>> > > +int bitmap_has_sha1_in_uninteresting(struct bitmap_index *bitmap_git,
>> > > + const unsigned char *sha1)
>> > > +{
>> > > +int pos;
>> > > +
>> > > +if (!bitmap_git)
>> > > +return 0; /* no bitmap loaded */
>> > > +if (!bitmap_git->result)
>> > > +BUG("failed to perform bitmap walk before querying");
>> >
>> > Some part of what calls this completely breaks pushing from the "next"
>> > branch when you have local bitmaps (we *really* should have some tests
>> > for this...).
>>
>> Yikes, thanks for reporting. I agree we need better tests here.
>
> OK, here is the fix. Since the problem is in 'next', this is done as a
> patch on top of jk/pack-delta-reuse-with-bitmap. But since we're set to
> rewind 'next' post-release anyway, we could squash it directly into
> 30cdc33fba from the original series. That would help later bisections
> from running into it, which may be worth it as it's a pretty severe
> breakage.  Or maybe not:

Junio: Just a reminder that next is still broken with this, and I see
e.g. the Debian "experimental" has the bug but not the fix at this
point.

I'm just reverting jk/pack-delta-reuse-with-bitmap out of next when
building my own package of git, but I think this really should be fixed
in that branch, either by merging the fix down or reverting the original
series out of next, I think just merging the fix down makes sense, but
have no strong opinion on it.


Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-04 Thread Jeff King
On Tue, Sep 04, 2018 at 12:30:14PM -0700, Stefan Beller wrote:

> >   [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check
> >
> > The actual fix. This should get merged to next ASAP (or the original
> > topic just reverted).
> >
> >   [2/4]: t5310: test delta reuse with bitmaps
> >
> > I did this separately to give us flexibility to squash or merge
> > quickly. But it does find Ævar's bug on a git without patch 1.
> >
> >   [3/4]: traverse_bitmap_commit_list(): don't free result
> >
> > The original assert should have simply been useless, but it was the
> > surprising behavior of this function that turned it into a bug.
> >
> >   [4/4]: pack-bitmap: drop "loaded" flag
> >
> > And this is just an annoyance I ran into, which is a fallout from
> > our conversion to using an allocated bitmap_index struct.
> 
> FWIW, the whole series is
> Reviewed-by: Stefan Beller 
> I am sorry for suggesting the BUG() in the first place.

Heh. You only asked a question about the interface. I was the one who
was _supposed_ to be familiar with the code, and put in the actual
assertion. So if your suggestion was dumb, my response was doubly so. :)

-Peff


Re: [PATCH 0/4] un-breaking pack-objects with bitmaps

2018-09-04 Thread Stefan Beller
>   [1/4]: bitmap_has_sha1_in_uninteresting(): drop BUG check
>
> The actual fix. This should get merged to next ASAP (or the original
> topic just reverted).
>
>   [2/4]: t5310: test delta reuse with bitmaps
>
> I did this separately to give us flexibility to squash or merge
> quickly. But it does find Ævar's bug on a git without patch 1.
>
>   [3/4]: traverse_bitmap_commit_list(): don't free result
>
> The original assert should have simply been useless, but it was the
> surprising behavior of this function that turned it into a bug.
>
>   [4/4]: pack-bitmap: drop "loaded" flag
>
> And this is just an annoyance I ran into, which is a fallout from
> our conversion to using an allocated bitmap_index struct.

FWIW, the whole series is
Reviewed-by: Stefan Beller 
I am sorry for suggesting the BUG() in the first place.

Stefan


Re: [PATCH 0/4] tests: make more use of 'test_must_be_empty'

2018-08-22 Thread Matthew DeVore
Note that you can also trivially convert a lot more instances of empty
checking to the "right" way: 'test_line_count = 0'

$ grep -Ir 'test_line_count = 0' t | wc -l
76

I think it would be nice to clean these up as well.



Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Johannes Schindelin
Hi Peff,

On Fri, 10 Aug 2018, Jeff King wrote:

> On Fri, Aug 10, 2018 at 06:43:07PM +0200, Johannes Schindelin wrote:
> 
> > So unless you are willing to ignore, to willfully keep this breakage,
> > I would suggest not to introduce the ugliness of an overridden
> > upload-pack for the sole purpose of disabling the tracing on one side,
> > but instead to get this here bug fixed, by helping me with this here
> > patch series.
> 
> I'm OK if you want to live with the broken test in the interim.

I realize that I failed to tell you that I basically spent 2.5 days worth
of worktime to figure this out and come up with three iterations of the
patch series (you only saw the latest).

I want this patch series in git.git, maybe not in the current form, but in
one form or another. I don't want to spend any more minute on trying to
figure out the same problem with any other regression test (which might
not even be as easily worked around as with a semi-simple --upload-pack
option).

Thank you for wanting to help. Please accept my apologies for expressing
in a poor way that I appreciate your eagerness to provide a patch, but
that I think nevertheless that it would be better to work on the GIT_TRACE
concurrency problem and its resolution via file locks. It would solve that
class of problems, instead of that single regression test being flakey.

Sorry,
Dscho


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff King
On Fri, Aug 10, 2018 at 11:34:59AM -0700, Junio C Hamano wrote:

> Johannes Sixt  writes:
> 
> > As this buglet looks like a recurring theme, and a proper fix is
> > preferable over repeated work-arounds. To me it looks like we need
> > some sort of locking on Windows. Unless your friends at Microsoft have
> > an ace in their sleeves that lets us have atomic O_APPEND the POSIX
> > way...
> 
> Just to put the severity of the issue in context, we use O_APPEND in
> a few codepaths, and the trace thing for debugging is the only thing
> that could have multiple writers.  Other users of O_APPEND are:
> 
>  * refs/files-backend.c uses it so that a reflog entry can be
>appended at the end, but because update to each ref is protected
>from racing at a lot higher layer with a lock, no two people
>would try to append to the same reflog file, so atomicity of
>O_APPEND does not matter here.

Just an interesting side note, but I think one of the reasons we use
dot-locks and not flock() is that the former generally works on network
filesystems. I think O_APPEND also is not reliable for non-local files,
though, so short of actually dot-locking trace files (yuck) I don't
think there's a great solution there.

> It may make sense to allow GIT_TRACE to have a placeholder
> (e.g. "/tmp/traceout.$$") to help debuggers arrange to give
> different processes their own trace output file, which perhaps may
> be a simpler and less impactful to the performance solution than
> having to make locks at an application layer.

Heh, I actually wrote that patch several years ago, as part of an
abandoned effort to have config-triggered tracing (e.g., on the server
side where you're getting hundreds of requests and you want to enable
tracing on a repo for a slice of time).

One annoying thing for a case like the test in question is that you
don't actually know the pid of the process you're interested in. So we'd
generate two trace files, and then you'd have to figure out which one is
which. In my earlier work, I coupled it with some magic to allow
per-command config, like:

  [trace "fetch"]
  packet = /tmp/trace.%p

but you could also imagine a "%n" which uses the command name.

I don't remember why I abandoned it, but I think a lot had to do with
violating the "don't look at config" rules for our repo startup code. A
lot of that has been fixed since then, but I haven't really needed it.
If anybody is really interested, the patches are at:

  https://github.com/peff/git jk/trace-stdin

(my ultimate goal was to record stdin for pack-objects to replay slow
fetches, but I ended up hacking together a patch to pack-objects
instead).

-Peff


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Junio C Hamano
Jeff King  writes:

> I have tried to help with the actual problem, by identifying the root
> cause (that the trace code's strategy is not fundamentally broken, but
> that it relies on O_APPEND whose guarantees are obviously not portable
> to Windows) and exploring possible solutions there (FILE_APPEND_DATA).
>
> Lacking any kind of Windows development environment, I'd just as soon
> stop there and let more knowledgeable people take over. But please don't
> characterize my presence in this thread as some kind of willful harm or
> ignoring of the problem.

Thanks.

It seems like the other Johannes and the other Jeff are also
interested, so we may hopefully see a properly fixed O_APPEND,
which would be the real solution that would help appenders
that are *not* tracing subsystem, too.





Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Junio C Hamano
Johannes Sixt  writes:

> As this buglet looks like a recurring theme, and a proper fix is
> preferable over repeated work-arounds. To me it looks like we need
> some sort of locking on Windows. Unless your friends at Microsoft have
> an ace in their sleeves that lets us have atomic O_APPEND the POSIX
> way...

Just to put the severity of the issue in context, we use O_APPEND in
a few codepaths, and the trace thing for debugging is the only thing
that could have multiple writers.  Other users of O_APPEND are:

 * refs/files-backend.c uses it so that a reflog entry can be
   appended at the end, but because update to each ref is protected
   from racing at a lot higher layer with a lock, no two people
   would try to append to the same reflog file, so atomicity of
   O_APPEND does not matter here.

 * sequencer.c wants to use it when moving one insn from the todo
   list to the 'done' list when it finishes one operation.  If you
   are running two sequences in a single repository, intermixed
   'done' list would be the least of your problem, so presumably we
   are fine here.

It may make sense to allow GIT_TRACE to have a placeholder
(e.g. "/tmp/traceout.$$") to help debuggers arrange to give
different processes their own trace output file, which perhaps may
be a simpler and less impactful to the performance solution than
having to make locks at an application layer.




Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff King
On Fri, Aug 10, 2018 at 06:43:07PM +0200, Johannes Schindelin wrote:

> > > Correct.  But I am more worried about the "mixed/overwriting"
> > > breakage, if there is one; it means we may need to be prepared for
> > > systems that lack O_APPEND that works correctly.  I initially just
> > > assumed that it was what Dscho was seeing, but after re-reading his
> > > message, I am not sure anymore.
> > > 
> > > I think the "do not trace the other side" approach you suggest for
> > > these tests that only care about one side is more appropriate
> > > solution for this particular case.  We then do not have to worry
> > > about overwriting or output from both sides mixed randomly.
> 
> Just so everybody knows: I never received the above two mails. Something
> is seriously rotten in GMX. This seems to have started a week or two ago.

I think it may just have been that they went to the gitgitgadget
address; I explicitly added you as a cc when I sent the patch with a
commit message, which was when I first noticed you weren't cc'd on the
earlier bits.

> What you cannot see easily, unless you go the route that I offered
> Jonathan (open a PR on gitgitgadget/git which will automatically run the
> test suite on Windows, macOS and Linux, and of course you can do anything
> in a PR, including narrowing down what tests are run) is that sometimes
> those lines in the `trace` file were clearly *incomplete*. That is, even
> if both processes tried to write atomically, one managed to overwrite the
> other's buffer in flight.
> 
> This is a pretty serious thing, even worse than the failing test suite,
> and I don't think that your patch acknowledges how serious this is.

I don't see anything in what you wrote above or in your original cover
letter or series that contradicts the notion that O_APPEND is simply not
atomic on Windows. Overwriting the same byte ranges in the file is what
I would expect on Linux, as well, if the file were opened without
O_APPEND.

I have mixed reports from searching online whether an equivalent to
O_APPEND exists. Some sources seem to say that FILE_APPEND_DATA is.  I
couldn't find anything definitive in Microsoft documentation about
whether it actually provides atomicity.  From my (admittedly limited)
digging, that does not seem to be used by the msys2 runtime, so it might
be worth exploring.

> And please don't give me that "but it works on Linux" response. Seriously.
> Sheesh.

But it does. And it's designed to. There is literally nothing to fix on
Linux.

> Even if you have not managed to trigger it, the POSIX standard seems not
> to define clearly what the behavior is of two competing write() calls,
> unless you lock the files appropriately, as my patch series does.

POSIX says:

  If the O_APPEND flag of the file status flags is set, the file offset
  shall be set to the end of the file prior to each write and no
  intervening file modification operation shall occur between changing
  the file offset and the write operation.

and:

  This volume of POSIX.1-2017 does not specify the behavior of
  concurrent writes to a regular file from multiple threads, except that
  each write is atomic.

I admit the writing is a little turgid, but I think between the two it
is promising the behavior that we expect (and I think that
interpretation is pretty common in the Unix world).

> So unless you are willing to ignore, to willfully keep this breakage, I
> would suggest not to introduce the ugliness of an overridden upload-pack
> for the sole purpose of disabling the tracing on one side, but instead to
> get this here bug fixed, by helping me with this here patch series.

I'm OK if you want to live with the broken test in the interim. But
after your complaining about the imminence of it "infecting" master, I
thought you might be happy to see a solution that was more immediate.

I have tried to help with the actual problem, by identifying the root
cause (that the trace code's strategy is not fundamentally broken, but
that it relies on O_APPEND whose guarantees are obviously not portable
to Windows) and exploring possible solutions there (FILE_APPEND_DATA).

Lacking any kind of Windows development environment, I'd just as soon
stop there and let more knowledgeable people take over. But please don't
characterize my presence in this thread as some kind of willful harm or
ignoring of the problem.

-Peff


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Johannes Sixt

Am 10.08.2018 um 18:51 schrieb Jeff Hostetler:



On 8/10/2018 12:15 PM, Johannes Sixt wrote:

Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget:
I reported a couple of times that t5552 is not passing reliably. It 
has now

reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs 
a lot

more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the 
same

file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.


Thanks for digging down to the root cause. As has been said, the 
behavior of O_APPEND is well-defined under POSIX, but last time I 
looked for equivalent feature on Windows, I did not find any.


Last time was when I worked around the same failure in 
t5503-tagfollow.sh in my private builds:


https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 



It is basically the same as Peff suggests: log only one side of the 
fetch.


As this buglet looks like a recurring theme, and a proper fix is 
preferable over repeated work-arounds. To me it looks like we need 
some sort of locking on Windows. Unless your friends at Microsoft have 
an ace in their sleeves that lets us have atomic O_APPEND the POSIX 
way...


The official Windows CRT for open() does document an O_APPEND, but after
looking at the distributed source, I'm not sure I trust it.


I looked at the CRT code, too, and no, we can't trust it.


I have not looked at the MSYS version of the CRT yet so I cannot comment
on it.

I did find that the following code does what we want.  Notice the use of
FILE_APPEND_DATA in place of the usual GENERIC_READ.  (And yes, the docs
are very vague here.)


As far as I understand, FILE_APPEND_DATA is just a permission flag (I'm 
not sure why that would be necessary), but at least the documentation 
states that it is still necessary for the caller to set the file pointer.


But I haven't tried your code below, yet. Should it append the line of 
text on each invocation? And if so, is it an atomic operation?



Before we try a locking solution, perhaps we should tweak mingw_open()
to use something like this to get a proper HANDLE directly and then use
_open_osfhandle() to get a "fd" for it.

Jeff

-

#include 

int main()
{
 HANDLE h = CreateFileW(L"foo.txt",
     FILE_APPEND_DATA,
     FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
     NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

 const char * buf = "this is a test\n";
 DWORD len = strlen(buf);
 DWORD nbw;

 WriteFile(h, buf, len, , NULL);
 CloseHandle(h);

 return 0;
}





Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff Hostetler




On 8/10/2018 12:51 PM, Jeff Hostetler wrote:



On 8/10/2018 12:15 PM, Johannes Sixt wrote:

Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget:
I reported a couple of times that t5552 is not passing reliably. It 
has now

reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs 
a lot

more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the 
same

file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.


Thanks for digging down to the root cause. As has been said, the 
behavior of O_APPEND is well-defined under POSIX, but last time I 
looked for equivalent feature on Windows, I did not find any.


Last time was when I worked around the same failure in 
t5503-tagfollow.sh in my private builds:


https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742 



It is basically the same as Peff suggests: log only one side of the 
fetch.


As this buglet looks like a recurring theme, and a proper fix is 
preferable over repeated work-arounds. To me it looks like we need 
some sort of locking on Windows. Unless your friends at Microsoft have 
an ace in their sleeves that lets us have atomic O_APPEND the POSIX 
way...


The official Windows CRT for open() does document an O_APPEND, but after
looking at the distributed source, I'm not sure I trust it.

I have not looked at the MSYS version of the CRT yet so I cannot comment
on it.

I did find that the following code does what we want.  Notice the use of
FILE_APPEND_DATA in place of the usual GENERIC_READ.  (And yes, the docs
are very vague here.)


d'oh

s/GENERIC_READ/GENERIC_WRITE/



Before we try a locking solution, perhaps we should tweak mingw_open()
to use something like this to get a proper HANDLE directly and then use
_open_osfhandle() to get a "fd" for it.

Jeff

-

#include 

int main()
{
 HANDLE h = CreateFileW(L"foo.txt",
     FILE_APPEND_DATA,
     FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
     NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

 const char * buf = "this is a test\n";
 DWORD len = strlen(buf);
 DWORD nbw;

 WriteFile(h, buf, len, , NULL);
 CloseHandle(h);

 return 0;
}


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff Hostetler




On 8/10/2018 12:15 PM, Johannes Sixt wrote:

Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget:
I reported a couple of times that t5552 is not passing reliably. It 
has now

reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs a 
lot

more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the same
file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.


Thanks for digging down to the root cause. As has been said, the 
behavior of O_APPEND is well-defined under POSIX, but last time I looked 
for equivalent feature on Windows, I did not find any.


Last time was when I worked around the same failure in 
t5503-tagfollow.sh in my private builds:


https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742

It is basically the same as Peff suggests: log only one side of the fetch.

As this buglet looks like a recurring theme, and a proper fix is 
preferable over repeated work-arounds. To me it looks like we need some 
sort of locking on Windows. Unless your friends at Microsoft have an ace 
in their sleeves that lets us have atomic O_APPEND the POSIX way...


The official Windows CRT for open() does document an O_APPEND, but after
looking at the distributed source, I'm not sure I trust it.

I have not looked at the MSYS version of the CRT yet so I cannot comment
on it.

I did find that the following code does what we want.  Notice the use of
FILE_APPEND_DATA in place of the usual GENERIC_READ.  (And yes, the docs
are very vague here.)

Before we try a locking solution, perhaps we should tweak mingw_open()
to use something like this to get a proper HANDLE directly and then use
_open_osfhandle() to get a "fd" for it.

Jeff

-

#include 

int main()
{
HANDLE h = CreateFileW(L"foo.txt",
FILE_APPEND_DATA,
FILE_SHARE_READ | FILE_SHARE_WRITE | FILE_SHARE_DELETE,
NULL, OPEN_ALWAYS, FILE_ATTRIBUTE_NORMAL, NULL);

const char * buf = "this is a test\n";
DWORD len = strlen(buf);
DWORD nbw;

WriteFile(h, buf, len, , NULL);
CloseHandle(h);

return 0;
}


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Johannes Schindelin
Hi,

On Fri, 10 Aug 2018, Jeff King wrote:

> On Thu, Aug 09, 2018 at 01:49:52PM -0700, Junio C Hamano wrote:
> 
> > Jeff King  writes:
> > 
> > > Are you sure that it's not well-defined? We open the path with O_APPEND,
> > > which means every write() will be atomically positioned at the end of
> > > file. So we would never lose or overwrite data.
> > >
> > > We do our own buffering in a strbuf, writing the result out in a single
> > > write() call (modulo the OS returning a short write, but that should not
> > > generally happen when writing short strings to a file). So we should get
> > > individual trace lines as atomic units.
> > >
> > > The order of lines from the two processes is undefined, of course.
> > 
> > Correct.  But I am more worried about the "mixed/overwriting"
> > breakage, if there is one; it means we may need to be prepared for
> > systems that lack O_APPEND that works correctly.  I initially just
> > assumed that it was what Dscho was seeing, but after re-reading his
> > message, I am not sure anymore.
> > 
> > I think the "do not trace the other side" approach you suggest for
> > these tests that only care about one side is more appropriate
> > solution for this particular case.  We then do not have to worry
> > about overwriting or output from both sides mixed randomly.

Just so everybody knows: I never received the above two mails. Something
is seriously rotten in GMX. This seems to have started a week or two ago.

> Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets
> us pursue any fix for interleaved trace output on Windows without the
> pressure of an impending flaky test.
> 
> My gut says that looking into making O_APPEND work there is going to be
> the nicest solution, but my gut is not very well versed in Windows
> subtleties. ;)

Your patch seems to be a patch in the original sense of the word: it is
not a solution.

And let me spell it out very clearly (as I have mentioned to Jonathan a
couple times, without any response): t5552 fails regularly, way more than
rarely, see for yourself:

https://git-for-windows.visualstudio.com/git/_build/index?definitionId=1&_a=history

And the reason it fails was analyzed by me and described in the commit
message (I am sorry that the cover letter was still stale and talked
about flock(), which I had decided was not the right approach after
fighting with Linux over it).

What you cannot see easily, unless you go the route that I offered
Jonathan (open a PR on gitgitgadget/git which will automatically run the
test suite on Windows, macOS and Linux, and of course you can do anything
in a PR, including narrowing down what tests are run) is that sometimes
those lines in the `trace` file were clearly *incomplete*. That is, even
if both processes tried to write atomically, one managed to overwrite the
other's buffer in flight.

This is a pretty serious thing, even worse than the failing test suite,
and I don't think that your patch acknowledges how serious this is.

And please don't give me that "but it works on Linux" response. Seriously.
Sheesh.

Even if you have not managed to trigger it, the POSIX standard seems not
to define clearly what the behavior is of two competing write() calls,
unless you lock the files appropriately, as my patch series does.

So unless you are willing to ignore, to willfully keep this breakage, I
would suggest not to introduce the ugliness of an overridden upload-pack
for the sole purpose of disabling the tracing on one side, but instead to
get this here bug fixed, by helping me with this here patch series.

We don't have to rush this, you know? We have to fix it, though.

Thank you,
Dscho

> -- >8 --
> Subject: [PATCH] t5552: suppress upload-pack trace output
> 
> The t5552 test script uses GIT_TRACE_PACKET to monitor what
> git-fetch sends and receives. However, because we're
> accessing a local repository, the child upload-pack also
> sends trace output to the same file.
> 
> On Linux, this works out OK. We open the trace file with
> O_APPEND, so all writes are atomically positioned at the end
> of the file. No data can be overwritten or omitted. And
> since we prepare our small writes in a strbuf and write them
> with a single write(), we should see each line as an atomic
> unit. The order of lines between the two processes is
> undefined, but the test script greps only for "fetch>" or
> "fetch<" lines. So under Linux, the test results are
> deterministic.
> 
> The test fails intermittently on Windows, however,
> reportedly even overwriting bits of the output file (i.e.,
> O_APPEND does not seem to give us an atomic position+write).
> 
> Since the test only cares about the trace output from fetch,
> we can just disable the output from upload-pack. That
> doesn't solve the greater question of O_APPEND/trace issues
> under Windows, but it easily fixes the flakiness from this
> test.
> 
> Reported-by: Johannes Schindelin 
> Signed-off-by: Jeff King 
> ---
> I'm assuming that this really isn't 

Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Johannes Sixt

Am 09.08.2018 um 19:35 schrieb Johannes Schindelin via GitGitGadget:

I reported a couple of times that t5552 is not passing reliably. It has now
reached next, and will no doubt infect master soon.

Turns out that it is not a Windows-specific issue, even if it occurs a lot
more often on Windows than elsewhere.

The culprit is that two processes try simultaneously to write to the same
file specified via GIT_TRACE_PACKET, and it is not well defined how that
should work, even on Linux.


Thanks for digging down to the root cause. As has been said, the 
behavior of O_APPEND is well-defined under POSIX, but last time I looked 
for equivalent feature on Windows, I did not find any.


Last time was when I worked around the same failure in 
t5503-tagfollow.sh in my private builds:


https://github.com/j6t/git/commit/9a447a6844b50b43746d9765b3ac809e2793d742

It is basically the same as Peff suggests: log only one side of the fetch.

As this buglet looks like a recurring theme, and a proper fix is 
preferable over repeated work-arounds. To me it looks like we need some 
sort of locking on Windows. Unless your friends at Microsoft have an ace 
in their sleeves that lets us have atomic O_APPEND the POSIX way...


-- Hannes


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Junio C Hamano
Jeff King  writes:

> Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets
> us pursue any fix for interleaved trace output on Windows without the
> pressure of an impending flaky test.
>
> My gut says that looking into making O_APPEND work there is going to be
> the nicest solution, but my gut is not very well versed in Windows
> subtleties. ;)

As we know these tests are only interested in "fetch" lines and not
record of communication in the other direction, I think this is a
nice clean-up that is worth doing.

For only two grep's it may not look worth doing, but it would be
also a good clean-up to give an got_acked helper that sits next to
have_sent and have_not_sent helpers for readability.  That is of
course outside the scope of this change.

Will queue.  Thanks.

> -- >8 --
> Subject: [PATCH] t5552: suppress upload-pack trace output
>
> The t5552 test script uses GIT_TRACE_PACKET to monitor what
> git-fetch sends and receives. However, because we're
> accessing a local repository, the child upload-pack also
> sends trace output to the same file.
>
> On Linux, this works out OK. We open the trace file with
> O_APPEND, so all writes are atomically positioned at the end
> of the file. No data can be overwritten or omitted. And
> since we prepare our small writes in a strbuf and write them
> with a single write(), we should see each line as an atomic
> unit. The order of lines between the two processes is
> undefined, but the test script greps only for "fetch>" or
> "fetch<" lines. So under Linux, the test results are
> deterministic.
>
> The test fails intermittently on Windows, however,
> reportedly even overwriting bits of the output file (i.e.,
> O_APPEND does not seem to give us an atomic position+write).
>
> Since the test only cares about the trace output from fetch,
> we can just disable the output from upload-pack. That
> doesn't solve the greater question of O_APPEND/trace issues
> under Windows, but it easily fixes the flakiness from this
> test.
>
> Reported-by: Johannes Schindelin 
> Signed-off-by: Jeff King 
> ---
> I'm assuming that this really isn't triggerable on Linux. I tried and
> couldn't manage to get it to fail, and the reasoning above explains why.
> But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows.
>
>  t/t5552-skipping-fetch-negotiator.sh | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/t/t5552-skipping-fetch-negotiator.sh 
> b/t/t5552-skipping-fetch-negotiator.sh
> index 0a8e0e42ed..0ad50dd839 100755
> --- a/t/t5552-skipping-fetch-negotiator.sh
> +++ b/t/t5552-skipping-fetch-negotiator.sh
> @@ -28,6 +28,19 @@ have_not_sent () {
>   done
>  }
>  
> +# trace_fetch   [args]
> +#
> +# Trace the packet output of fetch, but make sure we disable the variable
> +# in the child upload-pack, so we don't combine the results in the same file.
> +trace_fetch () {
> + client=$1; shift
> + server=$1; shift
> + GIT_TRACE_PACKET="$(pwd)/trace" \
> + git -C "$client" fetch \
> +   --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \
> +   "$server" "$@"
> +}
> +
>  test_expect_success 'commits with no parents are sent regardless of skip 
> distance' '
>   git init server &&
>   test_commit -C server to_fetch &&
> @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent 
> regardless of skip distanc
>   # "c1" has no parent, it is still sent as "have" even though it would
>   # normally be skipped.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c7 c5 c2 c1 &&
>   have_not_sent c6 c4 c3
>  '
> @@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the 
> larger one' '
>   # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4"
>   # (from "c5side" skip 1).
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c5side c11 c9 c6 c1 &&
>   have_not_sent c10 c8 c7 c5 c4 c3 c2
>  '
> @@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out 
> commits' '
>   # not need to send any ancestors of "c3", but we still need to send "c3"
>   # itself.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch &&
> + trace_fetch client origin to_fetch &&
>   have_sent c5 c4^ c2side &&
>   have_not_sent c4 c4^^ c4^^^
>  '
> @@ -121,7 +134,7 @@ test_expect_success 'handle clock skew' '
>   # and sent, because (due to clock skew) its only parent has already been
>   # popped off the priority queue.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - 

Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Junio C Hamano
Jeff King  writes:

> Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets
> us pursue any fix for interleaved trace output on Windows without the
> pressure of an impending flaky test.
>
> My gut says that looking into making O_APPEND work there is going to be
> the nicest solution, but my gut is not very well versed in Windows
> subtleties. ;)

As we know these tests are only interested in "fetch" lines and not
record of communication in the other direction, I think this is a
nice clean-up that is worth doing.

For only two grep's it may not look worth doing, but it would be
also a good clean-up to give an got_acked helper that sits next to
have_sent and have_not_sent helpers for readability.  That is of
course outside the scope of this change.

Will queue.  Thansk.

> -- >8 --
> Subject: [PATCH] t5552: suppress upload-pack trace output
>
> The t5552 test script uses GIT_TRACE_PACKET to monitor what
> git-fetch sends and receives. However, because we're
> accessing a local repository, the child upload-pack also
> sends trace output to the same file.
>
> On Linux, this works out OK. We open the trace file with
> O_APPEND, so all writes are atomically positioned at the end
> of the file. No data can be overwritten or omitted. And
> since we prepare our small writes in a strbuf and write them
> with a single write(), we should see each line as an atomic
> unit. The order of lines between the two processes is
> undefined, but the test script greps only for "fetch>" or
> "fetch<" lines. So under Linux, the test results are
> deterministic.
>
> The test fails intermittently on Windows, however,
> reportedly even overwriting bits of the output file (i.e.,
> O_APPEND does not seem to give us an atomic position+write).
>
> Since the test only cares about the trace output from fetch,
> we can just disable the output from upload-pack. That
> doesn't solve the greater question of O_APPEND/trace issues
> under Windows, but it easily fixes the flakiness from this
> test.
>
> Reported-by: Johannes Schindelin 
> Signed-off-by: Jeff King 
> ---
> I'm assuming that this really isn't triggerable on Linux. I tried and
> couldn't manage to get it to fail, and the reasoning above explains why.
> But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows.
>
>  t/t5552-skipping-fetch-negotiator.sh | 23 ++-
>  1 file changed, 18 insertions(+), 5 deletions(-)
>
> diff --git a/t/t5552-skipping-fetch-negotiator.sh 
> b/t/t5552-skipping-fetch-negotiator.sh
> index 0a8e0e42ed..0ad50dd839 100755
> --- a/t/t5552-skipping-fetch-negotiator.sh
> +++ b/t/t5552-skipping-fetch-negotiator.sh
> @@ -28,6 +28,19 @@ have_not_sent () {
>   done
>  }
>  
> +# trace_fetch   [args]
> +#
> +# Trace the packet output of fetch, but make sure we disable the variable
> +# in the child upload-pack, so we don't combine the results in the same file.
> +trace_fetch () {
> + client=$1; shift
> + server=$1; shift
> + GIT_TRACE_PACKET="$(pwd)/trace" \
> + git -C "$client" fetch \
> +   --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \
> +   "$server" "$@"
> +}
> +
>  test_expect_success 'commits with no parents are sent regardless of skip 
> distance' '
>   git init server &&
>   test_commit -C server to_fetch &&
> @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent 
> regardless of skip distanc
>   # "c1" has no parent, it is still sent as "have" even though it would
>   # normally be skipped.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c7 c5 c2 c1 &&
>   have_not_sent c6 c4 c3
>  '
> @@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the 
> larger one' '
>   # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4"
>   # (from "c5side" skip 1).
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c5side c11 c9 c6 c1 &&
>   have_not_sent c10 c8 c7 c5 c4 c3 c2
>  '
> @@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out 
> commits' '
>   # not need to send any ancestors of "c3", but we still need to send "c3"
>   # itself.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch &&
> + trace_fetch client origin to_fetch &&
>   have_sent c5 c4^ c2side &&
>   have_not_sent c4 c4^^ c4^^^
>  '
> @@ -121,7 +134,7 @@ test_expect_success 'handle clock skew' '
>   # and sent, because (due to clock skew) its only parent has already been
>   # popped off the priority queue.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - 

Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-10 Thread Jeff King
On Thu, Aug 09, 2018 at 01:49:52PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Are you sure that it's not well-defined? We open the path with O_APPEND,
> > which means every write() will be atomically positioned at the end of
> > file. So we would never lose or overwrite data.
> >
> > We do our own buffering in a strbuf, writing the result out in a single
> > write() call (modulo the OS returning a short write, but that should not
> > generally happen when writing short strings to a file). So we should get
> > individual trace lines as atomic units.
> >
> > The order of lines from the two processes is undefined, of course.
> 
> Correct.  But I am more worried about the "mixed/overwriting"
> breakage, if there is one; it means we may need to be prepared for
> systems that lack O_APPEND that works correctly.  I initially just
> assumed that it was what Dscho was seeing, but after re-reading his
> message, I am not sure anymore.
> 
> I think the "do not trace the other side" approach you suggest for
> these tests that only care about one side is more appropriate
> solution for this particular case.  We then do not have to worry
> about overwriting or output from both sides mixed randomly.

Here it is as a patch on top of jt/fetch-negotiator-skipping, which lets
us pursue any fix for interleaved trace output on Windows without the
pressure of an impending flaky test.

My gut says that looking into making O_APPEND work there is going to be
the nicest solution, but my gut is not very well versed in Windows
subtleties. ;)

-- >8 --
Subject: [PATCH] t5552: suppress upload-pack trace output

The t5552 test script uses GIT_TRACE_PACKET to monitor what
git-fetch sends and receives. However, because we're
accessing a local repository, the child upload-pack also
sends trace output to the same file.

On Linux, this works out OK. We open the trace file with
O_APPEND, so all writes are atomically positioned at the end
of the file. No data can be overwritten or omitted. And
since we prepare our small writes in a strbuf and write them
with a single write(), we should see each line as an atomic
unit. The order of lines between the two processes is
undefined, but the test script greps only for "fetch>" or
"fetch<" lines. So under Linux, the test results are
deterministic.

The test fails intermittently on Windows, however,
reportedly even overwriting bits of the output file (i.e.,
O_APPEND does not seem to give us an atomic position+write).

Since the test only cares about the trace output from fetch,
we can just disable the output from upload-pack. That
doesn't solve the greater question of O_APPEND/trace issues
under Windows, but it easily fixes the flakiness from this
test.

Reported-by: Johannes Schindelin 
Signed-off-by: Jeff King 
---
I'm assuming that this really isn't triggerable on Linux. I tried and
couldn't manage to get it to fail, and the reasoning above explains why.
But I wasn't 100% clear that Dscho hadn't seen it fail on non-Windows.

 t/t5552-skipping-fetch-negotiator.sh | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/t/t5552-skipping-fetch-negotiator.sh 
b/t/t5552-skipping-fetch-negotiator.sh
index 0a8e0e42ed..0ad50dd839 100755
--- a/t/t5552-skipping-fetch-negotiator.sh
+++ b/t/t5552-skipping-fetch-negotiator.sh
@@ -28,6 +28,19 @@ have_not_sent () {
done
 }
 
+# trace_fetch   [args]
+#
+# Trace the packet output of fetch, but make sure we disable the variable
+# in the child upload-pack, so we don't combine the results in the same file.
+trace_fetch () {
+   client=$1; shift
+   server=$1; shift
+   GIT_TRACE_PACKET="$(pwd)/trace" \
+   git -C "$client" fetch \
+ --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \
+ "$server" "$@"
+}
+
 test_expect_success 'commits with no parents are sent regardless of skip 
distance' '
git init server &&
test_commit -C server to_fetch &&
@@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent 
regardless of skip distanc
# "c1" has no parent, it is still sent as "have" even though it would
# normally be skipped.
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
+   trace_fetch client "$(pwd)/server" &&
have_sent c7 c5 c2 c1 &&
have_not_sent c6 c4 c3
 '
@@ -65,7 +78,7 @@ test_expect_success 'when two skips collide, favor the larger 
one' '
# the next "have" sent will be "c1" (from "c6" skip 4) and not "c4"
# (from "c5side" skip 1).
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
+   trace_fetch client "$(pwd)/server" &&
have_sent c5side c11 c9 c6 c1 &&
have_not_sent c10 c8 c7 c5 c4 c3 c2
 '
@@ -91,7 +104,7 @@ test_expect_success 'use ref advertisement to filter out 

Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-09 Thread Jeff King
On Thu, Aug 09, 2018 at 02:08:56PM -0700, Junio C Hamano wrote:

> > Correct.  But I am more worried about the "mixed/overwriting"
> > breakage, if there is one; it means we may need to be prepared for
> > systems that lack O_APPEND that works correctly.  I initially just
> > assumed that it was what Dscho was seeing, but after re-reading his
> > message, I am not sure anymore.
> >
> > I think the "do not trace the other side" approach you suggest for
> > these tests that only care about one side is more appropriate
> > solution for this particular case.  We then do not have to worry
> > about overwriting or output from both sides mixed randomly.
> 
> A concluding sentence I forgot to add, after saying "this is simpler
> and better to fix test breakage", was
> 
>   But if we really are seeing O_APPEND breakage, a mandatory
>   locking mechanism like this one might be necessary to work
>   around it (I seriously hope we do not have to, though).
> 
> Sorry for an additional noise.

Yeah, my earlier email said "if we just care about this test, then...".

But if we do care about making this multiple-tracing case work all the
time, then we'd need some solution. If there's something that can work
like O_APPEND, even if we have to make some system-specific call, I'd
prefer that to locking.

I did some searching for atomic append on Windows and got mixed reports
on whether their FILE_DATA_APPEND does what we want or not. Knowing
nothing about msys, I'd _assume_ that O_APPEND would get translated to
that flag, but I got lost trying to find it in
git-for-windows/msys2-runtime.

Wouldn't it be wonderful if the solution were as simple as making sure
that flag was plumbed through? I seriously doubt it will be that easy,
though. :)

-Peff


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-09 Thread Junio C Hamano
Junio C Hamano  writes:

> Jeff King  writes:
>
>> Are you sure that it's not well-defined? We open the path with O_APPEND,
>> which means every write() will be atomically positioned at the end of
>> file. So we would never lose or overwrite data.
>>
>> We do our own buffering in a strbuf, writing the result out in a single
>> write() call (modulo the OS returning a short write, but that should not
>> generally happen when writing short strings to a file). So we should get
>> individual trace lines as atomic units.
>>
>> The order of lines from the two processes is undefined, of course.
>
> Correct.  But I am more worried about the "mixed/overwriting"
> breakage, if there is one; it means we may need to be prepared for
> systems that lack O_APPEND that works correctly.  I initially just
> assumed that it was what Dscho was seeing, but after re-reading his
> message, I am not sure anymore.
>
> I think the "do not trace the other side" approach you suggest for
> these tests that only care about one side is more appropriate
> solution for this particular case.  We then do not have to worry
> about overwriting or output from both sides mixed randomly.

A concluding sentence I forgot to add, after saying "this is simpler
and better to fix test breakage", was

But if we really are seeing O_APPEND breakage, a mandatory
locking mechanism like this one might be necessary to work
around it (I seriously hope we do not have to, though).

Sorry for an additional noise.


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-09 Thread Junio C Hamano
Jeff King  writes:

> Are you sure that it's not well-defined? We open the path with O_APPEND,
> which means every write() will be atomically positioned at the end of
> file. So we would never lose or overwrite data.
>
> We do our own buffering in a strbuf, writing the result out in a single
> write() call (modulo the OS returning a short write, but that should not
> generally happen when writing short strings to a file). So we should get
> individual trace lines as atomic units.
>
> The order of lines from the two processes is undefined, of course.

Correct.  But I am more worried about the "mixed/overwriting"
breakage, if there is one; it means we may need to be prepared for
systems that lack O_APPEND that works correctly.  I initially just
assumed that it was what Dscho was seeing, but after re-reading his
message, I am not sure anymore.

I think the "do not trace the other side" approach you suggest for
these tests that only care about one side is more appropriate
solution for this particular case.  We then do not have to worry
about overwriting or output from both sides mixed randomly.

> diff --git a/t/t5552-skipping-fetch-negotiator.sh 
> b/t/t5552-skipping-fetch-negotiator.sh
> index 3b60bd44e0..5ad5bece55 100755
> --- a/t/t5552-skipping-fetch-negotiator.sh
> +++ b/t/t5552-skipping-fetch-negotiator.sh
> @@ -28,6 +28,19 @@ have_not_sent () {
>   done
>  }
>  
> +# trace_fetch   [args]
> +#
> +# Trace the packet output of fetch, but make sure we disable the variable
> +# in the child upload-pack, so we don't combine the results in the same file.
> +trace_fetch () {
> + client=$1; shift
> + server=$1; shift
> + GIT_TRACE_PACKET="$(pwd)/trace" \
> + git -C "$client" fetch \
> +   --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \
> +   "$server" "$@"
> +}
> +
>  test_expect_success 'commits with no parents are sent regardless of skip 
> distance' '
>   git init server &&
>   test_commit -C server to_fetch &&
> @@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent 
> regardless of skip distanc
>   # "c1" has no parent, it is still sent as "have" even though it would
>   # normally be skipped.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c7 c5 c2 c1 &&
>   have_not_sent c6 c4 c3
>  '
> @@ -88,7 +101,7 @@ test_expect_success 'when two skips collide, favor the 
> larger one' '
>   # the next "have" sent will be "c1" (from "c6" skip 4) and not "c4"
>   # (from "c5side" skip 1).
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c5side c11 c9 c6 c1 &&
>   have_not_sent c10 c8 c7 c5 c4 c3 c2
>  '
> @@ -114,7 +127,7 @@ test_expect_success 'use ref advertisement to filter out 
> commits' '
>   # not need to send any ancestors of "c3", but we still need to send "c3"
>   # itself.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch &&
> + trace_fetch client origin to_fetch &&
>   have_sent c5 c4^ c2side &&
>   have_not_sent c4 c4^^ c4^^^
>  '
> @@ -144,7 +157,7 @@ test_expect_success 'handle clock skew' '
>   # and sent, because (due to clock skew) its only parent has already been
>   # popped off the priority queue.
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
> + trace_fetch client "$(pwd)/server" &&
>   have_sent c2 c1 old4 old2 old1 &&
>   have_not_sent old3
>  '
> @@ -176,7 +189,7 @@ test_expect_success 'do not send "have" with ancestors of 
> commits that server AC
>   test_commit -C server commit-on-b1 &&
>  
>   test_config -C client fetch.negotiationalgorithm skipping &&
> - GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" 
> to_fetch &&
> + trace_fetch client "$(pwd)/server" to_fetch &&
>   grep "  fetch" trace &&
>  
>   # fetch-pack sends 2 requests each containing 16 "have" lines before


Re: [PATCH 0/4] t5552: fix flakiness by introducing proper locking for GIT_TRACE

2018-08-09 Thread Jeff King
On Thu, Aug 09, 2018 at 10:35:24AM -0700, Johannes Schindelin via GitGitGadget 
wrote:

> The culprit is that two processes try simultaneously to write to the same
> file specified via GIT_TRACE_PACKET, and it is not well defined how that
> should work, even on Linux.

Are you sure that it's not well-defined? We open the path with O_APPEND,
which means every write() will be atomically positioned at the end of
file. So we would never lose or overwrite data.

We do our own buffering in a strbuf, writing the result out in a single
write() call (modulo the OS returning a short write, but that should not
generally happen when writing short strings to a file). So we should get
individual trace lines as atomic units.

The order of lines from the two processes is undefined, of course.

> This patch series addresses that by locking the trace fd. I chose to use 
> flock() instead of fcntl() because the Win32 API LockFileEx()
> [https://docs.microsoft.com/en-us/windows/desktop/api/fileapi/nf-fileapi-lockfileex]
>  
> (which does exactly what I want in this context) has much more similar
> semantics to the former than the latter.

I must admit I'm not excited to see us adding extra locking and
complexity when it is not necessarily needed. Is this a feature we
actually care about delivering for normal users of GIT_TRACE, or do we
just care about solving the test flakiness here?

Because we should be able to do the latter with the patch below.

I also wonder if we should be clearing GIT_TRACE as part of clearing
repo-env. It would do what we want automatically in this case, though
there are definitely cases where I prefer the current behavior (because
I really do want to see the upload-pack side of it). Of course I could
always use "--upload-pack 'GIT_TRACE_PACKET=... upload-pack'", so this
is really just a default.

diff --git a/t/t5552-skipping-fetch-negotiator.sh 
b/t/t5552-skipping-fetch-negotiator.sh
index 3b60bd44e0..5ad5bece55 100755
--- a/t/t5552-skipping-fetch-negotiator.sh
+++ b/t/t5552-skipping-fetch-negotiator.sh
@@ -28,6 +28,19 @@ have_not_sent () {
done
 }
 
+# trace_fetch   [args]
+#
+# Trace the packet output of fetch, but make sure we disable the variable
+# in the child upload-pack, so we don't combine the results in the same file.
+trace_fetch () {
+   client=$1; shift
+   server=$1; shift
+   GIT_TRACE_PACKET="$(pwd)/trace" \
+   git -C "$client" fetch \
+ --upload-pack 'unset GIT_TRACE_PACKET; git-upload-pack' \
+ "$server" "$@"
+}
+
 test_expect_success 'commits with no parents are sent regardless of skip 
distance' '
git init server &&
test_commit -C server to_fetch &&
@@ -42,7 +55,7 @@ test_expect_success 'commits with no parents are sent 
regardless of skip distanc
# "c1" has no parent, it is still sent as "have" even though it would
# normally be skipped.
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
+   trace_fetch client "$(pwd)/server" &&
have_sent c7 c5 c2 c1 &&
have_not_sent c6 c4 c3
 '
@@ -88,7 +101,7 @@ test_expect_success 'when two skips collide, favor the 
larger one' '
# the next "have" sent will be "c1" (from "c6" skip 4) and not "c4"
# (from "c5side" skip 1).
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
+   trace_fetch client "$(pwd)/server" &&
have_sent c5side c11 c9 c6 c1 &&
have_not_sent c10 c8 c7 c5 c4 c3 c2
 '
@@ -114,7 +127,7 @@ test_expect_success 'use ref advertisement to filter out 
commits' '
# not need to send any ancestors of "c3", but we still need to send "c3"
# itself.
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch origin to_fetch &&
+   trace_fetch client origin to_fetch &&
have_sent c5 c4^ c2side &&
have_not_sent c4 c4^^ c4^^^
 '
@@ -144,7 +157,7 @@ test_expect_success 'handle clock skew' '
# and sent, because (due to clock skew) its only parent has already been
# popped off the priority queue.
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" &&
+   trace_fetch client "$(pwd)/server" &&
have_sent c2 c1 old4 old2 old1 &&
have_not_sent old3
 '
@@ -176,7 +189,7 @@ test_expect_success 'do not send "have" with ancestors of 
commits that server AC
test_commit -C server commit-on-b1 &&
 
test_config -C client fetch.negotiationalgorithm skipping &&
-   GIT_TRACE_PACKET="$(pwd)/trace" git -C client fetch "$(pwd)/server" 
to_fetch &&
+   trace_fetch client "$(pwd)/server" to_fetch &&
grep "  fetch" trace &&
 
# fetch-pack sends 2 requests each containing 16 "have" 

Re: [PATCH 0/4] line-log: be more careful when adjusting multiple line ranges

2018-08-05 Thread Eric Sunshine
On Sat, Aug 4, 2018 at 6:18 PM Johannes Schindelin via GitGitGadget
 wrote:
> Now, I am fairly certain that the changes are correct, but given my track
> record with off-by-one bugs (and once even an off-by-two bug), I would
> really appreciate some thorough review of this code, in particular the
> second one that is the actual bug fix. I am specifically interested in
> reviews from people who know line-log.c pretty well and can tell me whether
> the src[i].end > target[j].end is correct, or whether it should actually
> have been a >= (I tried to wrap my head around this, but I would feel more
> comfortable if a domain expert would analyze this, whistling, and looking
> Eric's way).

Although I fixed a number of bugs in basic range-set manipulation code
(5 years ago), I have no experience with or knowledge of the code
actually dealing with range-set manipulation in relation to diffs, so
I'm not a domain expert by any means, and I'm _still_ trying to wrap
my head around that code.

That said, I left some comments on the individual patches suggesting a
simpler and more correct fix for the crash. (Despite that suggestion
for fixing the crash, I still can't say whether the actual
computations performed by the existing code are correct, since I still
don't understand that aspect of it.)


Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-18 Thread Junio C Hamano
Johannes Schindelin  writes:

> Hi Olga,
>
> On Fri, 13 Jul 2018, Оля Тележная wrote:
>
>> 2018-07-09 11:27 GMT+03:00 Оля Тележная :
>> > Hello everyone,
>> > This is my new attempt to start using oid_object_info_extended() in
>> > ref-filter. You could look at previous one [1] [2] but it is not
>> > necessary.
>> >
>> > The goal (still) is to improve performance by avoiding calling expensive
>> > functions when we don't need the information they provide
>> > or when we could get it by using a cheaper function.
>> >
>> > This patch is a middle step. In the end, I want to add new atoms
>> > ("objectsize:disk" and "deltabase") and reuse ref-filter logic in
>> > cat-file command.
>> >
>> > I also know about problems with memory leaks in ref-filter: that would
>> > be my next task that I will work on. Since I did not generate any new
>> > leaks in this patch (just use existing ones), I decided to put this
>> > part on a review and fix leaks as a separate task.
>> 
>> UPDATES since v1:
>> add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin)
>> improve second commit message (thanks to Junio C Hamano)
>> add static keyword (thanks to Ramsay Jones)
>> 
>> >
>> > Thank you!
>> >
>> > [1] https://github.com/git/git/pull/493
>
> Could you please populate the description of that PR so that SubmitGit
> picks it up as cover letter?

Thanks for suggesting that.  Yes, an updated version of a series,
even if it is a small one with just 4 or 5 patches, becomes much
easier to read with a well-written cover letter.


Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-18 Thread Johannes Schindelin
Hi Olga,

On Fri, 13 Jul 2018, Оля Тележная wrote:

> 2018-07-09 11:27 GMT+03:00 Оля Тележная :
> > Hello everyone,
> > This is my new attempt to start using oid_object_info_extended() in
> > ref-filter. You could look at previous one [1] [2] but it is not
> > necessary.
> >
> > The goal (still) is to improve performance by avoiding calling expensive
> > functions when we don't need the information they provide
> > or when we could get it by using a cheaper function.
> >
> > This patch is a middle step. In the end, I want to add new atoms
> > ("objectsize:disk" and "deltabase") and reuse ref-filter logic in
> > cat-file command.
> >
> > I also know about problems with memory leaks in ref-filter: that would
> > be my next task that I will work on. Since I did not generate any new
> > leaks in this patch (just use existing ones), I decided to put this
> > part on a review and fix leaks as a separate task.
> 
> UPDATES since v1:
> add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin)
> improve second commit message (thanks to Junio C Hamano)
> add static keyword (thanks to Ramsay Jones)
> 
> >
> > Thank you!
> >
> > [1] https://github.com/git/git/pull/493

Could you please populate the description of that PR so that SubmitGit
picks it up as cover letter?

Thanks,
Johannes

> > [2] 
> > https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/
> 

Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-13 Thread Оля Тележная
2018-07-09 11:27 GMT+03:00 Оля Тележная :
> Hello everyone,
> This is my new attempt to start using oid_object_info_extended() in
> ref-filter. You could look at previous one [1] [2] but it is not
> necessary.
>
> The goal (still) is to improve performance by avoiding calling expensive
> functions when we don't need the information they provide
> or when we could get it by using a cheaper function.
>
> This patch is a middle step. In the end, I want to add new atoms
> ("objectsize:disk" and "deltabase") and reuse ref-filter logic in
> cat-file command.
>
> I also know about problems with memory leaks in ref-filter: that would
> be my next task that I will work on. Since I did not generate any new
> leaks in this patch (just use existing ones), I decided to put this
> part on a review and fix leaks as a separate task.

UPDATES since v1:
add init to eaten variable (thanks to Szeder Gabor, Johannes Schindelin)
improve second commit message (thanks to Junio C Hamano)
add static keyword (thanks to Ramsay Jones)

>
> Thank you!
>
> [1] https://github.com/git/git/pull/493
> [2] 
> https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/


Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-10 Thread Johannes Schindelin
Hi Olga,

On Mon, 9 Jul 2018, Оля Тележная wrote:

> [2] 
> https://public-inbox.org/git/010201637254c969-a346030e-0b75-41ad-8ef3-2ac7e04ba4fb-000...@eu-west-1.amazonses.com/

This type of Message-Id makes me think that you used SubmitGit to send
this patch series.

The main problem I see here is that the patches are not sent as replies to
this cover letter, and therefore they are seemingly disconnected on the
mailing list.

It was also my impression that SubmitGit started supporting sending cover
letters, in which case you would not have to jump through hoops to thread
the mails properly. But for that to work, the PR has to have a description
which is then used as cover letter. I do not see any description in
https://github.com/git/git/pull/520, though. Maybe provide one?

Ciao,
Johannes

P.S.: You might have noticed that I am working (slowly, but steadily) on a
contender for SubmitGit that I call GitGitGadget. Originally, I really
wanted to enhance SubmitGit instead because I am a big believer of *not*
reinventing the wheel (so much energy gets wasted that way).

However, in this case the limitations of the chosen language (I do not
want to learn Scala, I have absolutely zero need to know Scala in any of
my other endeavors, and my time to learn new things is limited, so I spend
it wisely) and the limitations of the design (the UI is completely
separate from GitHub, you have to allow Amazon to send mails in your name,
and SubmitGit's design makes it impossible to work bi-directionally, it is
only GitHub -> mailing list, while I also want the option to add replies
on the mailing list as comments to the GitHub PR in the future) made me
reconsider.

If you want to kick the tires, so to say, I welcome you to give
GitGitGadget a try. It would require only a couple of things from you:

- You would have to settle for a branch name, and then not open new PRs
  for every iteration you want to send, but force-push the branch instead.

- You would have to open a PR at https://github.com/gitgitgadget/git.

- You would have to provide the cover letter via the PR's description (and
  update that description before sending newer iterations).

- I would have to add you to the list of users allowed to send patches via
  GitGitGadget (GitGitGadget has some really light-weight access control
  to prevent spamming).

- You would then send a new iteration by simply adding a comment to your
  PR that contains this command: /submit

- To integrate well with previous patch series iterations (i.e. to connect
  the threads), I would have to come up with a little bit of tooling to
  add some metadata that I have to reconstruct manually from your
  previously-sent iterations.

Re: [PATCH 0/4] Use oid_object_info() instead of read_object_file()

2018-07-09 Thread Junio C Hamano
Оля Тележная   writes:

> Hello everyone,
> This is my new attempt to start using oid_object_info_extended() in
> ref-filter. You could look at previous one [1] [2] but it is not
> necessary.

Yup, it sounds like a sensible thing to do to try asking object-info
helper instead of reading the whole object in-core and inspecting it
ourselves when we can avoid it.



Re: [PATCH 0/4] fsck: doc fixes & fetch.fsck.* implementation

2018-05-28 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> On Thu, May 24, 2018 at 9:02 PM, Jeff King  wrote:
>> On Thu, May 24, 2018 at 07:04:04PM +0200, Ævar Arnfjörð Bjarmason wrote:
>>
>>> That doesn't work, because that's for the server-side, but I need the
>>> fetch.fsck.* that doesn't exist. This works (I'll send a better patch
>>> with tests / docs etc. soon):
>>
>> Yeah, I think this is the right direction.

It seems that this 4-patch series is sufficiently commented and
earlier parts can be further polished soonish using help made by
others.  I need to re-read the last patch (implementation) but I
think this generally is good.

Thanks.


Re: [PATCH 0/4] a few mark_parents_uninteresting cleanups

2018-05-14 Thread Derrick Stolee

On 5/11/2018 2:00 PM, Jeff King wrote:

This is a follow-up to the discussion from February:

   
https://public-inbox.org/git/1519522496-73090-1-git-send-email-dsto...@microsoft.com/

There I theorized that some of these extra has_object_file() checks were
unnecessary. After poking around a bit, I've convinced myself that this
is the case, so here are some patches.

After Stolee's fix in ebbed3ba04 (revision.c: reduce object database
queries, 2018-02-24), I doubt this will provide any noticeable speedup.
IMHO the value is just in simplifying the logic.

The first two patches are the actual has_object_file simplifications.
The second two are an attempt to fix some head-scratching I had while
reading mark_parents_uninteresting(). I hope the result is easier to
follow, but I may have just shuffled one confusion for another. :)

   [1/4]: mark_tree_contents_uninteresting(): drop missing object check
   [2/4]: mark_parents_uninteresting(): drop missing object check
   [3/4]: mark_parents_uninteresting(): replace list with stack
   [4/4]: mark_parents_uninteresting(): avoid most allocation

  revision.c | 90 ++
  1 file changed, 50 insertions(+), 40 deletions(-)

-Peff


This series looks good to me. I found Patch 3 hard to read in the diff, 
so I just looked at the final result and the new arrangement is very 
clear about how it should behave.


Reviewed-by: Derrick Stolee 


Re: [PATCH 0/4] doc: cleaning up instances of \--

2018-05-10 Thread Jeff King
On Wed, Apr 18, 2018 at 01:24:55PM +0900, Junio C Hamano wrote:

> Martin Ågren  writes:
> 
> > This is a patch series to convert \-- to -- in our documentation. The
> > first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to
> > --option, 2015-05-13) to fix some instances that have appeared since.
> > The other three patches deal with standalone "\--" which we can't
> > always turn into "--" since it can be rendered as an em dash.
> 
> All looked sensible.  As you mentioned in [2/4], "\--::" that is
> part of an enumulation appear in documentation for about a dozen
> commands after the series, but I do not think we can avoid it.
> 
> One thing that makes me wonder related to these patches is if a
> newer AsciiDoc we assume lets us do without {litdd} macro.  This
> series and our earlier effort like 1c262bb7 ("doc: convert \--option
> to --option", 2015-05-13) mentions that "\--word" is less pleasant
> on the eyes than "--word", but the ugliness "two{litdd}words" has
> over "two--words" is far worse than that, so...

I think many cases that use {litdd} would be better off using literal
backticks anyway (e.g., git-add.txt mentions the filename
`git-add--interactive.perl`).

There are certainly a few that can't, though (e.g., config.txt uses
linkgit:git-web{litdd}browse[1]).  I agree that "\--" is less ugly there
(and seems to work on my modern asciidoc). There's some history on the
litdd versus "\--" choice in 565e135a1e (Documentation: quote
double-dash for AsciiDoc, 2011-06-29). That in turn references the
2839478774 (Work around em-dash handling in newer AsciiDoc, 2010-08-23),
but I wouldn't be surprised if all of that is now obsolete with our
AsciiDoc 8+ requirement.

-Peff

PS Late review, I know, but the patches look good to me. :)


Re: [PATCH 0/4] subtree: move out of contrib

2018-05-01 Thread Ævar Arnfjörð Bjarmason

On Tue, May 01 2018, Johannes Schindelin wrote:

> Hi Ævar,
>
> On Mon, 30 Apr 2018, Ævar Arnfjörð Bjarmason wrote:
>
>> I think at this point git-subtree is widely used enough to move out of
>> contrib/, maybe others disagree, but patches are always better for
>> discussion that patch-less ML posts.
>
> Sure, it is used widely enough.
>
> However, it flies in the face of so many GSoC efforts to introduce yet
> another one of those poorly portable Unix shell scripts, as central part
> of Git's code base.
>
> The script itself does look quite straight-forward to port to a builtin,
> so why not give it a try?

That's a valid point. I think it makes sense to leave that aside for
now, maybe the consensus is that subtree is fine in every way except
we'd like to have a policy not to introduce new shellscript built-ins.

Let's first just assume it's in C already and look at it in terms of its
functionality, to figure out if it's worth even getting to that point.

> If you are completely opposed to porting it to C, I will be completely
> opposed to moving it out of contrib/.

This series shows that we should split the concern about whether
something lives in contrib/ from whether it's built/installed by
default.

No matter if we decide that subtree should be a blessed default command
it makes sense to move it out of contrib, purely because as can be seen
from this series it'll replace >100 lines of hacks with 1 line in our
main Makefile.

We can then just e.g. add a flag to guard for it,
e.g. CONTRIB_SUBTREE=YesPlease.

But that's just an internal implementation detail of how we manage code
sitting in git.git.


Re: [PATCH 0/4] subtree: move out of contrib

2018-05-01 Thread Johannes Schindelin
Hi Ævar,

On Mon, 30 Apr 2018, Ævar Arnfjörð Bjarmason wrote:

> I think at this point git-subtree is widely used enough to move out of
> contrib/, maybe others disagree, but patches are always better for
> discussion that patch-less ML posts.

Sure, it is used widely enough.

However, it flies in the face of so many GSoC efforts to introduce yet
another one of those poorly portable Unix shell scripts, as central part
of Git's code base.

The script itself does look quite straight-forward to port to a builtin,
so why not give it a try?

If you are completely opposed to porting it to C, I will be completely
opposed to moving it out of contrib/.

If you need help with porting it, please come up with a task plan and I
can jump in, to help (but please do collaborate with me on this one, don't
leave all of the hard work to me).

Ciao,
Dscho

Re: [PATCH 0/4] subtree: move out of contrib

2018-05-01 Thread Duy Nguyen
On Mon, Apr 30, 2018 at 11:50 AM, Ævar Arnfjörð Bjarmason
 wrote:
> I think at this point git-subtree is widely used enough to move out of
> contrib/, maybe others disagree, but patches are always better for
> discussion that patch-less ML posts.

After narrow/partial clone becomes real, it should be "easy" to
implement some sort of narrow checkout that would achieve the same
thing. But it took me forever with all other stuff to get back to
this.

If we remove it from contrib and there are people willing to
update/maintain it, should it be a separate repository then? The
willing people will have much more freedom to update it. And I don't
have to answer the questions about who will maintain this thing in
git.git. I don't like the idea of adding new (official) shell-based
commands either to be honest.
-- 
Duy


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Avery Pennarun
On Mon, Apr 30, 2018 at 6:21 PM, Ævar Arnfjörð Bjarmason
 wrote:
> Pretty clear it's garbage data, unless we're to believe that the
> relative interest of submodules in the US, Germany and Sweden is 51, 64
> & 84, but 75, 100 and 0 for subtree.

Oh yeah, Swedish people hate git-subtree.  Nobody knows why.

Avery


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Ævar Arnfjörð Bjarmason

On Mon, Apr 30 2018, Avery Pennarun wrote:

> On Mon, Apr 30, 2018 at 5:38 PM, Stefan Beller  wrote:
> There's one exception, which is doing a one-time permanent merge of
> two projects into one.  That's a nice feature, but is probably used
> extremely rarely.

FWIW this is the only thing I've used it for. I do this occasionally and
used to do this manually with format-patch + "perl -pe" before or
similar when I needed to merge some repositories together, and then some
other times I was less stupid and manually started doing something
similar to what subtree is doing with a "move everything" commit just
before the merge of the two histories.

>> https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule
>>
>> Not sure what to make of this data.
>
> Clearly people need a lot more help when using submodules than when
> using subtree :)

Pretty clear it's garbage data, unless we're to believe that the
relative interest of submodules in the US, Germany and Sweden is 51, 64
& 84, but 75, 100 and 0 for subtree.


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Stefan Beller
On Mon, Apr 30, 2018 at 2:53 PM, Avery Pennarun  wrote:

> For the best of both worlds, I've often thought that a good balance
> would be to use the same data structure that submodule uses, but to
> store all the code in a single git repo under different refs, which we
> might or might not download (or might or might not have different
> ACLs) under different circumstances.

There has been some experimentation with having a simpler ref
surface on the submodule side,
https://public-inbox.org/git/cover.1512168087.git.jonathanta...@google.com/

The way you describe the future of submodules, all we'd have to do
is to teach git-clone how to select the the "interesting" refs for your use
case. Any other command would assume all submodule data to be
in the main repository.

The difference to Jonathans proposal linked above, would be the
object store to be in the main repo and the refs to be prefixed
per submodule instead of "shadowed".

>  However, when some projects get
> really huge (lots of very big submodule dependencies), then repacking
> one-big-repo starts becoming unwieldy; in that situation git-subtree
> also fails completely.

Yes, but that is a general scaling problem of Git that could be tackled,
e.g. repack into multiple packs serially instead of putting everything
into one pack.

>> Submodules do not need to produce a synthetic project history
>> when splitting off again, as the history is genuine. This allows
>> for easier work with upstream.
>
> Splitting for easier work upstream is great, and there really ought to
> be an official version of 'git subtree split', which is good for all
> sorts of purposes.
>
> However, I suspect almost all uses of the split feature are a)
> splitting a subtree that you previously merged in, or b) splitting a
> subtree into a separate project that you want to maintain separately
> from now on.  Repeated splits in case (a) are only necessary because
> you're not using submodules, or in case (b) are only necessary because
> you didn't *switch* to submodules when it finally came time to split
> the projects.  (In both cases you probably didn't switch to submodules
> because you didn't like one of its tradeoffs, especially the need to
> track multiple repos when you fork.)

That makes sense.

>
> There's one exception, which is doing a one-time permanent merge of
> two projects into one.  That's a nice feature, but is probably used
> extremely rarely.  More often people get into a
> merge-split-merge-split cycle that would be better served by a
> slightly improved git-submodule.

This rare use case is how git-subtree came into existence in gits
contrib directory AFAICT,
https://kernel.googlesource.com/pub/scm/git/git/+/634392b26275fe5436c0ea131bc89b46476aa4ae
which is interesting to view in git-show, but I think defaults could
be tweaked there, as it currently shows me mostly a license file.

>> Conceptually Gerrit is doing
>>
>>   while true:
>> git submodule update --remote
>> if worktree is dirty:
>> git commit "update the submodules"
>>
>> just that Gerrit doesn't poll but does it event based.
>
> ...and it's super handy :)  The problem is it's fundamentally
> centralized: because gerrit can serialize merges into the submodule,
> it also knows exactly how to update the link in the supermodule.  If
> there was wild branching and merging (as there often is in git) and
> you had to resolve conflicts between two submodules, I don't think it
> would be obvious at all how to do it automatically when pushing a
> submodule.  (This also works quite badly with git subtree --squash.)

With the poll based solution I don't think you'd run into many more
problems than you would with Gerrits solution.

In a nearby thread, we were just discussing the submodule merging
strategies,
https://public-inbox.org/git/1524739599.20251.17.ca...@klsmartin.com/
which might seem confusing, but the implementation is actually easy
as we just fastforward-only in submodules.

>>
>> https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule
>>
>> Not sure what to make of this data.
>
> Clearly people need a lot more help when using submodules than when
> using subtree :)

That could be true. :)

Thanks,
Stefan


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Avery Pennarun
On Mon, Apr 30, 2018 at 5:38 PM, Stefan Beller  wrote:
> On Mon, Apr 30, 2018 at 1:45 PM, Avery Pennarun  wrote:
> No objections from me either.
>
> Submodules seem to serve a slightly different purpose, though?

I think the purpose is actually the same - it's just the tradeoffs
that are difference.  Both sets of tradeoffs kind of suck.

> With Subtrees the superproject always contains all the code,
> even when you squash the subtree histroy when merging it in.
> In the submodule world, you may not have access to one of the
> submodules.

Right.  Personally I think it's a disadvantage of subtree that it
always contains all the code (what if some people don't want the code
for a particular build variant?).  However, it's a huge pain that
submodules *don't* contain all the code (what if I'm not online right
now, or the site supposedly containing the code goes offline, or I
want to make my own fork?).

For the best of both worlds, I've often thought that a good balance
would be to use the same data structure that submodule uses, but to
store all the code in a single git repo under different refs, which we
might or might not download (or might or might not have different
ACLs) under different circumstances.  However, when some projects get
really huge (lots of very big submodule dependencies), then repacking
one-big-repo starts becoming unwieldy; in that situation git-subtree
also fails completely.

> Submodules do not need to produce a synthetic project history
> when splitting off again, as the history is genuine. This allows
> for easier work with upstream.

Splitting for easier work upstream is great, and there really ought to
be an official version of 'git subtree split', which is good for all
sorts of purposes.

However, I suspect almost all uses of the split feature are a)
splitting a subtree that you previously merged in, or b) splitting a
subtree into a separate project that you want to maintain separately
from now on.  Repeated splits in case (a) are only necessary because
you're not using submodules, or in case (b) are only necessary because
you didn't *switch* to submodules when it finally came time to split
the projects.  (In both cases you probably didn't switch to submodules
because you didn't like one of its tradeoffs, especially the need to
track multiple repos when you fork.)

> Subtrees present you the whole history by default and the user
> needs to be explicit about not wanting to see history from the
> subtree, which is the opposite of submodules (though this
> may be planned in the future to switch).

It turns out that AFAIK, almost everyone prefers 'git subtree
--squash', which squashes into a single commit each time you merge,
much like git submodule does.  I doubt people would cry too much if
the full-history feature went away.

There's one exception, which is doing a one-time permanent merge of
two projects into one.  That's a nice feature, but is probably used
extremely rarely.  More often people get into a
merge-split-merge-split cycle that would be better served by a
slightly improved git-submodule.

>> The gerrit team (eg. Stefan Beller) has been doing some really great
>> stuff to make submodules more usable by helping with relative
>> submodule links and by auto-updating links in supermodules at the
>> right times.  Unfortunately doing that requires help from the server
>> side, which kind of messes up decentralization and so doesn't solve
>> the problem in the general case.
>
> Conceptually Gerrit is doing
>
>   while true:
> git submodule update --remote
> if worktree is dirty:
> git commit "update the submodules"
>
> just that Gerrit doesn't poll but does it event based.

...and it's super handy :)  The problem is it's fundamentally
centralized: because gerrit can serialize merges into the submodule,
it also knows exactly how to update the link in the supermodule.  If
there was wild branching and merging (as there often is in git) and
you had to resolve conflicts between two submodules, I don't think it
would be obvious at all how to do it automatically when pushing a
submodule.  (This also works quite badly with git subtree --squash.)

>> I really wish there were a good answer, but I don't know what it is.
>> I do know that lots of people seem to at least be happy using
>> git-subtree, and would be even happier if it were installed
>> automatically with git.
>
> https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule
>
> Not sure what to make of this data.

Clearly people need a lot more help when using submodules than when
using subtree :)

Have fun,

Avery


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Stefan Beller
On Mon, Apr 30, 2018 at 1:45 PM, Avery Pennarun  wrote:
> On Mon, Apr 30, 2018 at 5:50 AM, Ævar Arnfjörð Bjarmason
>  wrote:
>> I think at this point git-subtree is widely used enough to move out of
>> contrib/, maybe others disagree, but patches are always better for
>> discussion that patch-less ML posts.
>
> I really was hoping git-subtree would be completely obsoleted by
> git-submodule by now, but... it hasn't been, so... no objections on
> this end.

No objections from me either.

Submodules seem to serve a slightly different purpose, though?

With Subtrees the superproject always contains all the code,
even when you squash the subtree histroy when merging it in.
In the submodule world, you may not have access to one of the
submodules.

Submodules do not need to produce a synthetic project history
when splitting off again, as the history is genuine. This allows
for easier work with upstream.

Subtrees present you the whole history by default and the user
needs to be explicit about not wanting to see history from the
subtree, which is the opposite of submodules (though this
may be planned in the future to switch).

> The gerrit team (eg. Stefan Beller) has been doing some really great
> stuff to make submodules more usable by helping with relative
> submodule links and by auto-updating links in supermodules at the
> right times.  Unfortunately doing that requires help from the server
> side, which kind of messes up decentralization and so doesn't solve
> the problem in the general case.

Conceptually Gerrit is doing

  while true:
git submodule update --remote
if worktree is dirty:
git commit "update the submodules"

just that Gerrit doesn't poll but does it event based.

> I really wish there were a good answer, but I don't know what it is.
> I do know that lots of people seem to at least be happy using
> git-subtree, and would be even happier if it were installed
> automatically with git.

https://trends.google.com/trends/explore?date=all=git%20subtree,git%20submodule

Not sure what to make of this data.


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Avery Pennarun
On Mon, Apr 30, 2018 at 5:50 AM, Ævar Arnfjörð Bjarmason
 wrote:
> I think at this point git-subtree is widely used enough to move out of
> contrib/, maybe others disagree, but patches are always better for
> discussion that patch-less ML posts.

I really was hoping git-subtree would be completely obsoleted by
git-submodule by now, but... it hasn't been, so... no objections on
this end.

The gerrit team (eg. Stefan Beller) has been doing some really great
stuff to make submodules more usable by helping with relative
submodule links and by auto-updating links in supermodules at the
right times.  Unfortunately doing that requires help from the server
side, which kind of messes up decentralization and so doesn't solve
the problem in the general case.

I really wish there were a good answer, but I don't know what it is.
I do know that lots of people seem to at least be happy using
git-subtree, and would be even happier if it were installed
automatically with git.

Have fun,

Avery


Re: [PATCH 0/4] subtree: move out of contrib

2018-04-30 Thread Philip Oakley

From: "Ævar Arnfjörð Bjarmason" 

I think at this point git-subtree is widely used enough to move out of
contrib/, maybe others disagree, but patches are always better for
discussion that patch-less ML posts.



Assuming this lands in Git, then there will also need to be a simple follow 
on into Duy's series that is updating the command-list.txt (Message-Id: 
<20180429181844.21325-10-pclo...@gmail.com>). Duy's series also does the 
completions thing IIUC;-).

--
Philip


Ævar Arnfjörð Bjarmason (4):
 git-subtree: move from contrib/subtree/
 subtree: remove support for git version <1.7
 subtree: fix a test failure under GETTEXT_POISON
 i18n: translate the git-subtree command

.gitignore|   1 +
Documentation/git-submodule.txt   |   2 +-
.../subtree => Documentation}/git-subtree.txt |   3 +
Makefile  |   1 +
contrib/subtree/.gitignore|   7 -
contrib/subtree/COPYING   | 339 --
contrib/subtree/INSTALL   |  28 --
contrib/subtree/Makefile  |  97 -
contrib/subtree/README|   8 -
contrib/subtree/t/Makefile|  86 -
contrib/subtree/todo  |  48 ---
.../subtree/git-subtree.sh => git-subtree.sh  | 109 +++---
{contrib/subtree/t => t}/t7900-subtree.sh |  21 +-
13 files changed, 78 insertions(+), 672 deletions(-)
rename {contrib/subtree => Documentation}/git-subtree.txt (99%)
delete mode 100644 contrib/subtree/.gitignore
delete mode 100644 contrib/subtree/COPYING
delete mode 100644 contrib/subtree/INSTALL
delete mode 100644 contrib/subtree/Makefile
delete mode 100644 contrib/subtree/README
delete mode 100644 contrib/subtree/t/Makefile
delete mode 100644 contrib/subtree/todo
rename contrib/subtree/git-subtree.sh => git-subtree.sh (84%)
rename {contrib/subtree/t => t}/t7900-subtree.sh (99%)

--
2.17.0.290.gded63e768a






Re: [PATCH 0/4] doc: cleaning up instances of \--

2018-04-18 Thread brian m. carlson
On Tue, Apr 17, 2018 at 09:15:25PM +0200, Martin Ågren wrote:
> This is a patch series to convert \-- to -- in our documentation. The
> first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to
> --option, 2015-05-13) to fix some instances that have appeared since.
> The other three patches deal with standalone "\--" which we can't
> always turn into "--" since it can be rendered as an em dash.

This series looked sane to me as well.  I appreciate the work to keep
our documentation working in both AsciiDoc and Asciidoctor.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 0/4] doc: cleaning up instances of \--

2018-04-17 Thread Junio C Hamano
Martin Ågren  writes:

> This is a patch series to convert \-- to -- in our documentation. The
> first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to
> --option, 2015-05-13) to fix some instances that have appeared since.
> The other three patches deal with standalone "\--" which we can't
> always turn into "--" since it can be rendered as an em dash.

All looked sensible.  As you mentioned in [2/4], "\--::" that is
part of an enumulation appear in documentation for about a dozen
commands after the series, but I do not think we can avoid it.

One thing that makes me wonder related to these patches is if a
newer AsciiDoc we assume lets us do without {litdd} macro.  This
series and our earlier effort like 1c262bb7 ("doc: convert \--option
to --option", 2015-05-13) mentions that "\--word" is less pleasant
on the eyes than "--word", but the ugliness "two{litdd}words" has
over "two--words" is far worse than that, so...

Thanks, will queue.




Re: [PATCH 0/4] Convert some stash functionality to a builtin

2018-03-25 Thread Thomas Gummerer
On 03/24, Joel Teichroeb wrote:
> I've been working on converting all of git stash to be a
> builtin, however it's hard to get it all working at once with
> limited time, so I've moved around half of it to a new
> stash--helper builtin and called these functions from the shell
> script. Once this is stabalized, it should be easier to convert
> the rest of the commands one at a time without breaking
> anything.
> 
> I've sent most of this code before, but that was targetting a
> full replacement of stash. The code is overall the same, but
> with some code review changes and updates for internal api
> changes.

Thanks for splitting this up into multiple patches, I found that much
more pleasant to review, and thanks for your continued work on this :)

> Since there seems to be interest from GSOC students who want to
> work on converting builtins, I figured I should finish what I
> have that works now so they could build on top of it.
> 
> Joel Teichroeb (4):
>   stash: convert apply to builtin
>   stash: convert branch to builtin
>   stash: convert drop and clear to builtin
>   stash: convert pop to builtin
> 
>  .gitignore  |   1 +
>  Makefile|   1 +
>  builtin.h   |   1 +
>  builtin/stash--helper.c | 514 
> 
>  git-stash.sh|  13 +-
>  git.c   |   1 +
>  6 files changed, 526 insertions(+), 5 deletions(-)
>  create mode 100644 builtin/stash--helper.c
> 
> -- 
> 2.16.2
> 


Re: [PATCH 0/4] Speed up git tag --contains

2018-03-12 Thread Jeff King
On Mon, Mar 12, 2018 at 09:45:27AM -0400, Derrick Stolee wrote:

> > diff --git a/builtin/branch.c b/builtin/branch.c
> > index 8dcc2ed058..4d674e86d5 100644
> > --- a/builtin/branch.c
> > +++ b/builtin/branch.c
> > @@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, 
> > struct ref_sorting *sortin
> > memset(, 0, sizeof(array));
> > +   filter->with_commit_tag_algo = 1;
> > filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN);
> > if (filter->verbose)
> > 
> > drops my run of "git branch -a --contains HEAD~100" from 8.6s to
> > 0.4s on a repo with ~1800 branches. That sounds good, but on a repo with
> > a smaller number of branches, we may actually end up slower (because we
> > dig further down in history, and don't benefit from the multiple-branch
> > speedup).
> 
> It's good to know that we already have an algorithm for the multi-head
> approach. Things like `git branch -vv` are harder to tease out because the
> graph walk is called by the line-format code.

Yeah, the ahead/behind stuff will need some work. Part of it is just
code structuring. We know ahead of time which branches (and their
upstreams) are going to need this ahead/behind computation, so we should
be able to do collect them all for a single call.

But I'm not sure if a general multi-pair ahead/behind is going to be
easy. I don't have even experimental code for that. :)

We have a multi-pair ahead/behind command which we use at GitHub, but it
does each pair separately. It leans heavily on reachability bitmaps, so
the main advantage is that it's able to amortize the cost of loading the
bitmaps (both off disk, but also we sometimes have to walk to complete
the bitmaps).

-Peff


Re: [PATCH 0/4] Speed up git tag --contains

2018-03-12 Thread Derrick Stolee

On 3/3/2018 12:15 AM, Jeff King wrote:

On Fri, Jan 12, 2018 at 10:56:00AM -0800, csilvers wrote:


This is a resubmission of Jeff King's patch series to speed up git tag
--contains with some changes. It's been cooking for a while as:

Replying to this 6-year-old thread:

Is there any chance this could be resurrected?  We are using
phabricator, which uses `git branch --contains` as part of its
workflow.  Our repo has ~1000 branches on it, and the contains
operation is eating up all our CPU (and time).  It would be very
helpful to us to make this faster!

(The original thread is at
https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/

Sorry, this got thrown on my "to respond" pile and languished.


Thanks for adding me to the thread. It's good to know the pain point 
people are having around commit graph walks.



There are actually three things that make "git branch --contains" slow.

First, if you're filtering 1000 branches, we'll run 1000 merge-base
traversals, which may walk over the same commits multiple times.

These days "tag --contains" uses a different algorithm that can look at
all heads in a single traversal. But the downside is that it's
depth-first, so it tends to walk down to the roots. That's generally OK
for tags, since you often have ancient tags that mean getting close to
the roots anyway.

But for branches, they're more likely to be recent, and you can get away
without going very deep into the history.

So it's a tradeoff. There's no run-time switch to flip between them, but
a patch like this:

diff --git a/builtin/branch.c b/builtin/branch.c
index 8dcc2ed058..4d674e86d5 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
  
  	memset(, 0, sizeof(array));
  
+	filter->with_commit_tag_algo = 1;

filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN);
  
  	if (filter->verbose)


drops my run of "git branch -a --contains HEAD~100" from 8.6s to
0.4s on a repo with ~1800 branches. That sounds good, but on a repo with
a smaller number of branches, we may actually end up slower (because we
dig further down in history, and don't benefit from the multiple-branch
speedup).


It's good to know that we already have an algorithm for the multi-head 
approach. Things like `git branch -vv` are harder to tease out because 
the graph walk is called by the line-format code.



I tried to do a "best of both" algorithm in:

  https://public-inbox.org/git/20140625233429.ga20...@sigill.intra.peff.net/

which finds arbitrary numbers of merge bases in a single traversal.  It
did seem to work, but I felt uneasy about some of the corner cases.
I've been meaning to revisit it, but obviously have never gotten around
to it.

The second slow thing is that during the traversal we load each commit
object from disk. The solution there is to keep the parent information
in a faster cache. I had a few proposals over the years, but I won't
even bother to dig them up, because there's quite recent and promising
work in this area from Derrick Stolee:

   
https://public-inbox.org/git/1519698787-190494-1-git-send-email-dsto...@microsoft.com/

And finally, the thing that the patches you linked are referencing is
about using commit timestamps as a proxy for generation numbers. And
Stolee's patches actually leave room for real, trustable generation
numbers.

Once we have the serialized commit graph and generation numbers, think
the final step would just be to teach the "tag --contains" algorithm to
stop walking down unproductive lines of history. And in fact, I think we
can forget about the best-of-both multi-tip merge-base idea entirely.
Because if you can use the generation numbers to avoid going too deep,
then a depth-first approach is fine. And we'd just want to flip
git-branch over to using that algorithm by default.


I'll keep this in mind as a target for performance measurements in the 
serialized commit graph patch and the following generation number patch.


Thanks,
-Stolee


Re: [PATCH 0/4] Speed up git tag --contains

2018-03-08 Thread csilvers
} I had a few proposals over the years, but I won't even bother to dig
} them up, because there's quite recent and promising work in this
} area from Derrick Stolee:

It sounds like the best thing to do is to wait for this, then.

We managed to convert a bunch of our branches to tags, so our
immediate problem has been resolved.  But I'm sure it will come up
again as more branches are created...

carig


Re: [PATCH 0/4] Teach `--default` to `git-config(1)`

2018-03-05 Thread Taylor Blau
On Mon, Mar 05, 2018 at 06:17:25PM -0800, Taylor Blau wrote:
> Attached is a patch series to introduce `--default` and `--color` to the
> `git-config(1)` builtin with the aim of introducing a consistent interface to
> replace `--get-color`.

This series draws significant inspiration from Soukaina Nait Hmid's work
in:

  
https://public-inbox.org/git/0102015fb0bf2f74-cb456171-fe65-4d83-8784-b553c7c9e584-000...@eu-west-1.amazonses.com/

--
- Taylor


Re: [PATCH 0/4] Speed up git tag --contains

2018-03-02 Thread Jeff King
On Fri, Jan 12, 2018 at 10:56:00AM -0800, csilvers wrote:

> > This is a resubmission of Jeff King's patch series to speed up git tag
> > --contains with some changes. It's been cooking for a while as:
> 
> Replying to this 6-year-old thread:
> 
> Is there any chance this could be resurrected?  We are using
> phabricator, which uses `git branch --contains` as part of its
> workflow.  Our repo has ~1000 branches on it, and the contains
> operation is eating up all our CPU (and time).  It would be very
> helpful to us to make this faster!
> 
> (The original thread is at
> https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/

Sorry, this got thrown on my "to respond" pile and languished.

There are actually three things that make "git branch --contains" slow.

First, if you're filtering 1000 branches, we'll run 1000 merge-base
traversals, which may walk over the same commits multiple times.

These days "tag --contains" uses a different algorithm that can look at
all heads in a single traversal. But the downside is that it's
depth-first, so it tends to walk down to the roots. That's generally OK
for tags, since you often have ancient tags that mean getting close to
the roots anyway.

But for branches, they're more likely to be recent, and you can get away
without going very deep into the history.

So it's a tradeoff. There's no run-time switch to flip between them, but
a patch like this:

diff --git a/builtin/branch.c b/builtin/branch.c
index 8dcc2ed058..4d674e86d5 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -404,6 +404,7 @@ static void print_ref_list(struct ref_filter *filter, 
struct ref_sorting *sortin
 
memset(, 0, sizeof(array));
 
+   filter->with_commit_tag_algo = 1;
filter_refs(, filter, filter->kind | FILTER_REFS_INCLUDE_BROKEN);
 
if (filter->verbose)

drops my run of "git branch -a --contains HEAD~100" from 8.6s to
0.4s on a repo with ~1800 branches. That sounds good, but on a repo with
a smaller number of branches, we may actually end up slower (because we
dig further down in history, and don't benefit from the multiple-branch
speedup).

I tried to do a "best of both" algorithm in:

 https://public-inbox.org/git/20140625233429.ga20...@sigill.intra.peff.net/

which finds arbitrary numbers of merge bases in a single traversal.  It
did seem to work, but I felt uneasy about some of the corner cases.
I've been meaning to revisit it, but obviously have never gotten around
to it.

The second slow thing is that during the traversal we load each commit
object from disk. The solution there is to keep the parent information
in a faster cache. I had a few proposals over the years, but I won't
even bother to dig them up, because there's quite recent and promising
work in this area from Derrick Stolee:

  
https://public-inbox.org/git/1519698787-190494-1-git-send-email-dsto...@microsoft.com/

And finally, the thing that the patches you linked are referencing is
about using commit timestamps as a proxy for generation numbers. And
Stolee's patches actually leave room for real, trustable generation
numbers.

Once we have the serialized commit graph and generation numbers, think
the final step would just be to teach the "tag --contains" algorithm to
stop walking down unproductive lines of history. And in fact, I think we
can forget about the best-of-both multi-tip merge-base idea entirely.
Because if you can use the generation numbers to avoid going too deep,
then a depth-first approach is fine. And we'd just want to flip
git-branch over to using that algorithm by default.

-Peff


Re: [PATCH 0/4] Delete ignore_env member in struct repository

2018-02-26 Thread Duy Nguyen
On Tue, Feb 27, 2018 at 3:46 AM, Stefan Beller  wrote:
> What is the plan from here on?
> Should I build further series on top of yours? The next series will
> focus on the pack side of things (pack.h, packfile.{c,h})
>
> So maybe we'll have Junio merge down my series (and yours as it
> needs one reroll?) and then build on top of that?

I think that's the less painful way for everybody.
-- 
Duy


Re: [PATCH 0/4] Delete ignore_env member in struct repository

2018-02-26 Thread Stefan Beller
On Mon, Feb 26, 2018 at 2:30 AM, Nguyễn Thái Ngọc Duy  wrote:
> It turns out I don't need my other series [1] in order to delete this
> field. This series moves getenv() calls from
> repo_set_gitdir()/repo_setup_env() and prepare_alt_odb() back in
> environment.c where they belong in my opinion.
>
> The repo_set_gitdir() now takes $GIT_DIR and optionally all other
> configurable paths. If those paths are NULL, default repo layout will
> be used. With getenv() no longer called inside repo_set_gitdir(),
> ignore_env has no reason to stay. This is in 1/4.
>
> The getenv() in prepare_alt_odb() is also moved back to
> setup_git_env() in 3/4. It demonstrates how we could move other
> getenv() back to if we want.
>
> This series is built on top of Stefan's object-store-part1, v4. I
> could rebase it on 'master' too, but then Junio may need to resolve
> some conflicts.
>

Thanks for working on this,
I found this series a pleasant read, the only issue I saw was Erics upbringing
of multiple getenv calls without strdup()ing the content.

What is the plan from here on?
Should I build further series on top of yours? The next series will
focus on the pack side of things (pack.h, packfile.{c,h})

So maybe we'll have Junio merge down my series (and yours as it
needs one reroll?) and then build on top of that?

Thanks,
Stefan


Re: [PATCH 0/4] Delete ignore_env member in struct repository

2018-02-26 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> This series is built on top of Stefan's object-store-part1, v4. I
> could rebase it on 'master' too, but then Junio may need to resolve
> some conflicts.

As a follow-up fix for combined "move things to the_repo as much as
possible" efforts I think what you did makes most sense.  I would
say that it makes even more sense to make these part of the
object-store series if/when the series needs rerolling--after all
the topic is already a multi-author effort.

I'll have to read them carefully before commenting on the actual
patches ;-)  Thanks.

>
> [1] https://public-inbox.org/git/20180225111840.16421-1-pclo...@gmail.com/
>
> Nguyễn Thái Ngọc Duy (4):
>   repository.c: move env-related setup code back to environment.c
>   repository.c: delete dead functions
>   sha1_file.c: move delayed getenv(altdb) back to setup_git_env()
>   repository: delete ignore_env member
>
>  cache.h|  2 +-
>  environment.c  | 13 +++--
>  object-store.h |  5 +++-
>  object.c   |  1 +
>  repository.c   | 79 ++
>  repository.h   | 21 +++---
>  setup.c|  3 +-
>  sha1_file.c|  6 +---
>  8 files changed, 64 insertions(+), 66 deletions(-)


Re: [PATCH 0/4] Correct offsets of hunks when one is skipped

2018-02-19 Thread Phillip Wood
On 13/02/18 23:56, brian m. carlson wrote:
> On Tue, Feb 13, 2018 at 10:44:04AM +, Phillip Wood wrote:
>> From: Phillip Wood 
>>
>> While working on a patch series to stage selected lines from a hunk
>> without having to edit it I got worried that subsequent patches would
>> be applied in the wrong place which lead to this series to correct the
>> offsets of hunks following those that are skipped or edited.
>>
>> Phillip Wood (4):
>>   add -i: add function to format hunk header
>>   t3701: add failing test for pathological context lines
>>   add -p: Adjust offsets of subsequent hunks when one is skipped
>>   add -p: calculate offset delta for edited patches
>>
>>  git-add--interactive.perl  | 93 
>> +++---
>>  t/t3701-add-interactive.sh | 30 +++
>>  2 files changed, 102 insertions(+), 21 deletions(-)
> 
> This looks reasonably sane to me.  I really like that you managed to
> produce failing tests for this situation.  I know pathological cases
> like this have bit GCC in the past, so it's good that you fixed this.
> 

Thanks Brain, it's interesting to hear that GCC has been bitten in the past

Best Wishes

Phillip


Re: [PATCH 0/4] Correct offsets of hunks when one is skipped

2018-02-13 Thread brian m. carlson
On Tue, Feb 13, 2018 at 10:44:04AM +, Phillip Wood wrote:
> From: Phillip Wood 
> 
> While working on a patch series to stage selected lines from a hunk
> without having to edit it I got worried that subsequent patches would
> be applied in the wrong place which lead to this series to correct the
> offsets of hunks following those that are skipped or edited.
> 
> Phillip Wood (4):
>   add -i: add function to format hunk header
>   t3701: add failing test for pathological context lines
>   add -p: Adjust offsets of subsequent hunks when one is skipped
>   add -p: calculate offset delta for edited patches
> 
>  git-add--interactive.perl  | 93 
> +++---
>  t/t3701-add-interactive.sh | 30 +++
>  2 files changed, 102 insertions(+), 21 deletions(-)

This looks reasonably sane to me.  I really like that you managed to
produce failing tests for this situation.  I know pathological cases
like this have bit GCC in the past, so it's good that you fixed this.
-- 
brian m. carlson / brian with sandals: Houston, Texas, US
https://www.crustytoothpaste.net/~bmc | My opinion only
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 0/4] Speed up git tag --contains

2018-01-12 Thread csilvers
> This is a resubmission of Jeff King's patch series to speed up git tag
> --contains with some changes. It's been cooking for a while as:

Replying to this 6-year-old thread:

Is there any chance this could be resurrected?  We are using
phabricator, which uses `git branch --contains` as part of its
workflow.  Our repo has ~1000 branches on it, and the contains
operation is eating up all our CPU (and time).  It would be very
helpful to us to make this faster!

(The original thread is at
https://public-inbox.org/git/e1ou82h-0001xy...@closure.thunk.org/
)

craig


Re: [PATCH 0/4] "Fast-"track Git GUI changes

2018-01-10 Thread Johannes Schindelin
Hi,

On Tue, 9 Jan 2018, Junio C Hamano wrote:

> Junio C Hamano  writes:
> 
> > Johannes Schindelin  writes:
> >
> >> As it seems to be impossible to get the attention of the Git GUI
> >> maintainer these "days" (I have opened Pull Requests on October 16th
> >> 2016 that have not even been looked at), let's just side-step that
> >> contribution path which seems to be dormant.
> >
> > Good to see that finally somebody else steps up after I did the same
> > for a few times recently.
> >
> >> These fixes have been in Git for Windows for various amounts of time.
> >>
> >> Note: there are more Git GUI changes in Git for Windows, I only
> >> accumulated the ones I deem wort considering for inclusion into v2.16.0,
> >> still.
> >
> > Thanks.  I am not sure if it is too late for 2.16, as these are not
> > fixes for regression during this cycle, though.
> 
> Heh, I changed my mind.  
> 
> Just like I pretended to be interim maintainer of Git-GUI for the
> past two times, you are doing the same this time and I even agreed
> that it was a good thing that you volunteered to pretend as one.
> 
> So let's follow through the pretence to its conclusion and merge
> these directly to 'master'.

Thank you, fellow pretender.

Ciao,
Dscho


Re: [PATCH 0/4] "Fast-"track Git GUI changes

2018-01-09 Thread Junio C Hamano
Junio C Hamano  writes:

> Johannes Schindelin  writes:
>
>> As it seems to be impossible to get the attention of the Git GUI
>> maintainer these "days" (I have opened Pull Requests on October 16th
>> 2016 that have not even been looked at), let's just side-step that
>> contribution path which seems to be dormant.
>
> Good to see that finally somebody else steps up after I did the same
> for a few times recently.
>
>> These fixes have been in Git for Windows for various amounts of time.
>>
>> Note: there are more Git GUI changes in Git for Windows, I only
>> accumulated the ones I deem wort considering for inclusion into v2.16.0,
>> still.
>
> Thanks.  I am not sure if it is too late for 2.16, as these are not
> fixes for regression during this cycle, though.

Heh, I changed my mind.  

Just like I pretended to be interim maintainer of Git-GUI for the
past two times, you are doing the same this time and I even agreed
that it was a good thing that you volunteered to pretend as one.

So let's follow through the pretence to its conclusion and merge
these directly to 'master'.

Thanks.


Re: [PATCH 0/4] "Fast-"track Git GUI changes

2018-01-09 Thread Junio C Hamano
Johannes Schindelin  writes:

> As it seems to be impossible to get the attention of the Git GUI
> maintainer these "days" (I have opened Pull Requests on October 16th
> 2016 that have not even been looked at), let's just side-step that
> contribution path which seems to be dormant.

Good to see that finally somebody else steps up after I did the same
for a few times recently.

> These fixes have been in Git for Windows for various amounts of time.
>
> Note: there are more Git GUI changes in Git for Windows, I only
> accumulated the ones I deem wort considering for inclusion into v2.16.0,
> still.

Thanks.  I am not sure if it is too late for 2.16, as these are not
fixes for regression during this cycle, though.



Re: [PATCH 0/4] Add --no-ahead-behind to status

2017-12-20 Thread Jeff King
On Wed, Dec 20, 2017 at 02:42:41PM +, Jeff Hostetler wrote:

> From: Jeff Hostetler 
> 
> This patch series adds a "--no-ahead-behind" option to status
> to request that it avoid a possibly expensive ahead/behind
> computation for the current branch.  Instead, it just prints a
> not up to date message in place of the detailed counts.
> 
> This idea was previously discussed in [1].  Working with the
> enormous Windows repository, we found that 20+ seconds was being
> spent in the ahead/behind computation when the current branch was
> 150K commits behind the upstream branch.  (Yes, this happens and
> only took 3 weeks on the reporter's system.)

Overall this looks cleanly done. I raised a few minor points in the
individual patches, but certainly nothing that would be a show-stopper
for the general idea.

> I've only modified "git status" in this patch series.  A similar
> change could be added to "git branch -vv" and "git checkout" to
> avoid delays there too.  I avoided doing it here to keep this
> patch series focused.

I have a feeling that the same user who got annoyed by "git status" is
going to get annoyed by "git checkout" sooner or later. I'm OK with
handling the other commands separately, but I suspect we may want at
least "git checkout" support in the near future.

There is one thing that it's worth thinking about (because it will be
hard to take back later): config option naming. If your repository is so
large that ahead/behind checks are annoying, would you want to be able
to set a single config to tell Git that, rather than one for each
command? If so, then do we want to ditch "status.aheadbehind" in favor
of a more unified name?

-Peff


Re: [PATCH 0/4] Fix issues with rename detection limits

2017-11-10 Thread Elijah Newren
On Fri, Nov 10, 2017 at 10:13 AM, Elijah Newren  wrote:
> In a repo with around 60k files with deep directory hierarchies (due to

> Elijah Newren (4):
>   sequencer: Warn when internal merge may be suboptimal due to
> renameLimit
>   Remove silent clamp of renameLimit
>   progress: Fix progress meters when dealing with lots of work
>   sequencer: Show rename progress during cherry picks

Sorry for the duplicate send.

I noticed the cover letter didn't appear on in the email list,
double-checked https://public-inbox.org/git/, waited over half an hour
and double checked both email and public-inbox again and still didn't
see it, so I re-sent.  Just as soon as I did, it seems that the
original and the new cover letter emails suddenly showed up right
then.  *shrug*.


Re: [PATCH 0/4] WIP: git-remote-media wiki namespace support

2017-10-30 Thread Antoine Beaupré
On 2017-10-30 11:40:06, Matthieu Moy wrote:
> Antoine Beaupré  writes:
>
>> Obviously, doing unit tests against a full MediaWiki instance isn't
>> exactly trivial.
>
> Not trivial, but doable: there is all the infrastructure to do so in t/:
> install-wiki.sh to automatically install Mediawiki, and then a testsuite
> that interacts with it.
>
> This has been written under the assumption that the developer had a
> lighttpd instance running on localhost, but this can probably be adapted
> to run on Travis-CI (install lighttpd & Mediawiki in the install: part,
> and run the tests afterwards), so that anyone can run the tests by just
> submitting a pull-request to Git-Mediawiki.
>
> If you are to work more on Git-Mediawiki, don't underestimate the
> usefullness of the testsuite (for example, Git-Mediawiki was developped
> against a prehistoric version of Mediawiki, the testsuite can help
> ensuring it still works on the lastest version), nor the fun of playing
> with install scripts and CI systems ;-).

Hello!

Glad to hear from you. :)

So I actually tried install-wiki.sh, and it "failed to start lighttpd"
and told me to see logs. I couldn't find them and stopped there...

It would be great to hook this up into CI somewhere, but I suspect it
isn't considering how it doesn't actually work out of the box.

I'm hoping we can still do things and fix some things without going
through that trouble, but I recognize it would be better to have unit
tests operational.

Honestly, I would prefer just having this thing work and not have to
work on it. :) I have lots of things on my plate and I'm just scratching
an itch on this one - some backup script broke and I am trying to fix
it. Once it works, my work is done, so unfortunately I cannot lead that
project (but I'd be happy to help when I can of course).

A.

-- 
The greatest tragedy in mankind's entire history may be the hijacking of
morality by religion.
- Arthur C. Clarke


Re: [PATCH 0/4] WIP: git-remote-media wiki namespace support

2017-10-30 Thread Matthieu Moy
Antoine Beaupré  writes:

> Obviously, doing unit tests against a full MediaWiki instance isn't
> exactly trivial.

Not trivial, but doable: there is all the infrastructure to do so in t/:
install-wiki.sh to automatically install Mediawiki, and then a testsuite
that interacts with it.

This has been written under the assumption that the developer had a
lighttpd instance running on localhost, but this can probably be adapted
to run on Travis-CI (install lighttpd & Mediawiki in the install: part,
and run the tests afterwards), so that anyone can run the tests by just
submitting a pull-request to Git-Mediawiki.

If you are to work more on Git-Mediawiki, don't underestimate the
usefullness of the testsuite (for example, Git-Mediawiki was developped
against a prehistoric version of Mediawiki, the testsuite can help
ensuring it still works on the lastest version), nor the fun of playing
with install scripts and CI systems ;-).

Cheers,

-- 
Matthieu Moy
https://matthieu-moy.fr/


Re: [PATCH 0/4] (x)diff cleanup: remove duplicate code

2017-10-24 Thread Stefan Beller
On Tue, Oct 24, 2017 at 11:59 AM, Stefan Beller  wrote:
> Junio wrote:
>
>>  * I was hoping that the next_byte() and string_hash() thing, once
>>they are cleaned up, will eventually be shared with the xdiff/
>>code at the lower layer, which needs to do pretty much the same
>>in order to implement various whitespace ignoring options.  I am
>>not sure how well the approach taken by the WIP patch meshes with
>>the needs of the lower layer.
>
> This series does exactly this; although I chose to reuse the code in
> xdiff/xutils.c instead of the new fancy next_byte/string_hash, as that
> code has seen more exercise already (hence I assume it has fewer bugs
> if any as well as its performance implications are well understood).
>
> However to do so, we need to pollute xdiff/xutils.c and include
> hashmap.h there (which also requires cache.h as that header has
> an inline function using BUG()), which I find a bit unfortunate but
> worth the trade off of using better tested code.
>

This, of course, applies on top of jk/diff-color-moved-fix.


Re: [PATCH 0/4] fsmonitor fixes

2017-10-20 Thread Johannes Schindelin
Hi Alex,

On Thu, 19 Oct 2017, Alex Vandiver wrote:

> A few fixes found from playing around with the fsmonitor branch in
> next.

Thank you for having a look, and even better: for sending a couple of
improvements on top of Ben's & my work.

I sent a couple of suggestions and hope you will find the useful?

Ciao,
Johannes


Re: [PATCH 0/4] pre-merge hook

2017-10-16 Thread Junio C Hamano
Michael J Gruber  writes:

> This series revives an old suggestion of mine to make merge honor
> pre-commit hook or a separate pre-merge hook

This seems to have become an abandoned loose end, so I'll drop the
topic from my tree for now; revival of the discussion is _not_
unwelcome (aka "dropping without prejudice").




Re: [PATCH 0/4] pre-merge hook

2017-10-02 Thread Stefan Beller
On Sun, Oct 1, 2017 at 10:54 PM, Junio C Hamano  wrote:
> Michael J Gruber  writes:
>
>> Now that f8b863598c ("builtin/merge: honor commit-msg hook for merges",
>> 2017-09-07) has landed, merge is getting closer to behaving like commit,
>> which is important because both are used to complete merges (automatic
>> vs. non-automatic).
>
> Just a gentle ping to the thread to see if it is still alive.
>
> I think people agree that this is a good thing to do, but it seems
> that the execution leaves a few loose ends, judging from the review
> comments that have yet to be answered.
>
> Thanks.

I agree. IIRC the series was ok in design and goal, just minor comments
for coding.

Thanks,
Stefan


Re: [PATCH 0/4] pre-merge hook

2017-10-01 Thread Junio C Hamano
Michael J Gruber  writes:

> Now that f8b863598c ("builtin/merge: honor commit-msg hook for merges",
> 2017-09-07) has landed, merge is getting closer to behaving like commit,
> which is important because both are used to complete merges (automatic
> vs. non-automatic).

Just a gentle ping to the thread to see if it is still alive.

I think people agree that this is a good thing to do, but it seems
that the execution leaves a few loose ends, judging from the review
comments that have yet to be answered.

Thanks.


Re: [PATCH 0/4] Fixes from the per-repository object store series

2017-09-14 Thread Junio C Hamano
Jonathan Nieder  writes:

> This is a continuation of the series at [1].  That was part 1 ---
> you can think of this as part 0, since it contains the simplest and
> least controversial part of the series --- each patch in this series
> is a bugfix in its own right.
>
> Patch 1 should be familiar if you reviewed the series [1].  It is
> unchanged from the patch there, except to note Peff's ack.
>
> Patches 2-4 have not yet visited the list but are fixes (or, in the
> case of patch 4, cleanups) noticed as part of the same process of
> teaching object store code to handle multiple repositories.
>
> We hope that splitting these out should make them easier to review
> and for people to benefit from these fixes without having to wait
> for the rest of the series to settle.

One thing that is unclear is if you are tentatively withdrawing the
longer series with this update, but I'd assume that it is the case
because these two obviously will conflict with each other, and the
most of what these four patches do are subset of what the larger one
you sent earlier do.

Thanks.


Re: [PATCH 0/4] Test name-rev with small stack

2017-09-11 Thread Jeff King
On Fri, Sep 08, 2017 at 02:33:35PM +0200, Michael J Gruber wrote:

> Looking at it more closely, the solution in cbc60b6720 ("git tag
> --contains: avoid stack overflow", 2014-04-24) seems to be a bit "ad
> hoc" to me:
> 
> First of all, there is more than "tag --contains" that may exceed the
> stack by recursion. So I would expect the solution to be more general,
> and not localised and specialised to builtin/tag.c

At the time, tag was the only one using this depth-first contains
algorithm. It's since been adapted to ref-filter.c, but of course the
stack handling went with it.

Most traversals have a date-sorted queue, so are effectively doing a
breadth-first iteration with no recursion.

> Second, this is a walk, so I'm wondering whether our revision walk
> machinery should be the place to add missing functionality (if any).
> That way, everything would benefit from possible or existing
> improvements there. For example, I think some of our "extra walkers"
> don't heed object replacements. (I need to test more.)

It's possible that name-rev could make better use of the existing
traversal machinery. It's often tough to do so, though, because that
machinery gives you a linearized ordering of the commits. Whereas
something like name-rev really cares about the order that it visits the
commits, because it's building up the names.

It's the same for this "tag --contains" traversal. It _used_ to be a
series of merge-base computations. But by doing a custom traversal, we
can cache incremental results through the graph and avoid walking over
the same bits multiple times. There actually is a way to do it with the
regular breadth-first traversal, but you have to store one bit per ref
you're checking for each commit.

I played around with that a bit a while ago, and it did seem to work. I
can dig up the patches if you're interested. But one downside is that
one bit per ref per commit adds up if you have a lot of refs. A large
number of those bitfields will be the same, so you could probably get by
with a copy-on-write scheme, but I never implemented that.

Of course somebody may have a more clever algorithm, too. I don't claim
the above is a proof. ;)

-Peff


Re: [PATCH 0/4] Test name-rev with small stack

2017-09-08 Thread Michael J Gruber
Jeff King venit, vidit, dixit 07.09.2017 16:54:
> On Thu, Sep 07, 2017 at 04:02:19PM +0200, Michael J Gruber wrote:
> 
>> name-rev segfaults for me in emacs.git with the typical 8102 stack size.
>> The reason is the recursive walk that name-rev uses.
>>
>> This series adds a test to mark this as known failure, after some
>> clean-ups.
> 
> These all look reasonable to me. The size of the test case in the final
> one is presumably arbitrary and just copied from t7004. I don't know if
> it's worth trying to shrink it. It could shorten a rather expensive
> test. OTOH, if we shorten it too much then we might get a false pass
> (e.g., if the algorithm remains recursive but has a smaller stack
> footprint).
> 
>> Michael J Gruber (4):
>>   t7004: move limited stack prereq to test-lib
>>   t6120: test name-rev --all and --stdin
>>   t6120: clean up state after breaking repo
>>   t6120: test describe and name-rev with deep repos
> 
> Now comes the hard part: rewriting the C code. :)

Looking at it more closely, the solution in cbc60b6720 ("git tag
--contains: avoid stack overflow", 2014-04-24) seems to be a bit "ad
hoc" to me:

First of all, there is more than "tag --contains" that may exceed the
stack by recursion. So I would expect the solution to be more general,
and not localised and specialised to builtin/tag.c

Second, this is a walk, so I'm wondering whether our revision walk
machinery should be the place to add missing functionality (if any).
That way, everything would benefit from possible or existing
improvements there. For example, I think some of our "extra walkers"
don't heed object replacements. (I need to test more.)

Michael


Re: [PATCH 0/4] Test name-rev with small stack

2017-09-07 Thread Jeff King
On Thu, Sep 07, 2017 at 04:02:19PM +0200, Michael J Gruber wrote:

> name-rev segfaults for me in emacs.git with the typical 8102 stack size.
> The reason is the recursive walk that name-rev uses.
> 
> This series adds a test to mark this as known failure, after some
> clean-ups.

These all look reasonable to me. The size of the test case in the final
one is presumably arbitrary and just copied from t7004. I don't know if
it's worth trying to shrink it. It could shorten a rather expensive
test. OTOH, if we shorten it too much then we might get a false pass
(e.g., if the algorithm remains recursive but has a smaller stack
footprint).

> Michael J Gruber (4):
>   t7004: move limited stack prereq to test-lib
>   t6120: test name-rev --all and --stdin
>   t6120: clean up state after breaking repo
>   t6120: test describe and name-rev with deep repos

Now comes the hard part: rewriting the C code. :)

-Peff


Re: [PATCH 0/4] vcs-svn: remove repo_tree library

2017-08-22 Thread Stefan Beller
On Tue, Aug 22, 2017 at 4:37 PM, Jonathan Nieder  wrote:
> Hi,
>
> Stefan noticed that repo_init from vcs-svn/repo_tree.h conflicts with
> repository.h[1].  Earlier brian m. carlson noticed the same thing[2].
>
> Originally repo_tree.h was used to manage an in-memory representation
> of the state of the svn tree being imported.  When that in-memory
> representation was retired, we were lazy and left some utility
> functions there.  Here is a patch series to finish cleaning up and
> remove vcs-svn/repo_tree.h completely.
>
> This is an alternative to bc/vcs-svn-cleanup from 'next'.  Those
> patches weren't cc-ed to me and I missed them --- sorry about that.  I
> can rebase on top of them if that is more convenient.

A rebased version would be easier IIUC Junios reply to
the one-off that I sent.

The patches look good.

Thanks,
Stefan

>
> Thoughts of all kinds welcome, as always.
>
> Thanks,
> Jonathan
>
> Jonathan Nieder (4):
>   vcs-svn: remove prototypes for missing functions
>   vcs-svn: remove custom mode constants
>   vcs-svn: remove repo_delete wrapper function
>   vcs-svn: move remaining repo_tree functions to fast_export.h
>
>  Makefile  |  1 -
>  vcs-svn/fast_export.c | 41 +
>  vcs-svn/fast_export.h |  3 +++
>  vcs-svn/repo_tree.c   | 48 
>  vcs-svn/repo_tree.h   | 23 ---
>  vcs-svn/svndump.c | 33 -
>  6 files changed, 56 insertions(+), 93 deletions(-)
>  delete mode 100644 vcs-svn/repo_tree.c
>  delete mode 100644 vcs-svn/repo_tree.h
>
> [1] https://public-inbox.org/git/20170822213501.5928-1-sbel...@google.com
> [2] 
> https://public-inbox.org/git/2017082122.26729-3-sand...@crustytoothpaste.net


Re: [PATCH 0/4] dropping support for older curl

2017-08-14 Thread Johannes Schindelin
Hi Paolo,

On Thu, 10 Aug 2017, Paolo Ciarrocchi wrote:

> Il 10 ago 2017 11:39 AM, "Johannes Schindelin" 
> ha scritto:
> 
> 
> 
> Footnote *1*: It is no secret that I find our patch submission less than
> inviting. Granted, *I* use it. *I* did not have problems entering the
> mailing list. But then, my mails were not swallowed silently, because my
> mail program does not send HTML by default. And prepared by the German
> school system (I learned the term "sugar coating" only when exposed to
> some US culture), I had little emotional problems with being criticized
> and not thanked for my contribution, I persisted nevertheless. The opinion
> that the Git contribution process is a lot less inviting than it could be
> is not only my view, by the way. I hear this a lot. I give you that we are
> not quite as textbook "keep out from here unless you look like us, smell
> like us, talk like us, have the same genital setup like us" as the Linux
> kernel mailing list, but we are in a different universe compared to, say,
> the Drupal community. And their universe is a lot nicer to live in.
> 
> 
> Isn't SumbitGit a possible answer to your doubts (I strongly agree with
> you) about the current development process?

No. I hate to say that SubmitGit neither integrates well with GitHub Pull
Requests (code comments on GitHub are approximately 1,523x easier to
write, read, associate with the actual code, and see the current state of,
compared to the mailing list, and SubmitGit does not even hint at
integrating with that user experience).

Also, the barrier to start using SubmitGit is rather high. If you open a
Pull Request on github.com/git/git, you get *no* indication that SubmitGit
is an option to *actually* get the code into Git. There are also concerns
about required permissions that Junio Hamano himself would not accept.

Now, let's assume that you submitted the code via SubmitGit. The
challenges of the patch submission process do not end there, yet SubmitGit
goes home and has a beer. But the hard part, the discussions on the
mailing list, the status updates in the completely separate What's cooking
mails, the missing links back to the original source code (let alone the
information in which worktree on your computer under which branch name
that topic was developed again?), the diverging mail threads, the
"rerolls" that should not forget to Cc: all reviewers of previous rounds,
all that jazz is still all very, very manual.

And even if there was an easier path from having a local branch that works
to finally getting it onto the list in the required form, your mail is an
eloquent example of one of the most preposterous hurdles along the way: we
pride ourselves with the openness we demonstrate by communicating via a
mailing list, everybody has a mail address, amirite? But of course, HTML
mails, like, about 130% of all mails on the internet (or so it feels),
including yours, are dropped. Silently. Not so open anymore, are we.

It is all so frustrating, really. I work in a team (Visual Studio Team
Services, you can think of it as kind of a Microsoft take on Git hosting
plus a lot of other tooling) where we get really, really positive feedback
regarding our user experience, in particular the frequent enhancements to
PRs. It is really powerful to open, review and merge PRs, interact with
other developers, see the state of PRs and their respective discussions,
open and resolve issues, automate workflows and build tasks [*1*]. And
then I try to convince people here on this mailing list that it really
makes a difference if you start using tools to lighten the load of
everybody, and... little changes.

At least thanks to Lars Schneider's incredible efforts we have Continuous
Testing (I know how much time he spent on this, it is another thing
deserving the label "preposterous", and I know how much time I spent on
top to add the Windows part which mostly works). If only we could push it
further to true Continuous Integration. Or at least to accepting PRs from
professionals who simply have no time to fight with our patch contribution
process and whose expertise we lose as a consequence.

Ciao,
Johannes

Footnote *1*: The funniest part about this is that I do get mails about
all of this all the time. When I am pulled in as a reviewer. When a build
failed. When a previously failing task was fixed by a new build. When
somebody responded to my comments. The difference to the mailing
list-centric approach is of course that those mails are only
notifications, and link back to the tool appropriately supporting what
I, the user, want to get done.


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 07:09:02PM -0400, Jeff King wrote:

> The first is "should we eventually drop support for antiquated versions
> of dependencies?". And the argument in favor is the one I was making
> here: besides lowering maintenance cost, it is more honest to our users
> about what to expect[1].

As usual, I forgot all my footnotes.

[1] When I've talked about keeping expectations reasonable, I'm a lot
less interested in "oops, I built Git and this particular feature
didn't work". It's easy to write that off as "well, you have an old
version of curl, patches welcome". I'm much more concerned about
security issues. Curl is network-facing. Leaving aside security
vulnerabilities in curl itself (which hopefully distros with 10-year
support periods would fix), I wouldn't be surprised if there are
bad interactions possible due to our tangle of ifdefs.

One way to address that would be more careful auditing. But then
that goes back to the cost/benefit thing.

> One is to do it by date and what dependencies are in long-term OS
> releases, and then compare that to the benefit. Requiring curl 7.11.1
> still keeps us working back to rhel4, which was already end-of-lifed
> completely after a 12 year run. Bumping to 7.16.0 drops rhel4 and rhel5,
> the latter of which is in its final "barely supported" phase after 10
> years. But it gives us a bit more bang for our buck by making CURL_MULTI
> uconditional[2].  Requiring 7.19.4 actually doesn't drop any more rhel
> releases. So by that metric, we might as well go there.

[2] The line-count change from dropping CURL_MULTI isn't _too_ exciting.
But a lot of the tangled design of our http code revolves around
the abstractions we've introduced. I have a feeling that it will
enable further cleanups as we move forward (OTOH, a lot of the worst
parts of our design are because of _using_ curl_multi for dumb http,
which of course hardly anyone does these days. But I have a feeling
if I suggested removing that, people would really scream).

-Peff


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen

On 11/08/17 01:23, Jeff King wrote:

On Fri, Aug 11, 2017 at 01:17:51AM +0200, Tom G. Christensen wrote:


OK, thanks for double-checking. I'm still puzzled why your build
succeeds and mine does not.


I know what's going on now and it's so simple.
Red Hats version of curl 7.15.5 includes a number of patches including one
that backports support for CURLPROTO_* (as part of a fix for CVE-2009-0037).
I haven't checked el6 but I would not be surprised if there where similar
things going on there.


el6 should have it already as part of 7.19.7, right?



Yes of course.


So in conclusion version based #ifdefs are misleading when used with curl as
shipped with RHEL.


Yeah, that's certainly an interesting finding. In this case your builds
are missing out on redirect protection that we _could_ be providing.



Yes and I'm looking into that right now.


If we do keep the compat ifdefs around this feature, it may be worth
converting them to "#ifdef CURLPROTO_HTTP" to more directly check the
feature.



Yes, a feature test would be better.

-tgc


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Fri, Aug 11, 2017 at 01:17:51AM +0200, Tom G. Christensen wrote:

> > OK, thanks for double-checking. I'm still puzzled why your build
> > succeeds and mine does not.
> 
> I know what's going on now and it's so simple.
> Red Hats version of curl 7.15.5 includes a number of patches including one
> that backports support for CURLPROTO_* (as part of a fix for CVE-2009-0037).
> I haven't checked el6 but I would not be surprised if there where similar
> things going on there.

el6 should have it already as part of 7.19.7, right?

> So in conclusion version based #ifdefs are misleading when used with curl as
> shipped with RHEL.

Yeah, that's certainly an interesting finding. In this case your builds
are missing out on redirect protection that we _could_ be providing.

If we do keep the compat ifdefs around this feature, it may be worth
converting them to "#ifdef CURLPROTO_HTTP" to more directly check the
feature.

-Peff


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen

On 11/08/17 00:54, Jeff King wrote:

On Fri, Aug 11, 2017 at 12:23:42AM +0200, Tom G. Christensen wrote:



Er, sorry if I'm being dense, but how? Are you suggesting that by
removing the callsite of get_curl_allowed_protocols(), the compiler
might elide the now-dead code completely? I could certainly see it being
dropped after the compilation, but I'm surprised that it wouldn't
complain about the undeclared identifiers in the first place.


You're right, that should not be able to handle it.


Can you please double-check that you're
building against the correct version of curl, and that you are building
the HTTP parts of Git (which _are_ optional, and the test suite will
pass without them).


I use a mock buildroot and there is no other curl than the vendor supplied
7.15.5 installed:
[...]


OK, thanks for double-checking. I'm still puzzled why your build
succeeds and mine does not.



I know what's going on now and it's so simple.
Red Hats version of curl 7.15.5 includes a number of patches including 
one that backports support for CURLPROTO_* (as part of a fix for 
CVE-2009-0037).
I haven't checked el6 but I would not be surprised if there where 
similar things going on there.


So in conclusion version based #ifdefs are misleading when used with 
curl as shipped with RHEL.


-tgc


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 03:17:06PM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Thu, Aug 10, 2017 at 11:36:41AM +0200, Johannes Schindelin wrote:
> >
> >> Hopefully I had better luck expressing my concerns this time?
> >
> > I understand your argument much better now. I'm still not sure I agree.
> >
> > -Peff
> 
> I do not think "there are a dozen #ifdefs and I don't know whether
> they still work. I don't know whether anybody (who most likely has
> better things to do than read the Git mailing list) is still using
> those.  So let's just remove them." was why you were suggesting to
> clean up the (apparent) support of older curl in the code, though.
> 
> Isn't the reason why your series simplifies these #ifdefs away
> because we by accident started using some features that require a
> version that is even newer than any of these #ifdef's try to cater
> to and yet nobody complained?  That is a lot more similar to the
> removal of rsync transport that happened in a not so distant past,
> where the reason for removal was "We have been shipping code that
> couldn't have possibly worked for some time and nobody complained
> ---we know nobody is depending on it."

I think there are two questions to be asked, and their answers come from
different lines of reasoning.

The first is "should we eventually drop support for antiquated versions
of dependencies?". And the argument in favor is the one I was making
here: besides lowering maintenance cost, it is more honest to our users
about what to expect[1].

The second is "how far back should we keep support?".

And there are two lines of thought there.

One is to do it by date and what dependencies are in long-term OS
releases, and then compare that to the benefit. Requiring curl 7.11.1
still keeps us working back to rhel4, which was already end-of-lifed
completely after a 12 year run. Bumping to 7.16.0 drops rhel4 and rhel5,
the latter of which is in its final "barely supported" phase after 10
years. But it gives us a bit more bang for our buck by making CURL_MULTI
uconditional[2].  Requiring 7.19.4 actually doesn't drop any more rhel
releases. So by that metric, we might as well go there.

And the second line of thought is: it was already broken and nobody
reported it or offered up a fix. And that's where the "similar to rsync"
thing comes in. Though in this case we do have some evidence that people
(at least Tom) was patching and distributing behind the scenes. And our
breakage period was much shorter (since v2.12.0, but that's only months
or so).

-Peff


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Fri, Aug 11, 2017 at 12:23:42AM +0200, Tom G. Christensen wrote:

> > > I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems 
> > > at
> > > all neither with building nor with running the testsuite.
> > 
> > As you can see, this does not compile for me. What's going on?
> > 
> The call site for get_curl_allowed_protocols() in http.c is still protected
> by an #if:
> #if LIBCURL_VERSION_NUM >= 0x071304
> curl_easy_setopt(result, CURLOPT_REDIR_PROTOCOLS,
>  get_curl_allowed_protocols(0));
> curl_easy_setopt(result, CURLOPT_PROTOCOLS,
>  get_curl_allowed_protocols(-1));
> #else
> warning("protocol restrictions not applied to curl redirects
> because\n"
> "your curl version is too old (>= 7.19.4)");
> #endif
> 
> > I don't see how it could work, as CURLPROTO_HTTP is not defined at all
> > in that version of curl.
> 
> Indeed but the #if will handle that.

Er, sorry if I'm being dense, but how? Are you suggesting that by
removing the callsite of get_curl_allowed_protocols(), the compiler
might elide the now-dead code completely? I could certainly see it being
dropped after the compilation, but I'm surprised that it wouldn't
complain about the undeclared identifiers in the first place.

And if that _is_ what is happening...that seems like a very fragile and
unportable thing to be depending on.

> > Can you please double-check that you're
> > building against the correct version of curl, and that you are building
> > the HTTP parts of Git (which _are_ optional, and the test suite will
> > pass without them).
> 
> I use a mock buildroot and there is no other curl than the vendor supplied
> 7.15.5 installed:
> [...]

OK, thanks for double-checking. I'm still puzzled why your build
succeeds and mine does not.

> > I saw that, too. But as I understand it, they provide no code updates:
> > no bugfixes and no security updates. They just promise to answer the
> > phone and help you with troubleshooting. It's possible my perception is
> > wrong, though; I'm certainly not one of their customers.
> 
> I am refering to the Extended Life-cycle Support product (ELS), which
> promises:
> "the ELS Add-On delivers certain critical-impact security fixes and selected
> urgent priority bug fixes and troubleshooting for the last minor release"
> 
> The full description is here:
> https://access.redhat.com/support/policy/updates/errata#Extended_Life_Cycle_Phase

That was the same page I was looking at. The bit I read was:

  For versions of products in the Extended Life Phase, Red Hat will
  provide limited ongoing technical support. No bug fixes, security
  fixes, hardware enablement or root-cause analysis will be available
  during this phase, and support will be provided on existing
  installations only.

But I missed the bit about the "ELS add-on" below there, which I guess
is an extra thing. I do suspect that "install arbitrary new versions of
Git" is outside of their scope of "urgent priority bug fixes". But in a
sense it doesn't really matter. What is much more interesting is whether
there's a significant population that is running RHEL5 and has a strong
need for newer versions of Git. That I'm not sure about.

-Peff


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen

On 10/08/17 23:32, Jeff King wrote:

On Thu, Aug 10, 2017 at 10:33:18PM +0200, Tom G. Christensen wrote:


You've totally ignored the argument I made back then[1], and which I
reiterated in this thread. So I'll say it one more time: the more
compelling reason is not the #ifdefs, but the fact that the older
versions are totally untested.


Perhaps you forgot but I stated in the original thread that I build RPMS for
RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite every
single time.


I didn't forget. I actually double-checked the patches you sent at the
time, but I didn't see one for the CURLPROTO issue. And indeed, it is
still broken for me:

   $ cd /path/to/curl/repo
   $ git checkout curl-7_15_5
   $ ./buildconf && ./configure --prefix=/tmp/foo && make install
   $ cd /path/to/git
   $ git checkout v2.14.0
   $ make CURLDIR=/tmp/foo V=1 http.o
   gcc -o http.o -c -MF ./.depend/http.o.d -MQ http.o -MMD -MP   -g -O0 -Wall -Werror -Wdeclaration-after-statement -Wpointer-arith -Wstrict-prototypes -Wvla 
-Wold-style-declaration -Wold-style-definition -Wno-error -Wno-cpp -Wno-unused-value -Wno-strict-prototypes  -I. -DUSE_LIBPCRE1 -DHAVE_ALLOCA_H 
-I/tmp/foo/include -DUSE_CURL_FOR_IMAP_SEND -DNO_GETTEXT -DSHA1_DC -DSHA1DC_NO_STANDARD_INCLUDES -DSHA1DC_INIT_SAFE_HASH_DEFAULT=0 
-DSHA1DC_CUSTOM_INCLUDE_SHA1_C="\"cache.h\"" -DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_C="\"sha1dc_git.c\"" 
-DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_H="\"sha1dc_git.h\"" -DSHA1DC_CUSTOM_INCLUDE_UBC_CHECK_C="\"git-compat-util.h\""  
-DHAVE_PATHS_H -DHAVE_DEV_TTY -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM  -DFREAD_READS_DIRECTORIES -DNO_STRLCPY -DSHELL_PATH='
  "/bin/sh"' -DPAGER_ENV='"LESS=FRX LV=-c"'  http.c
   http.c: In function ‘get_curl_allowed_protocols’:
   http.c:685:24: error: ‘CURLPROTO_HTTP’ undeclared (first use in this 
function); did you mean ‘CURLPROXY_HTTP’?
  allowed_protocols |= CURLPROTO_HTTP;
   ^~
   CURLPROXY_HTTP
   [and so on]


I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems at
all neither with building nor with running the testsuite.


As you can see, this does not compile for me. What's going on?

The call site for get_curl_allowed_protocols() in http.c is still 
protected by an #if:

#if LIBCURL_VERSION_NUM >= 0x071304
curl_easy_setopt(result, CURLOPT_REDIR_PROTOCOLS,
 get_curl_allowed_protocols(0));
curl_easy_setopt(result, CURLOPT_PROTOCOLS,
 get_curl_allowed_protocols(-1));
#else
warning("protocol restrictions not applied to curl redirects 
because\n"

"your curl version is too old (>= 7.19.4)");
#endif


I don't see how it could work, as CURLPROTO_HTTP is not defined at all
in that version of curl. 


Indeed but the #if will handle that.


Can you please double-check that you're
building against the correct version of curl, and that you are building
the HTTP parts of Git (which _are_ optional, and the test suite will
pass without them).



I use a mock buildroot and there is no other curl than the vendor 
supplied 7.15.5 installed:

# pwd
/var/lib/mock/jrpms-el5-x86_64/root
# find . -name 'curlver.h'
./usr/include/curl/curlver.h
# grep LIBCURL_VERSION_NUM ./usr/include/curl/curlver.h
   parsing and comparions by programs. The LIBCURL_VERSION_NUM define will
#define LIBCURL_VERSION_NUM 0x070f05
#

[root@c5-32bit-01 ~]# rpm -q git
git-2.14.1-1.el5.jr
[root@c5-32bit-01 ~]# ldd /usr/libexec/git-core/git-http-fetch |grep libcurl
libcurl.so.3 => /usr/lib/libcurl.so.3 (0x001e7000)
[root@c5-32bit-01 ~]# rpm -qf /usr/lib/libcurl.so.3
curl-7.15.5-17.el5_9
[root@c5-32bit-01 ~]# git --version
git version 2.14.1
[root@c5-32bit-01 ~]# git clone 
https://github.com/tgc/tgcware-for-solaris.git

Cloning into 'tgcware-for-solaris'...
warning: protocol restrictions not applied to curl redirects because
your curl version is too old (>= 7.19.4)
remote: Counting objects: 2793, done.
remote: Total 2793 (delta 0), reused 0 (delta 0), pack-reused 2793
Receiving objects: 100% (2793/2793), 780.88 KiB | 639.00 KiB/s, done.
Resolving deltas: 100% (1233/1233), done.
[root@c5-32bit-01 ~]#




I also won't claim any absolutes. I think we all agree this is a
cost/benefit tradeoff. But there are a lot of options for building on a
very old system. For instance, building without http if you don't need
it. Or building a more recent libcurl (and even linking statically for
simplicity).



Of course that is always an option but it does complicate things.


I'd find arguments against the latter more compelling if recent curl
were hard to compile on old systems. I don't know whether that's the
case (certainly on a modern system, it's much easier to get newer
versions of curl to compile than older ones).



I have no experience with building curl on older Linux systems. I know 
that I can build it on old Solaris releases but that is not 

Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Junio C Hamano
Jeff King  writes:

> On Thu, Aug 10, 2017 at 11:36:41AM +0200, Johannes Schindelin wrote:
>
>> Hopefully I had better luck expressing my concerns this time?
>
> I understand your argument much better now. I'm still not sure I agree.
>
> -Peff

I do not think "there are a dozen #ifdefs and I don't know whether
they still work. I don't know whether anybody (who most likely has
better things to do than read the Git mailing list) is still using
those.  So let's just remove them." was why you were suggesting to
clean up the (apparent) support of older curl in the code, though.

Isn't the reason why your series simplifies these #ifdefs away
because we by accident started using some features that require a
version that is even newer than any of these #ifdef's try to cater
to and yet nobody complained?  That is a lot more similar to the
removal of rsync transport that happened in a not so distant past,
where the reason for removal was "We have been shipping code that
couldn't have possibly worked for some time and nobody complained
---we know nobody is depending on it."

Or "We accidentally started shipping code with comma after the last
element of enum decl and nobody compalined---everybody's compiler
must be ready" ;-)


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 11:36:41AM +0200, Johannes Schindelin wrote:

> Hopefully I had better luck expressing my concerns this time?

I understand your argument much better now. I'm still not sure I agree.

-Peff


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Jeff King
On Thu, Aug 10, 2017 at 10:33:18PM +0200, Tom G. Christensen wrote:

> > You've totally ignored the argument I made back then[1], and which I
> > reiterated in this thread. So I'll say it one more time: the more
> > compelling reason is not the #ifdefs, but the fact that the older
> > versions are totally untested.
> 
> Perhaps you forgot but I stated in the original thread that I build RPMS for
> RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite every
> single time.

I didn't forget. I actually double-checked the patches you sent at the
time, but I didn't see one for the CURLPROTO issue. And indeed, it is
still broken for me:

  $ cd /path/to/curl/repo
  $ git checkout curl-7_15_5
  $ ./buildconf && ./configure --prefix=/tmp/foo && make install
  $ cd /path/to/git
  $ git checkout v2.14.0
  $ make CURLDIR=/tmp/foo V=1 http.o
  gcc -o http.o -c -MF ./.depend/http.o.d -MQ http.o -MMD -MP   -g -O0 -Wall 
-Werror -Wdeclaration-after-statement -Wpointer-arith -Wstrict-prototypes -Wvla 
-Wold-style-declaration -Wold-style-definition -Wno-error -Wno-cpp 
-Wno-unused-value -Wno-strict-prototypes  -I. -DUSE_LIBPCRE1 -DHAVE_ALLOCA_H 
-I/tmp/foo/include -DUSE_CURL_FOR_IMAP_SEND -DNO_GETTEXT -DSHA1_DC 
-DSHA1DC_NO_STANDARD_INCLUDES -DSHA1DC_INIT_SAFE_HASH_DEFAULT=0 
-DSHA1DC_CUSTOM_INCLUDE_SHA1_C="\"cache.h\"" 
-DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_C="\"sha1dc_git.c\"" 
-DSHA1DC_CUSTOM_TRAILING_INCLUDE_SHA1_H="\"sha1dc_git.h\"" 
-DSHA1DC_CUSTOM_INCLUDE_UBC_CHECK_C="\"git-compat-util.h\""  -DHAVE_PATHS_H 
-DHAVE_DEV_TTY -DHAVE_CLOCK_GETTIME -DHAVE_CLOCK_MONOTONIC -DHAVE_GETDELIM  
-DFREAD_READS_DIRECTORIES -DNO_STRLCPY -DSHELL_PATH='"/bin/sh"' 
-DPAGER_ENV='"LESS=FRX LV=-c"'  http.c
  http.c: In function ‘get_curl_allowed_protocols’:
  http.c:685:24: error: ‘CURLPROTO_HTTP’ undeclared (first use in this 
function); did you mean ‘CURLPROXY_HTTP’?
 allowed_protocols |= CURLPROTO_HTTP;
  ^~
  CURLPROXY_HTTP
  [and so on]

> I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems at
> all neither with building nor with running the testsuite.

As you can see, this does not compile for me. What's going on?

I don't see how it could work, as CURLPROTO_HTTP is not defined at all
in that version of curl.  Can you please double-check that you're
building against the correct version of curl, and that you are building
the HTTP parts of Git (which _are_ optional, and the test suite will
pass without them).

> > So IMHO this is about being honest with users about which versions we
> > _actually_ support.
> 
> I have no problem with you wanting to drop support for older curl releases
> (such as 7.15.5) but don't use the argument that it doesn't currently build
> and nobody cares.

My argument isn't quite that nobody cares. It's that we do users a
disservice by shipping a version of the code that very well may have
hidden problems like security holes (for instance, we do not handle
redirects safely in old versions of curl). So if you can get it to build
it may _seem_ fine, but it's a bit of a booby-trap waiting to spring.

I also won't claim any absolutes. I think we all agree this is a
cost/benefit tradeoff. But there are a lot of options for building on a
very old system. For instance, building without http if you don't need
it. Or building a more recent libcurl (and even linking statically for
simplicity).

I'd find arguments against the latter more compelling if recent curl
were hard to compile on old systems. I don't know whether that's the
case (certainly on a modern system, it's much easier to get newer
versions of curl to compile than older ones).

> Also FWIW Red Hat continues to support RHEL 5 with the Extended Life-cycle
> Support program until 2020-11-30.

I saw that, too. But as I understand it, they provide no code updates:
no bugfixes and no security updates. They just promise to answer the
phone and help you with troubleshooting. It's possible my perception is
wrong, though; I'm certainly not one of their customers.

-Peff


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen

On 09/08/17 23:47, Jeff King wrote:

On Wed, Aug 09, 2017 at 11:42:12PM +0200, Johannes Schindelin wrote:

I mean, if we even go out of our way to support the completely outdated
and obsolete .git/branches/ for what is likely a single user, it may not
be the worst to keep those couple of #ifdef guards to keep at least
nominal support for older cURLs?


You've totally ignored the argument I made back then[1], and which I
reiterated in this thread. So I'll say it one more time: the more
compelling reason is not the #ifdefs, but the fact that the older
versions are totally untested. 


Perhaps you forgot but I stated in the original thread that I build RPMS 
for RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite 
every single time.

I currently have 2.13.3 up for el4, el5, el6 and el7.
Only el4 requires any patches, the rest will build out of the box with 
the vendor supplied version of curl.


The plan was to drop the el4 builds for 2.14.0 to get rid of the patches.


In fact, they do not even compile, and
yet I have not seen any patches to fix that.



I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems 
at all neither with building nor with running the testsuite.



So IMHO this is about being honest with users about which versions we
_actually_ support.



I have no problem with you wanting to drop support for older curl 
releases (such as 7.15.5) but don't use the argument that it doesn't 
currently build and nobody cares.


Also FWIW Red Hat continues to support RHEL 5 with the Extended 
Life-cycle Support program until 2020-11-30.


-tgc


Re: [PATCH 0/4] dropping support for older curl

2017-08-10 Thread Tom G. Christensen
[I am resending this since the original does not seem to have made it to 
the list, at least I cannot find it in any archives]


On 09/08/17 23:47, Jeff King wrote:

On Wed, Aug 09, 2017 at 11:42:12PM +0200, Johannes Schindelin wrote:

I mean, if we even go out of our way to support the completely outdated
and obsolete .git/branches/ for what is likely a single user, it may not
be the worst to keep those couple of #ifdef guards to keep at least
nominal support for older cURLs?


You've totally ignored the argument I made back then[1], and which I
reiterated in this thread. So I'll say it one more time: the more
compelling reason is not the #ifdefs, but the fact that the older
versions are totally untested. 


Perhaps you forgot but I stated in the original thread that I build RPMS 
for RHEL/CentOS 3, 4, 5, 6 and 7. I still do and I run the testsuite 
every single time.

I currently have 2.13.3 up for el4, el5, el6 and el7.
Only el4 requires any patches, the rest will build out of the box with 
the vendor supplied version of curl.


The plan was to drop the el4 builds for 2.14.0 to get rid of the patches.


In fact, they do not even compile, and
yet I have not seen any patches to fix that.



I just built a pristine 2.14.0 on CentOS 5 with curl 7.15.5. No problems 
at all neither with building nor with running the testsuite.



So IMHO this is about being honest with users about which versions we
_actually_ support.



I have no problem with you wanting to drop support for older curl 
releases (such as 7.15.5) but don't use the argument that it doesn't 
currently build and nobody cares.


Also FWIW Red Hat continues to support RHEL 5 with the Extended 
Life-cycle Support program until 2020-11-30.


-tgc


  1   2   3   4   >