Finanzielle Hilfe (Darlehen)

2018-08-27 Thread Citi Bank




--
Schönen Tag Dame / Herr,
  
   Brauchen Sie finanzielle Hilfe (Darlehen)?
Ob es sich um Privat- oder Firmenkredite handelt, sprechen Sie mit uns 
bei CITI BANK (TÜRKEI), wir werden Ihre finanziellen Probleme lösen.


Unser Zinssatz beträgt 1,3%. Bitte bewerben Sie sich jetzt und füllen 
Sie die untenstehenden Bewerbungsdetails aus:


Vollständiger Name:_
Darlehensbetrag:___
Darlehensdauer:___
Der Grund für den Kredit:
Telefon:__

Wir warten auf Ihre Bewerbung, damit Ihre Kreditanfrage bearbeitet 
werden kann.


Freundliche Grüße


Re: GIT_TRACE doesn't show content filter files it's operating on

2018-08-27 Thread Stas Bekman
On 2018-08-27 05:58 PM, Jeff King wrote:
[...]
>> 2. Is there no way to get git to do the filename reporting as a normal
>> GIT_TRACE behavior? I don't know anything about its internal workings,
>> but it surely must knows which file it operates on when it opens it and
>> sends its data as stdin to the content filter. It makes the debugging so
>> much easier when one can see what files are being worked on. So perhaps
>> this utility can be made available to all not just as a hack/workaround.
> 
> No, because GIT_TRACE itself only reports on the execution of commands
> and sub-processes. There are other GIT_TRACE_* variables for various
> subsystems, but AFAIK nobody has instrumented the smudge/clean filter
> code. IMHO it would be reasonable to have a GIT_TRACE_CONVERT
> that covered convert.c (so these filters, but also newline conversion,
> etc).

Thank you, Jeff.

Would you please add this to the TODO list (if that's how new feature
requests are made)?

I now know how get the filenames for "clean/smudge" filters. Can you
please help with the same for "textconv". %f doesn't work - it gets
stuck there waiting for stdin, the following seems to pass, but I'm not
sure it's correct:

[diff "ipynb-code"]
textconv = "f() { echo >&2 \"textconv: nbstripout -t $*\"; nbstripout
-t; }; cat $1 > f"

In a trace (git diff w/ one modified file) it gets invoked twice:

18:05:27.640955 run-command.c:640   trace: run_command: 'f() { echo
>&2 "textconv: nbstripout $*"; nbstripout -t; }; cat $1 > f'
/tmp/4nCYaT_004_callbacks.ipynb
18:05:27.642205 run-command.c:640   trace: run_command: 'f() { echo
>&2 "textconv: nbstripout $*"; nbstripout -t; }; cat $1 > f'
dev_nb/004_callbacks.ipynb

The first trace gives a tmp filename, whereas I need the actual filename.

Thank you.

-- 

Stas Bekman   <'><   <'><
https://stasosphere.com  https://chestofbooks.com
https://experientialsexlab.com https://stason.org
https://stasosphere.com/experience-life/my-books


Re: Git clean removing tracked files semi-regularly

2018-08-27 Thread brian m. carlson
On Mon, Aug 27, 2018 at 07:06:59PM +, Brennan Conroy wrote:
> Hello, I work on a project that uses git. "git clean -xdf" is a common
> command to run to clean up the environment, however sometimes this
> command deletes the entire contents of two of the folders and all the
> files it deletes are being tracked which according to the
> documentation should not be deleted.
> 
> The project is located at https://github.com/aspnet/SignalR and the two 
> folders it likes to delete are 
> https://github.com/aspnet/SignalR/tree/release/2.2/clients/ts/signalr-protocol-msgpack
>  and https://github.com/aspnet/SignalR/tree/release/2.2/clients/ts/signalr
> 
> If you need me to collect git logs etc. please don't hesitate to ask!

It would be helpful to know more about your environment.  Is this only
reproducible on certain operating systems (e.g. case-insensitive ones)?
What version of Git are you using?  What does "git status" say before
you run git clean?
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: GIT_TRACE doesn't show content filter files it's operating on

2018-08-27 Thread Jeff King
On Mon, Aug 27, 2018 at 05:22:13PM -0700, Stas Bekman wrote:

> Your suggestions do the trick, Jeff. Thank you.
> 
> 1. To benefit others who might be looking for something similar may I
> post your suggestions as an answer to:
> ?

Great, thanks.

> 2. Is there no way to get git to do the filename reporting as a normal
> GIT_TRACE behavior? I don't know anything about its internal workings,
> but it surely must knows which file it operates on when it opens it and
> sends its data as stdin to the content filter. It makes the debugging so
> much easier when one can see what files are being worked on. So perhaps
> this utility can be made available to all not just as a hack/workaround.

No, because GIT_TRACE itself only reports on the execution of commands
and sub-processes. There are other GIT_TRACE_* variables for various
subsystems, but AFAIK nobody has instrumented the smudge/clean filter
code. IMHO it would be reasonable to have a GIT_TRACE_CONVERT
that covered convert.c (so these filters, but also newline conversion,
etc).

-Peff


Re: GIT_TRACE doesn't show content filter files it's operating on

2018-08-27 Thread Stas Bekman
On 2018-08-27 04:53 PM, Jeff King wrote:
> On Mon, Aug 27, 2018 at 04:23:34PM -0700, Stas Bekman wrote:
[...]
>> How can I get GIT_TRACE's run_command to show the arguments passed to
>> the filter? I looked at various other debug environment variables in
>> git's manual, but I don't see anything that would enable that level of
>> debug.
[...]
> You can work around it with some shell hackery:
> 
>   git config filter.foo.clean 'f() { echo >&2 "cleaning $1"; myfilter ...; }; 
> f %f'
> 
> and then even without GIT_TRACE, you get:
> 
>   $ git add .
>   cleaning .gitattributes
> 
> Or if you really just want to trigger for GIT_TRACE, try just this:
> 
>   $ git config filter.foo.clean 'f() { myfilter; }; f %f'
>   19:52:52.874064 [pid=14719] git.c:415 trace: built-in: git add .
>   19:52:52.875115 [pid=14719] run-command.c:637 trace: run_command: 'f() 
> { myfilter; }; f '\''.gitattributes'\'''

Your suggestions do the trick, Jeff. Thank you.

1. To benefit others who might be looking for something similar may I
post your suggestions as an answer to:
?

2. Is there no way to get git to do the filename reporting as a normal
GIT_TRACE behavior? I don't know anything about its internal workings,
but it surely must knows which file it operates on when it opens it and
sends its data as stdin to the content filter. It makes the debugging so
much easier when one can see what files are being worked on. So perhaps
this utility can be made available to all not just as a hack/workaround.


-- 

Stas Bekman   <'><   <'><
https://stasosphere.com  https://chestofbooks.com
https://experientialsexlab.com https://stason.org
https://stasosphere.com/experience-life/my-books


Re: Automatic core.autocrlf?

2018-08-27 Thread brian m. carlson
On Mon, Aug 27, 2018 at 07:32:08PM +0200, Andrei Rybak wrote:
> On 2018-08-27 17:52, Duy Nguyen wrote:
> > On Mon, Aug 27, 2018 at 5:37 PM Torsten Bögershausen  wrote:
> >>> In those cases, when it falls back to
> >>> configuration for line ending management, I want it to be
> >>> automatically configured based on the host platform.
> >>
> > 
> > An alternative is supporting conditional config includes based on
> > platform or host name, but I don't know if there are more use cases
> > like this to justify it.
> > 
> 
> How about just using unconditional includes?
> 
> global.gitconfig (synced across machines):
> 
>   [include]
>   path = platform-specific.gitconfig
> 
> And two version of file named "platform-specific.gitconfig", which
> are not synced, and include only code.autocrlf setting.

There's actually a way to do this that works with older Git versions as
well: put global config in .gitconfig and per-system config in
.config/git/config.  Or you can use some sort of script (say, a
Makefile) to install the proper values based on your system.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: GIT_TRACE doesn't show content filter files it's operating on

2018-08-27 Thread Jeff King
On Mon, Aug 27, 2018 at 04:23:34PM -0700, Stas Bekman wrote:

> I'm debugging the workings of a configured git content filter
> (nbstripout) and I'm trying to get GIT_TRACE to show me the files it's
> operating on, but it doesn't. Consider:
> 
> $ GIT_TRACE=1 git pull origin master
> [...] removed irrelevant sections of the output
> 16:49:28.846707 run-command.c:640   trace: run_command: git merge
> FETCH_HEAD
> 16:49:28.849309 git.c:344   trace: built-in: git merge
> FETCH_HEAD
> Updating 1ea49ad..ae0ba93
> 16:49:28.863291 run-command.c:640   trace: run_command: nbstripout
> 16:49:28.864700 run-command.c:640   trace: run_command: nbstripout
> 16:49:28.866060 run-command.c:640   trace: run_command: nbstripout
> [...] many more of the same
> 
> How can I get GIT_TRACE's run_command to show the arguments passed to
> the filter? I looked at various other debug environment variables in
> git's manual, but I don't see anything that would enable that level of
> debug.

GIT_TRACE should always show the arguments. But unless you specify
arguments in the clean/smudge filter config, then Git won't pass any.
The stdin/stdout stream is all that matters.

So e.g.:

  $ echo '* filter=foo' >.gitattributes
  $ git config filter.foo.clean 'myfilter'
  $ GIT_TRACE=1 git add .
  19:42:16.516401 [pid=14112] git.c:415 trace: built-in: git add .
  19:42:16.517454 [pid=14112] run-command.c:637 trace: run_command: myfilter

  $ git config filter.foo.clean 'myfilter --foo'
  $ touch .gitattributes ;# make sure we actually read it again ;)
  $ GIT_TRACE=1 git add .
  19:42:58.122942 [pid=14156] git.c:415 trace: built-in: git add .
  19:42:58.124023 [pid=14156] run-command.c:637 trace: run_command: 
'myfilter --foo'

You can use "%f" to pass the name of the file, like:

  $ git config filter.foo.clean 'myfilter %f'
  $ touch .gitattributes
  $ GIT_TRACE=1 git add .
  19:44:51.187177 [pid=14318] git.c:415 trace: built-in: git add .
  19:44:51.188256 [pid=14318] run-command.c:637 trace: run_command: 
'myfilter '\''.gitattributes'\'''

Of course that won't be helpful if your filter actually respects the
argument. For a "clean" filter that might be OK (e.g., if it just tells
your filter to read from the filesystem instead of stdin), but it's
almost certainly not what you want for a "smudge" filter.

You can work around it with some shell hackery:

  git config filter.foo.clean 'f() { echo >&2 "cleaning $1"; myfilter ...; }; f 
%f'

and then even without GIT_TRACE, you get:

  $ git add .
  cleaning .gitattributes

Or if you really just want to trigger for GIT_TRACE, try just this:

  $ git config filter.foo.clean 'f() { myfilter; }; f %f'
  19:52:52.874064 [pid=14719] git.c:415 trace: built-in: git add .
  19:52:52.875115 [pid=14719] run-command.c:637 trace: run_command: 'f() { 
myfilter; }; f '\''.gitattributes'\'''

There you get the name in the trace output, but the invoked command
doesn't actually do anything with it.

-Peff


Re: $GIT_DIR is no longer set when pre-commit hooks are called

2018-08-27 Thread Jeff King
On Mon, Aug 27, 2018 at 06:25:26PM +0200, Johannes Schindelin wrote:

> On Sat, 25 Aug 2018, Jeff King wrote:
> 
> > On Wed, Aug 22, 2018 at 04:16:00PM -0700, Gregory Oschwald wrote:
> > 
> > diff --git a/builtin/commit.c b/builtin/commit.c
> > index 3bfeabc463..3670024a25 100644
> > --- a/builtin/commit.c
> > +++ b/builtin/commit.c
> > @@ -1440,6 +1440,7 @@ int run_commit_hook(int editor_is_used, const char 
> > *index_file, const char *name
> > int ret;
> >  
> > argv_array_pushf(_env, "GIT_INDEX_FILE=%s", index_file);
> > +   argv_array_pushf(_env, "GIT_DIR=%s", get_git_dir());
> 
> We did something similar in sequencer.c, and it required setting
> `GIT_WORK_TREE`, too, to avoid problems in worktrees. Do you need the same
> here, too?

Hmm, good point. Maybe. If we're just trying to restore the original
behavior (i.e., fix a regression), then this would behave as the
original.

I'm not at all opposed to going beyond that and actually fixing more
cases. But I am a little nervous of introducing a new regression, given
the subtlety around some of our startup code.

My current understanding is:

  - If we're setting GIT_DIR, it's probably always save to set
GIT_WORK_TREE (or GIT_IMPLICIT_WORK_TREE=0 if there isn't a
worktree). I.e., there is no case I know that would be broken by
this.

  - Passing GIT_DIR to any sub-process operating in the context of our
repo (i.e., excluding cases where we're clearing local_repo_env) can
break a script that expects to do:

  cd /another/repo
  git ...

and operate in /another/repo. But such a script is already broken at
least in some cases, because running the initial command as:

  git --git-dir=/some/repo

will mean that GIT_DIR is set. So a hook depending on that is
already broken, though it may be small consolation to somebody whose
hook happened to work most of the time, because they do not pass in
GIT_DIR.

  - Any command that uses setup_work_tree() (which includes things with
NEED_WORK_TREE in git.c) would have always been setting GIT_DIR up
through v2.17.x. So any hooks they run would have seen it
consistently, and therefore are exempt from being regressed in the
case above.

So I _think_ it would be safe to:

  1. Set GIT_DIR again anytime we call setup_work_tree(), because that's
 what has always happened.

  2. Start setting GIT_WORK_TREE at the same time. This didn't happen
 before, but it can't break anybody, and might help.

That makes the rules for when GIT_DIR is set confusing, but at least it
shouldn't regress. A more consistent rule of "hooks always see GIT_DIR"
(or even "any sub-process always sees...") is easier to explain, but
might break people in practice.

I notice also that sequencer.c sets an absolute path, since 09d7b6c6fa
(sequencer: pass absolute GIT_DIR to exec commands, 2017-10-31). That
does seem friendlier, though I wonder if could break any scripts. I
cannot think of a case, unless the intermediate paths were no accessible
to the uid running the process (but then, how would Git have generated
the absolute path in the first place?).

-Peff


GIT_TRACE doesn't show content filter files it's operating on

2018-08-27 Thread Stas Bekman
Hello,

I'm debugging the workings of a configured git content filter
(nbstripout) and I'm trying to get GIT_TRACE to show me the files it's
operating on, but it doesn't. Consider:

$ GIT_TRACE=1 git pull origin master
[...] removed irrelevant sections of the output
16:49:28.846707 run-command.c:640   trace: run_command: git merge
FETCH_HEAD
16:49:28.849309 git.c:344   trace: built-in: git merge
FETCH_HEAD
Updating 1ea49ad..ae0ba93
16:49:28.863291 run-command.c:640   trace: run_command: nbstripout
16:49:28.864700 run-command.c:640   trace: run_command: nbstripout
16:49:28.866060 run-command.c:640   trace: run_command: nbstripout
[...] many more of the same

How can I get GIT_TRACE's run_command to show the arguments passed to
the filter? I looked at various other debug environment variables in
git's manual, but I don't see anything that would enable that level of
debug.

I'm aware of git check-attr, but I'd like to see the run time trace on
which files it's running the filter on and with which arguments.

git version 2.17.1

I originally posted this as a question here:

and it appears that sq_quote_argv_pretty() doesn't get argv when a
content filter is invoked, so it doesn't show this information.

Would it be possible to make the files and any other passed arguments
show in the trace?

Thank you.

-- 

Stas Bekman   <'><   <'><
https://stasosphere.com  https://chestofbooks.com
https://experientialsexlab.com https://stason.org
https://stasosphere.com/experience-life/my-books


Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-27 Thread Jeff King
On Mon, Aug 27, 2018 at 10:22:46AM +, Kirill Smelkov wrote:

> A minor comment from outside observer: running tests under something
> like
> 
>   -e and -o pipefail
> 
> would automatically catch the mistake in the first place. Maybe `-o
> pipefail` is bashism (I had not checked), but `git grep " | " t/` shows
> there are a lot of pipelines being used, and thus similar errors might be
> silently resting there. Something like -o pipefail would catch all such
> problems automatically.

Yes, "pipefail" is a bash-ism that we can't rely on.

I will say that I have been bitten before by "set -e -o pipefail" and
its subtle handling of SIGPIPE. Try this:

  set -e -o pipefail
  yes | head

-Peff


Re: [PATCH 2/2] fsck: use oidset for skiplist

2018-08-27 Thread Jeff King
On Sun, Aug 26, 2018 at 01:37:41PM +0200, René Scharfe wrote:

> Am 14.08.2018 um 03:58 schrieb Jeff King:
> > Your suggestion can be implemented using khash (my patch below).
> > 
> >> Before:
> >> Benchmark #1: ./git-cat-file --batch-all-objects --buffer --unordered 
> >> --batch-check='%(objectname)'
> >>
> >>   Time (mean ± σ): 269.5 ms ±  26.7 ms[User: 247.7 ms, System: 
> >> 21.4 ms]
> >>
> >>   Range (min … max):   240.3 ms … 339.3 ms
> >>
> >> After:
> >> Benchmark #1: ./git-cat-file --batch-all-objects --buffer --unordered 
> >> --batch-check='%(objectname)'
> >>
> >>   Time (mean ± σ): 224.2 ms ±  18.2 ms[User: 201.7 ms, System: 
> >> 22.1 ms]
> >>
> >>   Range (min … max):   205.0 ms … 259.0 ms
> > 
> > Yeah. My best-of-five dropped from 300ms to 247ms. That 300 was using
> > the memory pool, though khash's deletion strategy isn't all that
> > different (the wasted memory hangs around until the next hash resize,
> > but if you're evenly dropping and adding, you likely won't need to
> > resize).
> 
> With your khash patch:
> 
> Benchmark #1: ./git-cat-file --batch-all-objects --buffer --unordered 
> --batch-check='%(objectname)'
> 
>   Time (mean ± σ): 159.1 ms ±  20.5 ms[User: 140.3 ms, System: 18.5 
> ms]
> 
>   Range (min … max):   140.0 ms … 214.0 ms
> 
> So it seems worth it.

Hmm, that really does. Which is a shame, because I hoped that one day we
could get rid of the nasty macro-infestation that is khash.h. But it
really is a lot faster than the alternatives.

> [...]

I agree with all of your comments here. What I posted was just trying to
do the least amount of work to get something we could time.

Do you want to pick up this topic and carry it forward?

-Peff


Re: [PATCH v2 1/2] fsck: use strbuf_getline() to read skiplist file

2018-08-27 Thread Jeff King
On Sat, Aug 25, 2018 at 08:50:28PM +0200, René Scharfe wrote:

> buffer is unlikely to contain a NUL character, so printing its contents
> using %s in a die() format is unsafe (detected with ASan).

Having mostly forgotten about our earlier discussion, I got confused by
this, thinking the problem was that there is some issue with missing
NULs in the input.

But it is really just:

  We read() into a buffer and on error format the contents using "%s".
  But read() does not NUL-terminate, so die() might walk past the end of
  the buffer.

We _might_ be saved by a NUL in the input, but that is not the primary
concern. ;)

Not worth a re-roll on its own, but since there is some other
discussion, I thought I'd mention my confusion. :)

> Added error check.
> Hopefully fixed my MUA config..
> 
>  fsck.c | 25 -
>  1 file changed, 12 insertions(+), 13 deletions(-)

Patch itself looks good to me.

-Peff


Re: Contributor Summit planning

2018-08-27 Thread Johannes Schindelin
Hi AEvar,

On Mon, 13 Aug 2018, Ævar Arnfjörð Bjarmason wrote:

>  * Re the second half of "Not everyone can travel or can afford to do
>so" from Derrick, there's been travel sponsorships in past years.

Just to make sure that you understand: there are many more reasons than
just travel costs involved. Just to name a few:

- visa issues.

- some people have to take care of family members and are not at liberty
  to leave for even so much as half a day.

- conflicting schedules.

- mental health. As a friend of mine once put it: nobody is completely
  "healthy", we all suffer, on a spectrum, from anxiety, trauma, and other
  issues. The more inclusive we are, the more we can benefit from
  everybody's talents and contributions.

Just saying. Money is sometimes not the solution.

Ciao,
Dscho

Re: [PATCH v2 0/4] fix "rebase -i --root" corrupting root commit

2018-08-27 Thread Johannes Schindelin
Hi,

On Mon, 6 Aug 2018, Eric Sunshine wrote:

> On Mon, Aug 6, 2018 at 9:20 PM Hilco Wijbenga  
> wrote:
> > But your suggestion did make me think about what behaviour I would
> > like to see, exactly. I like that Git removes commits that no longer
> > serve any purpose (because I've included their changes in earlier
> > commits). So I would not want to keep commits that become empty during
> > the rebase. What I would like to see is that commits that _start out_
> > as empty, are retained. Would such behaviour make sense? Or would that
> > be considered surprising behaviour?
> 
> I, personally, have no opinion since I don't use empty commits.
> Perhaps someone more experienced and more long-sighted will chime in.

I got aware about two weeks ago that my mail provider fails to deliver all
the mails that are addressed to me. Two days ago, I managed to get this
mail (and the thread) via public-inbox (thanks, Eric!!!). Hence my late
reply.

Hilco: if you provide a patch to extend the test suite to demonstrate the
breakage (via `test_expect_failure`), I promise to try my best to provide
a fix on top.

Ciao,
Johannes


Re: Would a config var for --force-with-lease be useful?

2018-08-27 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> I.e. making plain --force-with-lease harder to use by hiding it behind a
> config option gives the user fewer options than with --force to recover.

I agree with that.  But I would consider it a good thing, if done
properly (i.e. suggest --force-with-lease that is not a lzzy form).

> So I think we should still recommend the longer and even safer variants
> of --force-with-lease, but being guaranteed to have the SHA-1 you just
> clobbered locally is *better*, and allows us to e.g. do this:
>
> $ git push --force-with-lease
> hint: You just clobbered  on . If you regret
> hint: this you can (until the object gets pruned) do:
> hint: git push  --force-with-lease=:

Until the object gets pruned, and until somebody else pushes there
to make the damage bigger, you can recover from a mistake with that
information.  It probably is a bug that the lazy force-with-lease
does not report what the remote tip you just overwrote was, and this
would help great deal.  It would be a good place to start.

One thing that I am still not clear how this line of thought truly
would help users is how to help them decide their answer to "If you
regret this".  An unthinking naïve user would say "of course I meant
it when I gave the option---why do you think I regret it?" without a
bit more hint, namely, " was taken from the remote tracking
branch, are you sure that is still what the newly prepared contents
you built to replace?  Did somebody clobber it while you are not
watching?"

These three lines however will be felt too loud by those who would
most benefit from them (i.e. those who do not know why they need
protection); I do not think advise.* to allow them to be squelched
would be appropriate.


[PATCH 1/2] t2013: add test for missing but active submodule

2018-08-27 Thread Stefan Beller
When cloning a superproject with the option
 --recurse-submodules='.', it is easy to find yourself wanting
a submodule active, but not having that submodule present in
the modules directory.

Signed-off-by: Stefan Beller 
---
 t/t2013-checkout-submodule.sh | 24 
 1 file changed, 24 insertions(+)

diff --git a/t/t2013-checkout-submodule.sh b/t/t2013-checkout-submodule.sh
index 6ef15738e44..c69640fc341 100755
--- a/t/t2013-checkout-submodule.sh
+++ b/t/t2013-checkout-submodule.sh
@@ -63,6 +63,30 @@ test_expect_success '"checkout " honors 
submodule.*.ignore from .git/
! test -s actual
 '
 
+test_expect_success 'setup superproject with historic submodule' '
+   test_create_repo super1 &&
+   test_create_repo sub1 &&
+   test_commit -C sub1 sub_content &&
+   git -C super1 submodule add ../sub1 &&
+   git -C super1 commit -a -m "sub1 added" &&
+   test_commit -C super1 historic_state &&
+   git -C super1 rm sub1 &&
+   git -C super1 commit -a -m "deleted sub" &&
+   test_commit -C super1 new_state &&
+   test_path_is_missing super1/sub &&
+
+   # The important part is to ensure sub1 is not in there any more.
+   # There is another series in flight, that may remove an
+   # empty .gitmodules file entirely.
+   test_must_be_empty super1/.gitmodules
+'
+
+test_expect_failure 'checkout old state with deleted submodule' '
+   test_when_finished "rm -rf super1 sub1 super1_clone" &&
+   git clone --recurse-submodules super1 super1_clone &&
+   git -C super1_clone checkout --recurse-submodules historic_state
+'
+
 KNOWN_FAILURE_DIRECTORY_SUBMODULE_CONFLICTS=1
 test_submodule_switch_recursing_with_args "checkout"
 
-- 
2.18.0



[PATCH 2/2] submodule.c: warn about missing submodule git directories

2018-08-27 Thread Stefan Beller
This is the continuation of f2d48994dc1 (submodule.c: submodule_move_head
works with broken submodules, 2017-04-18), which tones down the case of
"broken submodule" in case of a missing git directory of the submodule to
be only a warning.

Signed-off-by: Stefan Beller 
---
 submodule.c   | 16 
 t/t2013-checkout-submodule.sh |  2 +-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/submodule.c b/submodule.c
index 50cbf5f13ed..689439a3d0c 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1641,6 +1641,22 @@ int submodule_move_head(const char *path,
} else {
char *gitdir = xstrfmt("%s/modules/%s",
get_git_common_dir(), sub->name);
+
+   if (!is_git_directory(gitdir)) {
+   /*
+* It is safe to assume we could just clone
+* the submodule here, as we passed the
+* is_submodule_active test above (i.e. the
+* user is interested in this submodule.
+*
+* However as this code path is exercised
+* for operations that typically do not involve
+* network operations, let's not do that for 
now.
+*/
+   warning(_("Submodule '%s' missing"), path);
+   free(gitdir);
+   return 0;
+   }
connect_work_tree_and_git_dir(path, gitdir, 0);
free(gitdir);
 
diff --git a/t/t2013-checkout-submodule.sh b/t/t2013-checkout-submodule.sh
index c69640fc341..82ef4576b91 100755
--- a/t/t2013-checkout-submodule.sh
+++ b/t/t2013-checkout-submodule.sh
@@ -81,7 +81,7 @@ test_expect_success 'setup superproject with historic 
submodule' '
test_must_be_empty super1/.gitmodules
 '
 
-test_expect_failure 'checkout old state with deleted submodule' '
+test_expect_success 'checkout old state with deleted submodule' '
test_when_finished "rm -rf super1 sub1 super1_clone" &&
git clone --recurse-submodules super1 super1_clone &&
git -C super1_clone checkout --recurse-submodules historic_state
-- 
2.18.0



Re: [PATCH 0/6] Use generation numbers for --topo-order

2018-08-27 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> This patch series performs a decently-sized refactoring of the revision-walk
> machinery. Well, "refactoring" is probably the wrong word, as I don't
> actually remove the old code. Instead, when we see certain options in the
> 'rev_info' struct, we redirect the commit-walk logic to a new set of methods
> that distribute the workload differently. By using generation numbers in the
> commit-graph, we can significantly improve 'git log --graph' commands (and
> the underlying 'git rev-list --topo-order').

Finally ;-).


Re: Would a config var for --force-with-lease be useful?

2018-08-27 Thread Johannes Schindelin
Hi,

On Sat, 25 Aug 2018, Constantin Weißer wrote:

> I think there are two aspects to using "force with lease".

There is a third, very, very important aspect.

When you use --force-with-lease (and I, for one, do, all the time), keep
in mind that it assumes that you are at least aware of the latest fetched
ref.

Let's add a little color to this, to make it less theoretical, and more
relateable. Let's assume that you work on a branch, say,
`make-this-thing-work`, and you have to work on this branch on two
different machines, because it is a finicky thing you need to make work.
Let's add even more color by saying that one of your machines is a windows
machine, and the other one an old IRIX machine. We will call them
`windoze` and `irixxx`.

The common use case for `--force-with-lease`, at least as far as I know,
is that this branch is pushed from one of these setups first, say,
`windoze`, and fetched on the other, then a fixup is necessary and you
rebase the branch with `--autosquash` to make it compile on IRIX, and then
you push with `--force-with-lease` a day later, just in case that you
forgot to re-fetch something that you did on the `windoze` machine.

(Something like this happened to me recently, where one of my branches did
not compile in a Linux VM with an older cURL, and I had to fix a
Windows-targeting branch to keep it cross-platform)

The `--force-with-lease` option helps here, and quite a bit.

It would totally lose its usefulness, though, if you use a tool that
auto-fetches those remote branches. For example, "synchronizing" in
.

In that case, you did not see what was fetched, and you might have missed
updates, and you will overwrite them, even if you tried to be careful by
using `--for-ce-with-lease`.

I proposed, a couple of months ago, to either fix `--force-with-lease`, or
to deprecate it in favor of a new option, with a new behavior. The new
behavior would look at the *reflog*, much as the `--fork-point` option of
`git rebase`: in addition to the regular `--force-with-lease` server-side
checks, a client-side check *first* verifies that the remote-tracking
branch is reachable at least from *one* of the items in the reflog of the
branch we are about to push.

That is, it would ensure that even if we rebased locally, we did
incorporate the tip commit of the remote-tracking branch, at some stage.

Granted, there are probably cases where you would fetch, look at the
remote-tracking branch, and reject those changes without integrating those
into the local branch. In that case, you would want to relax to the
current behavior of `--force-with-lease`. But I would expect that to
happen only rarely.

The safety by the proposed behavior would make it a lot easier to accept a
config setting that makes this the default.

I guess that is the reason why that config setting does not exist yet: we
would want to have that new behavior in place first...

Ciao,
Johannes

> 
> Firstly, you, a person aware of the option, using it. In this case I
> think an alias is very fitting, because you get quickly used to just
> typing `git pf` or so. Plus, you don't have the disadvantage you
> described: if you’re working on a machine without your alias, you’ll
> just notice immediately and type the full option.
> 
> The other aspect is working in a team. The problem there is, that most
> (at least in my surroundings) use plain --force and you have to make
> them aware of --force-with-lease. But with an option or an alias, you
> depend on them using force with lease instead of plain force, so again I
> don't really see the advantage of such an option.
> 
> And lastly, a question: say you are using your proposed option and it is
> turned on. Now, git refuses to push, you clarify the situation and
> actually mean to push --force now. How would you do this? 1) turn off 2)
> push 3) turn option on again?
> 
> Regards,
> Constantin
> 
> Quoting Scott Johnson (2018-08-24 18:39:27)
> > Hello Everyone:
> > 
> > I'm considering writing a patch that adds a configuration variable
> > that will allow the user to default the command:
> > 
> > git push --force
> > 
> > to:
> > 
> > git push --force-with-lease
> > 
> > As discussed here:
> > 
> > https://stackoverflow.com/questions/30542491/push-force-with-lease-by-default
> > 
> > Now, I understand that there are downsides to having this enabled,
> > namely that a user who has this enabled might forget that they have it
> > enabled, and, as such, on a machine that _doesn't_ have it enabled (of
> > which they are unfamiliar) might then run the more consequential
> > command "git push --force", but my thinking is that adding this as a
> > feature to the git codebase as an _optional_ (i.e. not enabled by
> > default) configuration variable would then save some of us who use a
> > "rebase-then-force-push for pull request" workflow some time and
> > headaches.
> > 
> > Of course, I don't want to submit a patch if this is a feature that
> > isn't likely to be accepted, so I wanted to get 

Re: Would a config var for --force-with-lease be useful?

2018-08-27 Thread Ævar Arnfjörð Bjarmason


On Mon, Aug 27 2018, Junio C Hamano wrote:

[Scott, I hope you're still with us despite your recent attempt to
unsubscribe from git@ :)]

> Ævar Arnfjörð Bjarmason  writes:
>
>> This was after/during a long discussion starting with:
>> https://public-inbox.org/git/cacbzzx7mex-6rhgh2fa9+yl03mjxs8xmye86hnvxbxjmyiz...@mail.gmail.com/
>>
>> It appears the only patch that got in from that discussion was my
>> f17d642d3b ("push: document & test --force-with-lease with multiple
>> remotes", 2017-04-19) (https://github.com/git/git/commit/f17d642d3b)
>
> Thanks for pointing at the old thread.
>
> As far as our documentation is concerned, the invitation to improve
> the situation, offered in "git push --help", is still valid:
>
> Note that all forms other than `--force-with-lease=:`
> that specifies the expected current value of the ref explicitly are
> still experimental and their semantics may change as we gain experience
> with this feature.
>
> But I do not think (and I did not think back then) there is a magic
> bullet to make the lazy force-with-lease automatically safe for
> everybody, so it may be time to declare that the lazy force-with-lease
> was a failed experiment and move on, with a patch like the one
> suggested last year in the message:
>
>
> https://public-inbox.org/git/xmqq37a9fl8a.fsf...@gitster.mtv.corp.google.com/

With the benefit of hindsight I still agree with my counter-argument to
that in https://public-inbox.org/git/8760f4bmig@gmail.com/

I.e. making plain --force-with-lease harder to use by hiding it behind a
config option gives the user fewer options than with --force to recover.

So I think we should still recommend the longer and even safer variants
of --force-with-lease, but being guaranteed to have the SHA-1 you just
clobbered locally is *better*, and allows us to e.g. do this:

$ git push --force-with-lease
hint: You just clobbered  on . If you regret
hint: this you can (until the object gets pruned) do:
hint: git push  --force-with-lease=:

Or, doing the same with --force with some config option to use the
marginally safer (because at least you have a local copy)
--force-with-lease automatically.


[PATCH 1/6] prio-queue: add 'peek' operation

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When consuming a priority queue, it can be convenient to inspect
the next object that will be dequeued without actually dequeueing
it. Our existing library did not have such a 'peek' operation, so
add it as prio_queue_peek().

Add a reference-level comparison in t/helper/test-prio-queue.c
so this method is exercised by t0009-prio-queue.sh.

Signed-off-by: Derrick Stolee 
---
 prio-queue.c   |  9 +
 prio-queue.h   |  6 ++
 t/helper/test-prio-queue.c | 10 +++---
 3 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/prio-queue.c b/prio-queue.c
index a078451872..d3f488cb05 100644
--- a/prio-queue.c
+++ b/prio-queue.c
@@ -85,3 +85,12 @@ void *prio_queue_get(struct prio_queue *queue)
}
return result;
 }
+
+void *prio_queue_peek(struct prio_queue *queue)
+{
+   if (!queue->nr)
+   return NULL;
+   if (!queue->compare)
+   return queue->array[queue->nr - 1].data;
+   return queue->array[0].data;
+}
diff --git a/prio-queue.h b/prio-queue.h
index d030ec9dd6..682e51867a 100644
--- a/prio-queue.h
+++ b/prio-queue.h
@@ -46,6 +46,12 @@ extern void prio_queue_put(struct prio_queue *, void *thing);
  */
 extern void *prio_queue_get(struct prio_queue *);
 
+/*
+ * Gain access to the "thing" that would be returned by
+ * prio_queue_get, but do not remove it from the queue.
+ */
+extern void *prio_queue_peek(struct prio_queue *);
+
 extern void clear_prio_queue(struct prio_queue *);
 
 /* Reverse the LIFO elements */
diff --git a/t/helper/test-prio-queue.c b/t/helper/test-prio-queue.c
index 9807b649b1..e817bbf464 100644
--- a/t/helper/test-prio-queue.c
+++ b/t/helper/test-prio-queue.c
@@ -22,9 +22,13 @@ int cmd__prio_queue(int argc, const char **argv)
struct prio_queue pq = { intcmp };
 
while (*++argv) {
-   if (!strcmp(*argv, "get"))
-   show(prio_queue_get());
-   else if (!strcmp(*argv, "dump")) {
+   if (!strcmp(*argv, "get")) {
+   void *peek = prio_queue_peek();
+   void *get = prio_queue_get();
+   if (peek != get)
+   BUG("peek and get results do not match");
+   show(get);
+   } else if (!strcmp(*argv, "dump")) {
int *v;
while ((v = prio_queue_get()))
   show(v);
-- 
gitgitgadget



[PATCH 2/6] test-reach: add run_three_modes method

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The 'test_three_modes' method assumes we are using the 'test-tool
reach' command for our test. However, we may want to use the data
shape of our commit graph and the three modes (no commit-graph,
full commit-graph, partial commit-graph) for other git commands.

Split test_three_modes to be a simple translation on a more general
run_three_modes method that executes the given command and tests
the actual output to the expected output.

While inspecting this code, I realized that the final test for
'commit_contains --tag' is silently dropping the '--tag' argument.
It should be quoted to include both.

Signed-off-by: Derrick Stolee 
---
 t/t6600-test-reach.sh | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index d139a00d1d..1b18e12a4e 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -53,18 +53,22 @@ test_expect_success 'setup' '
git config core.commitGraph true
 '
 
-test_three_modes () {
+run_three_modes () {
test_when_finished rm -rf .git/objects/info/commit-graph &&
-   test-tool reach $1 actual &&
+   $1 actual &&
test_cmp expect actual &&
cp commit-graph-full .git/objects/info/commit-graph &&
-   test-tool reach $1 actual &&
+   $1 actual &&
test_cmp expect actual &&
cp commit-graph-half .git/objects/info/commit-graph &&
-   test-tool reach $1 actual &&
+   $1 actual &&
test_cmp expect actual
 }
 
+test_three_modes () {
+   run_three_modes "test-tool reach $1"
+}
+
 test_expect_success 'ref_newer:miss' '
cat >input <<-\EOF &&
A:commit-5-7
@@ -219,7 +223,7 @@ test_expect_success 'commit_contains:hit' '
EOF
echo "commit_contains(_,A,X,_):1" >expect &&
test_three_modes commit_contains &&
-   test_three_modes commit_contains --tag
+   test_three_modes "commit_contains --tag"
 '
 
 test_expect_success 'commit_contains:miss' '
-- 
gitgitgadget



[PATCH 4/6] revision.c: begin refactoring --topo-order logic

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When running 'git rev-list --topo-order' and its kin, the topo_order
setting in struct rev_info implies the limited setting. This means
that the following things happen during prepare_revision_walk():

* revs->limited implies we run limit_list() to walk the entire
  reachable set. There are some short-cuts here, such as if we
  perform a range query like 'git rev-list COMPARE..HEAD' and we
  can stop limit_list() when all queued commits are uninteresting.

* revs->topo_order implies we run sort_in_topological_order(). See
  the implementation of that method in commit.c. It implies that
  the full set of commits to order is in the given commit_list.

These two methods imply that a 'git rev-list --topo-order HEAD'
command must walk the entire reachable set of commits _twice_ before
returning a single result.

If we have a commit-graph file with generation numbers computed, then
there is a better way. This patch introduces some necessary logic
redirection when we are in this situation.

In v2.18.0, the commit-graph file contains zero-valued bytes in the
positions where the generation number is stored in v2.19.0 and later.
Thus, we use generation_numbers_enabled() to check if the commit-graph
is available and has non-zero generation numbers.

When setting revs->limited only because revs->topo_order is true,
only do so if generation numbers are not available. There is no
reason to use the new logic as it will behave similarly when all
generation numbers are INFINITY or ZERO.

In prepare_revision_walk(), if we have revs->topo_order but not
revs->limited, then we trigger the new logic. It breaks the logic
into three pieces, to fit with the existing framework:

1. init_topo_walk() fills a new struct topo_walk_info in the rev_info
   struct. We use the presence of this struct as a signal to use the
   new methods during our walk. In this patch, this method simply
   calls limit_list() and sort_in_topological_order(). In the future,
   this method will set up a new data structure to perform that logic
   in-line.

2. next_topo_commit() provides get_revision_1() with the next topo-
   ordered commit in the list. Currently, this simply pops the commit
   from revs->commits.

3. expand_topo_walk() provides get_revision_1() with a way to signal
   walking beyond the latest commit. Currently, this calls
   add_parents_to_list() exactly like the old logic.

While this commit presents method redirection for performing the
exact same logic as before, it allows the next commit to focus only
on the new logic.

Signed-off-by: Derrick Stolee 
---
 revision.c | 42 ++
 revision.h |  4 
 2 files changed, 42 insertions(+), 4 deletions(-)

diff --git a/revision.c b/revision.c
index 3205a3947a..1db70dc951 100644
--- a/revision.c
+++ b/revision.c
@@ -25,6 +25,7 @@
 #include "worktree.h"
 #include "argv-array.h"
 #include "commit-reach.h"
+#include "commit-graph.h"
 
 volatile show_early_output_fn_t show_early_output;
 
@@ -2451,7 +2452,7 @@ int setup_revisions(int argc, const char **argv, struct 
rev_info *revs, struct s
if (revs->diffopt.objfind)
revs->simplify_history = 0;
 
-   if (revs->topo_order)
+   if (revs->topo_order && !generation_numbers_enabled(the_repository))
revs->limited = 1;
 
if (revs->prune_data.nr) {
@@ -2889,6 +2890,33 @@ static int mark_uninteresting(const struct object_id 
*oid,
return 0;
 }
 
+struct topo_walk_info {};
+
+static void init_topo_walk(struct rev_info *revs)
+{
+   struct topo_walk_info *info;
+   revs->topo_walk_info = xmalloc(sizeof(struct topo_walk_info));
+   info = revs->topo_walk_info;
+   memset(info, 0, sizeof(struct topo_walk_info));
+
+   limit_list(revs);
+   sort_in_topological_order(>commits, revs->sort_order);
+}
+
+static struct commit *next_topo_commit(struct rev_info *revs)
+{
+   return pop_commit(>commits);
+}
+
+static void expand_topo_walk(struct rev_info *revs, struct commit *commit)
+{
+   if (add_parents_to_list(revs, commit, >commits, NULL) < 0) {
+   if (!revs->ignore_missing_links)
+   die("Failed to traverse parents of commit %s",
+   oid_to_hex(>object.oid));
+   }
+}
+
 int prepare_revision_walk(struct rev_info *revs)
 {
int i;
@@ -2925,11 +2953,13 @@ int prepare_revision_walk(struct rev_info *revs)
commit_list_sort_by_date(>commits);
if (revs->no_walk)
return 0;
-   if (revs->limited)
+   if (revs->limited) {
if (limit_list(revs) < 0)
return -1;
-   if (revs->topo_order)
-   sort_in_topological_order(>commits, revs->sort_order);
+   if (revs->topo_order)
+   sort_in_topological_order(>commits, 
revs->sort_order);
+   } else if (revs->topo_order)
+   init_topo_walk(revs);
if 

[PATCH 3/6] test-reach: add rev-list tests

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

The rev-list command is critical to Git's functionality. Ensure it
works in the three commit-graph environments constructed in
t6600-test-reach.sh. Here are a few important types of rev-list
operations:

* Basic: git rev-list --topo-order HEAD
* Range: git rev-list --topo-order compare..HEAD
* Ancestry: git rev-list --topo-order --ancestry-path compare..HEAD
* Symmetric Difference: git rev-list --topo-order compare...HEAD

Signed-off-by: Derrick Stolee 
---
 t/t6600-test-reach.sh | 84 +++
 1 file changed, 84 insertions(+)

diff --git a/t/t6600-test-reach.sh b/t/t6600-test-reach.sh
index 1b18e12a4e..2fcaa39077 100755
--- a/t/t6600-test-reach.sh
+++ b/t/t6600-test-reach.sh
@@ -243,4 +243,88 @@ test_expect_success 'commit_contains:miss' '
test_three_modes commit_contains --tag
 '
 
+test_expect_success 'rev-list: basic topo-order' '
+   git rev-parse \
+   commit-6-6 commit-5-6 commit-4-6 commit-3-6 commit-2-6 
commit-1-6 \
+   commit-6-5 commit-5-5 commit-4-5 commit-3-5 commit-2-5 
commit-1-5 \
+   commit-6-4 commit-5-4 commit-4-4 commit-3-4 commit-2-4 
commit-1-4 \
+   commit-6-3 commit-5-3 commit-4-3 commit-3-3 commit-2-3 
commit-1-3 \
+   commit-6-2 commit-5-2 commit-4-2 commit-3-2 commit-2-2 
commit-1-2 \
+   commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 
commit-1-1 \
+   >expect &&
+   run_three_modes "git rev-list --topo-order commit-6-6"
+'
+
+test_expect_success 'rev-list: first-parent topo-order' '
+   git rev-parse \
+   commit-6-6 \
+   commit-6-5 \
+   commit-6-4 \
+   commit-6-3 \
+   commit-6-2 \
+   commit-6-1 commit-5-1 commit-4-1 commit-3-1 commit-2-1 
commit-1-1 \
+   >expect &&
+   run_three_modes "git rev-list --first-parent --topo-order commit-6-6"
+'
+
+test_expect_success 'rev-list: range topo-order' '
+   git rev-parse \
+   commit-6-6 commit-5-6 commit-4-6 commit-3-6 commit-2-6 
commit-1-6 \
+   commit-6-5 commit-5-5 commit-4-5 commit-3-5 commit-2-5 
commit-1-5 \
+   commit-6-4 commit-5-4 commit-4-4 commit-3-4 commit-2-4 
commit-1-4 \
+   commit-6-3 commit-5-3 commit-4-3 \
+   commit-6-2 commit-5-2 commit-4-2 \
+   commit-6-1 commit-5-1 commit-4-1 \
+   >expect &&
+   run_three_modes "git rev-list --topo-order commit-3-3..commit-6-6"
+'
+
+test_expect_success 'rev-list: range topo-order' '
+   git rev-parse \
+   commit-6-6 commit-5-6 commit-4-6 \
+   commit-6-5 commit-5-5 commit-4-5 \
+   commit-6-4 commit-5-4 commit-4-4 \
+   commit-6-3 commit-5-3 commit-4-3 \
+   commit-6-2 commit-5-2 commit-4-2 \
+   commit-6-1 commit-5-1 commit-4-1 \
+   >expect &&
+   run_three_modes "git rev-list --topo-order commit-3-8..commit-6-6"
+'
+
+test_expect_success 'rev-list: first-parent range topo-order' '
+   git rev-parse \
+   commit-6-6 \
+   commit-6-5 \
+   commit-6-4 \
+   commit-6-3 \
+   commit-6-2 \
+   commit-6-1 commit-5-1 commit-4-1 \
+   >expect &&
+   run_three_modes "git rev-list --first-parent --topo-order 
commit-3-8..commit-6-6"
+'
+
+test_expect_success 'rev-list: ancestry-path topo-order' '
+   git rev-parse \
+   commit-6-6 commit-5-6 commit-4-6 commit-3-6 \
+   commit-6-5 commit-5-5 commit-4-5 commit-3-5 \
+   commit-6-4 commit-5-4 commit-4-4 commit-3-4 \
+   commit-6-3 commit-5-3 commit-4-3 \
+   >expect &&
+   run_three_modes "git rev-list --topo-order --ancestry-path 
commit-3-3..commit-6-6"
+'
+
+test_expect_success 'rev-list: symmetric difference topo-order' '
+   git rev-parse \
+   commit-6-6 commit-5-6 commit-4-6 \
+   commit-6-5 commit-5-5 commit-4-5 \
+   commit-6-4 commit-5-4 commit-4-4 \
+   commit-6-3 commit-5-3 commit-4-3 \
+   commit-6-2 commit-5-2 commit-4-2 \
+   commit-6-1 commit-5-1 commit-4-1 \
+   commit-3-8 commit-2-8 commit-1-8 \
+   commit-3-7 commit-2-7 commit-1-7 \
+   >expect &&
+   run_three_modes "git rev-list --topo-order commit-3-8...commit-6-6"
+'
+
 test_done
-- 
gitgitgadget



[PATCH 5/6] commit/revisions: bookkeeping before refactoring

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

There are a few things that need to move around a little before
making a big refactoring in the topo-order logic:

1. We need access to record_author_date() and
   compare_commits_by_author_date() in revision.c. These are used
   currently by sort_in_topological_order() in commit.c.

2. Moving these methods to commit.h requires adding the author_slab
   definition to commit.h.

3. The add_parents_to_list() method in revision.c performs logic
   around the UNINTERESTING flag and other special cases depending
   on the struct rev_info. Allow this method to ignore a NULL 'list'
   parameter, as we will not be populating the list for our walk.

Signed-off-by: Derrick Stolee 
---
 commit.c   | 11 ---
 commit.h   |  8 
 revision.c |  6 --
 3 files changed, 16 insertions(+), 9 deletions(-)

diff --git a/commit.c b/commit.c
index 32d1234bd7..2dbe187b8c 100644
--- a/commit.c
+++ b/commit.c
@@ -655,11 +655,8 @@ struct commit *pop_commit(struct commit_list **stack)
 /* count number of children that have not been emitted */
 define_commit_slab(indegree_slab, int);
 
-/* record author-date for each commit object */
-define_commit_slab(author_date_slab, unsigned long);
-
-static void record_author_date(struct author_date_slab *author_date,
-  struct commit *commit)
+void record_author_date(struct author_date_slab *author_date,
+   struct commit *commit)
 {
const char *buffer = get_commit_buffer(commit, NULL);
struct ident_split ident;
@@ -684,8 +681,8 @@ fail_exit:
unuse_commit_buffer(commit, buffer);
 }
 
-static int compare_commits_by_author_date(const void *a_, const void *b_,
- void *cb_data)
+int compare_commits_by_author_date(const void *a_, const void *b_,
+  void *cb_data)
 {
const struct commit *a = a_, *b = b_;
struct author_date_slab *author_date = cb_data;
diff --git a/commit.h b/commit.h
index e2c99d9b04..51de10e698 100644
--- a/commit.h
+++ b/commit.h
@@ -8,6 +8,7 @@
 #include "gpg-interface.h"
 #include "string-list.h"
 #include "pretty.h"
+#include "commit-slab.h"
 
 #define COMMIT_NOT_FROM_GRAPH 0x
 #define GENERATION_NUMBER_INFINITY 0x
@@ -328,6 +329,13 @@ extern int remove_signature(struct strbuf *buf);
  */
 extern int check_commit_signature(const struct commit *commit, struct 
signature_check *sigc);
 
+/* record author-date for each commit object */
+define_commit_slab(author_date_slab, timestamp_t);
+
+void record_author_date(struct author_date_slab *author_date,
+   struct commit *commit);
+
+int compare_commits_by_author_date(const void *a_, const void *b_, void 
*unused);
 int compare_commits_by_commit_date(const void *a_, const void *b_, void 
*unused);
 int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, 
void *unused);
 
diff --git a/revision.c b/revision.c
index 1db70dc951..565f903e46 100644
--- a/revision.c
+++ b/revision.c
@@ -804,7 +804,8 @@ static int add_parents_to_list(struct rev_info *revs, 
struct commit *commit,
if (p->object.flags & SEEN)
continue;
p->object.flags |= SEEN;
-   commit_list_insert_by_date_cached(p, list, cached_base, 
cache_ptr);
+   if (list)
+   commit_list_insert_by_date_cached(p, list, 
cached_base, cache_ptr);
}
return 0;
}
@@ -843,7 +844,8 @@ static int add_parents_to_list(struct rev_info *revs, 
struct commit *commit,
p->object.flags |= left_flag;
if (!(p->object.flags & SEEN)) {
p->object.flags |= SEEN;
-   commit_list_insert_by_date_cached(p, list, cached_base, 
cache_ptr);
+   if (list)
+   commit_list_insert_by_date_cached(p, list, 
cached_base, cache_ptr);
}
if (revs->first_parent_only)
break;
-- 
gitgitgadget



[PATCH 0/6] Use generation numbers for --topo-order

2018-08-27 Thread Derrick Stolee via GitGitGadget
This patch series performs a decently-sized refactoring of the revision-walk
machinery. Well, "refactoring" is probably the wrong word, as I don't
actually remove the old code. Instead, when we see certain options in the
'rev_info' struct, we redirect the commit-walk logic to a new set of methods
that distribute the workload differently. By using generation numbers in the
commit-graph, we can significantly improve 'git log --graph' commands (and
the underlying 'git rev-list --topo-order').

On the Linux repository, I got the following performance results when
comparing to the previous version with or without a commit-graph:

Test: git rev-list --topo-order -100 HEAD
HEAD~1, no commit-graph: 6.80 s
HEAD~1, w/ commit-graph: 0.77 s
  HEAD, w/ commit-graph: 0.02 s

Test: git rev-list --topo-order -100 HEAD -- tools
HEAD~1, no commit-graph: 9.63 s
HEAD~1, w/ commit-graph: 6.06 s
  HEAD, w/ commit-graph: 0.06 s

If you want to read this series but are unfamiliar with the commit-graph and
generation numbers, then I recommend reading 
Documentation/technical/commit-graph.txt or a blob post [1] I wrote on the
subject. In particular, the three-part walk described in "revision.c:
refactor basic topo-order logic" is present (but underexplained) as an
animated PNG [2].

Since revision.c is an incredibly important (and old) portion of the
codebase -- and because there are so many orthogonal options in 'struct
rev_info' -- I consider this submission to be "RFC quality". That is, I am
not confident that I am not missing anything, or that my solution is the
best it can be. I did merge this branch with ds/commit-graph-with-grafts and
the "DO-NOT-MERGE: write and read commit-graph always" commit that computes
a commit-graph with every 'git commit' command. The test suite passed with
that change, available on GitHub [3]. To ensure that I cover at least the
case I think are interesting, I added tests to t6600-test-reach.sh to verify
the walks report the correct results for the three cases there (no
commit-graph, full commit-graph, and a partial commit-graph so the walk
starts at GENERATION_NUMBER_INFINITY).

One notable case that is not included in this series is the case of a
history comparison such as 'git rev-list --topo-order A..B'. The existing
code in limit_list() has ways to cut the walk short when all pending commits
are UNINTERESTING. Since this code depends on commit_list instead of the
prio_queue we are using here, I chose to leave it untouched for now. We can
revisit it in a separate series later. Since handle_commit() turns on
revs->limited when a commit is UNINTERESTING, we do not hit the new code in
this case. Removing this 'revs->limited = 1;' line yields correct results,
but the performance is worse.

This series is based on ds/reachable.

Thanks, -Stolee

[1] 
https://blogs.msdn.microsoft.com/devops/2018/07/09/supercharging-the-git-commit-graph-iii-generations/
Supercharging the Git Commit Graph III: Generations and Graph Algorithms

[2] 
https://msdnshared.blob.core.windows.net/media/2018/06/commit-graph-topo-order-b-a.png
Animation showing three-part walk

[3] https://github.com/derrickstolee/git/tree/topo-order/testA branch
containing this series along with commits to compute commit-graph in entire
test suite.

Derrick Stolee (6):
  prio-queue: add 'peek' operation
  test-reach: add run_three_modes method
  test-reach: add rev-list tests
  revision.c: begin refactoring --topo-order logic
  commit/revisions: bookkeeping before refactoring
  revision.c: refactor basic topo-order logic

 commit.c   |  11 +-
 commit.h   |   8 ++
 object.h   |   4 +-
 prio-queue.c   |   9 ++
 prio-queue.h   |   6 +
 revision.c | 232 -
 revision.h |   6 +
 t/helper/test-prio-queue.c |  10 +-
 t/t6600-test-reach.sh  |  98 +++-
 9 files changed, 361 insertions(+), 23 deletions(-)


base-commit: 6cc017431c1c48f80d1c6512fdcc9866cf4b7f55
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-25%2Fderrickstolee%2Ftopo-order%2Fprogress-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-25/derrickstolee/topo-order/progress-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/25
-- 
gitgitgadget


[PATCH 6/6] revision.c: refactor basic topo-order logic

2018-08-27 Thread Derrick Stolee via GitGitGadget
From: Derrick Stolee 

When running a command like 'git rev-list --topo-order HEAD',
Git performed the following steps:

1. Run limit_list(), which parses all reachable commits,
   adds them to a linked list, and distributes UNINTERESTING
   flags. If all unprocessed commits are UNINTERESTING, then
   it may terminate without walking all reachable commits.
   This does not occur if we do not specify UNINTERESTING
   commits.

2. Run sort_in_topological_order(), which is an implementation
   of Kahn's algorithm. It first iterates through the entire
   set of important commits and computes the in-degree of each
   (plus one, as we use 'zero' as a special value here). Then,
   we walk the commits in priority order, adding them to the
   priority queue if and only if their in-degree is one. As
   we remove commits from this priority queue, we decrement the
   in-degree of their parents.

3. While we are peeling commits for output, get_revision_1()
   uses pop_commit on the full list of commits computed by
   sort_in_topological_order().

In the new algorithm, these three steps correspond to three
different commit walks. We run these walks simultaneously,
and advance each only as far as necessary to satisfy the
requirements of the 'higher order' walk. We know when we can
pause each walk by using generation numbers from the commit-
graph feature.

Recall that the generation number of a commit satisfies:

* If the commit has at least one parent, then the generation
  number is one more than the maximum generation number among
  its parents.

* If the commit has no parent, then the generation number is one.

There are two special generation numbers:

* GENERATION_NUMBER_INFINITY: this value is 0x and
  indicates that the commit is not stored in the commit-graph and
  the generation number was not previously calculated.

* GENERATION_NUMBER_ZERO: this value (0) is a special indicator
  to say that the commit-graph was generated by a version of Git
  that does not compute generation numbers (such as v2.18.0).

Since we use generation_numbers_enabled() before using the new
algorithm, we do not need to worry about GENERATION_NUMBER_ZERO.
However, the existence of GENERATION_NUMBER_INFINITY implies the
following weaker statement than the usual we expect from
generation numbers:

If A and B are commits with generation numbers gen(A) and
gen(B) and gen(A) < gen(B), then A cannot reach B.

Thus, we will walk in each of our stages until the "maximum
unexpanded generation number" is strictly lower than the
generation number of a commit we are about to use.

The walks are as follows:

1. EXPLORE: using the explore_queue priority queue (ordered by
   maximizing the generation number), parse each reachable
   commit until all commits in the queue have generation
   number strictly lower than needed. During this walk, update
   the UNINTERESTING flags as necessary.

2. INDEGREE: using the indegree_queue priority queue (ordered
   by maximizing the generation number), add one to the in-
   degree of each parent for each commit that is walked. Since
   we walk in order of decreasing generation number, we know
   that discovering an in-degree value of 0 means the value for
   that commit was not initialized, so should be initialized to
   two. (Recall that in-degree value "1" is what we use to say a
   commit is ready for output.) As we iterate the parents of a
   commit during this walk, ensure the EXPLORE walk has walked
   beyond their generation numbers.

3. TOPO: using the topo_queue priority queue (ordered based on
   the sort_order given, which could be commit-date, author-
   date, or typical topo-order which treats the queue as a LIFO
   stack), remove a commit from the queue and decrement the
   in-degree of each parent. If a parent has an in-degree of
   one, then we add it to the topo_queue. Before we decrement
   the in-degree, however, ensure the INDEGREE walk has walked
   beyond that generation number.

The implementations of these walks are in the following methods:

* explore_walk_step and explore_to_depth
* indegree_walk_step and compute_indegrees_to_depth
* next_topo_commit and expand_topo_walk

These methods have some patterns that may seem strange at first,
but they are probably carry-overs from their equivalents in
limit_list and sort_in_topological_order.

One thing that is missing from this implementation is a proper
way to stop walking when the entire queue is UNINTERESTING, so
this implementation is not enabled by comparisions, such as in
'git rev-list --topo-order A..B'. This can be updated in the
future.

In my local testing, I used the following Git commands on the
Linux repository in three modes: HEAD~1 with no commit-graph,
HEAD~1 with a commit-graph, and HEAD with a commit-graph. This
allows comparing the benefits we get from parsing commits from
the commit-graph and then again the benefits we get by
restricting the set of commits we walk.

Test: git rev-list --topo-order -100 HEAD

unsub

2018-08-27 Thread Scott Johnson
unsubscribe git


Re: [PATCH v3 6/7] fsck: use oidset for skiplist

2018-08-27 Thread Ævar Arnfjörð Bjarmason


On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:

> From: René Scharfe 
>
> Object IDs to skip are stored in a shared static oid_array.  Lookups do
> a binary search on the sorted array.  The code checks if the object IDs
> are already in the correct order while loading and skips sorting in that
> case.  Lookups are done before reporting a (non-fatal) corruption and
> before checking .gitmodules files.
>
> Simplify the code by using an oidset instead.  Memory usage is a bit
> higher, but we don't need to worry about any sort order anymore.  Embed
> the oidset into struct fsck_options to make its ownership clear (no
> hidden sharing) and avoid unnecessary pointer indirection.
>
> Performance on repositories with a low number of reported issues and
> .gitmodules files (i.e. the usual case) won't be affected much.  The
> oidset should be a bit quicker with higher numbers of bad objects in
> the skipList.

I didn't change this commit message at all, but FWIW this still has me
somewhat confused. What is the interaction with .gitmodules being
described here? You also mentioned it in
https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b...@web.de/
(but I don't get that either)

Does that just mean that when cloning with --recursive with
transfer.fsckObjects=true we'll re-read the file for each "clone"
invocation, both for the main project and everything listed in
.gitmodules?

If so, I think something like this commit message would be clearer:

fsck: use oidset instead of oid_array for skipList

Change the implementation of the skipList feature to use oidset
instead of oid_array to store SHA-1s for later lookup.

This list is parsed once on startup by fsck, fetch-pack or
receive-pack depending on the *.skipList config in use, so for fetch
it's currently re-parsed for each submodule fetch.

Memory usage is a bit higher, but we don't need to keep track of the
sort order anymore. Embed the oidset into struct fsck_options to
make its ownership clear (no hidden sharing) and avoid unnecessary
pointer indirection.

Then let's just attach the test program you wrote in
https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b...@web.de/
and note the results and let them speak for themselves.

I can do all that if that seems fine to you. I also notice that the test
case only sets up a case where all the items on the skip list are bad
commits in the repo, it would be interesting to test with a few needles
in a large haystack (I can modify it to do that...).

> Helped-by: Ævar Arnfjörð Bjarmason 
> Signed-off-by: Rene Scharfe 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  Documentation/config.txt | 11 ++-
>  fsck.c   | 23 ++-
>  fsck.h   |  8 +---
>  3 files changed, 13 insertions(+), 29 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index a8dfafa61d..3d0556e85d 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1729,11 +1729,12 @@ all three of them they must all set to the same 
> values.
>  +
>  Older versions of Git (before 2.20) documented that the object names
>  list should be sorted. This was never a requirement, the object names
> -can appear in any order, but when reading the list we track whether
> -the list is sorted for the purposes of an internal binary search
> -implementation, which can save itself some work with an already sorted
> -list.  Unless you have a humongous list there's no reason to go out of
> -your way to pre-sort the list.
> +could appear in any order, but when reading the list we tracked whether
> +the list was sorted for the purposes of an internal binary search
> +implementation, which could save itself some work with an already sorted
> +list. Unless you had a humongous list there was no reason to go out of
> +your way to pre-sort the list. After Git version 2.20 a hash implementation
> +is used instead, so there's now no reason to pre-sort the list.
>
>  gc.aggressiveDepth::
>   The depth parameter used in the delta compression
> diff --git a/fsck.c b/fsck.c
> index 972a26b9ba..4c643f1d40 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -10,7 +10,6 @@
>  #include "fsck.h"
>  #include "refs.h"
>  #include "utf8.h"
> -#include "sha1-array.h"
>  #include "decorate.h"
>  #include "oidset.h"
>  #include "packfile.h"
> @@ -182,19 +181,10 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
>
>  static void init_skiplist(struct fsck_options *options, const char *path)
>  {
> - static struct oid_array skiplist = OID_ARRAY_INIT;
> - int sorted;
>   FILE *fp;
>   struct strbuf sb = STRBUF_INIT;
>   struct object_id oid;
>
> - if (options->skiplist)
> - sorted = options->skiplist->sorted;
> - else {
> - sorted = 1;
> - options->skiplist = 
> - }
> -
>   fp = fopen(path, "r");
>   if (!fp)
>   die("Could not open skip list: %s", 

Re: Would a config var for --force-with-lease be useful?

2018-08-27 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> This was after/during a long discussion starting with:
> https://public-inbox.org/git/cacbzzx7mex-6rhgh2fa9+yl03mjxs8xmye86hnvxbxjmyiz...@mail.gmail.com/
>
> It appears the only patch that got in from that discussion was my
> f17d642d3b ("push: document & test --force-with-lease with multiple
> remotes", 2017-04-19) (https://github.com/git/git/commit/f17d642d3b)

Thanks for pointing at the old thread.

As far as our documentation is concerned, the invitation to improve
the situation, offered in "git push --help", is still valid:

Note that all forms other than `--force-with-lease=:`
that specifies the expected current value of the ref explicitly are
still experimental and their semantics may change as we gain experience
with this feature.

But I do not think (and I did not think back then) there is a magic
bullet to make the lazy force-with-lease automatically safe for
everybody, so it may be time to declare that the lazy force-with-lease
was a failed experiment and move on, with a patch like the one
suggested last year in the message:

  https://public-inbox.org/git/xmqq37a9fl8a.fsf...@gitster.mtv.corp.google.com/



Re: [PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support

2018-08-27 Thread Ævar Arnfjörð Bjarmason


On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Aug 27 2018, René Scharfe wrote:
>
>> Am 27.08.2018 um 09:37 schrieb Ævar Arnfjörð Bjarmason:
>>>
>>> On Sat, Aug 25 2018, René Scharfe wrote:
>>> [...]
>>> Now, I like yours much better. I'm just saying that currently the
>>> patch/commit message combo is confusing about *what* it's
>>> doing. I.e. let's not mix up the correction of docs that were always
>>> wrong with a non-change in the user facing implementation, and some
>>> internal optimization all in one patch.
>>
>> Now you have me confused.  Unsorted lists were always accepted, but the
>> documentation asked for a sorted one anyway, probably to avoid sorting
>> it with every use.  Switching the underlying data structure makes that a
>> moot point -- sortedness is no longer a concern at all -- not in the
>> code, and not for users.
>>
>> Inviting users to run the array implementation with unsorted lists is
>> not incorrect, but it also doesn't help anyone.  We could clarify that
>> sorted lists are preferred or recommended instead of required.  I don't
>> see value in perfecting the documentation of a quirk just before
>> removing it, though.
>
> I think it's easier to clarify step-by-step with code, so here's an my
> version of a v3 which implements what I was suggesting, but then of
> course while doing it I found other stuff to fix, changes since your
> v2:

Sorry for breaking threading, forgot In-Reply-To, which should be
https://public-inbox.org/git/c7cbc289-d86e-09dc-bdb3-b5496cf0b...@web.de/

> René Scharfe (2):
>   fsck: use strbuf_getline() to read skiplist file
>
> None to the code.
>
> I changed some docs I add in earlier patches to now describe behavior
> in a past tense (and the s/sorted // change is earlier), and to change
> the docs to say that sorting the list on-disk is now pointless for
> optimization purposes, but did something in the past.
>
>   fsck: use oidset for skiplist
>
> There is now a test for the bug you were fixing in your 1/2.
>
> Ævar Arnfjörð Bjarmason (5):
>   fsck tests: setup of bogus commit object
>
> Fixing some test redundancies while I'm at it.
>
>   fsck tests: add a test for no skipList input
>
> We didn't test the most basic skipList invocation, fixed while I was
> at it...
>
>   fsck: document and test sorted skipList input
>
> Test that sorted & unsorted skipList input, and document that in the
> past we said this was required, but it never was, but what (ever so
> slight) optimization this gives us.
>
>   fsck: document and test commented & empty line skipList input
>
> We don't support comments or empty lines. Add tests for this not
> working, and explicitly document that it doesn't work (I for one tried
> it).
>
>   fsck: support comments & empty lines in skipList
>
> Ignoring comments and empty lines is very useful for a file format
> that might be human-edited (I maintain one such file). Support that,
> and document & test for it.
>
>  Documentation/config.txt| 22 ++
>  fsck.c  | 48 ++---
>  fsck.h  |  8 +++--
>  t/t5504-fetch-receive-strict.sh | 53 ++---
>  4 files changed, 86 insertions(+), 45 deletions(-)


[PATCH v3 0/7] use oidset for skiplist + docs + tests + comment support

2018-08-27 Thread Ævar Arnfjörð Bjarmason
On Mon, Aug 27 2018, René Scharfe wrote:

> Am 27.08.2018 um 09:37 schrieb Ævar Arnfjörð Bjarmason:
>> 
>> On Sat, Aug 25 2018, René Scharfe wrote:
>> [...]
>> Now, I like yours much better. I'm just saying that currently the
>> patch/commit message combo is confusing about *what* it's
>> doing. I.e. let's not mix up the correction of docs that were always
>> wrong with a non-change in the user facing implementation, and some
>> internal optimization all in one patch.
>
> Now you have me confused.  Unsorted lists were always accepted, but the
> documentation asked for a sorted one anyway, probably to avoid sorting
> it with every use.  Switching the underlying data structure makes that a
> moot point -- sortedness is no longer a concern at all -- not in the
> code, and not for users.
>
> Inviting users to run the array implementation with unsorted lists is
> not incorrect, but it also doesn't help anyone.  We could clarify that
> sorted lists are preferred or recommended instead of required.  I don't
> see value in perfecting the documentation of a quirk just before
> removing it, though.

I think it's easier to clarify step-by-step with code, so here's an my
version of a v3 which implements what I was suggesting, but then of
course while doing it I found other stuff to fix, changes since your
v2:

René Scharfe (2):
  fsck: use strbuf_getline() to read skiplist file

None to the code.

I changed some docs I add in earlier patches to now describe behavior
in a past tense (and the s/sorted // change is earlier), and to change
the docs to say that sorting the list on-disk is now pointless for
optimization purposes, but did something in the past.

  fsck: use oidset for skiplist

There is now a test for the bug you were fixing in your 1/2.

Ævar Arnfjörð Bjarmason (5):
  fsck tests: setup of bogus commit object

Fixing some test redundancies while I'm at it.

  fsck tests: add a test for no skipList input

We didn't test the most basic skipList invocation, fixed while I was
at it...

  fsck: document and test sorted skipList input

Test that sorted & unsorted skipList input, and document that in the
past we said this was required, but it never was, but what (ever so
slight) optimization this gives us.

  fsck: document and test commented & empty line skipList input

We don't support comments or empty lines. Add tests for this not
working, and explicitly document that it doesn't work (I for one tried
it).

  fsck: support comments & empty lines in skipList

Ignoring comments and empty lines is very useful for a file format
that might be human-edited (I maintain one such file). Support that,
and document & test for it.

 Documentation/config.txt| 22 ++
 fsck.c  | 48 ++---
 fsck.h  |  8 +++--
 t/t5504-fetch-receive-strict.sh | 53 ++---
 4 files changed, 86 insertions(+), 45 deletions(-)

-- 
2.19.0.rc0.228.g281dcd1b4d0



[PATCH v3 1/7] fsck tests: setup of bogus commit object

2018-08-27 Thread Ævar Arnfjörð Bjarmason
Several fsck tests used the exact same git-hash-object output, but had
copy/pasted that part of the setup code. Let's instead do that setup
once and use it in subsequent tests.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t5504-fetch-receive-strict.sh | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 62f3569891..6d268f3327 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -133,6 +133,10 @@ committer Bugs Bunny  1234567890 +
 This commit object intentionally broken
 EOF
 
+test_expect_success 'setup bogus commit' '
+   commit="$(git hash-object -t commit -w --stdin err &&
@@ -142,7 +146,6 @@ test_expect_success 'fsck with invalid or bogus skipList 
input' '
 '
 
 test_expect_success 'push with receive.fsck.skipList' '
-   commit="$(git hash-object -t commit -w --stdin  dies' '
 '
 
 test_expect_success 'push with receive.fsck.missingEmail=warn' '
-   commit="$(git hash-object -t commit -w --stdin 

[PATCH v3 3/7] fsck: document and test sorted skipList input

2018-08-27 Thread Ævar Arnfjörð Bjarmason
Ever since the skipList support was first added in cd94c6f91 ("fsck:
git receive-pack: support excluding objects from fsck'ing",
2015-06-22) the documentation for the format has that the file is a
sorted list of object names.

Thus, anyone using the feature would have thought the list needed to
be sorted. E.g. I recently in conjunction with my fetch.fsck.*
implementation in 1362df0d41 ("fetch: implement fetch.fsck.*",
2018-07-27) wrote some code to ship a skipList, and went out of my way
to sort it.

Doing so seems intuitive, since it contains fixed-width records, and
has no support for comments, so one might expect it to be binary
searched in-place on-disk.

However, as documented here this was never a requirement, so let's
change the documentation. Since this is a file format change let's
also document what was said about this in the past, so e.g. someone
like myself reading the new docs can see this never needed to be
sorted ("why do I have all this code to sort this thing...").

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt| 10 +-
 t/t5504-fetch-receive-strict.sh | 19 +++
 2 files changed, 28 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1c42364988..9359fb534e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1708,7 +1708,7 @@ doing the same for `receive.fsck.` and 
`fetch.fsck.`
 will only cause git to warn.
 
 fsck.skipList::
-   The path to a sorted list of object names (i.e. one SHA-1 per
+   The path to a list of object names (i.e. one SHA-1 per
line) that are known to be broken in a non-fatal way and should
be ignored. This feature is useful when an established project
should be accepted despite early commits containing errors that
@@ -1723,6 +1723,14 @@ Unlike variables like `color.ui` and `core.editor` the
 fall back on the `fsck.skipList` configuration if they aren't set. To
 uniformly configure the same fsck settings in different circumstances
 all three of them they must all set to the same values.
++
+Older versions of Git (before 2.20) documented that the object names
+list should be sorted. This was never a requirement, the object names
+can appear in any order, but when reading the list we track whether
+the list is sorted for the purposes of an internal binary search
+implementation, which can save itself some work with an already sorted
+list.  Unless you have a humongous list there's no reason to go out of
+your way to pre-sort the list.
 
 gc.aggressiveDepth::
The depth parameter used in the delta compression
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index cbae31f330..fa56052f0f 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -142,6 +142,25 @@ test_expect_success 'fsck with no skipList input' '
test_i18ngrep "missingEmail" err
 '
 
+test_expect_success 'setup sorted and unsorted skipLists' '
+   cat >SKIP.unsorted <<-EOF &&
+   0004
+   0002
+   $commit
+   0001
+   0003
+   EOF
+   sort SKIP.unsorted >SKIP.sorted
+'
+
+test_expect_success 'fsck with sorted skipList' '
+   git -c fsck.skipList=SKIP.sorted fsck
+'
+
+test_expect_success 'fsck with unsorted skipList' '
+   git -c fsck.skipList=SKIP.unsorted fsck
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
test_must_fail git -c fsck.skipList=does-not-exist -c 
fsck.missingEmail=ignore fsck 2>err &&
-- 
2.19.0.rc0.228.g281dcd1b4d0



[PATCH v3 2/7] fsck tests: add a test for no skipList input

2018-08-27 Thread Ævar Arnfjörð Bjarmason
The recent 65a836fa6b ("fsck: add stress tests for fsck.skipList",
2018-07-27) added various stress tests for odd invocations of
fsck.skipList, but didn't tests for some very simple ones, such as
asserting that providing to skipList with a bad commit causes fsck to
exit with a non-zero exit code. Add such a test.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t5504-fetch-receive-strict.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 6d268f3327..cbae31f330 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -137,6 +137,11 @@ test_expect_success 'setup bogus commit' '
commit="$(git hash-object -t commit -w --stdin err &&
+   test_i18ngrep "missingEmail" err
+'
+
 test_expect_success 'fsck with invalid or bogus skipList input' '
git -c fsck.skipList=/dev/null -c fsck.missingEmail=ignore fsck &&
test_must_fail git -c fsck.skipList=does-not-exist -c 
fsck.missingEmail=ignore fsck 2>err &&
-- 
2.19.0.rc0.228.g281dcd1b4d0



[PATCH v3 5/7] fsck: use strbuf_getline() to read skiplist file

2018-08-27 Thread Ævar Arnfjörð Bjarmason
From: René Scharfe 

buffer is unlikely to contain a NUL character, so printing its contents
using %s in a die() format is unsafe (detected with ASan).

Use an idiomatic strbuf_getline() loop instead, which ensures the buffer
is always NUL-terminated, supports CRLF files as well, accepts files
without a newline after the last line, supports any hash length
automatically, and is shorter.

Helped-by: Jeff King 
Signed-off-by: Rene Scharfe 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 fsck.c  | 25 -
 t/t5504-fetch-receive-strict.sh |  2 +-
 2 files changed, 13 insertions(+), 14 deletions(-)

diff --git a/fsck.c b/fsck.c
index a0cee0be59..972a26b9ba 100644
--- a/fsck.c
+++ b/fsck.c
@@ -183,8 +183,9 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 static void init_skiplist(struct fsck_options *options, const char *path)
 {
static struct oid_array skiplist = OID_ARRAY_INIT;
-   int sorted, fd;
-   char buffer[GIT_MAX_HEXSZ + 1];
+   int sorted;
+   FILE *fp;
+   struct strbuf sb = STRBUF_INIT;
struct object_id oid;
 
if (options->skiplist)
@@ -194,25 +195,23 @@ static void init_skiplist(struct fsck_options *options, 
const char *path)
options->skiplist = 
}
 
-   fd = open(path, O_RDONLY);
-   if (fd < 0)
+   fp = fopen(path, "r");
+   if (!fp)
die("Could not open skip list: %s", path);
-   for (;;) {
+   while (!strbuf_getline(, fp)) {
const char *p;
-   int result = read_in_full(fd, buffer, sizeof(buffer));
-   if (result < 0)
-   die_errno("Could not read '%s'", path);
-   if (!result)
-   break;
-   if (parse_oid_hex(buffer, , ) || *p != '\n')
-   die("Invalid SHA-1: %s", buffer);
+   if (parse_oid_hex(sb.buf, , ) || *p != '\0')
+   die("Invalid SHA-1: %s", sb.buf);
oid_array_append(, );
if (sorted && skiplist.nr > 1 &&
oidcmp([skiplist.nr - 2],
   ) > 0)
sorted = 0;
}
-   close(fd);
+   if (ferror(fp))
+   die_errno("Could not read '%s'", path);
+   fclose(fp);
+   strbuf_release();
 
if (sorted)
skiplist.sorted = 1;
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 38aaf3b928..c7224db3bb 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -185,7 +185,7 @@ test_expect_success 'fsck with invalid or bogus skipList 
input (comments & empty
test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
 '
 
-test_expect_failure 'fsck no garbage output from comments & empty lines 
errors' '
+test_expect_success 'fsck no garbage output from comments & empty lines 
errors' '
test_line_count = 1 err-with-comment &&
test_line_count = 1 err-with-empty-line
 '
-- 
2.19.0.rc0.228.g281dcd1b4d0



[PATCH v3 7/7] fsck: support comments & empty lines in skipList

2018-08-27 Thread Ævar Arnfjörð Bjarmason
It's annoying not to be able to put comments and empty lines in the
skipList, when e.g. keeping a big central list of commits to skip in
/etc/gitconfig, which was my motivation for 1362df0d41 ("fetch:
implement fetch.fsck.*", 2018-07-27).

Implement that, and document what version of Git this was changed in,
since this on-disk format can be expected to be used by multiple
versions of git.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt| 4 ++--
 fsck.c  | 2 ++
 t/t5504-fetch-receive-strict.sh | 6 +++---
 3 files changed, 7 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3d0556e85d..e6f95a7fb2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1710,8 +1710,8 @@ will only cause git to warn.
 fsck.skipList::
The path to a list of object names (i.e. one SHA-1 per
line) that are known to be broken in a non-fatal way and should
-   be ignored. Comments ('#') and empty lines are not supported, and
-   will error out.
+   be ignored. On versions of Git 2.20 and later comments ('#') and empty
+   lines are ignored, but will error out on older versions.
 +
 This feature is useful when an established project should be accepted
 despite early commits containing errors that can be safely ignored
diff --git a/fsck.c b/fsck.c
index 4c643f1d40..589548308a 100644
--- a/fsck.c
+++ b/fsck.c
@@ -190,6 +190,8 @@ static void init_skiplist(struct fsck_options *options, 
const char *path)
die("Could not open skip list: %s", path);
while (!strbuf_getline(, fp)) {
const char *p;
+   if (!strcmp(sb.buf, "") || starts_with(sb.buf, "#"))
+   continue;
if (parse_oid_hex(sb.buf, , ) || *p != '\0')
die("Invalid SHA-1: %s", sb.buf);
oidset_insert(>skiplist, );
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index c7224db3bb..a1bac164d1 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -169,20 +169,20 @@ test_expect_success 'fsck with invalid or bogus skipList 
input' '
test_i18ngrep "Invalid SHA-1: \[core\]" err
 '
 
-test_expect_success 'fsck with invalid or bogus skipList input (comments & 
empty lines)' '
+test_expect_success 'fsck with other accepted skipList input (comments & empty 
lines)' '
cat >SKIP.with-comment <<-EOF &&
# Some bad commit
0001
EOF
test_must_fail git -c fsck.skipList=SKIP.with-comment fsck 
2>err-with-comment &&
-   test_i18ngrep "^fatal: Invalid SHA-1: # Some bad commit$" 
err-with-comment &&
+   test_i18ngrep "missingEmail" err-with-comment &&
cat >SKIP.with-empty-line <<-EOF &&
0001
 
0002
EOF
test_must_fail git -c fsck.skipList=SKIP.with-empty-line fsck 
2>err-with-empty-line &&
-   test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
+   test_i18ngrep "missingEmail" err-with-empty-line
 '
 
 test_expect_success 'fsck no garbage output from comments & empty lines 
errors' '
-- 
2.19.0.rc0.228.g281dcd1b4d0



[PATCH v3 6/7] fsck: use oidset for skiplist

2018-08-27 Thread Ævar Arnfjörð Bjarmason
From: René Scharfe 

Object IDs to skip are stored in a shared static oid_array.  Lookups do
a binary search on the sorted array.  The code checks if the object IDs
are already in the correct order while loading and skips sorting in that
case.  Lookups are done before reporting a (non-fatal) corruption and
before checking .gitmodules files.

Simplify the code by using an oidset instead.  Memory usage is a bit
higher, but we don't need to worry about any sort order anymore.  Embed
the oidset into struct fsck_options to make its ownership clear (no
hidden sharing) and avoid unnecessary pointer indirection.

Performance on repositories with a low number of reported issues and
.gitmodules files (i.e. the usual case) won't be affected much.  The
oidset should be a bit quicker with higher numbers of bad objects in
the skipList.

Helped-by: Ævar Arnfjörð Bjarmason 
Signed-off-by: Rene Scharfe 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt | 11 ++-
 fsck.c   | 23 ++-
 fsck.h   |  8 +---
 3 files changed, 13 insertions(+), 29 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index a8dfafa61d..3d0556e85d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1729,11 +1729,12 @@ all three of them they must all set to the same values.
 +
 Older versions of Git (before 2.20) documented that the object names
 list should be sorted. This was never a requirement, the object names
-can appear in any order, but when reading the list we track whether
-the list is sorted for the purposes of an internal binary search
-implementation, which can save itself some work with an already sorted
-list.  Unless you have a humongous list there's no reason to go out of
-your way to pre-sort the list.
+could appear in any order, but when reading the list we tracked whether
+the list was sorted for the purposes of an internal binary search
+implementation, which could save itself some work with an already sorted
+list. Unless you had a humongous list there was no reason to go out of
+your way to pre-sort the list. After Git version 2.20 a hash implementation
+is used instead, so there's now no reason to pre-sort the list.
 
 gc.aggressiveDepth::
The depth parameter used in the delta compression
diff --git a/fsck.c b/fsck.c
index 972a26b9ba..4c643f1d40 100644
--- a/fsck.c
+++ b/fsck.c
@@ -10,7 +10,6 @@
 #include "fsck.h"
 #include "refs.h"
 #include "utf8.h"
-#include "sha1-array.h"
 #include "decorate.h"
 #include "oidset.h"
 #include "packfile.h"
@@ -182,19 +181,10 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 
 static void init_skiplist(struct fsck_options *options, const char *path)
 {
-   static struct oid_array skiplist = OID_ARRAY_INIT;
-   int sorted;
FILE *fp;
struct strbuf sb = STRBUF_INIT;
struct object_id oid;
 
-   if (options->skiplist)
-   sorted = options->skiplist->sorted;
-   else {
-   sorted = 1;
-   options->skiplist = 
-   }
-
fp = fopen(path, "r");
if (!fp)
die("Could not open skip list: %s", path);
@@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options *options, 
const char *path)
const char *p;
if (parse_oid_hex(sb.buf, , ) || *p != '\0')
die("Invalid SHA-1: %s", sb.buf);
-   oid_array_append(, );
-   if (sorted && skiplist.nr > 1 &&
-   oidcmp([skiplist.nr - 2],
-  ) > 0)
-   sorted = 0;
+   oidset_insert(>skiplist, );
}
if (ferror(fp))
die_errno("Could not read '%s'", path);
fclose(fp);
strbuf_release();
-
-   if (sorted)
-   skiplist.sorted = 1;
 }
 
 static int parse_msg_type(const char *str)
@@ -319,9 +302,7 @@ static void append_msg_id(struct strbuf *sb, const char 
*msg_id)
 
 static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
 {
-   if (opts && opts->skiplist && obj)
-   return oid_array_lookup(opts->skiplist, >oid) >= 0;
-   return 0;
+   return opts && obj && oidset_contains(>skiplist, >oid);
 }
 
 __attribute__((format (printf, 4, 5)))
diff --git a/fsck.h b/fsck.h
index 0c7e8c9428..b95595ae5f 100644
--- a/fsck.h
+++ b/fsck.h
@@ -1,6 +1,8 @@
 #ifndef GIT_FSCK_H
 #define GIT_FSCK_H
 
+#include "oidset.h"
+
 #define FSCK_ERROR 1
 #define FSCK_WARN 2
 #define FSCK_IGNORE 3
@@ -35,12 +37,12 @@ struct fsck_options {
fsck_error error_func;
unsigned strict:1;
int *msg_type;
-   struct oid_array *skiplist;
+   struct oidset skiplist;
struct decoration *object_names;
 };
 
-#define FSCK_OPTIONS_DEFAULT { NULL, fsck_error_function, 0, NULL }
-#define FSCK_OPTIONS_STRICT { NULL, fsck_error_function, 1, NULL }
+#define 

[PATCH v3 4/7] fsck: document and test commented & empty line skipList input

2018-08-27 Thread Ævar Arnfjörð Bjarmason
There is currently no comment syntax for the fsck.skipList, this isn't
really by design, and it would be nice to have support for comments.

Document that this doesn't work, and test for how this errors
out. These tests reveal a current bug, if there's invalid input the
output will emit some of the next line, and then go into uninitialized
memory. This is fixed in a subsequent change.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt| 11 +++
 t/t5504-fetch-receive-strict.sh | 21 +
 2 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 9359fb534e..a8dfafa61d 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1710,10 +1710,13 @@ will only cause git to warn.
 fsck.skipList::
The path to a list of object names (i.e. one SHA-1 per
line) that are known to be broken in a non-fatal way and should
-   be ignored. This feature is useful when an established project
-   should be accepted despite early commits containing errors that
-   can be safely ignored such as invalid committer email addresses.
-   Note: corrupt objects cannot be skipped with this setting.
+   be ignored. Comments ('#') and empty lines are not supported, and
+   will error out.
++
+This feature is useful when an established project should be accepted
+despite early commits containing errors that can be safely ignored
+such as invalid committer email addresses.  Note: corrupt objects
+cannot be skipped with this setting.
 +
 Like `fsck.` this variable has corresponding
 `receive.fsck.skipList` and `fetch.fsck.skipList` variants.
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index fa56052f0f..38aaf3b928 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -169,6 +169,27 @@ test_expect_success 'fsck with invalid or bogus skipList 
input' '
test_i18ngrep "Invalid SHA-1: \[core\]" err
 '
 
+test_expect_success 'fsck with invalid or bogus skipList input (comments & 
empty lines)' '
+   cat >SKIP.with-comment <<-EOF &&
+   # Some bad commit
+   0001
+   EOF
+   test_must_fail git -c fsck.skipList=SKIP.with-comment fsck 
2>err-with-comment &&
+   test_i18ngrep "^fatal: Invalid SHA-1: # Some bad commit$" 
err-with-comment &&
+   cat >SKIP.with-empty-line <<-EOF &&
+   0001
+
+   0002
+   EOF
+   test_must_fail git -c fsck.skipList=SKIP.with-empty-line fsck 
2>err-with-empty-line &&
+   test_i18ngrep "^fatal: Invalid SHA-1: " err-with-empty-line
+'
+
+test_expect_failure 'fsck no garbage output from comments & empty lines 
errors' '
+   test_line_count = 1 err-with-comment &&
+   test_line_count = 1 err-with-empty-line
+'
+
 test_expect_success 'push with receive.fsck.skipList' '
git push . $commit:refs/heads/bogus &&
rm -rf dst &&
-- 
2.19.0.rc0.228.g281dcd1b4d0



Re: Would a config var for --force-with-lease be useful?

2018-08-27 Thread Ævar Arnfjörð Bjarmason


On Mon, Aug 27 2018, Junio C Hamano wrote:

> Scott Johnson  writes:
>
>> Hello Everyone:
>>
>> I'm considering writing a patch that adds a configuration variable
>> that will allow the user to default the command:
>>
>> git push --force
>>
>> to:
>>
>> git push --force-with-lease
>
> I actually consider "--force-with-lease" that does not say "this is
> what exactly I am expecting to replace with my version" a fairly
> dangerous form to recommend to the general public, unless their use
> of "git fetch" (or "git pull") is disciplined.  In the extreme case,
> if you habitually do "git fetch origin" only to update the remote
> tracking branches (so that you can do things like "git log ..origin"
> to see what others have been doing while you were offline), using
> "--force-with-lease" offers no value over "--force", as you're
> likely to find your remote-tracking ref to be up-to-date, but it no
> longer is what you based on your decision that replacing the tip
> with your version is safe.
>
> So, from that point of view, I would recommend thinking twice before
> considering to add such a configuration variable.

Last year there was a proposal for such a patch in:
https://public-inbox.org/git/1499116727-757-1-git-send-emai...@mazzo.li/

This was after/during a long discussion starting with:
https://public-inbox.org/git/cacbzzx7mex-6rhgh2fa9+yl03mjxs8xmye86hnvxbxjmyiz...@mail.gmail.com/

It appears the only patch that got in from that discussion was my
f17d642d3b ("push: document & test --force-with-lease with multiple
remotes", 2017-04-19) (https://github.com/git/git/commit/f17d642d3b)

I still think something like such a config variable would be useful, as
noted in https://public-inbox.org/git/8760f4bmig@gmail.com/ Junio
voiced similar objections at the time.

It would be great to have some patch like this for consideration, but
give that thread a read first to see what some of the objections were /
various points raised for/against doing that.


Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-27 Thread Jeff King
On Mon, Aug 27, 2018 at 05:55:43AM -0400, Eric Sunshine wrote:

> On Fri, Aug 24, 2018 at 7:25 PM Jeff King  wrote:
> > On Fri, Aug 24, 2018 at 06:55:24PM -0400, Eric Sunshine wrote:
> > > On Fri, Aug 24, 2018 at 10:47 AM Duy Nguyen  wrote:
> > > > > I was thinking that "worktree add" could start respecting the --force
> > > > > option as an escape hatch.
> > > >
> > > > Sounds good. Eric are you going to implement this? Just checking so
> > > > that I can (hopefully) cross this off my backlog ;-)
> > >
> > > It wasn't something I was planning on working on (at least not
> > > immediately) [...]
> > > As for the actual implementation, I haven't yet looked at how much
> > > surgery will be needed to make 'add' respect --force.
> >
> > Me either. I may take a look this weekend. [...]
> 
> Okay, I got an implementation up and running. It didn't require too
> much code, but neither was it a simple 1- or 2-liner.
> 
> I still need to update documentation, write tests, and compose the
> actual patch series (which will probably run to about 5 patches), so
> it's not quite ready to send out, but hopefully soon.

Great, and thanks for letting me know before we duplicated effort.

-Peff


Re: [PATCH v4 4/6] tests: use shorter here-docs in chainlint.sed for AIX sed

2018-08-27 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> Improve the portability of chainlint by using shorter here-docs. On
> AIX sed will complain about:
>
> sed: 0602-417 The label :hereslurp is greater than eight
> characters

Remind me again not to forget doing s/here-doc/label/ on this patch
before queueing.  Other than that, looks good, together with 3/6 for
the the indented comment issue.



Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts

2018-08-27 Thread Junio C Hamano
Thomas Gummerer  writes:

> Fair enough.  I thought of the technical documentation as something
> that doesn't promise users anything, but rather describes how the
> internals work right now, which is what this bit of documentation
> attempted to write down.

That's fine.  I'd rather keep it but perhaps add a reminder to tell
readers that it works only when the merging of contents that already
records with nested conflict markers happen to "cleanly nest".

Thanks.


Re: [PATCH] read-cache.c: optimize reading index format v4

2018-08-27 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> Running "test-tool read-cache 100" on webkit.git (275k files), reading
> v2 only takes 4.226 seconds, while v4 takes 5.711 seconds, 35% more
> time. The patch reduces read time on v4 to 4.319 seconds.

Nice.

> PS. I notice that v4 does not pad to align entries at 4 byte boundary
> like v2/v3. This could cause a slight slow down on x86 and segfault on
> some other platforms.

Care to elaborate?  

Long time ago, we used to mmap and read directly from the index file
contents, requiring either an unaligned read or padded entries.  But
that was eons ago and we first read and convert from on-disk using
get_be32() etc. to in-core structure, so I am not sure what you mean
by "segfault" here.

> @@ -1782,28 +1735,61 @@ static struct cache_entry *create_from_disk(struct 
> mem_pool *mem_pool,
>   extended_flags = get_be16(>flags2) << 16;
>   /* We do not yet understand any bit out of CE_EXTENDED_FLAGS */
>   if (extended_flags & ~CE_EXTENDED_FLAGS)
> - die("Unknown index entry format %08x", extended_flags);
> + die(_("unknown index entry format %08x"), 
> extended_flags);

Do this as a separate preparation patch that is not controversial
and can sail through without waiting for the rest of this patch.

In other words, don't slip in unreleted changes.

> - if (!previous_name) {
> - /* v3 and earlier */
> - if (len == CE_NAMEMASK)
> - len = strlen(name);
> - ce = cache_entry_from_ondisk(mem_pool, ondisk, flags, name, 
> len);
> + /*
> +  * Adjacent cache entries tend to share the leading paths, so it makes
> +  * sense to only store the differences in later entries.  In the v4
> +  * on-disk format of the index, each on-disk cache entry stores the
> +  * number of bytes to be stripped from the end of the previous name,
> +  * and the bytes to append to the result, to come up with its name.
> +  */
> + if (previous_ce) {
> + const unsigned char *cp = (const unsigned char *)name;
>  
> - *ent_size = ondisk_ce_size(ce);
> - } else {
> - unsigned long consumed;
> - consumed = expand_name_field(previous_name, name);
> - ce = cache_entry_from_ondisk(mem_pool, ondisk, flags,
> -  previous_name->buf,
> -  previous_name->len);
> + strip_len = decode_varint();
> + if (previous_ce->ce_namelen < strip_len)
> + die(_("malformed name field in the index, path '%s'"),
> + previous_ce->name);

The message is misleading; the previous is not the problematic one,
but the one that comes after it is.  Perhaps s/, path/, near path/
or something.

> + name = (const char *)cp;
> + }
>  
> - *ent_size = (name - ((char *)ondisk)) + consumed;
> + if (len == CE_NAMEMASK) {
> + len = strlen(name);
> + if (previous_ce)
> + len += previous_ce->ce_namelen - strip_len;

Nicely done.  If the result fits in that 12-bit truncated name, then
it is full so we do not need to adjust for strip.  Otherwise, we
know the length of this name is the sum of the part that is shared
with the previous one and the part that is unique to this one.

> + }
> +
> + ce = mem_pool__ce_alloc(mem_pool, len);
> + ce->ce_stat_data.sd_ctime.sec = get_be32(>ctime.sec);
> + ce->ce_stat_data.sd_mtime.sec = get_be32(>mtime.sec);
> + ce->ce_stat_data.sd_ctime.nsec = get_be32(>ctime.nsec);
> + ce->ce_stat_data.sd_mtime.nsec = get_be32(>mtime.nsec);
> + ce->ce_stat_data.sd_dev   = get_be32(>dev);
> + ce->ce_stat_data.sd_ino   = get_be32(>ino);
> + ce->ce_mode  = get_be32(>mode);
> + ce->ce_stat_data.sd_uid   = get_be32(>uid);
> + ce->ce_stat_data.sd_gid   = get_be32(>gid);
> + ce->ce_stat_data.sd_size  = get_be32(>size);
> + ce->ce_flags = flags & ~CE_NAMEMASK;
> + ce->ce_namelen = len;
> + ce->index = 0;
> + hashcpy(ce->oid.hash, ondisk->sha1);

Again, nice.  Now two callsites (both in this function) that call
cache_entry_from_ondisk() with slightly different parameters are
unified, there is no strong reason to have it as a single caller
helper function.

> @@ -1898,7 +1884,8 @@ int do_read_index(struct index_state *istate, const 
> char *path, int must_exist)
>   struct cache_header *hdr;
>   void *mmap;
>   size_t mmap_size;
> - struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
> + const struct cache_entry *previous_ce = NULL;
> + struct cache_entry *dummy_entry = NULL;
>  
>   if (istate->initialized)
>   return istate->cache_nr;
> @@ -1936,11 +1923,10 @@ int do_read_index(struct index_state *istate, const 
> char *path, int must_exist)
>   istate->initialized = 1;
>  
>   if 

Re: [ANNOUNCE] Git v2.19.0-rc0

2018-08-27 Thread Junio C Hamano
Jeff King  writes:

> Actually, what I showed earlier does seem to have some weirdness with
> else-if. But I finally stumbled on something even better:
>
>   - oidcmp(a, b) != 0
>   + !oideq(a, b)
>
> Because of the isomorphisms that coccinelle knows about, that catches
> everything we want.

Nice ;-)



Re: Would a config var for --force-with-lease be useful?

2018-08-27 Thread Junio C Hamano
Scott Johnson  writes:

> Hello Everyone:
>
> I'm considering writing a patch that adds a configuration variable
> that will allow the user to default the command:
>
> git push --force
>
> to:
>
> git push --force-with-lease

I actually consider "--force-with-lease" that does not say "this is
what exactly I am expecting to replace with my version" a fairly
dangerous form to recommend to the general public, unless their use
of "git fetch" (or "git pull") is disciplined.  In the extreme case,
if you habitually do "git fetch origin" only to update the remote
tracking branches (so that you can do things like "git log ..origin"
to see what others have been doing while you were offline), using
"--force-with-lease" offers no value over "--force", as you're
likely to find your remote-tracking ref to be up-to-date, but it no
longer is what you based on your decision that replacing the tip
with your version is safe.

So, from that point of view, I would recommend thinking twice before
considering to add such a configuration variable.


Re: [PATCH 12/21] patch-ids.c: remove implicit dependency on the_index

2018-08-27 Thread Stefan Beller
> -int init_patch_ids(struct patch_ids *ids)
> +int init_patch_ids(struct patch_ids *ids, struct repository *repo)
>  {
> memset(ids, 0, sizeof(*ids));
> -   diff_setup(>diffopts, the_repository);
> +   diff_setup(>diffopts, repo);

Just realized when looking at this diff, though it applies to
other patches as well. (and reading Documentation/technical/api-diff.txt
confirms my thinking IMHO)

What makes the repository argument any special compared
to the rest of the diff options?

So I would expect the setup to look like

memset(ids, 0, sizeof(*ids));
ids->diffopts->repo = the_repository;
diff_setup(>diffopts);

here and in diff_setup, we'd have

  if (!options->repo)
options->repo = the_repository;

or even put the_repository into default_diff_options,
but then I wonder how this deals with no-repo invocations
(git diff --no-index examples for bug reports)


Git clean removing tracked files semi-regularly

2018-08-27 Thread Brennan Conroy
Hello, I work on a project that uses git. "git clean -xdf" is a common command 
to run to clean up the environment, however sometimes this command deletes the 
entire contents of two of the folders and all the files it deletes are being 
tracked which according to the documentation should not be deleted.

The project is located at https://github.com/aspnet/SignalR and the two folders 
it likes to delete are 
https://github.com/aspnet/SignalR/tree/release/2.2/clients/ts/signalr-protocol-msgpack
 and https://github.com/aspnet/SignalR/tree/release/2.2/clients/ts/signalr 

If you need me to collect git logs etc. please don't hesitate to ask!



Re: [PATCH 02/21] read-cache.c: remove 'const' from index_has_changes()

2018-08-27 Thread Stefan Beller
On Sun, Aug 26, 2018 at 3:03 AM Nguyễn Thái Ngọc Duy  wrote:
>
> This function calls do_diff_cache() which eventually needs to set this
> "istate" to unpack_options->src_index (*). This is an unfornate fact

unfortunate

> diff --git a/diff.c b/diff.c

Unlike I thought in the cover letter, this is just adding the repository all
over the place and not adding new code, despite the size.

A cursory read seems to make sense, though I'd nitpick on the
choice of the variable name as I used to use 'r' for the repository
struct. I am not saying that is better, but we could think if we want to
strive for some consistency eventually. (for example most strbufs are
named 'sb', as they are temporary helpers with no explicit naming
required. So maybe we could strive to name all "repository pass throughs"
to be "repo" or "r").

It's sad to see the diff code invaded by the need of the_repository, but
the dependency is there, so it is better to have it explicit than implicit.

Thanks,
Stefan


Re: Automatic core.autocrlf?

2018-08-27 Thread Andrei Rybak
On 2018-08-27 19:32, Andrei Rybak wrote:
> 
> How about just using unconditional includes?
> 
> global.gitconfig (synced across machines):
> 
>   [include]
>   path = platform-specific.gitconfig
> 
> And two version of file named "platform-specific.gitconfig", which
> are not synced, and include only code.autocrlf setting.

Robert, in this setup you might also want (just for convenience)
to sync files "platform1.gitconfig" and "platform2.gitconfig", so
that they are available everywhere for editing and can be copied
into local version of platform-specific.gitconfig at any time.



Re: [PATCH v3] range-diff: update stale summary of --no-dual-color

2018-08-27 Thread Junio C Hamano
Jonathan Nieder  writes:

> Junio C Hamano wrote:
>> Jonathan Nieder  writes:
>>> Kyle Meyer wrote:
>
 Subject: [PATCH v3] range-diff: update stale summary of --no-dual-color
> [...]
>>> Reviewed-by: Jonathan Nieder 
>>
>> Sorry, too late.  I'll revert the merge of the previous round out of
>> 'next' and requeue this one, but that will have to wait until the
>> next integration cycle.
>
> Thanks for the heads up.  Sounds like a fine plan.

Having said that, I do not think the change from v2 to v3 is an
improvement.  At least the one in v2 explained what the input is to
the logic to determine colors, helping the users to understand what
is painted and why and decide if that coloring is useful to them.

The phrasing in v3, "use simple diff colors", does not give much
information over saying something like "paint it differently" (which
is silly because "differently" is a given, once you give an option
to cause a non-default behaviour).

Not limited to this particular case, but in general, subjective
words like "simple" have much less information density than more
specific words, and we need to be careful when spending bits on a
limited space (like option description) to them.



Re: [PATCH 0/1] Teach the builtin rebase about the builtin interactive rebase

2018-08-27 Thread Junio C Hamano
Johannes Schindelin  writes:

>> Please include this information in the commit message.  It's super
>> helpful to find this kind of information about why a patch does what
>> it does when encountering a patch later "in the wild" (in git log -S
>> output).
>
> I thought I did include the relevant part? As to the full back story: I
> was repeatedly dressed down by Junio in recent attempts to include more
> motivation in my commit messages. So I am reluctant to do as you say,
> because Junio is the BDFL here.

I do recall discouraging you from including irrelevant rant/whine in
the log message a few times in the recent past, and also I do recall
you never listening to me.  Don't make me an excuse.

I think what Jonathan finds helpful is the other half of the story
of what you did write in [1/1].  You wrote that it is no longer a
shell script and needs to follow a separate calling convention.
What was missing from that description that was given in [0/1] is
why the original "rebase-in-c" series was done while pretending that
the other effort "rebase-i-in-c" did not even exist, which made it
necessary to do this change as a separate step.

And I tend to agree that it _is_ a relevant story in this case.



Re: Possible bug: identical lines added/removed in git diff

2018-08-27 Thread Stefan Beller
On Sun, Aug 26, 2018 at 6:54 PM Gabriel Holodak  wrote:
>
> I think I'm running into a bug with git diff on v2.18.0.

I cannot reproduce with the two files attached.

I suspected you might have a different diff algorithm configured,
so I tested
git diff --no-index old new
git diff --patience --no-index old new
git diff --histogram --no-index old new

all of which do not reproduce the issue.

Are there any encoding issues locally (Which platform
are you on?) or in transit (Could you re-download the files
from [1] and confirm it still produces this bug?)

[1] 
https://public-inbox.org/git/CAE6=wb_4_phjfqpubfcyknkejfdr22s-y0npqkw5yd4gvan...@mail.gmail.com/


> where the "DWIDTH 8 0" and "BITMAP" lines are removed and added,
> despite being identical. It only seems to be happening around this
> section of the file, for the U+00F0 character. This also seems close
> to a minimal reproduction of the issue.

Could you cut down to a real minimal reproduction, i.e. just these 20
lines or so?

>  If I git add --patch and stage
> a few hunks, then the duplicated lines seem to disappear.

Do you have any smudge filters or configuration regarding
line endings?

Are the lines really different or the same ? (Can you inspect with a
hex editor, maybe there are different kinds of invisible white spaces?)

> Steps to reproduce:
> git diff --no-index unitera_bold_italic.bdf.old unitera_bold_italic.bdf.new
> (It also happens inside a repo, this just seemed the easiest way to
> demonstrate.)

Thanks,
Stefan


Re: [PATCH v4 10/11] rerere: teach rerere to handle nested conflicts

2018-08-27 Thread Junio C Hamano
Thomas Gummerer  writes:

> Agreed.  I think it may be solvable if we'd actually get the
> information about what belongs to which side from the merge algorithm
> directly.

The merge machinery may (eh, rather, "does") know, but we do not
have a way to express that in the working tree file that becomes the
input to the rerere algorithm, without making backward-incompatible
changes to the output format.

In a sense, that is already a solved problem, even though the
solution was done a bit differently ;-) If the end users need to
commit a half-resolved result with conflict markers (perhaps they
want to share it among themselves and work on resolving further),
what they can do is to also say that these are now part of contents,
not conflict markers, with conflict-marker-size attribute.  Perhaps
they prepare such a half-resolved result with unusual value of the
attribute, so that later merge of these with standard conflict
marker size will not get confused.

That reminds me of another thing.  I've been running with these in
my $GIT_DIR/info/attributes file for the past few years.  Perhaps we
should add them to Documentation/.gitattributes and t/.gitattributes
so that project participants would all benefit?

Documentation/git-merge.txt conflict-marker-size=32
Documentation/user-manual.txt   conflict-marker-size=32
t/t-*.shconflict-marker-size=32


Re: [PATCH 00/21] Kill the_index part 4

2018-08-27 Thread Stefan Beller
On Sun, Aug 26, 2018 at 3:03 AM Nguyễn Thái Ngọc Duy  wrote:
>
> This continues the journey of getting rid of the_index at least in
> library code. The focus of part 4 is diff code. Since 'struct
> repository *' is passed around more (and even more in part 5), I take
> this opportunity to remove some the_repository references too.

yay! Thank you for continuing on the_index!

>
> Besides some small conflicts on 'pu', like the previous part, it also
> breaks 'pu' because of API changes. The fix is trivial though, just
> prepend the_repository as the first argument for the broken function
> calls.

This sounds like a problem. I said the same when sending the
object store lookup series, which ended via
3a2a1dc1707 (Merge branch 'sb/object-store-lookup', 2018-08-02)
in master, but I do recall Junio being somewhat unhappy about it[1].

By just adding the argument to the function, it might merge easily
but not compile, which would have to be fixed up; and that object store
series seemed to touch a lot of functions that were also used in series
in-flight.

I have since lost track of the refactoring and focused more on
submodules again, but plan to come back to more 'repositorification'.

> After this and ~20 more patches in part5, the_index is gone from
> library code.

This sounds promising! I'll take a look.

>  diff.c | 259 +++--

Ugh? That sounds like there is an interesting change coming.

Stefan


Re: Automatic core.autocrlf?

2018-08-27 Thread Andrei Rybak
On 2018-08-27 17:52, Duy Nguyen wrote:
> On Mon, Aug 27, 2018 at 5:37 PM Torsten Bögershausen  wrote:
>>> In those cases, when it falls back to
>>> configuration for line ending management, I want it to be
>>> automatically configured based on the host platform.
>>
> 
> An alternative is supporting conditional config includes based on
> platform or host name, but I don't know if there are more use cases
> like this to justify it.
> 

How about just using unconditional includes?

global.gitconfig (synced across machines):

  [include]
  path = platform-specific.gitconfig

And two version of file named "platform-specific.gitconfig", which
are not synced, and include only code.autocrlf setting.

--
Best regards, Andrei R.


Re: Thank you for public-inbox!

2018-08-27 Thread Eric Wong


You're very welcome, Johannes.  And I'm hoping to have a few
more goodies live this fall/winter for public-inbox :>


Re: Automatic core.autocrlf?

2018-08-27 Thread Robert Dailey
On Mon, Aug 27, 2018 at 10:53 AM Duy Nguyen  wrote:
>
> On Mon, Aug 27, 2018 at 5:37 PM Torsten Bögershausen  wrote:
> > > In those cases, when it falls back to
> > > configuration for line ending management, I want it to be
> > > automatically configured based on the host platform.
> >
> > There is
> > git config core.eol native
>
> An alternative is supporting conditional config includes based on
> platform or host name, but I don't know if there are more use cases
> like this to justify it.

To Torsten's comment: Yes, I've looked at git-config. it doesn't have
the answer to my question there, hence I posted on the mailing list.
To your point, eol being native doesn't matter if text=auto can't be
simulated. If there was a version of `autocrlf` that set `eol=native`
and `text=auto`, that would work as well. But the `true` setting sets
`text=auto` and `eol=crlf`.

Duy: There are more use cases I've run into, but they are not related to the OP:

* Different user email between machines (work vs personal)
* Different tooling on platform (e.g. `nano` on Ubuntu, Notepad++ on Windows)

Centralizing .gitconfig, especially when you have tons of aliases,
becomes more important. 95% of my .gitconfig is platform-agnostic, if
there was an `#ifdef` mechanism, per your suggestion, this would allow
a lot more flexibility. Maybe I should have worded my OP more in terms
of platform-specific configuration sections, instead of specifically
about autocrlf, since that's just one symptom of the real problem.


YOUR PRODUCT

2018-08-27 Thread Rafaa Esawi




Greetings,

We are rebuilding Iraq after years of conflicts and we are inviting you to
take up contracts. We are determined to purchase your products in large
quantities, for use in all over our 18 governorates(provinces) as the task
of re-building Iraq covers every single sectormand facet of our society.
We'll submit your products information to
the Iraqi Project and Contracting Office. They will examine the propriety
and necessity of your product and approve for bulk supply contracting
relationship.

I am currently on the board of the Iraqi Project and Contracting Office,
With my connections in the corridors of power, we are quite confident of
securing approval. Also of note is the issue of different financial
regulations between my country Iraq and your country. As such you will be
paid 100% through the Iraqi ministry of Finance
before you commence supplies. When you've received payment, we would be
expecting a monthly supply; as the sum budgeted for product may be quite
enormous as to outstrip your capacity and capability to supply.

A consideration also is that your quotation must be CIF Port of Umm Qasr.
Please send response so that I will reveal more procedural information to
you upon your re-confirmation.

Thany You.

Rafaa Esawi



Re: $GIT_DIR is no longer set when pre-commit hooks are called

2018-08-27 Thread Johannes Schindelin
Hi Peff,

On Sat, 25 Aug 2018, Jeff King wrote:

> On Wed, Aug 22, 2018 at 04:16:00PM -0700, Gregory Oschwald wrote:
> 
> diff --git a/builtin/commit.c b/builtin/commit.c
> index 3bfeabc463..3670024a25 100644
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -1440,6 +1440,7 @@ int run_commit_hook(int editor_is_used, const char 
> *index_file, const char *name
>   int ret;
>  
>   argv_array_pushf(_env, "GIT_INDEX_FILE=%s", index_file);
> + argv_array_pushf(_env, "GIT_DIR=%s", get_git_dir());

We did something similar in sequencer.c, and it required setting
`GIT_WORK_TREE`, too, to avoid problems in worktrees. Do you need the same
here, too?

Ciao,
Dscho

>   /*
>* Let the hook know that no editor will be launched.
> 
> -Peff
> 


Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-27 Thread Johannes Schindelin
Hi Duy,

On Wed, 15 Aug 2018, Duy Nguyen wrote:

> On Tue, Aug 14, 2018 at 7:43 PM Derrick Stolee  wrote:
> > 2. Number of other commit tag-lines (Reviewed-By, Helped-By,
> > Reported-By, etc.).
> >
> >  Using git repo:
> >
> >  $ git log --since=2018-01-01 junio/next|grep by:|grep -v
> > Signed-off-by:|sort|uniq -c|sort -nr|head -n 20
> >
> >   66 Reviewed-by: Stefan Beller 
> >   22 Reviewed-by: Jeff King 
> >   19 Reviewed-by: Jonathan Tan 
> >   12 Helped-by: Eric Sunshine 
> >   11 Helped-by: Junio C Hamano 
> >9 Helped-by: Jeff King 
> >8 Reviewed-by: Elijah Newren 
> >7 Reported-by: Ramsay Jones 
> >7 Acked-by: Johannes Schindelin 
> >7 Acked-by: Brandon Williams 
> >6 Reviewed-by: Eric Sunshine 
> >6 Helped-by: Johannes Schindelin 
> >5 Mentored-by: Christian Couder 
> >5 Acked-by: Johannes Schindelin 
> >4 Reviewed-by: Jonathan Nieder 
> >4 Reviewed-by: Johannes Schindelin 
> >4 Helped-by: Stefan Beller 
> >4 Helped-by: René Scharfe 
> >3 Reviewed-by: Martin Ågren 
> >3 Reviewed-by: Lars Schneider 
> >
> >  (There does not appear to be enough density here to make a useful
> > metric.)
> 
> If your database keeps mail relationship (e.g. what mail is replied to
> what according to In-Reply-To header) then look for mail replies to
> patches. I think we have a rough picture who are active reviewers with
> that.

Not really, as there is a high percentage of "on a tangent" replies in
many, many patch threads.

Ciao,
Dscho

Re: Measuring Community Involvement (was Re: Contributor Summit planning)

2018-08-27 Thread Johannes Schindelin
Hi Junio,

On Tue, 14 Aug 2018, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > On Tue, Aug 14, 2018 at 01:43:38PM -0400, Derrick Stolee wrote:
> >
> >> On 8/13/2018 5:54 PM, Jeff King wrote:
> >> > So I try not to think too hard on metrics, and just use them to get a
> >> > rough view on who is active.
> >> 
> >> I've been very interested in measuring community involvement, with the
> >> knowledge that any metric is flawed and we should not ever say "this metric
> >> is how we measure the quality of a contributor". It can be helpful, though,
> >> to track some metrics and their change over time.
> >> 
> >> Here are a few measurements we can make:
> >
> > Thanks, it was nice to see a more comprehensive list in one spot.
> >
> > It would be neat to have a tool that presents all of these
> > automatically, but I think the email ones are pretty tricky (most people
> > don't have the whole list archive sitting around).
> 
> I do not think it covered e-mail at all, but there was git stats
> project several years ago (perhaps part of GSoC IIRC).
> 
> > I think I mentioned "surviving lines" elsewhere, which I do like this
> > (and almost certainly stole from Junio a long time ago):
> 
> Yeah, I recall that one as part of counting how many of 1244 lines
> Linus originally wrote still were in our codebase at around v1.6.0
> timeframe (the answer was ~220 IIRC) ;-)

And if you do not remember precisely, you can easily re-run `Linus` from
here: https://github.com/git/git/blob/todo/Linus

Ciao,
Dscho


Re: Automatic core.autocrlf?

2018-08-27 Thread Duy Nguyen
On Mon, Aug 27, 2018 at 5:37 PM Torsten Bögershausen  wrote:
> > In those cases, when it falls back to
> > configuration for line ending management, I want it to be
> > automatically configured based on the host platform.
>
> There is
> git config core.eol native

An alternative is supporting conditional config includes based on
platform or host name, but I don't know if there are more use cases
like this to justify it.
-- 
Duy


Re: Automatic core.autocrlf?

2018-08-27 Thread Torsten Bögershausen
On Mon, Aug 27, 2018 at 09:10:33AM -0500, Robert Dailey wrote:
> Is there an 'auto' setting for the 'core.autocrlf' config? Reason I
> ask is, I want that setting to be 'input' on linux but 'true' on
> Windows. I have a global .gitconfig that I sync across different
> platforms with Google Drive, and I hate to manage 2 copies of it on
> each platform (linux and Windows). If there's some way to make the
> line ending configuration the same between both, that would be
> awesome.
> 
> Note that I do rely mostly on git attributes files for this, however
> not all upstream repositories have one, and I don't always have
> permission to create one there.

How many repos do you look at, and how big is the "pain level" here?
Would it be possible to submit pull-requests ?

> In those cases, when it falls back to
> configuration for line ending management, I want it to be
> automatically configured based on the host platform.

There is
git config core.eol native

But that is already the default, and it needs the 'text' attribute
to be set (or auto), the simplest solution is to have
* text=auto
in .gitattributes

> 
> Any advice is appreciated here. Unfortunately Google isn't much help
> on this topic, Stack Overflow is a swamp full of different information
> and none of it seems authoritative.


Out of interest, not to blame anybody:
Did you ever look at

https://git-scm.com/docs/git-config

The authoritative answer is in the Git project itself,
under Documentation/config.txt

In summary, if I understand you right,
the solution would be to use .gitattributes in every project.

HTH



Re: [PATCH v2 2/2] fsck: use oidset for skiplist

2018-08-27 Thread René Scharfe
Am 27.08.2018 um 09:37 schrieb Ævar Arnfjörð Bjarmason:
> 
> On Sat, Aug 25 2018, René Scharfe wrote:
> 
>> diff --git a/Documentation/config.txt b/Documentation/config.txt
>> index 2fa65b7516..80ab570579 100644
>> --- a/Documentation/config.txt
>> +++ b/Documentation/config.txt
>> @@ -1715,7 +1715,7 @@ doing the same for `receive.fsck.` and 
>> `fetch.fsck.`
>>  will only cause git to warn.
>>
>>  fsck.skipList::
>> -The path to a sorted list of object names (i.e. one SHA-1 per
>> +The path to a list of object names (i.e. one SHA-1 per
>>  line) that are known to be broken in a non-fatal way and should
>>  be ignored. This feature is useful when an established project
>>  should be accepted despite early commits containing errors that
> 
> I was going to say that since this is a file format we're likely to
> support across versions we should make a note that "up to version
> so-and-so this needed to be sorted, but that's longer the case. So
> e.g. someone wouldn't test this on 2.20 (or read the online docs) and
> then deploy this for older clients..
> 
> But...
> 
>> -if (options->skiplist)
>> -sorted = options->skiplist->sorted;
>> -else {
>> -sorted = 1;
>> -options->skiplist = 
>> -}
>> -
>>  fp = fopen(path, "r");
>>  if (!fp)
>>  die("Could not open skip list: %s", path);
>> @@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options 
>> *options, const char *path)
>>  const char *p;
>>  if (parse_oid_hex(sb.buf, , ) || *p != '\0')
>>  die("Invalid SHA-1: %s", sb.buf);
>> -oid_array_append(, );
>> -if (sorted && skiplist.nr > 1 &&
>> -oidcmp([skiplist.nr - 2],
>> -   ) > 0)
>> -sorted = 0;
>> +oidset_insert(>skiplist, );
> 
> ...reading this implementation, where we always called
> oid_array_append(), but then just kept track of whether the set was
> sorted...
> 
>>  }
>>  if (ferror(fp))
>>  die_errno("Could not read '%s'", path);
>>  fclose(fp);
>>  strbuf_release();
>> -
>> -if (sorted)
>> -skiplist.sorted = 1;
> 
> ...and here where we assigned to the .sorted member of the oid_array...
> 
>>  static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
>>  {
>> -if (opts && opts->skiplist && obj)
>> -return oid_array_lookup(opts->skiplist, >oid) >= 0;
>> -return 0;
>> +return opts && obj && oidset_contains(>skiplist, >oid);
>>  }
> 
> and here where we'd always do the lookup if the skiplist was
> initialized, *not* just if it's sorted, and how the sha1-array.c code
> has looked ever since cd94c6f91 ("fsck: git receive-pack: support
> excluding objects from fsck'ing", 2015-06-22) when this was first added:
> 
> $ git show cd94c6f91:sha1-array.c|grep -A5 sha1_array_lookup
> int sha1_array_lookup(struct sha1_array *array, const unsigned char *sha1)
> {
> if (!array->sorted)
> sha1_array_sort(array);
> return sha1_pos(sha1, array->sha1, array->nr, sha1_access);
> }
> 
> So I think it makes sense to make this series a three-part, where in the
> first part we only change these docs to say s/sorted //, and modify the
> tests I added in 65a836fa6b ("fsck: add stress tests for fsck.skipList",
> 2018-07-27) to assert that an unsorted & sorted list of SHA-1s works
> just as well.
> 
> Then following up with your [12]/2, where the internal implementation is
> changed, but we make it clear that it's *just* the internal
> implementation. I.e. from a UI perspective the list never had to be
> pre-sorted, we'd just spend some work sorting it on the first lookup if
> it wasn't sorted already.
> 
> Now without some very careful reading it's not clear what "we don't need
> to worry about any sort order anymore" in the commit message means,
> i.e. what it really means is "for the purposes of the internal
> implementation, and as an opt-in user-side optimization ...".
> 
> I.e. an alternate version of this whole patch series could also be:
> 
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 1c4236498..930807e43 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1709,5 +1709,5 @@ will only cause git to warn.
> 
>  fsck.skipList::
> -   The path to a sorted list of object names (i.e. one SHA-1 per
> +   The path to a list of object names (i.e. one SHA-1 per
> line) that are known to be broken in a non-fatal way and should
> be ignored. This feature is useful when an established project
> diff --git a/fsck.c b/fsck.c
> index a0cee0be5..9d4e938ad 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -184,14 +184,10 @@ static void init_skiplist(struct fsck_options 
> *options, const char *path)
>  

Re: Get "Your branch is ahead of 'origin/master'" message when explicitly passing origin url at push command

2018-08-27 Thread Bentzy Sagiv
Thanks, it really helped to understand.
Since what I want to do is to run git pull/push from within a script in a 
Jenkins job and Jenkins already has the git user stored, I thougth is 
odd to store them at a git credential helper.
My solution is running (before git pull/push):
git config remote.origin.url 
https://$GIT_USER:$git_passw...@bitbucket.org/...
while at the start of the script trapping EXIT signal and resetting origin as 
following:
trap "sudo git config remote.origin.url 
https://$git_u...@bitbucket.org/...; EXIT
Make sense or am I missing something?

Thanks in advance,
Bentzy Sagiv
   

 




From: Bryan Turner 
Sent: Sunday, August 26, 2018 9:22 PM
To: Bentzy Sagiv
Cc: Git Users
Subject: Re: Get "Your branch is ahead of 'origin/master'" message when 
explicitly passing origin url at push command
  

On Sun, Aug 26, 2018 at 4:39 AM Bentzy Sagiv  wrote:
>
> git version 2.7.4
> __
>
> First try: NOT passing origin url explicitly
>
> ubuntu@ci-bentzy:/var/lib/jenkins$ sudo git push

Since you didn't specify a remote here, Git assumes origin. It uses
your configured "push.default" behavior ("simple" by default) to
determine which refs to push and pushes your master branch to the
origin's master branch. Since it _knows_ it pushed to origin, it
updates your local "refs/remotes/origin/master" ref with the same
commit it just pushed, resulting in an "up-to-date" message.

>
> ubuntu@ci-bentzy:/var/lib/jenkins$ git status
> On branch master
> Your branch is up-to-date with 'origin/master'.
> nothing to commit, working directory clean
> ubuntu@ci-bentzy:/var/lib/jenkins$
>
> __
>
> Second try: passing origin url explicitily
>
> ubuntu@ci-bentzy:/var/lib/jenkins$ sudo git push  
> https://bitbucket.org/OWNER/jenkinsconf-repo.git

This, on the other hand, _is not pushing to a configured remote_. It's
pushing to an explicit URL, which happens to match the URL of a
configured remote. But it's still not a configured remote. It's using
origin's URL, but you didn't push to origin. As a result,
"refs/remotes/origin/master" is not updated, and you get an "ahead"
message.

>
> ubuntu@ci-bentzy:/var/lib/jenkins$ git status
> On branch master
> Your branch is ahead of 'origin/master' by 1 commit.
>   (use "git push" to publish your local commits)
> nothing to commit, working directory clean
>
> __
>
> An additional " sudo git push" (without explicit origin) solves the issue

Everything here is working as intended. If you want to push to a
_remote_, you either need to:
- Name the remote ("git push origin"), or
- Leave it off, so Git will assume origin ("git push")

Pushing to a URL that matches a remote's URL is _not_ pushing to a
remote. It's pushing to an explicit URL.

Hope this helps,
Bryan Turner
>
>
>


Thank you for public-inbox!

2018-08-27 Thread Johannes Schindelin
Hi Eric,

I would like to take five minutes to thank you for public-inbox. It is
invaluable for me in the meantime. And I think I will never be able to
thank you enough for it.

Just a couple of things where it is super useful to me:

- Recently, my mail provider started dropping mails left and right. They
  might even be addressed to me, and they never arrive in my inbox (and
  not even in the spam folder: they just never arrive). I spent some ten
  hours this past weekend to script identifying the mails in public-inbox
  that I never received, and to weed through them.

  It seems I missed some 4,000 mails. Thanks to you, I now saw those mails
  (although I had to delete most of them after reading only their subject,
  in the interest of time).

- Many a times when my automated builds identified a problem with the test
  suite, it was a lot quicker to use public inbox to identify the mail to
  respond to than to use my mail program.

- Sometimes, I find myself in want of replying to a past patch, but back
  in the day when it was sent I thought it would not be interesting to me,
  so I deleted it. With public-inbox, I can easily get it in raw format,
  i.e. put it back into my inbox so I can reply.

- I cannot tell you how many times I send a link to public-inbox to my
  colleagues rather than forwarding a mail, because the former will give
  them more context (and also semi-live updates in case somebody replied
  to said mail after I sent the link).

A couple of things in the future that public-inbox will make possible for
me:

- You probably are aware of my GitGitGadget endeavor, a project similar in
  aim to SubmitGit, but a lot more integrated with the GitHub experience
  (and not requiring you to hand over your mail sending credentials to
  AWS).

  One particular feature I found myself really wanting in SubmitGit (but
  not possible due to its one-way design): I want my Pull Requests to be
  closed once the patches are integrated into git.git's `master` branch.

  While this feature is not available in GitGitGadget yet, I am well
  underway there. I already have a notes ref (`commit-to-mail`, available
  via https://github.git/gitgitgadget/git) that annotates commits in
  git.git with the Message-ID of the corresponding mail. By "I have", I
  mean: there is an automated task that uses public-inbox to keep that ref
  up to date.

  I also have an accompanying `mail-to-commit` notes ref that maps
  Message-IDs back to the corresponding commit in git.git. That notes ref
  "annotates" the (non-existing) blob you get when piping the Message-ID
  with a trailing newline to `git hash-object`.

  Again, this is information that would be absolutely unobtainable without
  public-inbox.

- Related, I want to annotate the GitHub Pull Requests handled by
  GitGitGadvget with the corresponding name of the branch in
  https://github.com/gitster/git.

  This requires that `mail-to-commit` I mentioned in the previous bullet
  point, and therefore would not be possible without public-inbox.

- A feature I plan on introducing into GitGitGadget is to attach comments
  to the GitHub Pull Request when anybody replies to the patch thread sent
  out by GitGitGadget.

  Also this feature would be impossible without public-inbox.

- Another really useful feature I plan on introducing is to attach
  comments to those PRs whenever a What's Cooking is talking about the
  corresponding branch.

  Once again, would be impossible without public-inbox.

So thank you, thank you, thank you, for public-inbox!

Ciao,
Dscho

P.S.: FWIW I added a mirror of public-inbox to
https://git-for-windows.visualstudio.com/git.public-inbox, so that my
automated tasks, as well as my playing around, does not stress your server
too much.



Automatic core.autocrlf?

2018-08-27 Thread Robert Dailey
Is there an 'auto' setting for the 'core.autocrlf' config? Reason I
ask is, I want that setting to be 'input' on linux but 'true' on
Windows. I have a global .gitconfig that I sync across different
platforms with Google Drive, and I hate to manage 2 copies of it on
each platform (linux and Windows). If there's some way to make the
line ending configuration the same between both, that would be
awesome.

Note that I do rely mostly on git attributes files for this, however
not all upstream repositories have one, and I don't always have
permission to create one there. In those cases, when it falls back to
configuration for line ending management, I want it to be
automatically configured based on the host platform.

Any advice is appreciated here. Unfortunately Google isn't much help
on this topic, Stack Overflow is a swamp full of different information
and none of it seems authoritative.


Re: Contributor Summit planning

2018-08-27 Thread Johannes Schindelin
Hi Peff,

On Mon, 13 Aug 2018, Jeff King wrote:

> For the past several years, we've held a Git Contributor Summit as part
> of the Git Merge conference. I'd like to get opinions from the community
> to help plan future installments. Any feedback or opinion is welcome,
> but some obvious things to think about:
> 
>   - where, when, and how often?
> 
> Plans are shaping up to have Git Merge 2019 in Brussels right after
> FOSDEM in February (like it was two years ago), with a contributor
> summit attached.
> 
> Are there people who would be more likely to attend a contributor
> summit if it were held elsewhere (e.g., in North America, probably
> in the Bay Area)? Are people interested in attending a separate
> contributor summit not attached to the larger Git Merge (and if so,
> is there any other event it might be worth connecting it with,
> time-wise)?  Are people interested in going to two summits in a year
> (e.g., Brussels in February, and then maybe some in North America
> later in the year), or is that diminishing returns?

I cannot speak for "the people", but for myself: Brussels is an ideal
location *for me*. I would probably be unable to physically go to a
second, in-person meeting in the same year, but I would of course love to
attend remotely.

>   - format
> 
> For those who haven't attended before, it's basically 25-ish Git
> (and associated project) developers sitting in a room for a day
> chatting about the project. Topics go on a whiteboard in the
> morning, and then we discuss each for 30-60 minutes.
> 
> We could do multiple days (which might give more room for actually
> working collaboratively instead of just discussing). We could do
> something more formal (like actual talks). We could do something
> less formal (like an all-day spaghetti buffet, where conversation
> happens only between mouthfuls). The sky is the limit. Some of those
> ideas may be better than others.

I found the unconference-style, one day meeting to be most productive.

As to more formal? I don't know... talks seem to be fun and all, but they
require a lot of preparation. Something championed in our standups are
"chalk talks", i.e. somebody presenting in a bit more detail what they are
working on, in particular explaining the context (think: Stolee
enlightening the audience about finer points of computational graph
theory) *without* preparing for it specifically. That makes for fun
presentations, if a bit more chaotic than a real "conference talk". This
format obviously lends itself to Google Hangouts.

As to multiple days: Of course it would be nice to have a kind of a "hack
day", but I wonder how productive this would be in the context of Git,
where interests very so widely.

Rather than have a "hack day", I would actually prefer to work with other
contributors in a way that we have not done before, but which I had the
pleasure of "test ballooning" with Pratik: using Visual Studio Code Live
Share. This allows multiple users to work on the same code base, in the
same worktree, seeing what each other is doing. It requires a separate
communication channel to talk; Pratik & I used IRC, but I think Google
Hangout (or Skype or WhatsApp or ) would
have worked a bit better. It's kind of pair programming, but with some of
the limitations removed.

I guess I went off on a tangent here...

Ciao,
Dscho


Re: Contributor Summit planning

2018-08-27 Thread Derrick Stolee

On 8/27/2018 9:22 AM, Johannes Schindelin wrote:

Point in favor of the pure-online meeting: the informal standup on IRC
every second Friday. I really try to attend it (it is a bit awkward
because it is on a Friday evening in my timezone, right at the time when I
want to unwind from the work week), as it does have a similar effect to
in-person standups: surprising collaborations spring up, unexpected help,
and a general sense of belonging.

Of course, the value of these standups comes from the makeup of the
participants: Stefan, Brandon, Stolee, JeffH, Jonathan and other *very*
active core contributors hang out for roughly half an hour, sharing what
they are working on, exchanging ideas, etc.


A focused aside, since you brought up the online "standup": it seems the 
IRC channel has been less than ideal, with people trying to participate 
but having nickname issues or being muted. You also describe another 
issue: the timing. Having a real-time discussion has its benefits, but 
also it leaves many people out.


One idea to try next time is to create a mailing list thread asking for 
statuses, and each person can chime in asynchronously and spawn a new 
discussion based on that status. Perhaps we can try that next time.


Thanks,

-Stolee



Re: Contributor Summit planning

2018-08-27 Thread Johannes Schindelin
Hi Peff & everybody,

On Mon, 13 Aug 2018, Jeff King wrote:

> On Mon, Aug 13, 2018 at 12:58:54PM -0400, Derrick Stolee wrote:
> 
> > I would be up for two meetings a year. I would expect that the variety of
> > locations would allow a larger set of contributors to make at least one
> > meeting a year. This may come at a cost of a smaller group in each summit.
> 
> Yeah, I do worry about it splitting the attendance. It could also be a
> thing we do _this_ year (if we care about having something in North
> America), and then try to make different plans in a future year.

While I think it is a great thing to have Git contributors so close to Git
users every year, I could imagine that a good compromise would be to have
a Git Contributor Summit the day before GitMerge, as we already have it,
and then roughly half a year later an "online-only" Virtual Summit.

> > The one thing I found missing that could be good is to have a remote
> > option.  Not everyone can travel or can afford to do so. I wonder if a
> > simple Google Hangout could allow more participation from the
> > community, even in a passive sense (those still at their day jobs
> > listening in). It could even facilitate remote presenters, if
> > applicable.
> 
> One year we had Dscho remote on a laptop sitting on a stool. I'm not
> sure how great that was for him. ;)

It was great. It was truly great. Thanks again for letting me do that!

> I agree it would be nice to include remote people, but I think it would
> be very important to have a good A/V setup. Passive involvement is not
> too hard, but I would love a setup where they could actually participate
> in discussions. I've seen that work in 5-10 people conferences, but I'm
> not sure how well even good A/V scales to 20-30.

Speaking from personal experience in my day job, I agree that it is easy
to have a discussion with a few remotes when you have only 5-10 team
members, and it gets pretty difficult when there are 20-30 people or more.

However, I found that those problems are rooted more in lack of
discipline, such as the people with physical presence simply talking over
the remote ones, seemingly without noticing what they are doing there.

Having said that, I believe that we core contributors can learn to have a
fruitful online meeting. With 30+ participants, too.

Learning from my past life in academia (it is hard for me to imagine a
less disciplined crowd than a bunch of white, male, old scientists), we
would need a moderator, and some forum that allows to "give somebody the
mic". That software/platform should exist somewhere.

> One other thought on remote folks: IMHO one of the most valuable things
> about these kinds of events (especially the first ones I attended) is
> the informal interactions. The hallway talks and meals provide a venue
> for spontaneous conversation, but they also just help us understand in a
> visceral way that the people on the other end of our emails are actual
> humans. Which I think can help smooth subsequent online interactions.
> I'm not sure how much of that can be gained remotely.

Indeed. That part is almost lost online. But not quite.

> (I don't think that's an argument against remote inclusion, just an
> opinion that we should continue to encourage in-person interaction).

I would love to have the best of both worlds. For example, it is an annual
annoyance that we are discussion all kinds of things regarding Git, trying
to steer the direction, to form collaborations on certain features, and
the person at the helm is not even there.

Maybe *two* meetings per year, one attached to GitMerge, and one purely
online, would help.

Point in favor of the pure-online meeting: the informal standup on IRC
every second Friday. I really try to attend it (it is a bit awkward
because it is on a Friday evening in my timezone, right at the time when I
want to unwind from the work week), as it does have a similar effect to
in-person standups: surprising collaborations spring up, unexpected help,
and a general sense of belonging.

Of course, the value of these standups comes from the makeup of the
participants: Stefan, Brandon, Stolee, JeffH, Jonathan and other *very*
active core contributors hang out for roughly half an hour, sharing what
they are working on, exchanging ideas, etc.

Such an online summit as I suggested above would really only work if
enough frequent contributors would attend. If enough people like you,
Junio, and the standup regulars would say: yep, we're willing to plan and
attend an online summit, where we try to have a timezone-friendly
"unconference"-style meeting on one day (on which we would of course try
to free ourselves from our regular work obligations).

I guess I am asking for a "raise your hands", with mine high up in the
air.

Ciao,
Dscho


Re: [PATCH 2/2] rebase -i: fix SIGSEGV when 'merge ' fails

2018-08-27 Thread Johannes Schindelin
Hi,

On Thu, 16 Aug 2018, Junio C Hamano wrote:

> Phillip Wood  writes:
> 
> > This commit implements a minimal fix which fixes the crash and allows
> > the user to successfully commit a conflict resolution with 'git rebase
> > --continue'. It does not write .git/rebase-merge/patch,
> > .git/rebase-merge/stopped-sha or update REBASE_HEAD.
> 
> I think that should be OK.  When merging, a patch that shows the
> diff from the merge base to the tip indeed is an interesting and
> useful reference material to help the conflict resolution, but it is
> not even clear what the latter two should mean while merging.

Late reply, I know.

It is indeed ambiguous what the REBASE_HEAD should be... While a natural
choice would be the commit to be merged, that would be inconsistent with
the `-c`/`-C` version of `merge`.

But yes, the `patch` should not be written, and the `stopped-sha` does not
make sense with a `merge` that has neither `-c ` nor `-C `
(although, please note, that this breaks a subsequent `fixup` if there is
one directly after the `merge`).

> > diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
> > index 31fe4268d5..2e767d4f1e 100755
> > --- a/t/t3430-rebase-merges.sh
> > +++ b/t/t3430-rebase-merges.sh
> > @@ -129,7 +129,7 @@ test_expect_success '`reset` refuses to overwrite 
> > untracked files' '
> > git rebase --abort
> >  '
> >  
> > -test_expect_success 'failed `merge` writes patch (may be rescheduled, 
> > too)' '
> > +test_expect_success 'failed `merge -C` writes patch (may be rescheduled, 
> > too)' '
> > test_when_finished "test_might_fail git rebase --abort" &&
> > git checkout -b conflicting-merge A &&
> >  
> > @@ -151,6 +151,19 @@ test_expect_success 'failed `merge` writes patch (may 
> > be rescheduled, too)' '
> > test_path_is_file .git/rebase-merge/patch
> >  '
> >  
> > +SQ="'"
> 
> A low-hanging fruit tangent, but somebody may want to go through the
> output from
> 
> $ git grep '[Ss][Qq]_*=' t
> 
> and come up with a shared "convenience" definition of this, which
> perhaps sits next to the definition of $_x40 etc.

If only we had a nice bug tracker with labels. I guess
https://github.com/git/git would be a good place, but it *is* discouraged
by us, which is a pity.

Ciao,
Dscho


Re: [PATCH 0/9] introducing oideq()

2018-08-27 Thread Derrick Stolee

On 8/26/2018 4:56 PM, brian m. carlson wrote:

On Sat, Aug 25, 2018 at 04:00:31AM -0400, Jeff King wrote:

This is a follow-up to the discussion in:

   https://public-inbox.org/git/20180822030344.ga14...@sigill.intra.peff.net/

The general idea is that the majority of callers don't care about actual
plus/minus ordering from oidcmp() and hashcmp(); they just want to know
if two oids are equal. The stricter equality check can be optimized
better by the compiler.

Due to the simplicity of the current code and our inlining, the compiler
can usually figure this out for now. So I wouldn't expect this patch to
actually improve performance right away. But as that discussion shows,
we are likely to take a performance hit as we move to more runtime
determination of the_hash_algo parameters. Having these callers use the
more strict form will potentially help us recover that.

So in that sense we _could_ simply punt on this series until then (and
it's certainly post-v2.19 material). But I think it's worth doing now,
simply from a readability/annotation standpoint. IMHO the resulting code
is more clear (though I've long since taught myself to read !foocmp() as
equality).

I would quite like to see this series picked up for v2.20.  If we want
to minimize performance regressions with the SHA-256 work, I think it's
important.


Seconded.


Applying the following patch on top of this series causes gcc to inline
both branches, which is pretty much the best we can expect.  I haven't
benchmarked it, though, so I can't say what the actual performance
consequence is.


We should definitely measure the cost here, but when we have the option 
to move to newhash, that cost should be acceptable.


Thanks,

-Stolee



Re: [PATCH 3/9] convert "oidcmp() == 0" to oideq()

2018-08-27 Thread Derrick Stolee

On 8/27/2018 8:31 AM, Derrick Stolee wrote:

On 8/25/2018 4:36 AM, Jeff King wrote:

On Sat, Aug 25, 2018 at 04:07:15AM -0400, Jeff King wrote:

diff --git a/contrib/coccinelle/object_id.cocci 
b/contrib/coccinelle/object_id.cocci

index 5869979be7..548c02336d 100644
--- a/contrib/coccinelle/object_id.cocci
+++ b/contrib/coccinelle/object_id.cocci
@@ -108,3 +108,9 @@ expression E1, E2;
  @@
  - hashcpy(E1.hash, E2->hash)
  + oidcpy(, E2)


Is this change intended? It doesn't seem to match the intention of the 
rest of the patch.


Ignore me. It's just confusing to read the +/- notation from a cocci 
script alongside the file diff.




Re: [PATCH 3/9] convert "oidcmp() == 0" to oideq()

2018-08-27 Thread Derrick Stolee

On 8/25/2018 4:36 AM, Jeff King wrote:

On Sat, Aug 25, 2018 at 04:07:15AM -0400, Jeff King wrote:


diff --git a/contrib/coccinelle/object_id.cocci 
b/contrib/coccinelle/object_id.cocci
index 5869979be7..548c02336d 100644
--- a/contrib/coccinelle/object_id.cocci
+++ b/contrib/coccinelle/object_id.cocci
@@ -108,3 +108,9 @@ expression E1, E2;
  @@
  - hashcpy(E1.hash, E2->hash)
  + oidcpy(, E2)


Is this change intended? It doesn't seem to match the intention of the 
rest of the patch.



+
+@@
+expression E1, E2;
+@@
+- oidcmp(E1, E2) == 0
++ oideq(E1, E2)

[...]

diff --git a/cache.h b/cache.h
index f6d227fac7..d090f71706 100644
--- a/cache.h
+++ b/cache.h
@@ -1100,7 +1100,7 @@ static inline int is_empty_blob_sha1(const unsigned char 
*sha1)
  
  static inline int is_empty_blob_oid(const struct object_id *oid)

  {
-   return !oidcmp(oid, the_hash_algo->empty_blob);
+   return oideq(oid, the_hash_algo->empty_blob);
  }

By the way, one interesting thing I noticed is that coccinelle generates
the hunk for cache.h for _every_ file that includes it, and the
resulting patch is annoying to apply. I think this is related to the
discussion we were having in:

   https://public-inbox.org/git/20180802115522.16107-1-szeder@gmail.com/

Namely that we might do better to invoke one big spatch (and let it
parallelize internally) instead of one perfile. Presumably that would
also avoid looking at the headers repeatedly (which would be both faster
and produce better output).

I'm not planning to pursue that immediately, so just food for thought at
this point.


The more we use Coccinelle, the more we will learn how to improve the 
workflow.


Thanks,

-Stolee



Re: [PATCH 04/11] builtin rebase: support --quiet

2018-08-27 Thread Johannes Schindelin
Hi Junio,

On Wed, 8 Aug 2018, Junio C Hamano wrote:

> Stefan Beller  writes:
> 
> > On Wed, Aug 8, 2018 at 6:51 AM Pratik Karki  wrote:
> >>
> >> This commit introduces a rebase option `--quiet`. While `--quiet` is
> >> commonly perceived as opposite to `--verbose`, this is not the case for
> >> the rebase command: both `--quiet` and `--verbose` default to `false` if
> >> neither `--quiet` nor `--verbose` is present.
> >>
> >> This commit goes further and introduces `--no-quiet` which is the
> >> contrary of `--quiet` and it's introduction doesn't modify any
> >> behaviour.
> 
> Why?  Is it for completeness (i.e. does the scripted version take
> such an option and addition of --no-quiet makes the C rewrite behave
> the same)?

Ah. I mentioned that an explanation for this is needed in the commit
message, and I guess that it is a bit too subtle. The part you clipped
from your quoted text says:

[... `--quiet`] switches off --verbose and --stat, and as
--verbose switches off --quiet, we use the (negated)
REBASE_NO_QUIET instead of REBASE_QUIET: this allows us to turn
off the quiet mode and turn on the verbose and diffstat mode in a
single OPT_BIT(), and the opposite in a single OPT_NEGBIT().

I agree that this is a pretty convoluted way to express the issue. See
below for an attempt at a clearer commit message.

> >> Note: The `flags` field in `rebase_options` will accumulate more bits in
> >> subsequent commits, in particular a verbose and a diffstat flag. And as
> >> --quoet inthe shell scripted version of the rebase command switches off
> >
> >   --quote in the
> >
> > (in case a resend is needed)
> 
> Meaning --quiet?

Yep. I should have paid more attention in my pre-submission review, sorry.

I changed the commit message to read like this:

builtin rebase: support --quiet

This commit introduces a rebase option `--quiet`. While `--quiet` is
commonly perceived as opposite to `--verbose`, this is not the case for
the rebase command: both `--quiet` and `--verbose` default to `false` if
neither `--quiet` nor `--verbose` is present.

Despite the default being `false` for both verbose and quiet mode,
passing the `--quiet` option will turn off verbose mode, and `--verbose`
will turn off quiet mode.

This patch introduces the `flags` bit field, with `REBASE_NO_QUIET`
as first user (with many more to come).

We do *not* use `REBASE_QUIET` here for an important reason: To keep the
implementation simple, this commit introduces `--no-quiet` instead of
`--quiet`, so that a single `OPT_NEGBIT()` can turn on quiet mode and
turn off verbose and diffstat mode at the same time. Likewise, the
companion commit which will introduce support for `--verbose` will have
a single `OPT_BIT()` that turns off quiet mode and turns on verbose and
diffstat mode at the same time.

Ciao,
Dscho


Re: [PATCH 03/11] builtin rebase: handle the pre-rebase hook (and add --no-verify)

2018-08-27 Thread Johannes Schindelin
Hi Junio,

On Wed, 8 Aug 2018, Junio C Hamano wrote:

> Pratik Karki  writes:
> 
> > This commit converts the equivalent part of the shell script
> > `git-legacy-rebase.sh` to run the pre-rebase hook (unless disabled), and
> > to interrupt the rebase with error if the hook fails.
> >
> > Signed-off-by: Pratik Karki 
> > ---
> 
> Introduction of upstream_arg in this step looked a bit
> surprising, but the hook invocation is the only thing that uses it,
> so it is understandable.

Yep, that's literally all that `upstream_arg` is used for:

$ git grep upstream_arg v2.19.0-rc0
v2.19.0-rc0:git-rebase.sh:  upstream_arg="$upstream_name"
v2.19.0-rc0:git-rebase.sh:  upstream_arg=--root
v2.19.0-rc0:git-rebase.sh:run_pre_rebase_hook "$upstream_arg" "$@"

> "rebase: handle the re-rebase hook and --no-verify" would have been
> sufficient, without "add" or parentheses.

Fixed.

> > diff --git a/builtin/rebase.c b/builtin/rebase.c
> > index 38c496dd10..b79f9b0a9f 100644
> > --- a/builtin/rebase.c
> > +++ b/builtin/rebase.c
> > @@ -317,6 +319,8 @@ int cmd_rebase(int argc, const char **argv, const char 
> > *prefix)
> > OPT_STRING(0, "onto", _name,
> >N_("revision"),
> >N_("rebase onto given branch instead of upstream")),
> > +   OPT_BOOL(0, "no-verify", _to_skip_pre_rebase,
> > +N_("allow pre-rebase hook to run")),
> 
> Do we need to catch "--no-no-verify" ourselves with NONEG bit, or is
> this sufficient to tell parse_options() machinery to take care of
> it?

I just issued

$ ./git rebase --verify --no-no-verify --xyz

and it showed

error: unknown option `xyz'
[... usage ...]

I vaguely remembered that the parse_options() machinery special-cases
"no-" prefixes, and my test seems to confirm it.

Holler if you want a more in-depth analysis.

Ciao,
Dscho


Re: [PATCH] t5310-pack-bitmaps: fix bogus 'pack-objects to file can use bitmap' test

2018-08-27 Thread Kirill Smelkov
On Tue, Aug 14, 2018 at 01:47:21PM +0200, SZEDER Gábor wrote:
> The test 'pack-objects to file can use bitmap' added in 645c432d61
> (pack-objects: use reachability bitmap index when generating
> non-stdout pack, 2016-09-10) is silently buggy and doesn't check what
> it's supposed to.
>
> In 't5310-pack-bitmaps.sh', the 'list_packed_objects' helper function
> does what its name implies by running:
>
>   git show-index <"$1" | cut -d' ' -f2
>
> The test in question invokes this function like this:
>
>   list_packed_objects packa.objects &&
>   list_packed_objects packb.objects &&
>   test_cmp packa.objects packb.objects
>
> Note how these two callsites don't specify the name of the pack index
> file as the function's parameter, but redirect the function's standard
> input from it.  This triggers an error message from the shell, as it
> has no filename to redirect from in the function, but this error is
> ignored, because it happens upstream of a pipe.  Consequently, both
> invocations produce empty 'pack{a,b}.objects' files, and the
> subsequent 'test_cmp' happily finds those two empty files identical.
>
> Fix these two 'list_packed_objects' invocations by specifying the pack
> index files as parameters.  Furthermore, eliminate the pipe in that
> function by replacing it with an &&-chained pair of commands using an
> intermediate file, so a failure of 'git show-index' or the shell
> redirection will fail the test.
>
> Signed-off-by: SZEDER Gábor 
> ---
>  t/t5310-pack-bitmaps.sh | 7 ---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/t/t5310-pack-bitmaps.sh b/t/t5310-pack-bitmaps.sh
> index 6ee4d3f2d9..557bd0d0c0 100755
> --- a/t/t5310-pack-bitmaps.sh
> +++ b/t/t5310-pack-bitmaps.sh
> @@ -9,7 +9,8 @@ objpath () {
>
>  # show objects present in pack ($1 should be associated *.idx)
>  list_packed_objects () {
> - git show-index <"$1" | cut -d' ' -f2
> + git show-index <"$1" >object-list &&
> + cut -d' ' -f2 object-list
>  }
>
>  # has_any pattern-file content-file
> @@ -204,8 +205,8 @@ test_expect_success 'pack-objects to file can use bitmap' 
> '
>   # verify equivalent packs are generated with/without using bitmap index
>   packasha1=$(git pack-objects --no-use-bitmap-index --all packa 
>packbsha1=$(git pack-objects --use-bitmap-index --all packb  &&
> - list_packed_objects packa.objects &&
> - list_packed_objects packb.objects &&
> + list_packed_objects packa-$packasha1.idx >packa.objects &&
> + list_packed_objects packb-$packbsha1.idx >packb.objects &&
>   test_cmp packa.objects packb.objects
>  '

Thanks for catching and correcting this.

A minor comment from outside observer: running tests under something
like

-e and -o pipefail

would automatically catch the mistake in the first place. Maybe `-o
pipefail` is bashism (I had not checked), but `git grep " | " t/` shows
there are a lot of pipelines being used, and thus similar errors might be
silently resting there. Something like -o pipefail would catch all such
problems automatically.

Kirill



Re: worktree duplicates, was: [PATCH] SubmittingPatches: mention doc-diff

2018-08-27 Thread Eric Sunshine
On Fri, Aug 24, 2018 at 7:25 PM Jeff King  wrote:
> On Fri, Aug 24, 2018 at 06:55:24PM -0400, Eric Sunshine wrote:
> > On Fri, Aug 24, 2018 at 10:47 AM Duy Nguyen  wrote:
> > > > I was thinking that "worktree add" could start respecting the --force
> > > > option as an escape hatch.
> > >
> > > Sounds good. Eric are you going to implement this? Just checking so
> > > that I can (hopefully) cross this off my backlog ;-)
> >
> > It wasn't something I was planning on working on (at least not
> > immediately) [...]
> > As for the actual implementation, I haven't yet looked at how much
> > surgery will be needed to make 'add' respect --force.
>
> Me either. I may take a look this weekend. [...]

Okay, I got an implementation up and running. It didn't require too
much code, but neither was it a simple 1- or 2-liner.

I still need to update documentation, write tests, and compose the
actual patch series (which will probably run to about 5 patches), so
it's not quite ready to send out, but hopefully soon.


Urgent,

2018-08-27 Thread Juliet Muhammad
i have been trying to contact you


Re: [PATCH v2 2/2] fsck: use oidset for skiplist

2018-08-27 Thread Ævar Arnfjörð Bjarmason


On Sat, Aug 25 2018, René Scharfe wrote:

> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 2fa65b7516..80ab570579 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1715,7 +1715,7 @@ doing the same for `receive.fsck.` and 
> `fetch.fsck.`
>  will only cause git to warn.
>
>  fsck.skipList::
> - The path to a sorted list of object names (i.e. one SHA-1 per
> + The path to a list of object names (i.e. one SHA-1 per
>   line) that are known to be broken in a non-fatal way and should
>   be ignored. This feature is useful when an established project
>   should be accepted despite early commits containing errors that

I was going to say that since this is a file format we're likely to
support across versions we should make a note that "up to version
so-and-so this needed to be sorted, but that's longer the case. So
e.g. someone wouldn't test this on 2.20 (or read the online docs) and
then deploy this for older clients..

But...

> - if (options->skiplist)
> - sorted = options->skiplist->sorted;
> - else {
> - sorted = 1;
> - options->skiplist = 
> - }
> -
>   fp = fopen(path, "r");
>   if (!fp)
>   die("Could not open skip list: %s", path);
> @@ -202,19 +192,12 @@ static void init_skiplist(struct fsck_options *options, 
> const char *path)
>   const char *p;
>   if (parse_oid_hex(sb.buf, , ) || *p != '\0')
>   die("Invalid SHA-1: %s", sb.buf);
> - oid_array_append(, );
> - if (sorted && skiplist.nr > 1 &&
> - oidcmp([skiplist.nr - 2],
> -) > 0)
> - sorted = 0;
> + oidset_insert(>skiplist, );

...reading this implementation, where we always called
oid_array_append(), but then just kept track of whether the set was
sorted...

>   }
>   if (ferror(fp))
>   die_errno("Could not read '%s'", path);
>   fclose(fp);
>   strbuf_release();
> -
> - if (sorted)
> - skiplist.sorted = 1;

...and here where we assigned to the .sorted member of the oid_array...

>  static int object_on_skiplist(struct fsck_options *opts, struct object *obj)
>  {
> - if (opts && opts->skiplist && obj)
> - return oid_array_lookup(opts->skiplist, >oid) >= 0;
> - return 0;
> + return opts && obj && oidset_contains(>skiplist, >oid);
>  }

and here where we'd always do the lookup if the skiplist was
initialized, *not* just if it's sorted, and how the sha1-array.c code
has looked ever since cd94c6f91 ("fsck: git receive-pack: support
excluding objects from fsck'ing", 2015-06-22) when this was first added:

$ git show cd94c6f91:sha1-array.c|grep -A5 sha1_array_lookup
int sha1_array_lookup(struct sha1_array *array, const unsigned char *sha1)
{
if (!array->sorted)
sha1_array_sort(array);
return sha1_pos(sha1, array->sha1, array->nr, sha1_access);
}

So I think it makes sense to make this series a three-part, where in the
first part we only change these docs to say s/sorted //, and modify the
tests I added in 65a836fa6b ("fsck: add stress tests for fsck.skipList",
2018-07-27) to assert that an unsorted & sorted list of SHA-1s works
just as well.

Then following up with your [12]/2, where the internal implementation is
changed, but we make it clear that it's *just* the internal
implementation. I.e. from a UI perspective the list never had to be
pre-sorted, we'd just spend some work sorting it on the first lookup if
it wasn't sorted already.

Now without some very careful reading it's not clear what "we don't need
to worry about any sort order anymore" in the commit message means,
i.e. what it really means is "for the purposes of the internal
implementation, and as an opt-in user-side optimization ...".

I.e. an alternate version of this whole patch series could also be:

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 1c4236498..930807e43 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1709,5 +1709,5 @@ will only cause git to warn.

 fsck.skipList::
-   The path to a sorted list of object names (i.e. one SHA-1 per
+   The path to a list of object names (i.e. one SHA-1 per
line) that are known to be broken in a non-fatal way and should
be ignored. This feature is useful when an established project
diff --git a/fsck.c b/fsck.c
index a0cee0be5..9d4e938ad 100644
--- a/fsck.c
+++ b/fsck.c
@@ -184,14 +184,10 @@ static void init_skiplist(struct fsck_options 
*options, const char *path)
 {
static struct oid_array skiplist = OID_ARRAY_INIT;
-   int sorted, fd;
+   int fd;
char buffer[GIT_MAX_HEXSZ + 1];
struct object_id oid;

-   if