Re: [PATCH v4 10/12] Add a base implementation of SHA-256 support

2018-10-31 Thread Junio C Hamano
"brian m. carlson"  writes:

> I need to do a reroll to address some other issues people brought up, so
> I can remove this line.

OK.  I actually do not feel too strongly about this one, by the way.


Re: [PATCH v4 0/7] Use generation numbers for --topo-order

2018-10-31 Thread Junio C Hamano
"Derrick Stolee via GitGitGadget"  writes:

> This patch series performs a decently-sized refactoring of the revision-walk
> machinery. Well, "refactoring" is probably the wrong word, as I don't
> actually remove the old code. Instead, when we see certain options in the
> 'rev_info' struct, we redirect the commit-walk logic to a new set of methods
> that distribute the workload differently. By using generation numbers in the
> commit-graph, we can significantly improve 'git log --graph' commands (and
> the underlying 'git rev-list --topo-order').

Review discussions seem to have petered out.  Would we merge this to
'next' and start cooking, perhaps for the remainder of this cycle?


Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:"

2018-10-31 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

>> On the other hand, I do not think I mind all that much if a src that
>> is a tag object to automatically go to refs/tags/ (having a tag
>> object in refs/remotes/** is rare enough to matter in the first
>> place).
>
> Yeah maybe this is going too far. I don't think so, but happy to me
> challenged on that point :)
>
> I don't think so because the only reason I've ever needed this is
> because I deleted some branch accidentally and am using a push from
> "remotes/*" to bring it back. I.e. I'll always want branch-for-branch,
> not to push that as a tag.

Oh, I didn't consider pushing it out as a tag, but now you bring it
up, I think that it also would make sense in a workflow to tell your
colleages to look at (sort of like how people use pastebin---"look
here, this commit has the kind of change I have in mind in this
discussion") some random commit and the commit happens to be sitting
at a tip of a remote-trackig branch.  Instead of pushing it out as a
branch or a remote-tracking branch, which has strong connotations of
inviting others to build on top, pushing it out as a tag would make
more sense in that context.

And as I mentioned already, I think it would equally likely, if not
more likely, for people like me to push remotes/** out as a
remote-tracking branch (rather than a local branch) of the
repository I'm pushing into.

So I tend to agree that this is going too far.  If the original
motivating example was not an ingredient of everyday workflow, but
was an one-off "recovery", I'd rather see people forced to be more
careful by requiring "push origin/frotz:refs/heads/frotz" rather
than incorrectly DWIDNM "push origin/frotz:frotz" and ending up with
creating refs/tags/frotz or refs/remotes/origin/frotz, which also
are plausible choices depending on what the user is trying to
recover from, which the sending end would not know (the side on
which the accidental loss of a ref happened earlier is on the remote
repository that would be receiving this push, and it _might_ know).

As to the entirety of the series,

 - I do not think this step 7, and its documentation in step 8, are
   particularly a good idea, in their current shape.  Pushing tag
   objects to refs/tags/ is probably a good idea, but pushing a
   commit as local branch heads are necessarily not.

 - Step 6 is probably a good documentation on the cases in which we
   make and do not make guess on the unqualified push destination.

 - Step 5 and earlier looked like good changes.

If we were to salvage some parts of step 7 (and step 8), we'd
probably need fb7c2268 ("SQUASH???", 2018-10-29) to number all the
placeholders in the printf format string.


Re: [PATCH 2/3] tests: mark those tests where "git fsck" fails at the end

2018-10-31 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> Mark the tests where "git fsck" fails at the end with extra test code
> to check the fsck output. There fsck.{err,out} has been created for
> us.
>
> A later change will add the support for GIT_TEST_FSCK_TESTS. They're
> being added first to ensure the test suite will never fail with
> GIT_TEST_FSCK=true during bisect.

I am sympathetic to what step 3/3 (eh, rather, an earlier "let's not
leave the repository in corrupt state, as that would make it
inconvenient for us to later append new tests") wants to do, but not
this one---these markings at the end makes it inconvenient for us to
later add new tests to these script before them.



[PATCH v4fixed 4/5] add read_author_script() to libgit

2018-10-31 Thread Junio C Hamano
From: Phillip Wood 

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood 
Signed-off-by: Junio C Hamano 
---
 builtin/am.c |  86 +
 sequencer.c  | 105 +++
 sequencer.h  |   3 ++
 3 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index c78a745289..83685180e0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct 
am_state *state,
die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append  at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-   while (*buf) {
-   struct string_list_item *item;
-   char *np;
-   char *cp = strchr(buf, '=');
-   if (!cp) {
-   np = strchrnul(buf, '\n');
-   return error(_("unable to parse '%.*s'"),
-(int) (np - buf), buf);
-   }
-   np = strchrnul(cp, '\n');
-   *cp++ = '\0';
-   item = string_list_append(list, buf);
-
-   buf = np + (*np == '\n');
-   *np = '\0';
-   cp = sq_dequote(cp);
-   if (!cp)
-   return error(_("unable to dequote value of '%s'"),
-item->string);
-   item->util = xstrdup(cp);
-   }
-   return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
-   struct strbuf buf = STRBUF_INIT;
-   struct string_list kv = STRING_LIST_INIT_DUP;
-   int retval = -1; /* assume failure */
-   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
-   int fd;
 
assert(!state->author_name);
assert(!state->author_email);
assert(!state->author_date);
 
-   fd = open(filename, O_RDONLY);
-   if (fd < 0) {
-   if (errno == ENOENT)
-   return 0;
-   return error_errno(_("could not open '%s' for reading"),
-  filename);
-   }
-   strbuf_read(, fd, 0);
-   close(fd);
-   if (parse_key_value_squoted(buf.buf, ))
-   goto finish;
-
-   for (i = 0; i < kv.nr; i++) {
-   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
-   if (name_i != -2)
-   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
-   else
-   name_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
-   if (email_i != -2)
-   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
-   else
-   email_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
-   if (date_i != -2)
-   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
-   else
-   date_i = i;
-   } else {
-   err = error(_("unknown variable '%s'"),
-   kv.items[i].string);
-   }
-   }
-   if (name_i == -2)
-   error(_("missing 'GIT_AUTHOR_NAME'"));
-   if (email_i == -2)
-   error(_("missing 'GIT_AUTHOR_EMAIL'"));
-   if (date_i == -2)
-   error(_("missing 'GIT_AUTHOR_DATE'"));
-   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
-   goto finish;
-   state->author_name = kv.items[name_i].util;
-   state->author_email = kv.items[email_i].util;
-   state->author_date = kv.items[date_i].util;
-   retval = 0;
-finish:
-   string_list_clear(, !!retval);
-   strbuf_release();
-   return retval;
+   return read_author_script(filename, >author_name,
+ >author_email, >author_date, 1);
 }
 
 /**
diff --git a/sequencer.c b/sequencer.c
index 6387c9ee6e..bf84a4f8ea 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -664,6 +664,111 @@ static int write_author_script(const char *message)
return res;
 }
 
+/**
+ * Take a series of KEY='VALUE' lines where 

[PATCH v4fixed 2/5] am: improve author-script error reporting

2018-10-31 Thread Junio C Hamano
From: Phillip Wood 

If there are errors in a user edited author-script there was no
indication of what was wrong. This commit adds some specific error messages
depending on the problem. It also relaxes the requirement that the
variables appear in a specific order in the file to match the behavior
of 'rebase --interactive'.

Signed-off-by: Phillip Wood 
Signed-off-by: Junio C Hamano 
---
 builtin/am.c | 49 +++--
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 4f7f28a9dc..ffca4479d7 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
struct string_list_item *item;
char *np;
char *cp = strchr(buf, '=');
-   if (!cp)
-   return -1;
+   if (!cp) {
+   np = strchrnul(buf, '\n');
+   return error(_("unable to parse '%.*s'"),
+(int) (np - buf), buf);
+   }
np = strchrnul(cp, '\n');
*cp++ = '\0';
item = string_list_append(list, buf);
@@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
*np = '\0';
cp = sq_dequote(cp);
if (!cp)
-   return -1;
+   return error(_("unable to dequote value of '%s'"),
+item->string);
item->util = xstrdup(cp);
}
return 0;
@@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
struct strbuf buf = STRBUF_INIT;
struct string_list kv = STRING_LIST_INIT_DUP;
int retval = -1; /* assume failure */
+   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
int fd;
 
assert(!state->author_name);
@@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
if (parse_key_value_squoted(buf.buf, ))
goto finish;
 
-   if (kv.nr != 3 ||
-   strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-   strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-   strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
+   for (i = 0; i < kv.nr; i++) {
+   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+   if (name_i != -2)
+   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
+   else
+   name_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+   if (email_i != -2)
+   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
+   else
+   email_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+   if (date_i != -2)
+   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
+   else
+   date_i = i;
+   } else {
+   err = error(_("unknown variable '%s'"),
+   kv.items[i].string);
+   }
+   }
+   if (name_i == -2)
+   error(_("missing 'GIT_AUTHOR_NAME'"));
+   if (email_i == -2)
+   error(_("missing 'GIT_AUTHOR_EMAIL'"));
+   if (date_i == -2)
+   error(_("missing 'GIT_AUTHOR_DATE'"));
+   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
goto finish;
-   state->author_name = kv.items[0].util;
-   state->author_email = kv.items[1].util;
-   state->author_date = kv.items[2].util;
+   state->author_name = kv.items[name_i].util;
+   state->author_email = kv.items[email_i].util;
+   state->author_date = kv.items[date_i].util;
retval = 0;
 finish:
string_list_clear(, !!retval);
-- 
2.19.1-708-g4ede3d42df



Re: [PATCH v4 0/5] am/rebase: share read_author_script()

2018-10-31 Thread Junio C Hamano
Phillip Wood  writes:

> From: Phillip Wood 
>
> Sorry for the confusion with v3, here are the updated patches.
>
> Thanks to Junio for the feedback on v2. I've updated patch 4 based on
> those comments, the rest are unchanged.

The mistake of overwriting -1 (i.e. earlier we detected dup) with
the third instance of the same originates at [2/5], so updating
[4/5] without fixing it at its source would mean [4/5] is not a pure
code movement to make it available to libgit users---it instead hides
a (not so important) bugfix in it.

>
> v1 cover letter:
>
> This is a follow up to pw/rebase-i-author-script-fix, it reduces code
> duplication and improves rebase's parsing of the author script. After
> this I'll do another series to share the code to write the author
> script.
>
>
> Phillip Wood (5):
>   am: don't die in read_author_script()
>   am: improve author-script error reporting
>   am: rename read_author_script()
>   add read_author_script() to libgit
>   sequencer: use read_author_script()
>
>  builtin/am.c |  60 ++--
>  sequencer.c  | 192 ---
>  sequencer.h  |   3 +
>  3 files changed, 128 insertions(+), 127 deletions(-)


Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows

2018-10-31 Thread Junio C Hamano
Johannes Schindelin  writes:

> Indeed, the patch in question regards something I consider outside Git for
> Windows' realm. As Chris said, you can run this script from a PowerShell
> prompt, without any Git Bash (and without Git's Perl) involved.
>
> I am fine with this patch, as long as the author name is fixed to match
> the name in the Signed-off-by: footer ;-) [*1*]

Thanks, I'll find a corrected patch on the list (or manufacture it
out of the original) and queue, then.



Re: [PATCH v2] completion: use builtin completion for format-patch

2018-10-31 Thread Junio C Hamano
Duy Nguyen  writes:

>> -__git_format_patch_options="
>> -   --stdout --attach --no-attach --thread --thread= --no-thread
>> -   --numbered --start-number --numbered-files --keep-subject --signoff
>> -   --signature --no-signature --in-reply-to= --cc= --full-index --binary
>> -   --not --all --cover-letter --no-prefix --src-prefix= --dst-prefix=
>> -   --inline --suffix= --ignore-if-in-upstream --subject-prefix=
>> -   --output-directory --reroll-count --to= --quiet --notes
>> -"
>> -
>>  _git_format_patch ()
>>  {
>> case "$cur" in
>> @@ -1550,7 +1541,7 @@ _git_format_patch ()
>> return
>> ;;
>> --*)
>> -   __gitcomp "$__git_format_patch_options"
>> +   __gitcomp_builtin format-patch
>
> We do want to keep some extra options back because __gitcomp_builtin
> cannot show all supported options of format-patch. The reason is a
> subset of options is handled by setup_revisions() which does not have
> --git-completion-helper support.

Yeah, that is one of the differences I saw compared to the older
one; thanks for being a careful reviewer.


>> @@ -2080,16 +2071,19 @@ _git_send_email ()
>> return
>> ;;
>> --*)
>> -   __gitcomp "--annotate --bcc --cc --cc-cmd --chain-reply-to
>> -   --compose --confirm= --dry-run --envelope-sender
>> -   --from --identity
>> -   --in-reply-to --no-chain-reply-to 
>> --no-signed-off-by-cc
>> -   --no-suppress-from --no-thread --quiet --reply-to
>> -   --signed-off-by-cc --smtp-pass --smtp-server
>> -   --smtp-server-port --smtp-encryption= --smtp-user
>> -   --subject --suppress-cc= --suppress-from --thread 
>> --to
>> -   --validate --no-validate
>> -   $__git_format_patch_options"
>> +   __gitcomp "--all --annotate --attach --bcc --binary --cc 
>> --cc= --cc-cmd
>> +   --chain-reply-to --compose --confirm= --cover-letter 
>> --dry-run
>> +   --dst-prefix= --envelope-sender --from --full-index 
>> --identity
>> +   --ignore-if-in-upstream --inline --in-reply-to 
>> --in-reply-to=
>> +   --keep-subject --no-attach --no-chain-reply-to 
>> --no-prefix
>> +   --no-signature --no-signed-off-by-cc 
>> --no-suppress-from --not --notes
>> +   --no-thread --no-validate --numbered --numbered-files
>> +   --output-directory --quiet --reply-to --reroll-count 
>> --signature
>> +   --signed-off-by-cc --signoff --smtp-encryption= 
>> --smtp-pass
>> +   --smtp-server --smtp-server-port --smtp-user 
>> --src-prefix=
>> +   --start-number --stdout --subject --subject-prefix= 
>> --suffix=
>> +   --suppress-cc= --suppress-from --thread --thread= 
>> --to --to=
>> +   --validate"
>
> I have no comment about this. In an ideal world, sendemail.perl could
> be taught to support --git-completion-helper but I don't think my
> little remaining Perl knowledge (or time) is enough to do it. Perhaps
> this will do. I don't know.

So "all", "attach", etc. are added to this list while these similar
options are lost from the other variable?  Is this a good trade-off?


Re: [PATCH 12/12] fsck: mark strings for translation

2018-10-31 Thread Jiang Xin
Ævar Arnfjörð Bjarmason  于2018年10月30日周二 上午1:43写道:
> On Mon, Oct 29 2018, Ævar Arnfjörð Bjarmason wrote:
> Unlike the rest of our stack where we need to support however many years
> old tools we can freely rely on bleeding-edge gettext features, since
> the only person we need to convince to upgrade is Jiang. I.e. he accepts
> the PRs from translators, (presumably) runs msgfmt --check and then
> submits the result to Junio.

I used a shell script to check commits before I send a pull request to Junio.
This script is in the po-helper branch, see:
https://github.com/git-l10n/git-po/tree/po-helper

It can catch unmatched '\n' errors (missing or unnecessary '\n').

$ git diff
   diff --git a/po/zh_CN.po b/po/zh_CN.po
   index eabd4d1f8e..6b0d9ebc60 100644
   --- a/po/zh_CN.po
   +++ b/po/zh_CN.po
   @@ -5157,7 +5157,7 @@ msgstr  "略过补丁 '%s'。"
#
#, perl-format
msgid   "Skipping %s with backup suffix '%s'.\n"
   -msgstr  "略过 %s 含备份后缀 '%s'。\n"
   +msgstr  "略过 %s 含备份后缀 '%s'。"
#
#, c-format
msgid   "Skipping repository %s\n"

   $ LC_ALL=C po-helper.sh check
   
   bg.po : 3958 translated messages.
   ca.po : 3328 translated messages, 18 fuzzy translations, 30
untranslated messages.
   de.po : 3958 translated messages.
   es.po : 3958 translated messages.
   fr.po : 3957 translated messages, 1 fuzzy translation.
   is.po : 14 translated messages.
   it.po : 716 translated messages, 350 untranslated messages.
   ko.po : 3608 translated messages.
   pt_PT.po  : 3198 translated messages.
   ru.po : 3366 translated messages, 594 untranslated messages.
   sv.po : 3958 translated messages.
   vi.po : 3958 translated messages.
   zh_CN.po  : po/zh_CN.po:19717: 'msgid' and 'msgstr' entries do not
both end with '\n'
   msgfmt: found 1 fatal error
   3958 translated messages.
   
   Show updates of git.pot...

   # Nothing changed. (run 'make pot' first)
   
   Check commits...

   0 commits checked complete.
   
   Note: If you want to check for upstream l10n update, run:
   Note:
   Note: po-helper.sh check update 
   

So, no warry about it. BTW, I agree with Jonathan.

> Jonathan said:
> IMHO the advantage of leaving the \n out in the message is not so much
> that we don't trust translators but more that where we can make their
> lives easier, we should.

--
Jiang Xin


[Spam]Quick Response

2018-10-31 Thread Annable Katherine Grosvenor
Good day,

My name is Annable Katherine Grosvenor, I'm 57yrs old, a widow, no kids, from 
the United Kingdom, I'm very sorry to bother you with this message but it is 
very important for me that I send out this message because I am very sick and 
at the point of death, I'm diagnosed with Ovarian Cancer and from all 
indications, I will not survive it because my Doctor courageously told me that 
I have few months to live, and also I can see that my health is fast 
deteriorating, the aggression of cancer has reached a critical stage.

I'm contacting you from my sick bed with the help of my personal nurse, I need 
a trustworthy, Godfearing and a reliable person I can bank on to carry out my 
last wish for me which is also the wish of my late husband, I have a 
humanitarian project worth three million, five hundred thousand dollars only, 
if you promise to help me accomplish this my last dying wish to the Glory of 
God and humanity, I will give you thirty percent of the total sum.

For further Enquiry about my mail to you pls kindly get back to me on my 
private email address so I can give you further details on how to go about this

Thank you very much for taking out time to read my message to you.

Yours Truly,
Annable Katherine Grosvenor
annablekatherinegrosve...@gmail.com


Re: using --force-with-lease after git rebase

2018-10-31 Thread Junio C Hamano
Alexander Mills  writes:

> I have been confused about the need for --force-with-lease after rebasing
>
> Imagine I have a feature branch:
>
> git checkout --no-track -b 'feature' 'origin/dev'
> git push -u origin feature
>
> I do some work, and then I rebase against origin/dev to keep up to
> date with the integration branch.
>
> git fetch origin/dev
> git rebase origin/dev
>
> then I try to push to the remote
>
> git push origin feature
>
> but that is rejected, I have to do:

This all depends on how the "dev" branch at the "origin" remote
relate to the branch at "origin" you are updating with the commit at
the tip of the "feature" branch.  Are you pushing to "feature" branch?

If so, then the rejection is very much expected.  At this point, the
histories of the "feature" branch at the "origin" remote is seeing
would look like this:

---X---o---o---o---o---A
\
 x---x---x---x---B

where (X) is where you started the old iteration of the "feature"
branch forking off of the "dev" branch, (A) is the tip of that old
iteration of the "feature" branch, and (B) is the tip of the new
itertion of the "feature" branch you are trying to update with.

The "origin" repository does not know WHY B is not a fast-forward of
A.  The only thing it knows is that you are discarding the work done
in commits (o) while attempting to publish commits (x).  If it is
intentional, then that's fine, but it does not know (x) are
replacements for (o) due to rebasing, so it errs on the side of the
caution.

With the "--force-with-lease=feature:A" option, you can tell the
other side: "it is OK if this push does not fast-forward, as long as
I am updating from A" [*1*].

"--fore-with-lease" without saying what the commit you are expecting
to discard makes Git on the sending side _guess_.  Depending on what
you do locally, it can make a wrong guess, so I would not recommend
such a use, but if you saw it succeed and if you did not lose
commits at the "origin", then it may have guessed correctly ;-)


[Footnote]

*1* Telling what the value of 'A' is to the other side is important,
 as you are essentially saying that 'B' has everything you want to
 resurrect from 'A'.  Imagine that somebody else pushed to update
 "feature" at the "origin" remote from 'A' to 'C' (or if you did so
 and forgot about it) and then you tried to push 'B' after rebasing.

 C
/
---X---o---o---o---o---A
\
 x---x---x---x---B

 As far as you (who rebased) were concerned, 'B' is equivalent to
 (or "an improved version of") 'A' and you want the push that does
 not fast-forward to go through to replace 'A' with 'B'.  By telling
 "I am replacing A with B" (instead of saying "I am replacing
 whatever with B", which is what "--force" is), the receiving side
 at the "origin" repository can notice that there was another update
 by somebody else's push to the branch while you are preparing 'B'
 and the tip of "feature" is no longer at 'A', and reject the push
 in order to prevent you from losing the work between 'A' and 'C'.


Re: using --force-with-lease after git rebase

2018-10-31 Thread brian m. carlson
On Wed, Oct 31, 2018 at 12:13:06PM -0700, Alexander Mills wrote:
> I have been confused about the need for --force-with-lease after rebasing
> 
> Imagine I have a feature branch:
> 
> git checkout --no-track -b 'feature' 'origin/dev'
> git push -u origin feature
> 
> I do some work, and then I rebase against origin/dev to keep up to
> date with the integration branch.
> 
> git fetch origin/dev
> git rebase origin/dev
> 
> then I try to push to the remote
> 
> git push origin feature
> 
> but that is rejected, I have to do:
> 
> git push --force-with-lease origin feature
> 
> why is that? Why do I need to force push my feature branch to the
> remote tracking branch after rebasing against the integration branch?

When you perform a push to an existing branch, Git checks to make sure
that the push is a fast-forward; that is, that your changes are a strict
superset of the commits that are in that branch.  When you rebase a
branch, you rewrite the commits and their object IDs (their hashes).
Since you've rewritten some of the commits that were on the version of
your branch that you pushed to the server, the commits you want to push
are no longer a strict superset, and Git requires a force push.

This provision is in place to prevent data loss.  For example, if you
and a colleague are both collaborating on a branch that isn't rebased,
you would want to avoid pushing changes that overwrote those that your
colleague had made.  Git making your push fail is its way of helping
you realize that there are changes you have not included and making you
decide how you want to handle them.

You strictly do not need to use --force-with-lease; you could just use
--force instead.  But using --force-with-lease over --force when
possible ensures that you are overwriting what you think you're
overwriting, and not additional working changes that someone else has
made, so it's a good practice.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [BUG?] protocol.version=2 sends HTTP "Expect" headers

2018-10-31 Thread brian m. carlson
On Wed, Oct 31, 2018 at 12:03:53PM -0400, Jeff King wrote:
> Since 959dfcf42f (smart-http: Really never use Expect: 100-continue,
> 2011-03-14), we try to avoid sending "Expect" headers, since some
> proxies apparently don't handle them well. There we have to explicitly
> tell curl not to use them.
> 
> The exception is large requests with GSSAPI, as explained in c80d96ca0c
> (remote-curl: fix large pushes with GSSAPI, 2013-10-31).
> 
> However, Jon Simons noticed that when using protocol.version=2, we've
> started sending Expect headers again. That's because rather than going
> through post_rpc(), we push the stateless data through a proxy_state
> struct. And in proxy_state_init(), when we set up the headers, we do not
> disable curl's Expect handling.
> 
> So a few questions:
> 
>   - is this a bug or not? I.e., do we still need to care about proxies
> that can't handle Expect? The original commit was from 2011. Maybe
> things are better now. Or maybe that's blind optimism.
> 
> Nobody has complained yet, but that's probably just because v2 isn't
> widely deployed yet.

HTTP/1.1 requires support for 100 Continue on the server side and in
proxies (it is mandatory to implement).  The original commit disabling
it (959dfcf42f ("smart-http: Really never use Expect: 100-continue",
2011-03-14)), describes proxies as the reason for disabling it.

It's my understanding that all major proxies (including, as of version
3.2, Squid) support HTTP/1.1 at this point.  Moreover, Kerberos is more
likely to be used in enterprises, where proxies (especially poorly
behaved and outright broken proxies) are more common.  We haven't seen
any complaints about Kerberos support in a long time.  This leads me to
believe that things generally work[0].

Finally, modern versions of libcurl also have a timeout after which they
assume that the server is not going to respond and just send the request
body anyways.  This makes the issue mostly moot.

>   - alternatively, should we just leave it on for v2, and provide a
> config switch to disable it if you have a crappy proxy? I don't know
> how widespread the problem is, but I can imagine that the issue is
> subtle enough that most users wouldn't even know.

For the reasons I mentioned above, I'd leave it on for now.  Between
libcurl and better proxy support, I think this issue is mostly solved.
If we need to fix it in the future, we can, and people can fall back to
older protocols via config in the interim.

[0] In some environments, people use SSH because the proxy breaks
everything that looks like HTTP, but that's beyond the scope of this
discussion.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH] pretty: Add %(trailer:X) to display single trailer

2018-10-31 Thread Anders Waldenborg


Jeff King writes:

> On the other hand, if the rule were not "this affects the next
> placeholder" but had a true ending mark, then we could make a real
> parse-tree out of it, and format chunks of placeholders. E.g.:
>
>   %(format:lpad=30,filename)%(subject) %(authordate)%(end)
>
> would pad and format the whole string with two placeholders. I know that
> going down this road eventually involves reinventing XML, but I think
> having an actual tree structure may not be an unreasonable thing to
> shoot for.

Yes. I'm thinking that with [] for formatting specifiers and () for
placeholders, {} would be available for nesting. E.g:

   %[lpad=30,mangle]{%(subject) %ad%}


> My main concern for now is to avoid introducing new
> syntax that we'll be stuck with forever, even though it may later become
> redundant (or worse, create parsing ambiguities).

Agreed.

I'm planning to work on the initial "trailer:key=" part later this
week. Maybe I can play around with different formatting options and see
how it affects the parser.


[PATCH v2 0/1] DiffHighlight.pm: Use correct /dev/null for UNIX and Windows

2018-10-31 Thread Chris. Webster via GitGitGadget
Use File::Spec->devnull() for output redirection to avoid messages when
Windows version of Perl is first in path. The message 'The system cannot
find the path specified.' is displayed each time git is run to get colors.

Chris. Webster (1):
  diff-highlight: Use correct /dev/null for UNIX and Windows

 contrib/diff-highlight/DiffHighlight.pm | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)


base-commit: c670b1f876521c9f7cd40184bf7ed05aad843433
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-59%2Fwebstech%2Fmaster-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-59/webstech/master-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/59

Range-diff vs v1:

 1:  8159cbd1b8 ! 1:  bcbffa1411 Use correct /dev/null for UNIX and Windows
 @@ -1,6 +1,6 @@
 -Author: chris 
 +Author: Chris. Webster 
  
 -Use correct /dev/null for UNIX and Windows
 +diff-highlight: Use correct /dev/null for UNIX and Windows
  
  Use File::Spec->devnull() for output redirection to avoid messages
  when Windows version of Perl is first in path.  The message 'The

-- 
gitgitgadget


Re: [PATCH v4 10/12] Add a base implementation of SHA-256 support

2018-10-31 Thread brian m. carlson
On Mon, Oct 29, 2018 at 09:39:33AM +0900, Junio C Hamano wrote:
> "brian m. carlson"  writes:
> > This may not strictly be needed, but removing it makes the header no
> > longer self-contained, which blows up my (and others') in-editor
> > linting.
> 
> That sounds like bending backwards to please tools, though.  Can't
> these in-editor linting learn the local rules of the codebase they
> are asked to operate on?

Doing so involves writing (in my case) Vim code to configure settings
for this repo.

Since language linting tools invoke compilers and other language
runtimes, you need to specify command-line arguments to those tools, and
in general, that's not a safe thing you can read from the repository
configuration, since just viewing files should not be able to execute
arbitrary code[0].  Languages such as Perl which can execute arbitrary
code with compile checks have to be enabled explicitly with ALE (which
is what I'm using).

I need to do a reroll to address some other issues people brought up, so
I can remove this line.

[0] Pass "-o .bashrc" to the preprocessor, for example.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: Git Evolve

2018-10-31 Thread Stefan Xenos
Sorry it's taken awhile for me to get back to this thread. I've been
keeping my open source contributions timeboxed, and had to work
through a bit of a backlog before this email thread got back to the
front of the queue.

> What would the command-line experience look like for this workflow? Be 
> specific, including examples!

Yes, I agree that's critical. I'm not going to try to improvise an
answer over email, but I will be sure to include such examples when I
submit the patch to Documentation/technical, after some careful
thought.

> How does one create a commit that obsoletes another?

I was planning to classify all git commands that create commits as
either "modifications" or "clones". Modification commands (such as
rebase or commit --amend) would automatically obsolete their
predecessors. Clone commands (such as cherry-pick) would create copies
without obsoleting the original.

> Are we in the middle of an interactive rebase, or are we simply checking out 
> the commit?

To obsolete a commit, you'd just check it out and amend. You really
wouldn't want to have to do anything extra or put yourself in a
special mode to mark things as obsolete or they wouldn't be much of a
convenience. However, if you were to run the "evolve" command (the
thing that reads the obsolescence markers) and it encountered errors,
you'd be in the middle of something very much like an interactive
rebase.

> How does a use keep track of their progress in a topic?
> How do I view which commits in my topic are obsolete, and to what commits?
> Do obsolescence markers live in the ref space? (This is one way to help 
> answer the question above.)

The answers to these questions are more complicated. I'll be sure to
address them in the technical proposal, but I wouldn't want create a
misunderstanding with an incomplete explanation.

> Can I make multiple commits obsolete into one commit (merge patches)? Can I 
> make my commit obsolete in multiple ways (split a patch)? How is this 
> communicated to the user?

Yes and yes. When a single commit is obsoleted by one or more
non-obsolete commits, this is called divergence and it needs to be
resolved by the user (the evolve command will stop when it encounters
divergence, much like a rebase will stop when it encounters a merge
conflict). Possible resolutions are to discard one of the two
non-obsolete commits by marking them as obsolete, merge the two, or
keep both by cloning one of them and marking it as no longer
obsoleting its predecessor.

Unnecessary divergence isn't a good thing since it requires user
intervention to resolve. Splitting a patch doesn't need to create
divergence (and probably shouldn't). If you split a commit, the
earlier one would be a completely new commit and the latter one would
obsolete the original. Doing it the other way around creates a merge
conflict between the later commit and anything that followed the
original. If both commits obsolete the original, it creates divergence
- and either divergence or a merge would get the user involved
unnecessarily.

A more likely scenario where divergence would occur is of two
different users try to amend the same commit and then share their
results -- or if a user amends a commit, resets back to the original,
and then amends it again.

> Count me in to review the technical details.

Will do!

On Mon, Oct 1, 2018 at 5:37 AM, Derrick Stolee  wrote:
> On 9/29/2018 7:00 PM, Stefan Xenos wrote:
>>
>> Hello, List!
>
>
> Hello! Welcome.
>
>> I'm interested in porting something like Mercurial's evolve command to
>> Git. I'll be following up with a formal proposal shortly, but before I
>> do I thought I'd introduce myself to the list and find out if anyone
>> else is interested in this topic.
>
>
> I'm CC'ing some contributors who have also expressed interest in this topic.
>
>> What is the evolve command?
>
>
> I'm snipping the rest of your thread because I'm vaguely familiar with how
> this is used in hg, but I haven't used it myself. Instead, I'm going to ask
> you the same questions I asked the last time I had a conversation about this
> with someone. In my opinion, these questions should have good answers before
> we start working on the solution, or else we could paint ourselves into a
> corner as we build the first pieces.
>
> ---
>
> What would the command-line experience look like for this workflow? Be
> specific, including examples!
>
> How does one create a commit that obsoletes another? Are we in the middle of
> an interactive rebase, or are we simply checking out the commit? How does a
> use keep track of their progress in a topic?
>
> How do I view which commits in my topic are obsolete, and to what commits?
>
> If I want to obsolete commits on one machine and then finish the work on
> another machine, how do I do that? Similarly: how can I share obsolete
> commits with other users so they can apply them (or not)?
>
> Do obsolescence markers live in the ref space? (This is one way to help
> answer the question above.)
>

[PATCH 1/1] poll: use GetTickCount64() to avoid wrap-around issues

2018-10-31 Thread Steve Hoelzer via GitGitGadget
From: Steve Hoelzer 

>From Visual Studio 2015 Code Analysis: Warning C28159 Consider using
'GetTickCount64' instead of 'GetTickCount'.

Reason: GetTickCount() overflows roughly every 49 days. Code that does
not take that into account can loop indefinitely. GetTickCount64()
operates on 64 bit values and does not have that problem.

Note: this patch has been carried in Git for Windows for almost two
years, but with a fallback for Windows XP, as the GetTickCount64()
function is only available on Windows Vista and later. However, in the
meantime we require Vista or later, hence we can drop that fallback.

Signed-off-by: Steve Hoelzer 
Signed-off-by: Johannes Schindelin 
---
 compat/poll/poll.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/compat/poll/poll.c b/compat/poll/poll.c
index ad5dcde439..4abbfcb6a4 100644
--- a/compat/poll/poll.c
+++ b/compat/poll/poll.c
@@ -18,6 +18,9 @@
You should have received a copy of the GNU General Public License along
with this program; if not, see .  */
 
+/* To bump the minimum Windows version to Windows Vista */
+#include "git-compat-util.h"
+
 /* Tell gcc not to warn about the (nfd < 0) tests, below.  */
 #if (__GNUC__ == 4 && 3 <= __GNUC_MINOR__) || 4 < __GNUC__
 # pragma GCC diagnostic ignored "-Wtype-limits"
@@ -449,7 +452,8 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
   static HANDLE hEvent;
   WSANETWORKEVENTS ev;
   HANDLE h, handle_array[FD_SETSIZE + 2];
-  DWORD ret, wait_timeout, nhandles, start = 0, elapsed, orig_timeout = 0;
+  DWORD ret, wait_timeout, nhandles, elapsed, orig_timeout = 0;
+  ULONGLONG start = 0;
   fd_set rfds, wfds, xfds;
   BOOL poll_again;
   MSG msg;
@@ -465,7 +469,7 @@ poll (struct pollfd *pfd, nfds_t nfd, int timeout)
   if (timeout != INFTIM)
 {
   orig_timeout = timeout;
-  start = GetTickCount();
+  start = GetTickCount64();
 }
 
   if (!hEvent)
@@ -614,7 +618,7 @@ restart:
 
   if (!rc && orig_timeout && timeout != INFTIM)
 {
-  elapsed = GetTickCount() - start;
+  elapsed = (DWORD)(GetTickCount64() - start);
   timeout = elapsed >= orig_timeout ? 0 : orig_timeout - elapsed;
 }
 
-- 
gitgitgadget


[PATCH 0/1] Make compat/poll safer on Windows

2018-10-31 Thread Johannes Schindelin via GitGitGadget
This is yet another piece from the Git for Windows cake. It avoids a
wrap-around in the poll emulation on Windows that occurs every 49 days.

Steve Hoelzer (1):
  poll: use GetTickCount64() to avoid wrap-around issues

 compat/poll/poll.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)


base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-64%2Fdscho%2Fmingw-safer-compat-poll-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-64/dscho/mingw-safer-compat-poll-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/64
-- 
gitgitgadget


Re: [RFC v1] Add virtual file system settings and hook proc

2018-10-31 Thread Ben Peart




On 10/31/2018 3:11 PM, Duy Nguyen wrote:

not really a review, just  a couple quick notes..



Perfect!  As an RFC, I'm more looking for high level thoughts/notes than 
a style/syntax code review.



On Tue, Oct 30, 2018 at 9:40 PM Ben Peart  wrote:


From: Ben Peart 

On index load, clear/set the skip worktree bits based on the virtual
file system data. Use virtual file system data to update skip-worktree
bit in unpack-trees. Use virtual file system data to exclude files and
folders not explicitly requested.

Signed-off-by: Ben Peart 
---

We have taken several steps to make git perform well on very large repos.
Some of those steps include: improving underlying algorithms, utilizing
multi-threading where possible, and simplifying the behavior of some commands.
These changes typically benefit all git repos to varying degrees.  While
these optimizations all help, they are insufficient to provide adequate
performance on the very large repos we often work with.

To make git perform well on the very largest repos, we had to make more
significant changes.  The biggest performance win by far is the work we have
done to make git operations O(modified) instead of O(size of repo).  This
takes advantage of the fact that the number of files a developer has modified
is a tiny fraction of the overall repo size.

We accomplished this by utilizing the existing internal logic for the skip
worktree bit and excludes to tell git to ignore all files and folders other
than those that have been modified.  This logic is driven by an external
process that monitors writes to the repo and communicates the list of files
and folders with changes to git via the virtual file system hook in this patch.

The external process maintains a list of files and folders that have been
modified.  When git runs, it requests the list of files and folders that
have been modified via the virtual file system hook.  Git then sets/clears
the skip-worktree bit on the cache entries and builds a hashmap of the
modified files/folders that is used by the excludes logic to avoid scanning
the entire repo looking for changes and untracked files.

With this system, we have been able to make local git command performance on
extremely large repos (millions of files, 1/2 million folders) entirely
manageable (30 second checkout, 3.5 seconds status, 4 second add, 7 second
commit, etc).

Our desire is to eliminate all custom patches in our fork of git.  To that
end, I'm submitting this as an RFC to see how much interest there is and how
much willingness to take this type of change into git.


Most of these paragraphs (perhaps except the last one) should be part
of the commit message. You describe briefly what the patch does but
it's even more important to say why you want to do it.


+core.virtualFilesystem::
+   If set, the value of this variable is used as a command which
+   will identify all files and directories that are present in
+   the working directory.  Git will only track and update files
+   listed in the virtual file system.  Using the virtual file system
+   will supersede the sparse-checkout settings which will be ignored.
+   See the "virtual file system" section of linkgit:githooks[6].


It sounds like "virtual file system" is just one of the use cases for
this feature, which is more about a dynamic source of sparse-checkout
bits. Perhaps name the config key with something along sparse checkout
instead of naming it after a use case.


It's more than a dynamic sparse-checkout because the same list is also 
used to exclude any file/folder not listed.  That means any file not 
listed won't ever be updated by git (like in 'checkout' for example) so 
'stale' files could be left in the working directory.  It also means git 
won't find new/untracked files unless they are specifically added to the 
list.




This is a hook. I notice we start to avoid adding real hooks and just
add config keys instead. Eventually we should have config-based hooks,
but if we're going to add more like this, I think these should be in a
separate section, hook.virtualFileSystem or something.



That is a great idea.  I don't personally like specifying the hook as 
the 'flag' for whether a feature should be used.  I'd rather have it be 
a bool (enable the feature? true/false) and 1) either have the hook name 
hard coded (like most existing hooks) or 2) as you suggest add a 
consistent way to have config-based hooks.  Config based hooks could 
also help provide a consistent way to configure them using GIT_TEST_* 
environment variables for testing.



I don't think the superseding makes sense. There's no reason this
could not be used in combination with $GIT_DIR/info/sparse-checkout.
If you don't want both, disable the other.

One last note. Since this is related to filesystem. Shouldn't it be
part of fsmonitor (the protocol, not the implementation)? Then
watchman user could use it to.



To get this to work properly takes a lot more logic than exists in 

Re: [PATCH 3/3] cat-file: handle streaming failures consistently

2018-10-31 Thread Jeff King
On Wed, Oct 31, 2018 at 01:38:59PM -0400, Eric Sunshine wrote:

> On Tue, Oct 30, 2018 at 07:23:38PM -0400, Jeff King wrote:
> > There are three ways to convince cat-file to stream a blob:
> > 
> >   - cat-file -p $blob
> > 
> >   - cat-file blob $blob
> > 
> >   - echo $batch | cat-file --batch
> > 
> > In the first two, we simply exit with the error code of
> > streaw_blob_to_fd(). That means that an error will cause us
> 
> Your "m" got confused and ended up upside-down.

Heh. I'm not sure how I managed that. They're not exactly next to each
other on a qwerty keyboard.

-Peff


Re: [PATCH] pretty: Add %(trailer:X) to display single trailer

2018-10-31 Thread Jeff King
On Mon, Oct 29, 2018 at 06:05:34PM +0100, Anders Waldenborg wrote:

> I'll start by reworking my patch to handle %(trailers:key=X)  (I'll
> assume keys never contain ')' or ','), and ignore any formatting until
> the way forward there is decided (see below).

IMHO that is probably an acceptable tradeoff. We haven't really made any
rules for quoting arbitrary values in other %() sequences. I think it's
something we may want to have eventually, but as long as the rule for
now is "you can't do that", I think it would be OK to loosen it later.

> > But my second question is whether we want to provide something more
> > flexible than the always-parentheses that "%d" provides. That has been a
> > problem in the past when people want to format the decoration in some
> > other way.
> 
> Maybe just like +/-/space can be used directly after %, a () pair can
> be allowed..   E.g "%d" would just be an alias for "%()D",  and for
> trailers it would be something like "%()(trailers:key=foo)"

Yeah, I was thinking that "(" was taken as a special character, but I
guess immediately followed by ")" it is easy to parse left-to-right with
no ambiguity.

Would it include the leading space, too? It would be nice if it could be
combined with "% " in an orthogonal way. I guess in theory "% ()D" would
work, but it may need some tweaks to the parsing.

> There is another special cased placeholder %f (sanitized subject line,
> suitable for a filename). Which also could be changed to be a format
> specifiier, allowing sanitize any thing, e.g "%!an" for sanitized
> author name.

Yeah, I agree we should be able to sanitize anything. It's not strictly
related to your patch, though, so you may or may not want to go down
this rabbit hole. :)

> Is even the linebreak to commaseparation a generic thing?
> "% ,()(trailers:key=Ticket)"   it starts go look a bit silly.

In theory, yeah. I agree it's getting a bit magical.

> Then there are the padding modifiers. %<() %<|(). They operate on next
> placeholder. "%<(10)%s" Is that a better syntax?
> "%()%(trailers:key=Ticket,comma)"
> 
> I can also imagine moving all these modifiers into a generic modifier
> syntax in brackets (and keeping old for backwards compat)
> %[lpad=10,ltrunc=10]s ==  %<(10,trunc)%s
> %[nonempty-prefix="%n"]GS ==  %+GS
> %[nonempty-prefix=" (",nonempty-suffix=")"]D ==  %d
> Which would mean something like this for tickets thing:
> %[nonempty-prefix=" 
> (Tickets:",nonempty-suffix=")",commaseparatelines](trailers:key=Ticket,nokey)
> which is kinda verbose.

Yes. I had dreams of eventually stuffing all of those as options into
the placeholders themselves. So "%s" would eventually have a long-form
of "%(subject)", and in that syntax it could be:

  %(subject:lpad=10,filename)

or something. I'm not completely opposed to:

  %[lpad=10,filename]%(subject)

which keeps the "formatting" arguments out of the regular placeholders.
On the other hand, if the rule were not "this affects the next
placeholder" but had a true ending mark, then we could make a real
parse-tree out of it, and format chunks of placeholders. E.g.:

  %(format:lpad=30,filename)%(subject) %(authordate)%(end)

would pad and format the whole string with two placeholders. I know that
going down this road eventually involves reinventing XML, but I think
having an actual tree structure may not be an unreasonable thing to
shoot for.

I dunno. You certainly don't need to solve all of these issues for what
you want to do. My main concern for now is to avoid introducing new
syntax that we'll be stuck with forever, even though it may later become
redundant (or worse, create parsing ambiguities).

> > We have formatting magic for "if this thing is non-empty, then show this
> > prefix" in the for-each-ref formatter, but I'm not sure that we do in
> > the commit pretty-printer beyond "% ". I wonder if we could/should add a
> > a placeholder for "if this thing is non-empty, put in a space and
> > enclose it in parentheses".
> 
> Would there be any interest in consolidating those formatters? Even
> though they are totally separate beasts today. I think having all
> attributes available on long form (e.g "%(authorname)") in addition to
> existing short forms in pretty-formatter would make sense.

Yes, there's great interest. :)

The formats are not mutually incompatible at this point, so we should be
able to come up with a unified language that maintains backwards
compatibility. One of the tricky parts is that some of the formatters
have more information than others (for-each-ref has a ref, which may
resolve to any object type; cat-file has objects only; --pretty has only
commits).

This was the subject of last year's Outreachy work. There's still a ways
to go, but you can find some of the previous discussions and work by
searching for Olga's work in the archive:

  https://public-inbox.org/git/?q=olga+telezhnaya

I've also cc'd her here, as she's still been doing some work since then.

-Peff


Re: [RFC v1] Add virtual file system settings and hook proc

2018-10-31 Thread Ben Peart




On 10/30/2018 7:07 PM, Junio C Hamano wrote:

Ben Peart  writes:


diff --git a/config.c b/config.c
index 4051e38823..96e05ee0f1 100644
--- a/config.c
+++ b/config.c
...
@@ -2307,6 +2311,37 @@ int git_config_get_index_threads(void)
return 0; /* auto */
  }
  
+int git_config_get_virtualfilesystem(void)

+{
+   if (git_config_get_pathname("core.virtualfilesystem", 
_virtualfilesystem))
+   core_virtualfilesystem = getenv("GIT_TEST_VIRTUALFILESYSTEM");
+
+   if (core_virtualfilesystem && !*core_virtualfilesystem)
+   core_virtualfilesystem = NULL;
+
+   if (core_virtualfilesystem) {
+   /*
+* Some git commands spawn helpers and redirect the index to a 
different
+* location.  These include "difftool -d" and the sequencer
+* (i.e. `git rebase -i`, `git cherry-pick` and `git revert`) 
and others.
+* In those instances we don't want to update their temporary 
index with
+* our virtualization data.
+*/
+   char *default_index_file = xstrfmt("%s/%s", the_repository->gitdir, 
"index");
+   int should_run_hook = !strcmp(default_index_file, 
the_repository->index_file);
+
+   free(default_index_file);
+   if (should_run_hook) {
+   /* virtual file system relies on the sparse checkout 
logic so force it on */
+   core_apply_sparse_checkout = 1;
+   return 1;
+   }
+   core_virtualfilesystem = NULL;


It would be a small leak if this came from config_get_pathname(),
but if it came from $GIT_TEST_VFS env, we cannot free it X-<.

A helper function called *_get_X() that does not return X as its
return value or updating the location pointed by its *dst parameter,
and instead only stores its finding in a global variable feels
somewhat odd.  It smells more like "find out", "probe", "check",
etc.



I agree.  Frankly, I think it should just return whether it should be 
used or not (bool) and the hook name should be fixed.  I got push back 
when I did that for fsmonitor so I made this the same in an effort to 
head off that same feedback.



diff --git a/dir.c b/dir.c
index 47c2fca8dc..3097b0e446 100644
--- a/dir.c
+++ b/dir.c
@@ -21,6 +21,7 @@
  #include "ewah/ewok.h"
  #include "fsmonitor.h"
  #include "submodule-config.h"
+#include "virtualfilesystem.h"
  
  /*

   * Tells read_directory_recursive how a file or directory should be treated.
@@ -1109,6 +1110,18 @@ int is_excluded_from_list(const char *pathname,
  struct exclude_list *el, struct index_state *istate)
  {
struct exclude *exclude;
+
+   /*
+* The virtual file system data is used to prevent git from traversing
+* any part of the tree that is not in the virtual file system.  Return
+* 1 to exclude the entry if it is not found in the virtual file system,
+* else fall through to the regular excludes logic as it may further 
exclude.
+*/


This comment will sit better immediately in front of the call to "is
excluded from vfs?" helper function.


+   if (*dtype == DT_UNKNOWN)
+   *dtype = get_dtype(NULL, istate, pathname, pathlen);


We try to defer paying cost to determine unknown *dtype as late as
possible by having this call in last_exclude_matching_from_list(),
and not here.  If we are doing this, we probably should update the
callpaths that call last_exclude_matching_from_list() to make the
caller responsible for doing get_dtype() and drop the lazy finding
of dtype from the callee.  Alternatively, the new "is excluded from
vfs" helper can learn to do the lazy get_dtype() just like the
existing last_exclude_matching_from_list() does.  I suspect the
latter may be simpler.


In is_excluded_from_virtualfilesystem() dtype can't be lazy because it 
is always needed (which is why I test and die if it isn't known).  I 
considered doing the test/call to get_dtype() within 
is_excluded_from_virtualfilesystem() but didn't like making it dependent 
on istate just so I could move the get_dtype() call in one level.  It is 
functionally identical so I can easily move it in if that is preferred.





+   if (is_excluded_from_virtualfilesystem(pathname, pathlen, *dtype) > 0)
+   return 1;
+
exclude = last_exclude_matching_from_list(pathname, pathlen, basename,
  dtype, el, istate);
if (exclude)
@@ -1324,8 +1337,20 @@ struct exclude *last_exclude_matching(struct dir_struct 
*dir,
  int is_excluded(struct dir_struct *dir, struct index_state *istate,
const char *pathname, int *dtype_p)
  {
-   struct exclude *exclude =
-   last_exclude_matching(dir, istate, pathname, dtype_p);
+   struct exclude *exclude;
+
+   /*
+* The virtual file system data is used to prevent git from traversing

[PATCH 3/3] tests: optionally skip `git rebase -p` tests

2018-10-31 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

The `--preserve-merges` mode of the `rebase` command is slated to be
deprecated soon, as the more powerful `--rebase-merges` mode is
available now, and the latter was designed with the express intent to
address the shortcomings of `--preserve-merges`' design (e.g. the
inability to reorder commits in an interactive rebase).

As such, we will eventually even remove the `--preserve-merges` support,
and along with it, its tests.

In preparation for this, and also to allow the Windows phase of our
automated tests to save some well-needed time when running the test
suite, this commit introduces a new prerequisite REBASE_P, which can be
forced to being unmet by setting the environment variable
`GIT_TEST_SKIP_REBASE_P` to any non-empty string.

Signed-off-by: Johannes Schindelin 
---
 t/t3404-rebase-interactive.sh |  8 ++---
 t/t3408-rebase-multi-line.sh  |  2 +-
 t/t3409-rebase-preserve-merges.sh |  5 
 t/t3410-rebase-preserve-dropped-merges.sh |  5 
 t/t3411-rebase-preserve-around-merges.sh  |  5 
 t/t3412-rebase-root.sh| 12 
 t/t3414-rebase-preserve-onto.sh   |  5 
 t/t3418-rebase-continue.sh|  4 +--
 t/t3421-rebase-topology-linear.sh | 36 +++
 t/t3425-rebase-topology-merges.sh |  5 
 t/t5520-pull.sh   |  6 ++--
 t/t7505-prepare-commit-msg-hook.sh|  2 +-
 t/t7517-per-repo-email.sh |  6 ++--
 t/test-lib.sh |  4 +++
 14 files changed, 69 insertions(+), 36 deletions(-)

diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index 99d1fb79a8..68ca8dc9bb 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -312,7 +312,7 @@ test_expect_success 'retain authorship when squashing' '
git show HEAD | grep "^Author: Twerp Snog"
 '
 
-test_expect_success '-p handles "no changes" gracefully' '
+test_expect_success REBASE_P '-p handles "no changes" gracefully' '
HEAD=$(git rev-parse HEAD) &&
set_fake_editor &&
git rebase -i -p HEAD^ &&
@@ -322,7 +322,7 @@ test_expect_success '-p handles "no changes" gracefully' '
test $HEAD = $(git rev-parse HEAD)
 '
 
-test_expect_failure 'exchange two commits with -p' '
+test_expect_failure REBASE_P 'exchange two commits with -p' '
git checkout H &&
set_fake_editor &&
FAKE_LINES="2 1" git rebase -i -p HEAD~2 &&
@@ -330,7 +330,7 @@ test_expect_failure 'exchange two commits with -p' '
test G = $(git cat-file commit HEAD | sed -ne \$p)
 '
 
-test_expect_success 'preserve merges with -p' '
+test_expect_success REBASE_P 'preserve merges with -p' '
git checkout -b to-be-preserved master^ &&
: > unrelated-file &&
git add unrelated-file &&
@@ -373,7 +373,7 @@ test_expect_success 'preserve merges with -p' '
test $(git show HEAD:unrelated-file) = 1
 '
 
-test_expect_success 'edit ancestor with -p' '
+test_expect_success REBASE_P 'edit ancestor with -p' '
set_fake_editor &&
FAKE_LINES="1 2 edit 3 4" git rebase -i -p HEAD~3 &&
echo 2 > unrelated-file &&
diff --git a/t/t3408-rebase-multi-line.sh b/t/t3408-rebase-multi-line.sh
index e7292f5b9b..d2bd7c17b0 100755
--- a/t/t3408-rebase-multi-line.sh
+++ b/t/t3408-rebase-multi-line.sh
@@ -52,7 +52,7 @@ test_expect_success rebase '
test_cmp expect actual
 
 '
-test_expect_success rebasep '
+test_expect_success REBASE_P rebasep '
 
git checkout side-merge &&
git rebase -p side &&
diff --git a/t/t3409-rebase-preserve-merges.sh 
b/t/t3409-rebase-preserve-merges.sh
index 8c251c57a6..3b340f1ece 100755
--- a/t/t3409-rebase-preserve-merges.sh
+++ b/t/t3409-rebase-preserve-merges.sh
@@ -8,6 +8,11 @@ Run "git rebase -p" and check that merges are properly carried 
along
 '
 . ./test-lib.sh
 
+if ! test_have_prereq REBASE_P; then
+   skip_all='skipping git rebase -p tests, as asked for'
+   test_done
+fi
+
 GIT_AUTHOR_EMAIL=bogus_email_address
 export GIT_AUTHOR_EMAIL
 
diff --git a/t/t3410-rebase-preserve-dropped-merges.sh 
b/t/t3410-rebase-preserve-dropped-merges.sh
index 6f73b95558..2e29866993 100755
--- a/t/t3410-rebase-preserve-dropped-merges.sh
+++ b/t/t3410-rebase-preserve-dropped-merges.sh
@@ -11,6 +11,11 @@ rewritten.
 '
 . ./test-lib.sh
 
+if ! test_have_prereq REBASE_P; then
+   skip_all='skipping git rebase -p tests, as asked for'
+   test_done
+fi
+
 # set up two branches like this:
 #
 # A - B - C - D - E
diff --git a/t/t3411-rebase-preserve-around-merges.sh 
b/t/t3411-rebase-preserve-around-merges.sh
index dc81bf27eb..fb45e7bf7b 100755
--- a/t/t3411-rebase-preserve-around-merges.sh
+++ b/t/t3411-rebase-preserve-around-merges.sh
@@ -10,6 +10,11 @@ a merge to before the merge.
 '
 . ./test-lib.sh
 
+if ! test_have_prereq REBASE_P; then
+   skip_all='skipping git rebase -p tests, as asked for'
+   

[PATCH 2/3] t3418: decouple test cases from a previous `rebase -p` test case

2018-10-31 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

It is in general a good idea for regression test cases to be as
independent of each other as possible (with the one exception of an
initial `setup` test case, which is only a test case in Git's test suite
because it does not have a notion of a fixture or setup).

This patch addresses one particular instance of this principle being
violated: a few test cases in t3418-rebase-continue.sh depend on a side
effect of a test case that verifies a specific `rebase -p` behavior. The
later test cases should, however, still succeed even if the `rebase -p`
test case is skipped.

Signed-off-by: Johannes Schindelin 
---
 t/t3418-rebase-continue.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t3418-rebase-continue.sh b/t/t3418-rebase-continue.sh
index 25099d715c..031e3d8ddb 100755
--- a/t/t3418-rebase-continue.sh
+++ b/t/t3418-rebase-continue.sh
@@ -177,6 +177,7 @@ test_expect_success 'setup rerere database' '
git checkout master &&
test_commit "commit-new-file-F3" F3 3 &&
test_config rerere.enabled true &&
+   git update-ref refs/heads/topic commit-new-file-F3-on-topic-branch &&
test_must_fail git rebase -m master topic &&
echo "Resolved" >F2 &&
cp F2 expected-F2 &&
-- 
gitgitgadget



[PATCH 1/3] t3404: decouple some test cases from outcomes of previous test cases

2018-10-31 Thread Johannes Schindelin via GitGitGadget
From: Johannes Schindelin 

Originally, the `--preserve-merges` option of the `git rebase` command
piggy-backed on top of the `--interactive` feature. For that reason, the
early test cases were added to the very same test script that contains
the `git rebase -i` tests: `t3404-rebase-interactive.sh`.

However, since c42abfe7857 (rebase: introduce a dedicated backend for
--preserve-merges, 2018-05-28), the `--preserve-merges` feature got its
own backend, in preparation for converting the rest of the
`--interactive` code to built-in code, written in C rather than shell.

The reason why the `--preserve-merges` feature was not converted at the
same time is that we have something much better now: `--rebase-merges`.
That option intends to supersede `--preserve-merges`, and we will
probably deprecate the latter soon.

Once `--preserve-merges` has been deprecated for a good amount of time,
it will be time to remove it, and along with it, its tests.

In preparation for that, let's make the rest of the test cases in
`t3404-rebase-interactive.sh` independent of the test cases dedicated to
`--preserve-merges`.

Signed-off-by: Johannes Schindelin 
---
 t/t3404-rebase-interactive.sh | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index ff89b6341a..99d1fb79a8 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -387,6 +387,7 @@ test_expect_success 'edit ancestor with -p' '
 '
 
 test_expect_success '--continue tries to commit' '
+   git reset --hard D &&
test_tick &&
set_fake_editor &&
test_must_fail git rebase -i --onto new-branch1 HEAD^ &&
@@ -426,7 +427,7 @@ test_expect_success C_LOCALE_OUTPUT 'multi-fixup does not 
fire up editor' '
git rebase -i $base &&
test $base = $(git rev-parse HEAD^) &&
test 0 = $(git show | grep NEVER | wc -l) &&
-   git checkout to-be-rebased &&
+   git checkout @{-1} &&
git branch -D multi-fixup
 '
 
@@ -441,7 +442,7 @@ test_expect_success 'commit message used after conflict' '
git rebase --continue &&
test $base = $(git rev-parse HEAD^) &&
test 1 = $(git show | grep ONCE | wc -l) &&
-   git checkout to-be-rebased &&
+   git checkout @{-1} &&
git branch -D conflict-fixup
 '
 
@@ -456,7 +457,7 @@ test_expect_success 'commit message retained after 
conflict' '
git rebase --continue &&
test $base = $(git rev-parse HEAD^) &&
test 2 = $(git show | grep TWICE | wc -l) &&
-   git checkout to-be-rebased &&
+   git checkout @{-1} &&
git branch -D conflict-squash
 '
 
@@ -481,7 +482,7 @@ test_expect_success C_LOCALE_OUTPUT 'squash and fixup 
generate correct log messa
grep "^# This is a combination of 3 commits\."  &&
git cat-file commit HEAD@{3} |
grep "^# This is a combination of 2 commits\."  &&
-   git checkout to-be-rebased &&
+   git checkout @{-1} &&
git branch -D squash-fixup
 '
 
@@ -494,7 +495,7 @@ test_expect_success C_LOCALE_OUTPUT 'squash ignores 
comments' '
git rebase -i $base &&
test $base = $(git rev-parse HEAD^) &&
test 1 = $(git show | grep ONCE | wc -l) &&
-   git checkout to-be-rebased &&
+   git checkout @{-1} &&
git branch -D skip-comments
 '
 
@@ -507,7 +508,7 @@ test_expect_success C_LOCALE_OUTPUT 'squash ignores blank 
lines' '
git rebase -i $base &&
test $base = $(git rev-parse HEAD^) &&
test 1 = $(git show | grep ONCE | wc -l) &&
-   git checkout to-be-rebased &&
+   git checkout @{-1} &&
git branch -D skip-blank-lines
 '
 
@@ -648,7 +649,7 @@ test_expect_success 'rebase with a file named HEAD in 
worktree' '
) &&
 
set_fake_editor &&
-   FAKE_LINES="1 squash 2" git rebase -i to-be-rebased &&
+   FAKE_LINES="1 squash 2" git rebase -i @{-1} &&
test "$(git show -s --pretty=format:%an)" = "Squashed Away"
 
 '
-- 
gitgitgadget



[PATCH 0/3] tests: allow to skip git rebase -p tests

2018-10-31 Thread Johannes Schindelin via GitGitGadget
The --preserve-merges mode of the git rebase command is on its way out,
being superseded by the --rebase-merges mode. My plan is to contribute
patches to deprecate the former in favor of the latter before long.

In the meantime, it seems a bit pointless to keep running the git rebase -p 
tests, in particular in the Windows phase of the automated testing. In
preparation for skipping those tests, this patch series starts out by
decoupling test cases so that no non-rebase -p ones depend on side effects
of rebase -p ones, and it concludes by a patch that allows skipping the 
rebase -p ones by setting the environment variable GIT_TEST_SKIP_REBASE_P.

In a quick 'n dirty test, skipping the rebase -p tests seems to shave off
about 8 minutes from the 1h20 running time of the test suite on Windows
(without git svn tests, we skip them for a long time already, as they are
really, really slow on Windows).

Johannes Schindelin (3):
  t3404: decouple some test cases from outcomes of previous test cases
  t3418: decouple test cases from a previous `rebase -p` test case
  tests: optionally skip `git rebase -p` tests

 t/t3404-rebase-interactive.sh | 23 ---
 t/t3408-rebase-multi-line.sh  |  2 +-
 t/t3409-rebase-preserve-merges.sh |  5 
 t/t3410-rebase-preserve-dropped-merges.sh |  5 
 t/t3411-rebase-preserve-around-merges.sh  |  5 
 t/t3412-rebase-root.sh| 12 
 t/t3414-rebase-preserve-onto.sh   |  5 
 t/t3418-rebase-continue.sh|  5 ++--
 t/t3421-rebase-topology-linear.sh | 36 +++
 t/t3425-rebase-topology-merges.sh |  5 
 t/t5520-pull.sh   |  6 ++--
 t/t7505-prepare-commit-msg-hook.sh|  2 +-
 t/t7517-per-repo-email.sh |  6 ++--
 t/test-lib.sh |  4 +++
 14 files changed, 78 insertions(+), 43 deletions(-)


base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
Published-As: 
https://github.com/gitgitgadget/git/releases/tags/pr-63%2Fdscho%2Fsplit-out-rebase-p-tests-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-63/dscho/split-out-rebase-p-tests-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/63
-- 
gitgitgadget


Re: Import/Export as a fast way to purge files from Git?

2018-10-31 Thread Lars Schneider



> On Sep 24, 2018, at 7:24 PM, Elijah Newren  wrote:
> 
> On Sun, Sep 23, 2018 at 6:08 AM Lars Schneider  
> wrote:
>> 
>> Hi,
>> 
>> I recently had to purge files from large Git repos (many files, many 
>> commits).
>> The usual recommendation is to use `git filter-branch --index-filter` to 
>> purge
>> files. However, this is *very* slow for large repos (e.g. it takes 45min to
>> remove the `builtin` directory from git core). I realized that I can remove
>> files *way* faster by exporting the repo, removing the file references,
>> and then importing the repo (see Perl script below, it takes ~30sec to remove
>> the `builtin` directory from git core). Do you see any problem with this
>> approach?
> 
> It looks like others have pointed you at other tools, and you're
> already shifting to that route.  But I think it's a useful question to
> answer more generally, so for those that are really curious...
> 
> 
> The basic approach is fine, though if you try to extend it much you
> can run into a few possible edge/corner cases (more on that below).
> I've been using this basic approach for years and even created a
> mini-python library[1] designed specifically to allow people to create
> "fast-filters", used as
>   git fast-export  | your-fast-filter | git fast-import 
> 
> But that library didn't really take off; even I have rarely used it,
> often opting for filter-branch despite its horrible performance or a
> simple fast-export | long-sed-command | fast-import (with some extra
> pre-checking to make sure the sed wouldn't unintentionally munge other
> data).  BFG is great, as long as you're only interested in removing a
> few big items, but otherwise doesn't seem very useful (to be fair,
> it's very upfront about only wanting to solve that problem).
> Recently, due to continuing questions on filter-branch and folks still
> getting confused with it, I looked at existing tools, decided I didn't
> think any quite fit, and started looking into converting
> git_fast_filter into a filter-branch-like tool instead of just a
> libary.  Found some bugs and missing features in fast-export along the
> way (and have some patches I still need to send in).  But I kind of
> got stuck -- if the tool is in python, will that limit adoption too
> much?  It'd be kind of nice to have this tool in core git.  But I kind
> of like leaving open the possibility of using it as a tool _or_ as a
> library, the latter for the special cases where case-specific
> programmatic filtering is needed.  But a developer-convenience library
> makes almost no sense unless in a higher level language, such as
> python.  I'm still trying to make up my mind about what I want (and
> what others might want), and have been kind of blocking on that.  (If
> others have opinions, I'm all ears.)

That library sounds like a very interesting idea. Unfortunately, the 
referenced repo seems not to be available anymore:
git://gitorious.org/git_fast_filter/mainline.git

I very much like Python. However, more recently I started to
write Git tools in Perl as they work out of the box on every
machine with Git installed ... and I think Perl can be quite
readable if no shortcuts are used :-). 


> Anyway, the edge/corner cases you can watch out for:
> 
>  - Signed tags are a problem; you may need to specify
> --signed-tags=strip to fast-export
> 
>  - References to other commits in your commit messages will now be
> incorrect.  I think a good tool should either default to rewriting
> commit ids in commit messages or at least have an option to do so
> (BFG does this; filter-branch doesn't; fast-export format makes it
> really hard for a filter based on it to do so)
> 
>  - If the paths you remove are the only paths modified in a commit,
> the commit can become empty.  If you're only filtering a few paths
> out, this might be nothing more than a minor inconvenience for you.
> However, if you're trying to prune directories (and perhaps several
> toplevel ones), then it can be extremely annoying to have a new
> history with the vast majority of all commits being empty.
> (filter-branch has an option for this; BFG does not; tools based on
> fast-export output can do it with sufficient effort).
> 
>  - If you start pruning empty commits, you have to worry about
> rewriting branches and tags to remaining parents.  This _might_ happen
> for free depending on your history's structure and the fast-export
> stream, but to be correct in general you will have to specify the new
> commit for some branches or tags.
> 
>  - If you start pruning empty commits, you have to decide whether to
> allow pruning of merge commits.  Your first reaction might be to not
> allow it, but if one parent and its entire history are all pruned,
> then transforming the merge commit to a normal commit and then
> considering whether it is empty and allowing it to be pruned is much
> better.
> 
>  - If you start pruning empty commits, you also have to worry about
> history topology changing, beyond 

using --force-with-lease after git rebase

2018-10-31 Thread Alexander Mills
I have been confused about the need for --force-with-lease after rebasing

Imagine I have a feature branch:

git checkout --no-track -b 'feature' 'origin/dev'
git push -u origin feature

I do some work, and then I rebase against origin/dev to keep up to
date with the integration branch.

git fetch origin/dev
git rebase origin/dev

then I try to push to the remote

git push origin feature

but that is rejected, I have to do:

git push --force-with-lease origin feature

why is that? Why do I need to force push my feature branch to the
remote tracking branch after rebasing against the integration branch?

-alex

related question:
https://stackoverflow.com/questions/52823692/git-push-force-with-lease-vs-force/53042745#53042745



-- 
Alexander D. Mills
¡¡¡ New cell phone number: (415)730-1805 !!!
alexander.d.mi...@gmail.com

www.linkedin.com/pub/alexander-mills/b/7a5/418/


Re: [RFC v1] Add virtual file system settings and hook proc

2018-10-31 Thread Duy Nguyen
not really a review, just  a couple quick notes..

On Tue, Oct 30, 2018 at 9:40 PM Ben Peart  wrote:
>
> From: Ben Peart 
>
> On index load, clear/set the skip worktree bits based on the virtual
> file system data. Use virtual file system data to update skip-worktree
> bit in unpack-trees. Use virtual file system data to exclude files and
> folders not explicitly requested.
>
> Signed-off-by: Ben Peart 
> ---
>
> We have taken several steps to make git perform well on very large repos.
> Some of those steps include: improving underlying algorithms, utilizing
> multi-threading where possible, and simplifying the behavior of some commands.
> These changes typically benefit all git repos to varying degrees.  While
> these optimizations all help, they are insufficient to provide adequate
> performance on the very large repos we often work with.
>
> To make git perform well on the very largest repos, we had to make more
> significant changes.  The biggest performance win by far is the work we have
> done to make git operations O(modified) instead of O(size of repo).  This
> takes advantage of the fact that the number of files a developer has modified
> is a tiny fraction of the overall repo size.
>
> We accomplished this by utilizing the existing internal logic for the skip
> worktree bit and excludes to tell git to ignore all files and folders other
> than those that have been modified.  This logic is driven by an external
> process that monitors writes to the repo and communicates the list of files
> and folders with changes to git via the virtual file system hook in this 
> patch.
>
> The external process maintains a list of files and folders that have been
> modified.  When git runs, it requests the list of files and folders that
> have been modified via the virtual file system hook.  Git then sets/clears
> the skip-worktree bit on the cache entries and builds a hashmap of the
> modified files/folders that is used by the excludes logic to avoid scanning
> the entire repo looking for changes and untracked files.
>
> With this system, we have been able to make local git command performance on
> extremely large repos (millions of files, 1/2 million folders) entirely
> manageable (30 second checkout, 3.5 seconds status, 4 second add, 7 second
> commit, etc).
>
> Our desire is to eliminate all custom patches in our fork of git.  To that
> end, I'm submitting this as an RFC to see how much interest there is and how
> much willingness to take this type of change into git.

Most of these paragraphs (perhaps except the last one) should be part
of the commit message. You describe briefly what the patch does but
it's even more important to say why you want to do it.

> +core.virtualFilesystem::
> +   If set, the value of this variable is used as a command which
> +   will identify all files and directories that are present in
> +   the working directory.  Git will only track and update files
> +   listed in the virtual file system.  Using the virtual file system
> +   will supersede the sparse-checkout settings which will be ignored.
> +   See the "virtual file system" section of linkgit:githooks[6].

It sounds like "virtual file system" is just one of the use cases for
this feature, which is more about a dynamic source of sparse-checkout
bits. Perhaps name the config key with something along sparse checkout
instead of naming it after a use case.

This is a hook. I notice we start to avoid adding real hooks and just
add config keys instead. Eventually we should have config-based hooks,
but if we're going to add more like this, I think these should be in a
separate section, hook.virtualFileSystem or something.

I don't think the superseding makes sense. There's no reason this
could not be used in combination with $GIT_DIR/info/sparse-checkout.
If you don't want both, disable the other.

One last note. Since this is related to filesystem. Shouldn't it be
part of fsmonitor (the protocol, not the implementation)? Then
watchman user could use it to.
-- 
Duy


Re: [PATCH v2] sequencer: break out of loop explicitly

2018-10-31 Thread Martin Ågren
On Wed, 31 Oct 2018 at 18:28, Eric Sunshine  wrote:
>
> On Wed, Oct 31, 2018 at 10:54 AM Johannes Schindelin
>  wrote:

> > ACK. Thanks for cleaning up after me,
>
> Looks good to me, as well. Thanks for working on it.

Thanks, both of you.

Martin


Re: Using --word-diff breaks --color-moved

2018-10-31 Thread Stefan Beller
On Wed, Oct 31, 2018 at 12:07 AM james harvey  wrote:

> I think these options can co-exist.  I could be wrong, but I'm betting
> the code for "--color-moved" was only written with the typical full
> line(s) diff in mind, and wasn't written with "--word-diff" in mind.

I think it was brought up, but neglected at the time.


Re: Using --word-diff breaks --color-moved

2018-10-31 Thread Stefan Beller
On Tue, Oct 30, 2018 at 7:06 PM james harvey  wrote:
>
> If you use both "--word-diff" and "--color-moved", regardless of the
> order of arguments, "--word-diff" takes precedence and "--color-moved"
> isn't allowed to do anything.

The order of arguments doesn't matter here, as these just set internal
flags at parse time, which determine what later stages do.

Git uses the xdiff library internally for producing diffs[1].
To produce a diff, we have to feed two "streams of symbols"
to the library which then figures out the diff.
Usually a symbol is a whole line. Once we have the diff
we need to make it look nice again (i.e. put file names,
context markers and lines around the diff), which happens
in diff.c.

But when --word-diff is given, each line is broken up
into words and those are used as symbols for the finding
the diff[2]. See the function fn_out_consume() [3],
for example 'ecbdata->diff_words' is set on '--word-diff'.

When it is not set we fall down to the switch case that
will call emit_{add, del, context}_line(), which in turn
emits the lines.
The --color-moved step is performed after all diffing
(and nicing up) is done already and solely works on
the add/del lines. The word diff is piecing together lines
for output, which are completely ignored for move
detection.

[1] see the xdiff/ dir in your copy of git. We have some
substantial changes compared to unmaintained upstream
http://www.xmailserver.org/xdiff-lib.html
http://www.xmailserver.org/xdiff.html

[2] https://github.com/git/git/blob/master/diff.c#L1872

[3] https://github.com/git/git/blob/master/diff.c#L2259

> I think "--color-moved" should have precedence over "--word-diff".

I agree for precedence as in "work well together". Now we'd need
to figure out what that means. In its current form, the move
detection can detect moved lines across diff hunks or file
boundaries.

Should that also be the case for word diffing?
I think word diffing is mostly used for free text, which has different
properties compared to code, that the color-moved was originally
intended for.

For example in code we often have few characters on a line
such as " }" which is found often in gits code base.
We added some heuristics that lines showing up often with
few characters would not be detected on their own as a moved
block [4]. I would expect we'd have to figure out a similar heuristic
for word diffing, if we go down that route.

But that is a detail; we'd first have to figure out how to make the
words work with the move detection.

[4] https://github.com/git/git/commit/f0b8fb6e591b50b72b921f2c4cf120ebd284f510


>   I
> cannot think of a scenario where a user would supply both options, and
> actually want "--word-diff" to take precedence.  If I'm not thinking
> of a scenario where this wouldn't be desired, perhaps whichever is
> first as an argument could take precedence.

word diffing and move detection are completely orthogonal at the moment.
Instead of option order, I'd rather introduce a new option that tells us
how to resolve some corner case. Or in the short term we might just
want to raise an error?

> (The same behavior happens if 4+ lines are moved and
> "--color-moved{default=zebra}" is used, but below
> "--color-moved=plain" is used to be a smaller testcase.)
>
> [...]

This sounds like you are asking for two things:
(1) make color-moved work with words (somehow)
(2) allow the user to fine tune the heuristics for a block,
such that default=zebra would still work.


Re: [PATCH 3/3] cat-file: handle streaming failures consistently

2018-10-31 Thread Eric Sunshine
On Tue, Oct 30, 2018 at 07:23:38PM -0400, Jeff King wrote:
> There are three ways to convince cat-file to stream a blob:
> 
>   - cat-file -p $blob
> 
>   - cat-file blob $blob
> 
>   - echo $batch | cat-file --batch
> 
> In the first two, we simply exit with the error code of
> streaw_blob_to_fd(). That means that an error will cause us

Your "m" got confused and ended up upside-down.

> to exit with "-1" (which we try to avoid) without printing
> any kind of error message (which is confusing to the user).
> 
> Instead, let's match the third case, which calls die() on an
> error. Unfortunately we cannot be more specific, as
> stream_blob_to_fd() does not tell us whether the problem was
> on reading (e.g., a corrupt object) or on writing (e.g.,
> ENOSPC). That might be an opportunity for future work, but
> for now we will at least exit with a sane message and exit
> code.
> 
> Signed-off-by: Jeff King 


Re: [PATCH v2] sequencer: break out of loop explicitly

2018-10-31 Thread Eric Sunshine
On Wed, Oct 31, 2018 at 10:54 AM Johannes Schindelin
 wrote:
> On Tue, 30 Oct 2018, Martin Ågren wrote:
> > Rewrite the loop to a more idiomatic variant which doesn't muck with
> > `len` in the loop body. That should help compilers and human readers
> > figure out what is going on here. But do note that we need to update
> > `len` since it is not only used just after this loop (where we could
> > have used `i` directly), but also later in this function.
> >
> > Signed-off-by: Martin Ågren 
> > ---
>
> ACK. Thanks for cleaning up after me,

Looks good to me, as well. Thanks for working on it.


[BUG?] protocol.version=2 sends HTTP "Expect" headers

2018-10-31 Thread Jeff King
Since 959dfcf42f (smart-http: Really never use Expect: 100-continue,
2011-03-14), we try to avoid sending "Expect" headers, since some
proxies apparently don't handle them well. There we have to explicitly
tell curl not to use them.

The exception is large requests with GSSAPI, as explained in c80d96ca0c
(remote-curl: fix large pushes with GSSAPI, 2013-10-31).

However, Jon Simons noticed that when using protocol.version=2, we've
started sending Expect headers again. That's because rather than going
through post_rpc(), we push the stateless data through a proxy_state
struct. And in proxy_state_init(), when we set up the headers, we do not
disable curl's Expect handling.

So a few questions:

  - is this a bug or not? I.e., do we still need to care about proxies
that can't handle Expect? The original commit was from 2011. Maybe
things are better now. Or maybe that's blind optimism.

Nobody has complained yet, but that's probably just because v2 isn't
widely deployed yet.

  - if it is a bug, how can we handle it like the v0 code? There we
enable it only for GSSAPI on large requests. But I'm not sure we can
know here whether the request is large, since we're inherently just
streaming through chunked data. It looks like post_rpc tries to read
a single packet first, and considers anything over 64k to be large.

  - alternatively, should we just leave it on for v2, and provide a
config switch to disable it if you have a crappy proxy? I don't know
how widespread the problem is, but I can imagine that the issue is
subtle enough that most users wouldn't even know.

I think I've convinced myself that we probably do need to do the "peek
at the first packet" thing like post_rpc() does, but I think it might be
tricky with the way the proxy_state code is structured.

Thoughts from people with more HTTP knowledge/experience?

-Peff


Re: Lost changes after merge

2018-10-31 Thread Brandon McCaig
Gray:

On Tue, Oct 30, 2018 at 03:46:28AM +0100, Gray King wrote:
> * Before merge run `git log --format="%h %p %d" -n 20 --all --graph`:
> 
> https://cfp.vim-cn.com/cbfq6
> 
> * After merged run `git log --format="%h %p %d" -n 20 --all --graph`:
> 
> https://cfp.vim-cn.com/cbfq7

I also cannot see anything wrong here. The two commits you
mentioned earlier are not in either paste, as was already stated.
It might help us understand if you are more explicit about what
you think is wrong. Maybe verify that your pastes have captured
all of the necessary information to understand the problem.

The only other thing that I note is that 6d6ed669d1 and
3c792ffaf0 appear to be merging the same two parents, so
logically they should be identical (unless merge conflicts were
resolved in different ways). Is that related to the issue? Was
one of the merges done incorrectly, and you're trying to fix it?
Or are you just confused about how you got into this state in the
first place?

Regards,


-- 
Brandon McCaig  
Castopulence Software 
Blog 
perl -E '$_=q{V zrna gur orfg jvgu jung V fnl. }.
q{Vg qbrfa'\''g nyjnlf fbhaq gung jnl.};
tr/A-Ma-mN-Zn-z/N-Zn-zA-Ma-m/;say'



signature.asc
Description: PGP signature


Re: [PATCH v3 8/8] merge-recursive: improve rename/rename(1to2)/add[/add] handling

2018-10-31 Thread Derrick Stolee

On 10/19/2018 3:31 PM, Elijah Newren wrote:

[snip]

+   char *new_path = NULL;
+   if (dir_in_way(b->path, !o->call_depth, 0)) {
+   new_path = unique_path(o, b->path, ci->branch2);
+   output(o, 1, _("%s is a directory in %s adding "
+  "as %s instead"),
+  b->path, ci->branch1, new_path);


I tried really hard, but failed to get a test to cover the block below. 
I was able to
find that the "check handling of differently renamed file with D/F 
conflicts" test
in t6022-merge-rename.sh covers the block above. Trying to tweak the 
example using

untracked files seems to hit an error message from unpack-trees.c instead.


+   } else if (would_lose_untracked(b->path)) {
+   new_path = unique_path(o, b->path, ci->branch2);
+   output(o, 1, _("Refusing to lose untracked file"
+  " at %s; adding as %s instead"),
+  b->path, new_path);


It could also be that I failed because I'm less familiar with this part 
of the

codebase. Elijah, do you think it is possible to hit this block?

Thanks,
-Stolee


Re: [PATCH v2] sequencer: break out of loop explicitly

2018-10-31 Thread Johannes Schindelin
Hi Martin,

On Tue, 30 Oct 2018, Martin Ågren wrote:

> It came up in review [1, 2] that this non-idiomatic loop is a bit tricky.
> When we find a space, we set `len = i`, which gives us the answer we are
> looking for, but which also breaks out of the loop.
> 
> It turns out that this loop can confuse compilers as well. My copy of
> gcc 7.3.0 realizes that we are essentially evaluating `(len + 1) < len`
> and warns that the behavior is undefined if `len` is `INT_MAX`. (Because
> the assignment `len = i` is guaranteed to decrease `len`, such undefined
> behavior is not actually possible.)
> 
> Rewrite the loop to a more idiomatic variant which doesn't muck with
> `len` in the loop body. That should help compilers and human readers
> figure out what is going on here. But do note that we need to update
> `len` since it is not only used just after this loop (where we could
> have used `i` directly), but also later in this function.
> 
> While at it, reduce the scope of `i`.
> 
> [1] 
> https://public-inbox.org/git/CAPig+cQbG2s-LrAo9+7C7=dXifbWFJ3SzuNa-QePHDk7egK=j...@mail.gmail.com/
> 
> [2] 
> https://public-inbox.org/git/capig+crju6nixpt2frdwz0x1hmgf1ojvzj3uk2qxege-s7i...@mail.gmail.com/
> 
> Helped-by: Eric Sunshine 
> Signed-off-by: Martin Ågren 
> ---

ACK. Thanks for cleaning up after me,
Dscho

>  Thanks for the comments on v1. Based on them, I decided to go for
>  Eric's option 2, and to make the log message less technical in favor of
>  "compilers and humans alike can get this wrong".
> 
>  sequencer.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
> 
> diff --git a/sequencer.c b/sequencer.c
> index 0c164d5f98..e7aa4d5020 100644
> --- a/sequencer.c
> +++ b/sequencer.c
> @@ -2829,7 +2829,7 @@ static int do_reset(const char *name, int len, struct 
> replay_opts *opts)
>   struct tree_desc desc;
>   struct tree *tree;
>   struct unpack_trees_options unpack_tree_opts;
> - int ret = 0, i;
> + int ret = 0;
>  
>   if (hold_locked_index(, LOCK_REPORT_ON_ERROR) < 0)
>   return -1;
> @@ -2849,10 +2849,13 @@ static int do_reset(const char *name, int len, struct 
> replay_opts *opts)
>   }
>   oidcpy(, >squash_onto);
>   } else {
> + int i;
> +
>   /* Determine the length of the label */
>   for (i = 0; i < len; i++)
>   if (isspace(name[i]))
> - len = i;
> + break;
> + len = i;
>  
>   strbuf_addf(_name, "refs/rewritten/%.*s", len, name);
>   if (get_oid(ref_name.buf, ) &&
> -- 
> 2.19.1.593.gc670b1f876.dirty
> 
> 

Re: Re*: [PATCH v3] fetch: replace string-list used as a look-up table with a hashmap

2018-10-31 Thread Johannes Schindelin
Hi Junio,

On Sat, 27 Oct 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > Just one thing^W^Wa couple of things:
> >
> > It would probably make more sense to `hashmap_get_from_hash()` and
> > `strhash()` here (and `strhash()` should probably be used everywhere
> > instead of `memhash(str, strlen(str))`).
> 
> hashmap_get_from_hash() certainly is much better suited for simpler
> usage pattern like these callsites, and the ones in sequencer.c.  It
> is a shame that a more complex variant takes the shorter-and-sweeter
> name hashmap_get().

I agree, at least in part.

>From what I understand, hashmap_get_from_hash() needs a little assistance
from the comparison function with which the hashmap is configured, see
e.g. this function in the sequencer:

static int labels_cmp(const void *fndata, const struct labels_entry *a,
  const struct labels_entry *b, const void *key)
{
return key ? strcmp(a->label, key) : strcmp(a->label, b->label);
}

See how that first tests whether `key` is non-`NULL`, and then takes a
shortcut, not even looking at `b`? This is important, because `b` does not
refer to a complete `labels_entry` when we call `hashmap_get_from_hash()`.
It only refers to a `hashmap_entry`. Looking at `b->label` would access
some random memory, and do most certainly the wrong thing.

> I wish we named the latter hashmap_get_fullblown_feature_rich() and
> called the _from_hash() thing a simple hashmap_get() from day one,
> but it is way too late.
> 
> I looked briefly the users of the _get() variant, and some of their
> uses are legitimately not-simple and cannot be reduced to use the
> simpler _get_from_hash variant, it seems.  But others like those in
> builtin/difftool.c should be straight-forward to convert to use the
> simpler get_from_hash variant.  It could be a low-hanging fruit left
> for later clean-up, perhaps.

Right. #leftoverbits

> >> @@ -271,10 +319,10 @@ static void find_non_local_tags(const struct ref 
> >> *refs,
> >>!has_object_file_with_flags(>old_oid,
> >>OBJECT_INFO_QUICK) &&
> >>!will_fetch(head, ref->old_oid.hash) &&
> >> -  !has_sha1_file_with_flags(item->util,
> >> +  !has_sha1_file_with_flags(item->oid.hash,
> >
> > I am not sure that we need to test for null OIDs here, given that...
> > ...
> > Of course, `has_sha1_file_with_flags()` is supposed to return `false` for
> > null OIDs, I guess.
> 
> Yup.  An alternative is to make item->oid a pointer to oid, not an
> oid object itself, so that we can express "no OID for this ref" in a
> more explicit way, but is_null_oid() is already used as "no OID" in
> many other codepaths, so...

Right, and it would complicate the code. So I am fine with your version of
it.

> >> +  for_each_string_list_item(remote_ref_item, _refs_list) {
> >> +  const char *refname = remote_ref_item->string;
> >> +  struct hashmap_entry key;
> >> +
> >> +  hashmap_entry_init(, memhash(refname, strlen(refname)));
> >> +  item = hashmap_get(_refs, , refname);
> >> +  if (!item)
> >> +  continue; /* can this happen??? */
> >
> > This would indicate a BUG, no?
> 
> Possibly.  Alternatively, we can just use item without checking and
> let the runtime segfault.

Hahaha! Yep. We could also cause a crash. I do prefer the BUG() call.

> Here is an incremental on top that can be squashed in to turn v3
> into v4.

Nice.

Thanks!
Dscho

> 
> diff --git a/builtin/fetch.c b/builtin/fetch.c
> index 0f8e333022..aee1d9bf21 100644
> --- a/builtin/fetch.c
> +++ b/builtin/fetch.c
> @@ -259,7 +259,7 @@ static struct refname_hash_entry *refname_hash_add(struct 
> hashmap *map,
>   size_t len = strlen(refname);
>  
>   FLEX_ALLOC_MEM(ent, refname, refname, len);
> - hashmap_entry_init(ent, memhash(refname, len));
> + hashmap_entry_init(ent, strhash(refname));
>   oidcpy(>oid, oid);
>   hashmap_add(map, ent);
>   return ent;
> @@ -282,11 +282,7 @@ static void refname_hash_init(struct hashmap *map)
>  
>  static int refname_hash_exists(struct hashmap *map, const char *refname)
>  {
> - struct hashmap_entry key;
> - size_t len = strlen(refname);
> - hashmap_entry_init(, memhash(refname, len));
> -
> - return !!hashmap_get(map, , refname);
> + return !!hashmap_get_from_hash(map, strhash(refname), refname);
>  }
>  
>  static void find_non_local_tags(const struct ref *refs,
> @@ -365,12 +361,10 @@ static void find_non_local_tags(const struct ref *refs,
>*/
>   for_each_string_list_item(remote_ref_item, _refs_list) {
>   const char *refname = remote_ref_item->string;
> - struct hashmap_entry key;
>  
> - hashmap_entry_init(, memhash(refname, strlen(refname)));
> - item = hashmap_get(_refs, , refname);
> +   

Re: [PATCH 3/3] cat-file: handle streaming failures consistently

2018-10-31 Thread Jeff King
On Wed, Oct 31, 2018 at 11:23:48PM +0900, Junio C Hamano wrote:

> Torsten Bögershausen  writes:
> 
> >> +static int stream_blob(const struct object_id *oid)
> >
> > Sorry for nit-picking:
> > could this be renamed into stream_blob_to_stdout() ?
> 
> I think that name makes sense, even though stream_blob() is just
> fine for a fuction that takes a single parameter oid, as there is no
> other sane choice than streaming to the standard output stream the
> blob data.

I was trying to keep the name small since it is a static-local
convenience helper. I'd rather write it as:

  stream_blob(1, oid);

than change the name. ;)

> >> +{
> >> +  if (stream_blob_to_fd(1, oid, NULL, 0))
> >
> > And I wonder if we could make things clearer:
> >  s/1/STDOUT_FILENO/
> 
> What would benefit from symbolic constant more in that function call
> may be CAN_SEEK thing, but s/1/STDOUT_FILENO/ adds negative value to
> that line, I would think.  The name of the function already makes it
> clear this is sending the output to a file descriptor, and an
> integer 1 that specifies a file descriptor cannot mean anything
> other than the standard output stream.

Yes, I'd agree (there are very few cases where I think STDOUT_FILENO
actually increases the readability, since it is usually pretty clear
from the context when something is a descriptor).

-Peff


Re: [PATCH 3/3] cat-file: handle streaming failures consistently

2018-10-31 Thread Junio C Hamano
Torsten Bögershausen  writes:

>> +static int stream_blob(const struct object_id *oid)
>
> Sorry for nit-picking:
> could this be renamed into stream_blob_to_stdout() ?

I think that name makes sense, even though stream_blob() is just
fine for a fuction that takes a single parameter oid, as there is no
other sane choice than streaming to the standard output stream the
blob data.

>> +{
>> +if (stream_blob_to_fd(1, oid, NULL, 0))
>
> And I wonder if we could make things clearer:
>  s/1/STDOUT_FILENO/

What would benefit from symbolic constant more in that function call
may be CAN_SEEK thing, but s/1/STDOUT_FILENO/ adds negative value to
that line, I would think.  The name of the function already makes it
clear this is sending the output to a file descriptor, and an
integer 1 that specifies a file descriptor cannot mean anything
other than the standard output stream.


Re: [PATCH v3 2/8] t6036, t6042: testcases for rename collision of already conflicting files

2018-10-31 Thread Derrick Stolee

On 10/19/2018 3:31 PM, Elijah Newren wrote:

+test_expect_success "setup nested conflicts" '


nit: should these test names be single-quoted? I see you using double-quotes
in PATCH 1/8 as well, but that seems to be because there are variables in
the test names.


...

+test_expect_failure "check nested conflicts" '


Same here.


+test_expect_success "setup nested conflicts from rename/rename(2to1)" '



+test_expect_failure "check nested conflicts from rename/rename(2to1)" '


Thanks,
-Stolee


Re: [PATCH v3 4/8] merge-recursive: new function for better colliding conflict resolutions

2018-10-31 Thread Derrick Stolee

On 10/31/2018 9:53 AM, Derrick Stolee wrote:

On 10/19/2018 3:31 PM, Elijah Newren wrote:
+#if 0 // #if-0-ing avoids unused function warning; will make live in 
next commit

+static int handle_file_collision(struct merge_options *o,
+ const char *collide_path,
+ const char *prev_path1,
+ const char *prev_path2,
+ const char *branch1, const char *branch2,
+ const struct object_id *a_oid,
+ unsigned int a_mode,
+ const struct object_id *b_oid,
+ unsigned int b_mode)
+{
+    struct merge_file_info mfi;
+    struct diff_filespec null, a, b;
+    char *alt_path = NULL;
+    const char *update_path = collide_path;
+
+    /*
+ * In the recursive case, we just opt to undo renames
+ */
+    if (o->call_depth && (prev_path1 || prev_path2)) {
+    /* Put first file (a_oid, a_mode) in its original spot */
+    if (prev_path1) {
+    if (update_file(o, 1, a_oid, a_mode, prev_path1))
+    return -1;
+    } else {
+    if (update_file(o, 1, a_oid, a_mode, collide_path))


The latest test coverage report [1] shows this if statement is never 
run, so

it appears that every call to this method in the test suite has either
o->call_depth positive, prev_path1 non-NULL, or both prev_path1 and 
prev_path2

NULL.

Is there a way we can add a test case that calls this method with 
o->call_depth

positive, prev_path1 NULL, and prev_path2 non-NULL?


+    return -1;
+    }
+
+    /* Put second file (b_oid, b_mode) in its original spot */
+    if (prev_path2) {
+    if (update_file(o, 1, b_oid, b_mode, prev_path2))


Since this line is covered, we _do_ call the method with prev_path2 
non-NULL, but

prev_path1 must be non-NULL in all cases.

I may have found a reason why this doesn't happen in one of the 
callers you introduced.

I'm going to comment on PATCH 8/8 to see if that is the case.


Nevermind on the PATCH 8/8 situation. I thought I saw you pass (a->path, 
NULL) and
(b->path, NULL) into the (prev_path1, prev_path2) pairs, but in each 
case the non-NULL

parameter is actually for 'collide_path'.

It is still interesting if we can hit this case. Perhaps we need a 
different kind of
conflict, like (rename, delete) [but I struggle to make sense of how to 
do that].


Thanks,
-Stolee


Re: [PATCH v3 4/8] merge-recursive: new function for better colliding conflict resolutions

2018-10-31 Thread Derrick Stolee

On 10/19/2018 3:31 PM, Elijah Newren wrote:

+#if 0 // #if-0-ing avoids unused function warning; will make live in next 
commit
+static int handle_file_collision(struct merge_options *o,
+const char *collide_path,
+const char *prev_path1,
+const char *prev_path2,
+const char *branch1, const char *branch2,
+const struct object_id *a_oid,
+unsigned int a_mode,
+const struct object_id *b_oid,
+unsigned int b_mode)
+{
+   struct merge_file_info mfi;
+   struct diff_filespec null, a, b;
+   char *alt_path = NULL;
+   const char *update_path = collide_path;
+
+   /*
+* In the recursive case, we just opt to undo renames
+*/
+   if (o->call_depth && (prev_path1 || prev_path2)) {
+   /* Put first file (a_oid, a_mode) in its original spot */
+   if (prev_path1) {
+   if (update_file(o, 1, a_oid, a_mode, prev_path1))
+   return -1;
+   } else {
+   if (update_file(o, 1, a_oid, a_mode, collide_path))


The latest test coverage report [1] shows this if statement is never run, so
it appears that every call to this method in the test suite has either
o->call_depth positive, prev_path1 non-NULL, or both prev_path1 and 
prev_path2

NULL.

Is there a way we can add a test case that calls this method with 
o->call_depth

positive, prev_path1 NULL, and prev_path2 non-NULL?


+   return -1;
+   }
+
+   /* Put second file (b_oid, b_mode) in its original spot */
+   if (prev_path2) {
+   if (update_file(o, 1, b_oid, b_mode, prev_path2))


Since this line is covered, we _do_ call the method with prev_path2 
non-NULL, but

prev_path1 must be non-NULL in all cases.

I may have found a reason why this doesn't happen in one of the callers 
you introduced.

I'm going to comment on PATCH 8/8 to see if that is the case.

Thanks,
-Stolee

[1] 
https://public-inbox.org/git/62f0bcf6-aa73-c192-d804-e6d69cac1...@gmail.com/


Re: [PATCH 18/19] submodule: use submodule repos for object lookup

2018-10-31 Thread Derrick Stolee

On 10/16/2018 7:35 PM, Stefan Beller wrote:

@@ -482,14 +483,46 @@ void prepare_submodule_repo_env(struct argv_array *out)
 DEFAULT_GIT_DIR_ENVIRONMENT);
  }
  
-/* Helper function to display the submodule header line prior to the full

- * summary output. If it can locate the submodule objects directory it will
- * attempt to lookup both the left and right commits and put them into the
- * left and right pointers.
+/*
+ * Initialize 'out' based on the provided submodule path.
+ *
+ * Unlike repo_submodule_init, this tolerates submodules not present
+ * in .gitmodules. This function exists only to preserve historical behavior,
+ *
+ * Returns 0 on success, -1 when the submodule is not present.
+ */
+static int open_submodule(struct repository *out, const char *path)
+{
+   struct strbuf sb = STRBUF_INIT;
+
+   if (submodule_to_gitdir(, path) || repo_init(out, sb.buf, NULL)) {
+   strbuf_release();
+   return -1;
+   }
+
+   out->submodule_prefix = xstrdup(path);
+   out->submodule_prefix = xstrfmt("%s%s/",
+   the_repository->submodule_prefix ?
+   the_repository->submodule_prefix :
+   "", path);
+
+   strbuf_release();
+   return 0;
+}


Based on the recent test coverage report [1], this xstrfmt() call is never
run witha non-null the_repository->submodule_prefix. Is there a way we can
exercise that branch?

Thanks,
-Stolee

[1] 
https://public-inbox.org/git/62f0bcf6-aa73-c192-d804-e6d69cac1...@gmail.com/


Re: [PATCH 3/3] cat-file: handle streaming failures consistently

2018-10-31 Thread Torsten Bögershausen
On Tue, Oct 30, 2018 at 07:23:38PM -0400, Jeff King wrote:
> There are three ways to convince cat-file to stream a blob:
> 
>   - cat-file -p $blob
> 
>   - cat-file blob $blob
> 
>   - echo $batch | cat-file --batch
> 
> In the first two, we simply exit with the error code of
> streaw_blob_to_fd(). That means that an error will cause us
> to exit with "-1" (which we try to avoid) without printing
> any kind of error message (which is confusing to the user).
> 
> Instead, let's match the third case, which calls die() on an
> error. Unfortunately we cannot be more specific, as
> stream_blob_to_fd() does not tell us whether the problem was
> on reading (e.g., a corrupt object) or on writing (e.g.,
> ENOSPC). That might be an opportunity for future work, but
> for now we will at least exit with a sane message and exit
> code.
> 
> Signed-off-by: Jeff King 
> ---
>  builtin/cat-file.c | 16 
>  1 file changed, 12 insertions(+), 4 deletions(-)
> 
> diff --git a/builtin/cat-file.c b/builtin/cat-file.c
> index 8d97c84725..0d403eb77d 100644
> --- a/builtin/cat-file.c
> +++ b/builtin/cat-file.c
> @@ -50,6 +50,13 @@ static int filter_object(const char *path, unsigned mode,
>   return 0;
>  }
>  
> +static int stream_blob(const struct object_id *oid)

Sorry for nit-picking:
could this be renamed into stream_blob_to_stdout() ?

> +{
> + if (stream_blob_to_fd(1, oid, NULL, 0))

And I wonder if we could make things clearer:
 s/1/STDOUT_FILENO/
 
 (Stolen from fast-import.c)

> + die("unable to stream %s to stdout", oid_to_hex(oid));
> + return 0;
> +}
> +
[]


Re: [RFC] Generation Number v2

2018-10-31 Thread Derrick Stolee

On 10/31/2018 8:54 AM, Ævar Arnfjörð Bjarmason wrote:

On Tue, Oct 30 2018, Junio C Hamano wrote:


Derrick Stolee  writes:

In contrast, maximum generation numbers and corrected commit
dates both performed quite well. They are frequently the top
two performing indexes, and rarely significantly different.

The trade-off here now seems to be: which _property_ is more important,
locally-computable or backwards-compatible?

Nice summary.

As I already said, I personally do not think being compatible with
currently deployed clients is important at all (primarily because I
still consider the whole thing experimental), and there is a clear
way forward once we correct the mistake of not having a version
number in the file format that tells the updated clients to ignore
the generation numbers.  For longer term viability, we should pick
something that is immutable, reproducible, computable with minimum
input---all of which would lead to being incrementally computable, I
would think.

I think it depends on what we mean by backwards compatibility. None of
our docs are saying this is experimental right now, just that it's
opt-in like so many other git-config(1) options.

So if we mean breaking backwards compatibility in that we'll write a new
file or clobber the existing one with a version older clients can't use
as an optimization, fine.

But it would be bad to produce a hard error on older clients, but
avoiding that seems as easy as just creating a "commit-graph2" file in
.git/objects/info/.


Well, we have a 1-byte version number following the "CGPH" header in
the commit-graph file, and clients will ignore the commit-graph file
if that number is not "1". My hope for backwards-compatibility was
to avoid incrementing this value and instead use the unused 8th byte.

However, it appears that we are destined to increment that version
number, anyway. Here is my list for what needs to be in the next
version of the commit-graph file format:

1. A four-byte hash version.

2. File incrementality (split commit-graph).

3. Reachability Index versioning

Most of these changes will happen in the file header. The chunks
themselves don't need to change, but some chunks may be added that
only make sense in v2 commit-graphs.

Thanks,
-Stolee


Re: [RFC] Generation Number v2

2018-10-31 Thread Ævar Arnfjörð Bjarmason


On Tue, Oct 30 2018, Junio C Hamano wrote:

> Derrick Stolee  writes:
>> In contrast, maximum generation numbers and corrected commit
>> dates both performed quite well. They are frequently the top
>> two performing indexes, and rarely significantly different.
>>
>> The trade-off here now seems to be: which _property_ is more important,
>> locally-computable or backwards-compatible?
>
> Nice summary.
>
> As I already said, I personally do not think being compatible with
> currently deployed clients is important at all (primarily because I
> still consider the whole thing experimental), and there is a clear
> way forward once we correct the mistake of not having a version
> number in the file format that tells the updated clients to ignore
> the generation numbers.  For longer term viability, we should pick
> something that is immutable, reproducible, computable with minimum
> input---all of which would lead to being incrementally computable, I
> would think.

I think it depends on what we mean by backwards compatibility. None of
our docs are saying this is experimental right now, just that it's
opt-in like so many other git-config(1) options.

So if we mean breaking backwards compatibility in that we'll write a new
file or clobber the existing one with a version older clients can't use
as an optimization, fine.

But it would be bad to produce a hard error on older clients, but
avoiding that seems as easy as just creating a "commit-graph2" file in
.git/objects/info/.


[PATCH 3/3] tests: add a special test setup that runs "git fsck" before exiting

2018-10-31 Thread Ævar Arnfjörð Bjarmason
Add the ability to run the tests with GIT_TEST_FSCK=true in the
environment. If set we'll run "git fsck" at the end of every test, and
those tests that fail need to annotate what their failure was.

The goal is to detect regressions in fsck that our tests might
otherwise miss. We had one such regression in c68b489e56 ("fsck: parse
loose object paths directly", 2017-01-13) released with Git 2.12.0,
which wasn't spotted more than a year and a half later during the
2.20.0 window.

As it turns out there already was a test for what triggerd that bug
all along in the form of t5000-tar-tree.sh, we just weren't running
"git fsck" at the end[1].

That specific bug has been fixed in ("check_stream_sha1(): handle
input underflow", 2018-10-30)[1], but since we have a demonstrable
history of not anticipating which tests which would make "git fsck"
fail need to be made part of the "git fsck" test suite let's add this
test mode to cover potential blind spots. The "git fsck" command is
also something where we might expect that during our RC windows users
aren't actively testing on already corrupt repositories, so "in the
wild" test coverage will be spotty, so we need all the help we can
get.

1. https://public-inbox.org/git/878t2fkxrn@evledraar.gmail.com/
2. https://public-inbox.org/git/20181030232312.gb32...@sigill.intra.peff.net/

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/README|  5 +
 t/t-basic.sh| 26 ++
 t/test-lib-functions.sh |  2 ++
 t/test-lib.sh   | 33 +
 4 files changed, 66 insertions(+)

diff --git a/t/README b/t/README
index 8847489640..092f78b3d7 100644
--- a/t/README
+++ b/t/README
@@ -343,6 +343,11 @@ of the index for the whole test suite by bypassing the 
default number of
 cache entries and thread minimums. Setting this to 1 will make the
 index loading single threaded.
 
+GIT_TEST_FSCK= if true arranges for "git fsck" to be run at
+the end of the test scripts. Those tests that fail will need to set a
+"GIT_TEST_FSCK_TESTS" variable before we enter "test_done" with a test
+fragment to test that fsck.{out,err} is the expected failure.
+
 Naming Tests
 
 
diff --git a/t/t-basic.sh b/t/t-basic.sh
index 4d23373526..8e667e6691 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -19,6 +19,7 @@ modification *should* take notice and update the test vectors 
here.
 '
 
 . ./test-lib.sh
+unset GIT_TEST_FSCK
 
 try_local_x () {
local x="local" &&
@@ -393,6 +394,31 @@ test_expect_success 'GIT_SKIP_TESTS sh pattern' "
)
 "
 
+test_expect_success 'GIT_TEST_FSCK=true' "
+   test_when_finished 'sane_unset GIT_TEST_FSCK' &&
+   GIT_TEST_FSCK=true &&
+   export GIT_TEST_FSCK &&
+   run_sub_test_lib_test run-git-fsck-test \
+   '--run basic' --run='1 3 5' <<-\\EOF &&
+   for i in 1 2 3 4 5 6
+   do
+   test_expect_success \"passing test #\$i\" 'true'
+   done
+   GIT_TEST_FSCK=true test_done
+   EOF
+   check_sub_test_lib_test run-git-fsck-test <<-\\EOF
+   > ok 1 - passing test #1
+   > ok 2 # skip passing test #2 (--run)
+   > ok 3 - passing test #3
+   > ok 4 # skip passing test #4 (--run)
+   > ok 5 - passing test #5
+   > ok 6 # skip passing test #6 (--run)
+   > ok 7 # skip git fsck at end (due to GIT_TEST_FSCK) (expected to 
succeed) (--run)
+   > # passed all 7 test(s)
+   > 1..7
+   EOF
+"
+
 test_expect_success '--run basic' "
run_sub_test_lib_test run-basic \
'--run basic' --run='1 3 5' <<-\\EOF &&
diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 78d8c3783b..7d002ff5aa 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -470,6 +470,7 @@ test_expect_success () {
 # Usage: test_external description command arguments...
 # Example: test_external 'Perl API' perl ../path/to/test.pl
 test_external () {
+   unset GIT_TEST_FSCK
test "$#" = 4 && { test_prereq=$1; shift; } || test_prereq=
test "$#" = 3 ||
error >&5 "bug in the test script: not 3 or 4 parameters to 
test_external"
@@ -511,6 +512,7 @@ test_external () {
 # Like test_external, but in addition tests that the command generated
 # no output on stderr.
 test_external_without_stderr () {
+   unset GIT_TEST_FSCK
# The temporary file has no (and must have no) security
# implications.
tmp=${TMPDIR:-/tmp}
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 897e6fcc94..5f7f5595e3 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -454,6 +454,8 @@ GIT_EXIT_OK=
 trap 'die' EXIT
 trap 'exit $?' INT
 
+GIT_TEST_FSCK_TESTS=
+
 # The user-facing functions are loaded from a separate file so that
 # test_perf subshells can have them too
 . "$TEST_DIRECTORY/test-lib-functions.sh"
@@ -789,7 +791,36 @@ test_at_end_hook_ () {
:
 }
 
+_test_done_fsck() {
+   desc='git fsck at end (due to GIT_TEST_FSCK)'
+   

[PATCH 1/3] tests: add a "env-bool" helper to test-tool

2018-10-31 Thread Ævar Arnfjörð Bjarmason
This new helper is a wrapper around the git_env_bool() function. There
are various GIT_TEST_* variables described in "Running tests with
special setups" in t/README that use git_env_bool().

A GIT_TEST_* variable implemented in shellscript won't have access to
the same semantics (historically we've used "test -n" for many of
these).

So let's add this helper so we can expose the same environment
variable behavior without exposing the implementation detail of
whether that variable happens to be checked in C or shellscript.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile | 1 +
 t/helper/test-env-bool.c | 9 +
 t/helper/test-tool.c | 1 +
 t/helper/test-tool.h | 1 +
 4 files changed, 12 insertions(+)
 create mode 100644 t/helper/test-env-bool.c

diff --git a/Makefile b/Makefile
index b08d5ea258..ca624c381f 100644
--- a/Makefile
+++ b/Makefile
@@ -723,6 +723,7 @@ TEST_BUILTINS_OBJS += test-dump-fsmonitor.o
 TEST_BUILTINS_OBJS += test-dump-split-index.o
 TEST_BUILTINS_OBJS += test-dump-untracked-cache.o
 TEST_BUILTINS_OBJS += test-example-decorate.o
+TEST_BUILTINS_OBJS += test-env-bool.o
 TEST_BUILTINS_OBJS += test-genrandom.o
 TEST_BUILTINS_OBJS += test-hashmap.o
 TEST_BUILTINS_OBJS += test-index-version.o
diff --git a/t/helper/test-env-bool.c b/t/helper/test-env-bool.c
new file mode 100644
index 00..956b0aa88e
--- /dev/null
+++ b/t/helper/test-env-bool.c
@@ -0,0 +1,9 @@
+#include "test-tool.h"
+#include "cache.h"
+#include "config.h"
+
+int cmd__env_bool(int argc, const char **argv)
+{
+   assert(argc == 2);
+   return !git_env_bool(argv[1], 0);
+}
diff --git a/t/helper/test-tool.c b/t/helper/test-tool.c
index 5df8b682aa..c4481085c4 100644
--- a/t/helper/test-tool.c
+++ b/t/helper/test-tool.c
@@ -17,6 +17,7 @@ static struct test_cmd cmds[] = {
{ "dump-fsmonitor", cmd__dump_fsmonitor },
{ "dump-split-index", cmd__dump_split_index },
{ "dump-untracked-cache", cmd__dump_untracked_cache },
+   { "env-bool", cmd__env_bool },
{ "example-decorate", cmd__example_decorate },
{ "genrandom", cmd__genrandom },
{ "hashmap", cmd__hashmap },
diff --git a/t/helper/test-tool.h b/t/helper/test-tool.h
index 71f470b871..f7845fbc56 100644
--- a/t/helper/test-tool.h
+++ b/t/helper/test-tool.h
@@ -13,6 +13,7 @@ int cmd__dump_cache_tree(int argc, const char **argv);
 int cmd__dump_fsmonitor(int argc, const char **argv);
 int cmd__dump_split_index(int argc, const char **argv);
 int cmd__dump_untracked_cache(int argc, const char **argv);
+int cmd__env_bool(int argc, const char **argv);
 int cmd__example_decorate(int argc, const char **argv);
 int cmd__genrandom(int argc, const char **argv);
 int cmd__hashmap(int argc, const char **argv);
-- 
2.19.1.899.g0250525e69



[PATCH 0/3] Add a GIT_TEST_FSCK test mode

2018-10-31 Thread Ævar Arnfjörð Bjarmason
This goes on top Jeff's "cat-file: handle streaming failures
consistently" and implements the test mode I suggested in
https://public-inbox.org/git/877ehzksjd@evledraar.gmail.com/

In the process I didn't find any other bugs than the 2.12..2.19
regression which is already fixed, but as noted in 3/3 I think it's
worth it to stress test fsck like this. I'll be adding this to my
regular build.

Ævar Arnfjörð Bjarmason (3):
  tests: add a "env-bool" helper to test-tool
  tests: mark those tests where "git fsck" fails at the end
  tests: add a special test setup that runs "git fsck" before exiting

 Makefile|  1 +
 t/README|  5 
 t/helper/test-env-bool.c|  9 +++
 t/helper/test-tool.c|  1 +
 t/helper/test-tool.h|  1 +
 t/t-basic.sh| 26 +++
 t/t1006-cat-file.sh |  5 
 t/t1305-config-include.sh   |  4 +++
 t/t1404-update-ref-errors.sh|  4 +++
 t/t1410-reflog.sh   |  4 +++
 t/t1515-rev-parse-outside-repo.sh   |  4 +++
 t/t3008-ls-files-lazy-init-name-hash.sh |  4 +++
 t/t3103-ls-tree-misc.sh |  6 +
 t/t3430-rebase-merges.sh|  6 +
 t/t4046-diff-unmerged.sh|  4 +++
 t/t4058-diff-duplicates.sh  |  5 
 t/t4212-log-corrupt.sh  |  6 +
 t/t5000-tar-tree.sh |  5 
 t/t5300-pack-object.sh  |  5 
 t/t5303-pack-corruption-resilience.sh   |  8 ++
 t/t5307-pack-missing-commit.sh  |  7 ++
 t/t5312-prune-corruption.sh |  4 +++
 t/t5504-fetch-receive-strict.sh |  4 +++
 t/t5601-clone.sh|  8 ++
 t/t6007-rev-list-cherry-pick-file.sh|  4 +++
 t/t6011-rev-list-with-bad-commit.sh |  7 ++
 t/t6030-bisect-porcelain.sh |  6 +
 t/t7007-show.sh |  6 +
 t/t7106-reset-unborn-branch.sh  |  4 +++
 t/t7415-submodule-names.sh  |  4 +++
 t/t7416-submodule-dash-url.sh   |  4 +++
 t/t7417-submodule-path-url.sh   |  4 +++
 t/t7509-commit-authorship.sh|  4 +++
 t/t8003-blame-corner-cases.sh   |  4 +++
 t/t9130-git-svn-authors-file.sh |  7 ++
 t/test-lib-functions.sh |  2 ++
 t/test-lib.sh   | 33 +
 37 files changed, 225 insertions(+)
 create mode 100644 t/helper/test-env-bool.c

-- 
2.19.1.899.g0250525e69



[PATCH 2/3] tests: mark those tests where "git fsck" fails at the end

2018-10-31 Thread Ævar Arnfjörð Bjarmason
Mark the tests where "git fsck" fails at the end with extra test code
to check the fsck output. There fsck.{err,out} has been created for
us.

A later change will add the support for GIT_TEST_FSCK_TESTS. They're
being added first to ensure the test suite will never fail with
GIT_TEST_FSCK=true during bisect.

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t1006-cat-file.sh | 5 +
 t/t1305-config-include.sh   | 4 
 t/t1404-update-ref-errors.sh| 4 
 t/t1410-reflog.sh   | 4 
 t/t1515-rev-parse-outside-repo.sh   | 4 
 t/t3008-ls-files-lazy-init-name-hash.sh | 4 
 t/t3103-ls-tree-misc.sh | 6 ++
 t/t3430-rebase-merges.sh| 6 ++
 t/t4046-diff-unmerged.sh| 4 
 t/t4058-diff-duplicates.sh  | 5 +
 t/t4212-log-corrupt.sh  | 6 ++
 t/t5000-tar-tree.sh | 5 +
 t/t5300-pack-object.sh  | 5 +
 t/t5303-pack-corruption-resilience.sh   | 8 
 t/t5307-pack-missing-commit.sh  | 7 +++
 t/t5312-prune-corruption.sh | 4 
 t/t5504-fetch-receive-strict.sh | 4 
 t/t5601-clone.sh| 8 
 t/t6007-rev-list-cherry-pick-file.sh| 4 
 t/t6011-rev-list-with-bad-commit.sh | 7 +++
 t/t6030-bisect-porcelain.sh | 6 ++
 t/t7007-show.sh | 6 ++
 t/t7106-reset-unborn-branch.sh  | 4 
 t/t7415-submodule-names.sh  | 4 
 t/t7416-submodule-dash-url.sh   | 4 
 t/t7417-submodule-path-url.sh   | 4 
 t/t7509-commit-authorship.sh| 4 
 t/t8003-blame-corner-cases.sh   | 4 
 t/t9130-git-svn-authors-file.sh | 7 +++
 29 files changed, 147 insertions(+)

diff --git a/t/t1006-cat-file.sh b/t/t1006-cat-file.sh
index 43c4be1e5e..12b69e6fbe 100755
--- a/t/t1006-cat-file.sh
+++ b/t/t1006-cat-file.sh
@@ -588,4 +588,9 @@ test_expect_success 'cat-file --unordered works' '
test_cmp expect actual
 '
 
+GIT_TEST_FSCK_TESTS='
+   test_i18ngrep "unable to unpack header of" fsck.err &&
+   test_i18ngrep "object corrupt or missing" fsck.err
+'
+
 test_done
diff --git a/t/t1305-config-include.sh b/t/t1305-config-include.sh
index 635918505d..890d307d4e 100755
--- a/t/t1305-config-include.sh
+++ b/t/t1305-config-include.sh
@@ -313,4 +313,8 @@ test_expect_success 'include cycles are detected' '
test_i18ngrep "exceeded maximum include depth" stderr
 '
 
+GIT_TEST_FSCK_TESTS='
+   test_i18ngrep "exceeded maximum include depth" fsck.err
+'
+
 test_done
diff --git a/t/t1404-update-ref-errors.sh b/t/t1404-update-ref-errors.sh
index 51a4f4c0ac..6095b2d4b9 100755
--- a/t/t1404-update-ref-errors.sh
+++ b/t/t1404-update-ref-errors.sh
@@ -618,4 +618,8 @@ test_expect_success 'delete fails cleanly if packed-refs 
file is locked' '
test_cmp unchanged actual
 '
 
+GIT_TEST_FSCK_TESTS='
+   test_i18ngrep "invalid sha1 pointer" fsck.err
+'
+
 test_done
diff --git a/t/t1410-reflog.sh b/t/t1410-reflog.sh
index 388b0611d8..43b8e0c9c5 100755
--- a/t/t1410-reflog.sh
+++ b/t/t1410-reflog.sh
@@ -368,4 +368,8 @@ test_expect_success 'continue walking past root commits' '
)
 '
 
+GIT_TEST_FSCK_TESTS='
+   test_i18ngrep "invalid reflog entry" fsck.err
+'
+
 test_done
diff --git a/t/t1515-rev-parse-outside-repo.sh 
b/t/t1515-rev-parse-outside-repo.sh
index 3ec2971ee5..1d8fc3ad70 100755
--- a/t/t1515-rev-parse-outside-repo.sh
+++ b/t/t1515-rev-parse-outside-repo.sh
@@ -42,4 +42,8 @@ test_expect_success 'rev-parse --resolve-git-dir' '
test_cmp expect actual
 '
 
+GIT_TEST_FSCK_TESTS='
+   test_i18ngrep "not a git repository" fsck.err
+'
+
 test_done
diff --git a/t/t3008-ls-files-lazy-init-name-hash.sh 
b/t/t3008-ls-files-lazy-init-name-hash.sh
index 64f047332b..7fb2e5c177 100755
--- a/t/t3008-ls-files-lazy-init-name-hash.sh
+++ b/t/t3008-ls-files-lazy-init-name-hash.sh
@@ -24,4 +24,8 @@ test_expect_success 'no buffer overflow in 
lazy_init_name_hash' '
test-tool lazy-init-name-hash -m
 '
 
+GIT_TEST_FSCK_TESTS='
+   test_i18ngrep "notice: No default references" fsck.err
+'
+
 test_done
diff --git a/t/t3103-ls-tree-misc.sh b/t/t3103-ls-tree-misc.sh
index 14520913af..b7d8ae2e81 100755
--- a/t/t3103-ls-tree-misc.sh
+++ b/t/t3103-ls-tree-misc.sh
@@ -22,4 +22,10 @@ test_expect_success 'ls-tree fails with non-zero exit code 
on broken tree' '
test_must_fail git ls-tree -r HEAD
 '
 
+GIT_TEST_FSCK_TESTS='
+   test_i18ngrep "invalid sha1 pointer in cache-tree" fsck.err &&
+   test_i18ngrep "broken link from.*tree" fsck.out &&
+   test_i18ngrep "missing tree" fsck.out
+'
+
 test_done
diff --git a/t/t3430-rebase-merges.sh b/t/t3430-rebase-merges.sh
index aa7bfc88ec..efac3a792b 100755
--- a/t/t3430-rebase-merges.sh
+++ b/t/t3430-rebase-merges.sh
@@ -396,4 +396,10 @@ 

Re: [RFC] Generation Number v2

2018-10-31 Thread Derrick Stolee

On 10/29/2018 11:59 PM, Junio C Hamano wrote:

Derrick Stolee  writes:


**V3: Corrected Commit Date.**
For a commit C, let its _corrected commit date_ (denoted by cdate(C))
be the maximum of the commit date of C and the commit dates of its
parents.

"maximum of the commit date of C and the corrected commit dates of
its parents"?


That's what I mean. Thanks.



We've talked about exactly this one in the past (long before any of
Microsoft folks other than Dscho came to the scene) but without an
implementation, or detailed design and analysis.  I am very happy to
see the idea among other possibilities to be considered again.  This
time around, we may finally come up with something better than the
"commit dates with SLOP" thing ;-).


Essentially, the felineY order is selected with the goal of swapping
positions of topologically-independent commits relative to the felinX
ordering. The resulting reachability index is as follows:

If felineX(A) < felineY(B), then A cannot reach B.
If felineY(A) < felineY(B), then A cannot reach B.

I presume that the first line is a typo and you compare the same X index?


Yes, sorry for the typos. I fixed them in the report on GitHub.




* **Compatible?** In our test implementation, we use a previously unused
   byte of data in the commit-graph format to indicate which reachability
   index version we are using. Existing clients ignore this value, so we
   will want to consider if these new indexes are _backwards compatible_.
   That is, will they still report correct values if they ignore this byte
   and use the generation number column from the commit-graph file assuming
   the values are minimum generation numbers?

I personally consider that the current commit-graph with generation
numbers experimental, so I am not sure how much we care about this.

Having said that.

By the above definition, any new index that is wider than the
current generation number cannot be compatible (can we even tell the
existing clients how wide each elements in the ix array is?)

In any case, perhaps the first thing to do is to update the clients
so that they stop ignoring the version number field, and instead
work without generation number when there is no version of reach.ix
available in the file?  That way, a better reachablility index can
be chosen freely without having to worry about the compatibility.


I can work on that. It should be as simple as setting commit->generation to
GENERATION_NUMBER_ZERO in fill_commit_in_graph when the graph
has a different version.




* **Immutable?** Git objects are _immutable_. If you change an object you
   actually create a new object with a new object ID. Are the values we
store
   for these reachability indexes also immutable?

Even if we do not embed the reachability ix in commit objects,
having an immutable value is probably a must if we want to make them
incrementally computable, so this is a very good property to have.
Unless there is a clever idea to incrementally compute a mutable
reach.ix, my gut instinct says that this property is a must.

Another thing, perhaps related to "local" below, is if exactly the
same reach.ix is computed by anybody, given an identical commit
history graph (perhaps "reproducibility"?).  I think most of the
candidates you listed are reproducible without a fixed tie-breaker,
but I am not sure about felineY() thing.


* **Local?** Are these values **locally computable**? That is, do we only
   need to look at the parents of a commit (assuming those parents have
   computed values) in order to determine the value at that commit?

A subset of non-local reachability ix, for example, the ones that
need to know what each commit's children are, cannot be immutable,
as adding new objects to the graph (either with locally committing,
or transferring objects from other repositories) would affect the
ix; is this true for all non-local reachability ix, I wonder?


As a thought experiment, we could define a function size(C) to be the
numberof commits reachable from C. This is not locally-computable
from the size values at C's parents due to the inclusion-exclusion
principle. We would need to compute it by walking the reachable set
and counting (resulting in quadratic performance overall) but is
immutable. Since the performance cost is so expensive (unlike the
linear costs in the other non-local versions) I didn't include it
in my comparison.




We focused on three types of performance tests that test the indexes
in different ways. Each test lists the `git` command that is used,
and the table lists which repository is used and which inputs.

### Test 1: `git log --topo-order -N`

This test focuses on the number of commits that are parsed during
a `git log --topo-order` before writing `N` commits to output.

A devil's advocate comment.  Your patches seem to be very focused on
this "unlimited" case for the past few weeks, but I am not so sure
if that is a case worth optimizing for.  If "git log --topo-order -N
HEAD~M.." 

Re: [PATCH 0/3] Make add_missing_tags() linear

2018-10-31 Thread Derrick Stolee


On 10/31/2018 2:04 AM, Elijah Newren wrote:
> On Tue, Oct 30, 2018 at 7:16 AM Derrick Stolee via GitGitGadget
>  wrote:
>>
>> As reported earlier [1], the add_missing_tags() method in remote.c has
>> quadratic performance. Some of that performance is curbed due to the
>> generation-number cutoff in in_merge_bases_many(). However, that fix doesn't
>> help users without a commit-graph, and it can still be painful if that
>> cutoff is sufficiently low compared to the tags we are using for
>> reachability testing.
>>
>> Add a new method in commit-reach.c called get_reachable_subset() which does
>> a many-to-many reachability test. Starting at the 'from' commits, walk until
>> the generation is below the smallest generation in the 'to' commits, or all
>> 'to' commits have been discovered. This performs only one commit walk for
>> the entire add_missing_tags() method, giving linear performance in the worst
>> case.
>>
>> Tests are added in t6600-test-reach.sh to ensure get_reachable_subset()
>> works independently of its application in add_missing_tags().
>
> On the original repo where the topic was brought up, with commit-graph
> NOT turned on and using origin/master, I see:
>
> $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> To /home/newren/repo-mirror
>  * [new branch]   test5 -> test5
>
> real 1m20.081s
> user 1m19.688s
> sys 0m0.292s
>
> Merging this series in, I now get:
>
> $ time git push --dry-run --follow-tags /home/newren/repo-mirror
> To /home/newren/repo-mirror
>  * [new branch]   test5 -> test5
>
> real 0m2.857s
> user 0m2.580s
> sys 0m0.328s
>
> which provides a very nice speedup.
>
> Oddly enough, if I _also_ do the following:
> $ git config core.commitgraph true
> $ git config gc.writecommitgraph true
> $ git gc
>
> then my timing actually slows down just slightly:
> $ time git push --follow-tags --dry-run /home/newren/repo-mirror
> To /home/newren/repo-mirror
>  * [new branch]  test5 -> test5
>
> real 0m3.027s
> user 0m2.696s
> sys 0m0.400s

So you say that the commit-graph is off in the 2.8s case, but not here
in the 3.1s case? I would expect _at minimum_ that the cost of parsing
commits would have a speedup in the commit-graph case.  There may be
something else going on here, since you are timing a `push` event that
is doing more than the current walk.

> (run-to-run variation seems pretty consistent, < .1s variation, so
> this difference is just enough to notice.)  I wouldn't be that
> surprised if that means there's some really old tags with very small
> generation numbers, meaning it's not gaining anything in this special
> case from the commit-graph, but it does pay the cost of loading the
> commit-graph.

While you have this test environment, do you mind applying the diff
below and re-running the tests? It will output a count for how many
commits are walked by the algorithm. This should help us determine if
this is another case where generation numbers are worse than commit-date,
or if there is something else going on. Thanks!

-->8--

>From 2115e7dcafb2770455b7b4793f90edc2254bad97 Mon Sep 17 00:00:00 2001
From: Derrick Stolee 
Date: Wed, 31 Oct 2018 11:40:50 +
Subject: [PATCH] DO-NOT-MERGE: count commits in get_reachable_subset

Signed-off-by: Derrick Stolee 
---
 commit-reach.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/commit-reach.c b/commit-reach.c
index a98532ecc8..b512461cf7 100644
--- a/commit-reach.c
+++ b/commit-reach.c
@@ -700,6 +700,7 @@ struct commit_list *get_reachable_subset(struct commit 
**from, int nr_from,
struct commit **from_last = from + nr_from;
uint32_t min_generation = GENERATION_NUMBER_INFINITY;
int num_to_find = 0;
+   int count = 0;
 
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
 
@@ -729,6 +730,8 @@ struct commit_list *get_reachable_subset(struct commit 
**from, int nr_from,
while (num_to_find && (current = prio_queue_get()) != NULL) {
struct commit_list *parents;
 
+   count++;
+
if (current->object.flags & PARENT1) {
current->object.flags &= ~PARENT1;
current->object.flags |= reachable_flag;
@@ -755,6 +758,8 @@ struct commit_list *get_reachable_subset(struct commit 
**from, int nr_from,
clear_commit_marks_many(nr_to, to, PARENT1);
clear_commit_marks_many(nr_from, from, PARENT2);
 
+   fprintf(stderr, "count: %d\n", count);
+
return found_commits;
 }
 
-- 
2.19.1.542.gc4df23f792



Re: [PATCH 1/3] commit-reach: implement get_reachable_subset

2018-10-31 Thread Derrick Stolee

On 10/30/2018 11:35 PM, Junio C Hamano wrote:

"Derrick Stolee via GitGitGadget"  writes:


+struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
+struct commit **to, int nr_to,
+int reachable_flag)

This is OR'ed into object.flags, and I somoehow expected to see it
as 'unsigned int', not a signed one.


Will do. Thanks.




+{
+   struct commit **item;
+   struct commit *current;
+   struct commit_list *found_commits = NULL;
+   struct commit **to_last = to + nr_to;
+   struct commit **from_last = from + nr_from;
+   uint32_t min_generation = GENERATION_NUMBER_INFINITY;
+   int num_to_find = 0;
+
+   struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+
+   for (item = to; item < to_last; item++) {
+   struct commit *c = *item;
+   
+   parse_commit(c);
+   if (c->generation < min_generation)
+   min_generation = c->generation;
+
+   if (!(c->object.flags & PARENT1)) {
+   c->object.flags |= PARENT1;
+   num_to_find++;
+   }
+   }
+
+   for (item = from; item < from_last; item++) {
+   struct commit *c = *item;
+   if (!(c->object.flags & PARENT2)) {
+   c->object.flags |= PARENT2;
+   parse_commit(c);
+
+   prio_queue_put(, *item);
+   }
+   }

OK, we marked "to" with PARENT1 and counted them in num_to_find
without dups.  We also marked "from" with PARENT2 and threw them in
the "queue" without dups.

Mental note: the caller must guarantee that everybody reachable from
"to" and "from" have PARENT1 and PARENT2 clear.  This might deserve
to be in the comment before the function.


I'll put that in the header file.

[snip]

OK, this all makes sense.  Unlike merge-base traversals, this does
not have to traverse from the "to" side at all, which makes it quite
simpler and straight-forward.

I do wonder if we can now reimplement in_merge_bases_many() in terms
of this helper, and if that gives us a better performance.  It asks
"is 'commit', i.e. a single 'to', an ancestor of, i.e. reachable
from, one of the 'references', i.e.  'from'"?


We could do this, but it does come with a performance hit when the following
are all true:

1. 'to' is not reachable from any 'from' commits.

2. The 'to' and 'from' commits are close in commit-date.

3. Generation numbers are not available, or the topology is skewed to have
   commits with high commit date and low generation number.

Since in_merge_bases_many() calls paint_down_to_common(), it has the same
issues with the current generation numbers. This can be fixed when we have
the next version of generation numbers available.

I'll make a note to have in_merge_bases_many() call get_reachable_subset()
conditionally (like the generation_numbers_available() trick in the 
--topo-order

series) after the generation numbers are settled and implemented.

Thanks,
-Stolee


Re: [PATCH 1/3] commit-reach: implement get_reachable_subset

2018-10-31 Thread Derrick Stolee

On 10/31/2018 2:07 AM, Elijah Newren wrote:

On Tue, Oct 30, 2018 at 7:16 AM Derrick Stolee via GitGitGadget
 wrote:

--- a/commit-reach.c
+++ b/commit-reach.c
@@ -688,3 +688,73 @@ int can_all_from_reach(struct commit_list *from, struct 
commit_list *to,
 object_array_clear(_objs);
 return result;
  }
+
+struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
+struct commit **to, int nr_to,
+int reachable_flag)
+{
+   struct commit **item;
+   struct commit *current;
+   struct commit_list *found_commits = NULL;
+   struct commit **to_last = to + nr_to;
+   struct commit **from_last = from + nr_from;
+   uint32_t min_generation = GENERATION_NUMBER_INFINITY;
+   int num_to_find = 0;
+
+   struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
+
+   for (item = to; item < to_last; item++) {
+   struct commit *c = *item;
+
+   parse_commit(c);
+   if (c->generation < min_generation)
+   min_generation = c->generation;

So, when we don't have a commit-graph, is c->generation just going to
be 0, making min_generation also be 0? (meaning we get no possible
speed benefit from the commit-graph, since we just don't have that
information available)?


If we don't have a commit-graph, then we can only terminate the loop early
if we discover all of the commits (num_to_find == 0).Otherwise, we need to
walk the entire graph in order to determine non-reachability. This relates
to Junio's point about in_merge_bases_many(), which I'll respond to his
message in more detail about that.

Thanks,
-Stolee


OK

2018-10-31 Thread AHMED ZAMA
Greetings,

I humbly solicit for your partnership to transfer €15 million Euros
into your personal or company’s account .Contact me for more detailed
explanation.

Kindly send me the followings

Full Names
Address
Occupation
Direct Mobile Telephone Lines
Nationality

Ahmed Zama
+22675844869


Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows

2018-10-31 Thread Johannes Schindelin
Hi,

On Wed, 31 Oct 2018, Junio C Hamano wrote:

> > From: chris 
> 
> Please make this line read like
> 
>   From: Chris Webster 
> 
> i.e. the author should be the person who is signing off that patch.

This is most likely recorded as the commit's author in the commit
object... Chris, to fix it, make sure that your `user.name` is configured
correctly, and then call `git commit --amend --reset-author`.

> > Use File::Spec->devnull() for output redirection to avoid messages
> > when Windows version of Perl is first in path.  The message 'The
> > system cannot find the path specified.' is displayed each time git is
> > run to get colors.
> >
> > Signed-off-by: Chris. Webster 
> > ---
> >  contrib/diff-highlight/DiffHighlight.pm | 7 ++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> There are a handful more instances of /dev/null found if you do
> 
>   $ git grep /dev/null -- \*.pl \*.pm
> 
> The one in perl/Git.pm must be shared by scripts written in Perl, so
> it may be worth giving the same tweak to it, like this patch does to
> the highlight script.

I do not think that perl/Git.pm is intended to run with any random Perl
interpreter. It has to be one that has been verified to work correctly
with the Perl code in perl/, and that code is notoriously reliant on POSIX
behavior, hence our choice to go with MSYS2 Perl (there *is* a MINGW Perl
package in Git for Windows' SDK, but it will most likely not work, in
particular because of the missing Subversion bindings).

So I would restrict the search to contrib/\*.pl, contrib/\*.perl and
contrib/\*.pm. The stuff in contrib/ is supposed to be semi-independent
from the particular Git one is using (and from whatever Perl is shipped
with it, if any).

Ciao,
Johannes

> > diff --git a/contrib/diff-highlight/DiffHighlight.pm 
> > b/contrib/diff-highlight/DiffHighlight.pm
> > index 536754583..7440aa1c4 100644
> > --- a/contrib/diff-highlight/DiffHighlight.pm
> > +++ b/contrib/diff-highlight/DiffHighlight.pm
> > @@ -4,6 +4,11 @@ use 5.008;
> >  use warnings FATAL => 'all';
> >  use strict;
> >  
> > +# Use the correct value for both UNIX and Windows (/dev/null vs nul)
> > +use File::Spec;
> > +
> > +my $NULL = File::Spec->devnull();
> > +
> >  # Highlight by reversing foreground and background. You could do
> >  # other things like bold or underline if you prefer.
> >  my @OLD_HIGHLIGHT = (
> > @@ -134,7 +139,7 @@ sub highlight_stdin {
> >  # fallback, which means we will work even if git can't be run.
> >  sub color_config {
> > my ($key, $default) = @_;
> > -   my $s = `git config --get-color $key 2>/dev/null`;
> > +   my $s = `git config --get-color $key 2>$NULL`;
> > return length($s) ? $s : $default;
> >  }
> 


Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows

2018-10-31 Thread Johannes Schindelin
Hi Junio,

On Wed, 31 Oct 2018, Junio C Hamano wrote:

> Chris Webster  writes:
> 
> >>> > Use File::Spec->devnull() for output redirection to avoid messages
> >>> > when Windows version of Perl is first in path.  The message 'The
> >>>
> >>> Dscho, "Windows version of Perl is first in path" somehow feels
> >>> contradicting with what one of the topics I saw from you were trying
> >>> to enforce (or, at least, "set as the supported configuration").
> >>>
> >>> I am guessing that the Perl you are building and shipping with Git
> >>> for Windows would yield what the shell that ends up running the
> >>> scriptlet `git config --get-color $key` prefers when asked for
> >>> File::Spec->devnull(), and nothing will break with this patch even
> >>> if that is "/dev/null", but I thought I'd double check.
> >>>
> >>> Thanks.
> >>>
> > This problem originally showed up in the
> > https://github.com/so-fancy/diff-so-fancy project, which has a copy of
> > DiffHighlight.pm.   That project allows diffsofancy (perl) to be run
> > from the command line without requiring the bash environment ((well ,
> > sort of) including the associated perl).
> 
> Thanks for additional comments.  
> 
> In any case, Windows is not my bailiwick, so I'll hope that the
> above comments from you would help Dscho in his response and wait.
> I know use of File::Spec->devnull() won't hurt POSIX folks so making
> sure this won't break Git for Windows is the primary thing I woudl
> worry about this patch.

Indeed, the patch in question regards something I consider outside Git for
Windows' realm. As Chris said, you can run this script from a PowerShell
prompt, without any Git Bash (and without Git's Perl) involved.

I am fine with this patch, as long as the author name is fixed to match
the name in the Signed-off-by: footer ;-) [*1*]

Ciao,
Dscho

Footnote *1*: This patch came in via GitGitGadget, and if I had infinite
amounts of time, I would probably implement some rudimentary pre-checks,
such as: does the Author: header match the first Signed-off-by: footer, is
the commit message wrapped correctly, does the oneline have a prefix and
continues lower-case, etc. And GitGitGadget would then point out the
issues, possibly even try to fix them and push up the fixed commits.

If anybody agrees with these goals and is curious enough to dive into some
Typescript programming, I'd be very happy to guide that person through
implementing this ;-)


Re: [PATCH 0/4] mingw: prevent external PERL5LIB from interfering

2018-10-31 Thread Johannes Schindelin
Hi Junio,

On Wed, 31 Oct 2018, Junio C Hamano wrote:

> "Johannes Schindelin via GitGitGadget" 
> writes:
> 
> > An alternative approach which was rejected at the time (because it
> > interfered with the then-ongoing work to compile Git for Windows using
> > MS Visual C++) would patch the make_environment_block() function to
> > skip the specified environment variables before handing down the
> > environment block to the spawned process. Currently it would interfere
> > with the mingw-utf-8-env patch series I sent earlier today
> > [https://public-inbox.org/git/pull.57.git.gitgitgad...@gmail.com].
> 
> So the rejected approach that was not used in this series would
> interfere with the other one, but I do not have to worry about it
> because this series does not use that approach?  I had to read the six
> lines above twice to figure out that it essentially is saying "I
> shouldn't care", but please let me know if I misread the paragraph and I
> need to care ;-)

Well, you might want to worry about it. Or not.

The approach taken by this patch series is to call `unsetenv()` for the
variable names listed in `core.unsetenvvars` (if any), just before
spawning off the first process (if any).

What I meant by this comment in the cover letter is that I thought about
doing it differently. We already have a perfectly fine function called
`make_environment_block()` that takes a "deltaenv", and then constructs a
new environment block from the current environment plus deltaenv. And this
would be an obvious alternative place to "unset" the variables, as this
function is only called just before spawning new processes.

I was weighing both options, and both back then as right now, there are
other patches in flight that conflict with the second approach, so the
first approach is what I went with.

>From a purely aesthetic point of view, the second approach looks nicer, as
it really only affects the spawned processes (and not the current one),
and it allows for changing the list between spawning processes.

But to do it right, I would also have to use a hash set, and it would
complicate the code of `make_environment_block()` even further. And it
sounds a bit like overkill to me, too.

So I sided with the approach presented in the current revision of the
patch series, but I wanted to describe the other approach in case you (or
other reviewers) have a different opinion.

> > While at it, this patch series also cleans up house and moves the
> > Windows-specific core.* variable handling to compat/mingw.c rather
> > than cluttering environment.c and config.c with things that e.g.
> > developers on Linux do not want to care about.
> 
> Or Macs.

Or macOS. Or FreeBSD. Or Irix. Or any other, that's right ;-)

Traditionally, we did not really care all that much about platforms other
than Linux, though, which is what made me write what I wrote. Having said
that, I get the impression that this is changing slowly. The benefits are
pretty clear, too. After all, it was a *Windows* build failure recently
that let Peff identify and fix a *non-Windows* bug...

> When I skimmed this series earlier, I found that patches 2 & 3 sensibly
> implemented to achieve this goal.

Thanks!
Dscho

> 
> >
> > Johannes Schindelin (4):
> >   config: rename `dummy` parameter to `cb` in git_default_config()
> >   Allow for platform-specific core.* config settings
> >   Move Windows-specific config settings into compat/mingw.c
> >   mingw: unset PERL5LIB by default
> >
> >  Documentation/config.txt |  6 
> >  cache.h  |  8 -
> >  compat/mingw.c   | 58 +++-
> >  compat/mingw.h   |  3 ++
> >  config.c | 18 ---
> >  environment.c|  1 -
> >  git-compat-util.h|  8 +
> >  t/t0029-core-unsetenvvars.sh | 30 +++
> >  8 files changed, 109 insertions(+), 23 deletions(-)
> >  create mode 100755 t/t0029-core-unsetenvvars.sh
> >
> >
> > base-commit: 4ede3d42dfb57f9a41ac96a1f216c62eb7566cc2
> > Published-As: 
> > https://github.com/gitgitgadget/git/releases/tags/pr-62%2Fdscho%2Fperl5lib-v1
> > Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
> > pr-62/dscho/perl5lib-v1
> > Pull-Request: https://github.com/gitgitgadget/git/pull/62
> 


[PATCH v4 2/5] am: improve author-script error reporting

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

If there are errors in a user edited author-script there was no
indication of what was wrong. This commit adds some specific error messages
depending on the problem. It also relaxes the requirement that the
variables appear in a specific order in the file to match the behavior
of 'rebase --interactive'.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 49 +++--
 1 file changed, 39 insertions(+), 10 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index b68578bc3f..d42b725273 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -270,8 +270,11 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
struct string_list_item *item;
char *np;
char *cp = strchr(buf, '=');
-   if (!cp)
-   return -1;
+   if (!cp) {
+   np = strchrnul(buf, '\n');
+   return error(_("unable to parse '%.*s'"),
+(int) (np - buf), buf);
+   }
np = strchrnul(cp, '\n');
*cp++ = '\0';
item = string_list_append(list, buf);
@@ -280,7 +283,8 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
*np = '\0';
cp = sq_dequote(cp);
if (!cp)
-   return -1;
+   return error(_("unable to dequote value of '%s'"),
+item->string);
item->util = xstrdup(cp);
}
return 0;
@@ -308,6 +312,7 @@ static int read_author_script(struct am_state *state)
struct strbuf buf = STRBUF_INIT;
struct string_list kv = STRING_LIST_INIT_DUP;
int retval = -1; /* assume failure */
+   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
int fd;
 
assert(!state->author_name);
@@ -326,14 +331,38 @@ static int read_author_script(struct am_state *state)
if (parse_key_value_squoted(buf.buf, ))
goto finish;
 
-   if (kv.nr != 3 ||
-   strcmp(kv.items[0].string, "GIT_AUTHOR_NAME") ||
-   strcmp(kv.items[1].string, "GIT_AUTHOR_EMAIL") ||
-   strcmp(kv.items[2].string, "GIT_AUTHOR_DATE"))
+   for (i = 0; i < kv.nr; i++) {
+   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
+   if (name_i >= 0)
+   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
+   else
+   name_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
+   if (email_i >= 0)
+   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
+   else
+   email_i = i;
+   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
+   if (date_i >= 0)
+   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
+   else
+   date_i = i;
+   } else {
+   err = error(_("unknown variable '%s'"),
+   kv.items[i].string);
+   }
+   }
+   if (name_i == -2)
+   error(_("missing 'GIT_AUTHOR_NAME'"));
+   if (email_i == -2)
+   error(_("missing 'GIT_AUTHOR_EMAIL'"));
+   if (date_i == -2)
+   error(_("missing 'GIT_AUTHOR_DATE'"));
+   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
goto finish;
-   state->author_name = kv.items[0].util;
-   state->author_email = kv.items[1].util;
-   state->author_date = kv.items[2].util;
+   state->author_name = kv.items[name_i].util;
+   state->author_email = kv.items[email_i].util;
+   state->author_date = kv.items[date_i].util;
retval = 0;
 finish:
string_list_clear(, !!retval);
-- 
2.19.1



[PATCH v4 0/5] am/rebase: share read_author_script()

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

Sorry for the confusion with v3, here are the updated patches.

Thanks to Junio for the feedback on v2. I've updated patch 4 based on
those comments, the rest are unchanged.

v1 cover letter:

This is a follow up to pw/rebase-i-author-script-fix, it reduces code
duplication and improves rebase's parsing of the author script. After
this I'll do another series to share the code to write the author
script.


Phillip Wood (5):
  am: don't die in read_author_script()
  am: improve author-script error reporting
  am: rename read_author_script()
  add read_author_script() to libgit
  sequencer: use read_author_script()

 builtin/am.c |  60 ++--
 sequencer.c  | 192 ---
 sequencer.h  |   3 +
 3 files changed, 128 insertions(+), 127 deletions(-)

-- 
2.19.1



[PATCH v4 4/5] add read_author_script() to libgit

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

Add read_author_script() to sequencer.c based on the implementation in
builtin/am.c and update read_am_author_script() to use
read_author_script(). The sequencer code that reads the author script
will be updated in the next commit.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v2:
 - tweaked an error message as suggested by Junio
 - fixed corner case where a key is given three times (Thanks to Junio
   for pointing this out)

changes since v1:
 - added comment above read_author_script()
 - rebased to reflect changes added in patch 2

 builtin/am.c |  86 +
 sequencer.c  | 105 +++
 sequencer.h  |   3 ++
 3 files changed, 110 insertions(+), 84 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index 991d13f9a2..c5373158c0 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -260,36 +260,6 @@ static int read_state_file(struct strbuf *sb, const struct 
am_state *state,
die_errno(_("could not read '%s'"), am_path(state, file));
 }
 
-/**
- * Take a series of KEY='VALUE' lines where VALUE part is
- * sq-quoted, and append  at the end of the string list
- */
-static int parse_key_value_squoted(char *buf, struct string_list *list)
-{
-   while (*buf) {
-   struct string_list_item *item;
-   char *np;
-   char *cp = strchr(buf, '=');
-   if (!cp) {
-   np = strchrnul(buf, '\n');
-   return error(_("unable to parse '%.*s'"),
-(int) (np - buf), buf);
-   }
-   np = strchrnul(cp, '\n');
-   *cp++ = '\0';
-   item = string_list_append(list, buf);
-
-   buf = np + (*np == '\n');
-   *np = '\0';
-   cp = sq_dequote(cp);
-   if (!cp)
-   return error(_("unable to dequote value of '%s'"),
-item->string);
-   item->util = xstrdup(cp);
-   }
-   return 0;
-}
-
 /**
  * Reads and parses the state directory's "author-script" file, and sets
  * state->author_name, state->author_email and state->author_date accordingly.
@@ -309,65 +279,13 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
 static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
-   struct strbuf buf = STRBUF_INIT;
-   struct string_list kv = STRING_LIST_INIT_DUP;
-   int retval = -1; /* assume failure */
-   int i, name_i = -2, email_i = -2, date_i = -2, err = 0;
-   int fd;
 
assert(!state->author_name);
assert(!state->author_email);
assert(!state->author_date);
 
-   fd = open(filename, O_RDONLY);
-   if (fd < 0) {
-   if (errno == ENOENT)
-   return 0;
-   return error_errno(_("could not open '%s' for reading"),
-  filename);
-   }
-   strbuf_read(, fd, 0);
-   close(fd);
-   if (parse_key_value_squoted(buf.buf, ))
-   goto finish;
-
-   for (i = 0; i < kv.nr; i++) {
-   if (!strcmp(kv.items[i].string, "GIT_AUTHOR_NAME")) {
-   if (name_i >= 0)
-   name_i = error(_("'GIT_AUTHOR_NAME' already 
given"));
-   else
-   name_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_EMAIL")) {
-   if (email_i >= 0)
-   email_i = error(_("'GIT_AUTHOR_EMAIL' already 
given"));
-   else
-   email_i = i;
-   } else if (!strcmp(kv.items[i].string, "GIT_AUTHOR_DATE")) {
-   if (date_i >= 0)
-   date_i = error(_("'GIT_AUTHOR_DATE' already 
given"));
-   else
-   date_i = i;
-   } else {
-   err = error(_("unknown variable '%s'"),
-   kv.items[i].string);
-   }
-   }
-   if (name_i == -2)
-   error(_("missing 'GIT_AUTHOR_NAME'"));
-   if (email_i == -2)
-   error(_("missing 'GIT_AUTHOR_EMAIL'"));
-   if (date_i == -2)
-   error(_("missing 'GIT_AUTHOR_DATE'"));
-   if (date_i < 0 || email_i < 0 || date_i < 0 || err)
-   goto finish;
-   state->author_name = kv.items[name_i].util;
-   state->author_email = kv.items[email_i].util;
-   state->author_date = kv.items[date_i].util;
-   retval = 0;
-finish:
-   string_list_clear(, !!retval);
-   strbuf_release();
-   return retval;
+   return read_author_script(filename, >author_name,
+ >author_email, 

[PATCH v4 1/5] am: don't die in read_author_script()

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

The caller is already prepared to handle errors returned from this
function so there is no need for it to die if it cannot read the file.

Suggested-by: Eric Sunshine 
Signed-off-by: Phillip Wood 
---
 builtin/am.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/am.c b/builtin/am.c
index 5e866d17c7..b68578bc3f 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -318,7 +318,8 @@ static int read_author_script(struct am_state *state)
if (fd < 0) {
if (errno == ENOENT)
return 0;
-   die_errno(_("could not open '%s' for reading"), filename);
+   return error_errno(_("could not open '%s' for reading"),
+  filename);
}
strbuf_read(, fd, 0);
close(fd);
-- 
2.19.1



[PATCH v4 5/5] sequencer: use read_author_script()

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

Use the new function added in the last commit to read the author
script, updating read_env_script() and read_author_ident(). We now
have a single code path that reads the author script for am and all
flavors of rebase. This changes the behavior of read_env_script() as
previously it would set any environment variables that were in the
author-script file. Now it is an error if the file contains other
variables or any of GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL and
GIT_AUTHOR_DATE are missing. This is what am and the non interactive
version of rebase have been doing for several years so hopefully it
will not cause a problem for interactive rebase users. The advantage
is that we are reusing existing code from am which uses sq_dequote()
to properly dequote variables. This fixes potential problems with user
edited scripts as read_env_script() which did not track quotes
properly.

This commit also removes the fallback code for checking for a broken
author script after git is upgraded when a rebase is stopped. Now that
the parsing uses sq_dequote() it will reliably return an error if the
quoting is broken and the user will have to abort the rebase and
restart. This isn't ideal but it's a corner case and the detection of
the broken quoting could be confused by user edited author scripts.

Signed-off-by: Phillip Wood 
---

Notes:
changes since v1
 - use argv_array_pushf() as suggested by Eric
 - fixed strbuf handling as suggested by Eric
 - fix comments and commit message to reflect changed behavior of
   read_env_script()

 sequencer.c | 97 -
 1 file changed, 21 insertions(+), 76 deletions(-)

diff --git a/sequencer.c b/sequencer.c
index af9987c807..09dc200b4f 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -767,53 +767,24 @@ int read_author_script(const char *path, char **name, 
char **email, char **date,
 }
 
 /*
- * write_author_script() used to fail to terminate the last line with a "'" and
- * also escaped "'" incorrectly as "'''" rather than "'\\''". We check for
- * the terminating "'" on the last line to see how "'" has been escaped in case
- * git was upgraded while rebase was stopped.
- */
-static int quoting_is_broken(const char *s, size_t n)
-{
-   /* Skip any empty lines in case the file was hand edited */
-   while (n > 0 && s[--n] == '\n')
-   ; /* empty */
-   if (n > 0 && s[n] != '\'')
-   return 1;
-
-   return 0;
-}
-
-/*
- * Read a list of environment variable assignments (such as the author-script
- * file) into an environment block. Returns -1 on error, 0 otherwise.
+ * Read a GIT_AUTHOR_NAME, GIT_AUTHOR_EMAIL AND GIT_AUTHOR_DATE from a
+ * file with shell quoting into struct argv_array. Returns -1 on
+ * error, 0 otherwise.
  */
 static int read_env_script(struct argv_array *env)
 {
-   struct strbuf script = STRBUF_INIT;
-   int i, count = 0, sq_bug;
-   const char *p2;
-   char *p;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(, rebase_path_author_script(), 256) <= 0)
+   if (read_author_script(rebase_path_author_script(),
+  , , , 0))
return -1;
-   /* write_author_script() used to quote incorrectly */
-   sq_bug = quoting_is_broken(script.buf, script.len);
-   for (p = script.buf; *p; p++)
-   if (sq_bug && skip_prefix(p, "'''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (skip_prefix(p, "'\\''", ))
-   strbuf_splice(, p - script.buf, p2 - p, "'", 1);
-   else if (*p == '\'')
-   strbuf_splice(, p-- - script.buf, 1, "", 0);
-   else if (*p == '\n') {
-   *p = '\0';
-   count++;
-   }
 
-   for (i = 0, p = script.buf; i < count; i++) {
-   argv_array_push(env, p);
-   p += strlen(p) + 1;
-   }
+   argv_array_pushf(env, "GIT_AUTHOR_NAME=%s", name);
+   argv_array_pushf(env, "GIT_AUTHOR_EMAIL=%s", email);
+   argv_array_pushf(env, "GIT_AUTHOR_DATE=%s", date);
+   free(name);
+   free(email);
+   free(date);
 
return 0;
 }
@@ -833,54 +804,28 @@ static char *get_author(const char *message)
 /* Read author-script and return an ident line (author  timestamp) */
 static const char *read_author_ident(struct strbuf *buf)
 {
-   const char *keys[] = {
-   "GIT_AUTHOR_NAME=", "GIT_AUTHOR_EMAIL=", "GIT_AUTHOR_DATE="
-   };
struct strbuf out = STRBUF_INIT;
-   char *in, *eol;
-   const char *val[3];
-   int i = 0;
+   char *name, *email, *date;
 
-   if (strbuf_read_file(buf, rebase_path_author_script(), 256) <= 0)
+   if (read_author_script(rebase_path_author_script(),
+  , , , 0))
return NULL;
 
-   /* dequote values and construct 

[PATCH v4 3/5] am: rename read_author_script()

2018-10-31 Thread Phillip Wood
From: Phillip Wood 

Rename read_author_script() in preparation for adding a shared
read_author_script() function to libgit.

Signed-off-by: Phillip Wood 
---
 builtin/am.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/am.c b/builtin/am.c
index d42b725273..991d13f9a2 100644
--- a/builtin/am.c
+++ b/builtin/am.c
@@ -306,7 +306,7 @@ static int parse_key_value_squoted(char *buf, struct 
string_list *list)
  * script, and thus if the file differs from what this function expects, it is
  * better to bail out than to do something that the user does not expect.
  */
-static int read_author_script(struct am_state *state)
+static int read_am_author_script(struct am_state *state)
 {
const char *filename = am_path(state, "author-script");
struct strbuf buf = STRBUF_INIT;
@@ -441,7 +441,7 @@ static void am_load(struct am_state *state)
BUG("state file 'last' does not exist");
state->last = strtol(sb.buf, NULL, 10);
 
-   if (read_author_script(state) < 0)
+   if (read_am_author_script(state) < 0)
die(_("could not parse author script"));
 
read_commit_msg(state);
-- 
2.19.1



Re: [PATCH v3 0/5] am/rebase: share read_author_script()

2018-10-31 Thread Phillip Wood
On 31/10/2018 02:50, Junio C Hamano wrote:
> Phillip Wood  writes:
> 
>> From: Phillip Wood 
>>
>> Thanks to Junio for the feedback on v2. I've updated patch 4 based on
>> those comments, the rest are unchanged.
> 
> Hmph, all these five patches seem to be identical to what I have in
> 'pu'.  Did you send the right version?

Oh dear that's embarrassing. I updated the patches on my laptop and
forget to sync before sending them from my desktop. I'll send v4 now.

Sorry for the confusion

Phillip

> 
>> v1 cover letter:
>>
>> This is a follow up to pw/rebase-i-author-script-fix, it reduces code
>> duplication and improves rebase's parsing of the author script. After
>> this I'll do another series to share the code to write the author
>> script.
>>
>> Phillip Wood (5):
>>   am: don't die in read_author_script()
>>   am: improve author-script error reporting
>>   am: rename read_author_script()
>>   add read_author_script() to libgit
>>   sequencer: use read_author_script()
>>
>>  builtin/am.c |  60 ++--
>>  sequencer.c  | 192 ---
>>  sequencer.h  |   3 +
>>  3 files changed, 128 insertions(+), 127 deletions(-)



Re: Using --word-diff breaks --color-moved

2018-10-31 Thread james harvey
On Wed, Oct 31, 2018 at 12:27 AM Junio C Hamano  wrote:
>
> james harvey  writes:
>
> > If you use both "--word-diff" and "--color-moved", regardless of the
> > order of arguments, "--word-diff" takes precedence and "--color-moved"
> > isn't allowed to do anything.
> >
> > I think "--color-moved" should have precedence over "--word-diff".  I
> > cannot think of a scenario where a user would supply both options, and
> > actually want "--word-diff" to take precedence.
>
> I am not sure if I follow.  If these two cannot work well together,
> then we should just reject the request as asking for incompatible
> combination of options while we are parsing the command line
> arguments, rather than arguing which one should trump the other
> one---that would simply lead to "in my opinion, word-diff is more
> important" vs "in mine, color-moved is more important", no?

I should have been more clear in my original message.  I don't mean
that if "--color-moved" is given, that the argument "--word-diff"
should be completely ignored as if it weren't given as an option.

I'm not too concerned about my reduced test case scenario.  I'm
concerned about a larger diff, where there's some areas that got
moved, some lines that got deleted, some added, and some lines with
just a word or two changed.

In those larger scenarios, WITHOUT using BOTH "--color-moved" and
"--word-diff", and INSTEAD just using "git diff --color-moved", a
typical full line(s) diff occurs for changed areas that weren't moved,
as if it were given as a hidden/default option.  It's analyzing each
differing area to see if it's going to show each of those differing
areas as a move or a full line(s) diff.  Here, "--color-moved" takes
precedence (in the way I'm trying to use the word) over the typical
full line(s) diff.

I could be wrong, but I don't see why "--color-moved" can't operate
the same way, with "--word-diff" taking the place of the typical full
line(s) diff.  So, if it would be technically accurate to show
something that was moved using either method, that it would show moved
areas as a move rather than as word-diffs.  This would leave areas not
moved to be word-diffed.

I think these options can co-exist.  I could be wrong, but I'm betting
the code for "--color-moved" was only written with the typical full
line(s) diff in mind, and wasn't written with "--word-diff" in mind.


Re: [PATCH 01/24] Makefile: add pending semantic patches

2018-10-31 Thread Junio C Hamano
Stefan Beller  writes:

> [Missing: SZEDERs sign off, so I also do not sign off]

At least to me, based on my reading of DCO in
Documentation/SubmittingPatches, this reasoning does not make much
sense.


Re: [PATCHv2 00/24] Bring more repository handles into our code base]

2018-10-31 Thread Junio C Hamano
Stefan Beller  writes:

> I also picked up the patch for pending semantic patches, as the
> first patch, can I have your sign off, please?

I find this step quite lacking.  

What was posted would have been perfectly fine as a "how about doing
it this way" weatherbaloon patch, but as a part of real series, it
needs to explain to the developers what the distinctions between two
classes are, and who is to use the cocci patch at what point in the
development cycle, etc., in an accompanying documentation update.

So can we have both sign-off and write-up (perhaps cut from
the original e-mail discussion)?

> Stefan Beller (23):
>   sha1_file: allow read_object to read objects in arbitrary repositories
>   packfile: allow has_packed_and_bad to handle arbitrary repositories
>   object-store: allow read_object_file_extended to read from arbitrary
> repositories
>   object-store: prepare read_object_file to deal with arbitrary
> repositories
>   object-store: prepare has_{sha1, object}_file[_with_flags] to handle
> arbitrary repositories
>   object: parse_object to honor its repository argument
>   commit: allow parse_commit* to handle arbitrary repositories
>   commit-reach.c: allow paint_down_to_common to handle arbitrary
> repositories
>   commit-reach.c: allow merge_bases_many to handle arbitrary
> repositories
>   commit-reach.c: allow remove_redundant to handle arbitrary
> repositories
>   commit-reach.c: allow get_merge_bases_many_0 to handle arbitrary
> repositories
>   commit-reach: prepare get_merge_bases to handle arbitrary repositories
>   commit-reach: prepare in_merge_bases[_many] to handle arbitrary
> repositories
>   commit: prepare get_commit_buffer to handle arbitrary repositories
>   commit: prepare repo_unuse_commit_buffer to handle arbitrary
> repositories
>   commit: prepare logmsg_reencode to handle arbitrary repositories
>   pretty: prepare format_commit_message to handle arbitrary repositories
>   submodule: use submodule repos for object lookup
>   submodule: don't add submodule as odb for push
>   commit-graph: convert remaining function to handle arbitrary
> repositories
>   commit: make free_commit_buffer and release_commit_memory repository
> agnostic
>   path.h: make REPO_GIT_PATH_FUNC repository agnostic
>   t/helper/test-repository: celebrate independence from the_repository

It seems that this topic is full of commits with overly long title.

>
>  Makefile  |   6 +-
>  builtin/fsck.c|   3 +-
>  builtin/log.c |   6 +-
>  builtin/rev-list.c|   3 +-
>  cache.h   |   2 +
>  commit-graph.c|  40 +++--
>  commit-reach.c|  73 +
>  commit-reach.h|  38 +++--
>  commit.c  |  41 ++---
>  commit.h  |  43 +-
>  .../coccinelle/the_repository.pending.cocci   | 144 ++
>  object-store.h|  35 -
>  object.c  |   8 +-
>  packfile.c|   5 +-
>  packfile.h|   2 +-
>  path.h|   2 +-
>  pretty.c  |  28 ++--
>  pretty.h  |   7 +-
>  sha1-file.c   |  34 +++--
>  streaming.c   |   2 +-
>  submodule.c   |  79 +++---
>  t/helper/test-repository.c|  10 ++
>  22 files changed, 459 insertions(+), 152 deletions(-)
>  create mode 100644 contrib/coccinelle/the_repository.pending.cocci
>
> git range-diff origin/sb/more-repo-in-api...
> [...] # rebased to origin/master

I offhand do not recall what happened during these 100+ pacthes.
DId we acquire something significantly new that would have
conflicted with this new round of the topic?  I do not mind at all
seeing that a series gets adjusted to the updated codebase, but I do
dislike seeing it done without explanation---an unexplained rebase
to a new base is a wasted opportunity to learn what areas of the
codebase are so hot that multiple and separate topics would want to
touch them.

>   -:  -- > 116:  4ede3d42df Seventh batch for 2.20
>   -:  -- > 117:  b1de196491 Makefile: add pending semantic patches
>   1:  1b9b5c695e = 118:  f1be5eb487 sha1_file: allow read_object to read 
> objects in arbitrary repositories
>   2:  33b94066f2 = 119:  9100a5705d packfile: allow has_packed_and_bad to 
> handle arbitrary repositories
>   3:  5217b6b1e1 = 120:  a4ad7791da object-store: allow 
> read_object_file_extended to read from arbitrary repositories
>   4:  2b7239b55b ! 121:  5d7b3a6dd9 

Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-31 Thread Christian Couder
Hi Jonathan,

On Tue, Oct 16, 2018 at 7:43 PM Jonathan Nieder  wrote:
>
> Hi Christian,
>
> On Tue, Sep 25, 2018, Christian Couder wrote:
>
> > In the cover letter there is a "Discussion" section which is about
> > this, but I agree that it might not be very clear.
> >
> > The main issue that this patch series tries to solve is that
> > extensions.partialclone config option limits the partial clone and
> > promisor features to only one remote. One related issue is that it
> > also prevents to have other kind of promisor/partial clone/odb
> > remotes. By other kind I mean remotes that would not necessarily be
> > git repos, but that could store objects (that's where ODB, for Object
> > DataBase, comes from) and could provide those objects to Git through a
> > helper (or driver) script or program.
>
> Thanks for this explanation.  I took the opportunity to learn more
> while you were in the bay area for the google summer of code mentor
> summit and learned a little more, which was very helpful to me.

Thanks for inviting me at the Google offices in Sunnyvale and San
Francisco to discuss about this and other issues.

> The broader picture is that this is meant to make Git natively handle
> large blobs in a nicer way.  The design in this series has a few
> components:
>
>  1. Teaching partial clone to attempt to fetch missing objects from
> multiple remotes instead of only one.  This is useful because you
> can have a server that is nearby and cheaper to serve from (some
> kind of local cache server) that you make requests to first before
> falling back to the canonical source of objects.
>
>  2. Simplifying the protocol for fetching missing objects so that it
> can be satisfied by a lighter weight object storage system than
> a full Git server.  The ODB helpers introduced in this series are
> meant to speak such a simpler protocol since they are only used
> for one-off requests of a collection of missing objects instead of
> needing to understand refs, Git's negotiation, etc.
>
>  3. (possibly, though not in this series) Making the criteria for what
> objects can be missing more aggressive, so that I can "git add"
> a large file and work with it using Git without even having a
> second copy of that object in my local object store.

Yeah, I think this is a good summary of the issues I have been trying
to address.

> For (2), I would like to see us improve the remote helper
> infrastructure instead of introducing a new ODB helper.  Remote
> helpers are already permitted to fetch some objects without listing
> refs --- perhaps we will want to
>
>  i. split listing refs to a separate capability, so that a remote
> helper can advertise that it doesn't support that.  (Alternatively
> the remote could advertise that it has no refs.)
>
>  ii. Use the "long-running process" mechanism to improve how Git
>  communicates with a remote helper.

Yeah, I agree that improving the remote helper infrastructure is
probably better than what I have been trying to add. And I agree with
the above 2 steps you propose.

> For (1), things get more tricky.  In an object store from a partial
> clone today, we relax the ordinary "closure under reachability"
> invariant but in a minor way.  We'll need to work out how this works
> with multiple promisor remotes.
>
> The idea today is that there are two kinds of packs: promisor packs
> (from the promisor remote) and non-promisor packs.  Promisor packs are
> allowed to have reachability edges (for example a tree->blob edge)
> that point to a missing object, since the promisor remote has promised
> that we will be able to access that object on demand.  Non-promisor
> packs are also allowed to have reachability edges that point to a
> missing object, as long as there is a reachability edge from an object
> in a promisor pack to the same object (because of the same promise).
> See "Handling Missing Objects" in Documentation/technical/partial-clone.txt
> for more details.
>
> To prevent older versions of Git from being confused by partial clone
> repositories, they use the repositoryFormatVersion mechanism:
>
> [core]
> repositoryFormatVersion = 1
> [extensions]
> partialClone = ...
>
> If we change the invariant, we will need to use a new extensions.* key
> to ensure that versions of Git that are not aware of the new invariant
> do not operate on the repository.

Maybe the versions of Git that are not aware of the new invariant
could still work using only the specified remote while the new
versions would know that they can use other remotes by looking at
other config variables.

> A promisor pack is indicated by there being a .promisor file next to
> the usual .pack file.  Currently the .promisor file is empty.  The
> previous idea was that once we want more metadata (e.g. for the sake of
> multiple promisor remotes), we could write it in that file.  For
> example, remotes could be 

Re: [PATCH 1/3] commit-reach: implement get_reachable_subset

2018-10-31 Thread Elijah Newren
On Tue, Oct 30, 2018 at 7:16 AM Derrick Stolee via GitGitGadget
 wrote:
> --- a/commit-reach.c
> +++ b/commit-reach.c
> @@ -688,3 +688,73 @@ int can_all_from_reach(struct commit_list *from, struct 
> commit_list *to,
> object_array_clear(_objs);
> return result;
>  }
> +
> +struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
> +struct commit **to, int nr_to,
> +int reachable_flag)
> +{
> +   struct commit **item;
> +   struct commit *current;
> +   struct commit_list *found_commits = NULL;
> +   struct commit **to_last = to + nr_to;
> +   struct commit **from_last = from + nr_from;
> +   uint32_t min_generation = GENERATION_NUMBER_INFINITY;
> +   int num_to_find = 0;
> +
> +   struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
> +
> +   for (item = to; item < to_last; item++) {
> +   struct commit *c = *item;
> +
> +   parse_commit(c);
> +   if (c->generation < min_generation)
> +   min_generation = c->generation;

So, when we don't have a commit-graph, is c->generation just going to
be 0, making min_generation also be 0? (meaning we get no possible
speed benefit from the commit-graph, since we just don't have that
information available)?

> +
> +   if (!(c->object.flags & PARENT1)) {
> +   c->object.flags |= PARENT1;
> +   num_to_find++;
> +   }
> +   }
> +
> +   for (item = from; item < from_last; item++) {
> +   struct commit *c = *item;
> +   if (!(c->object.flags & PARENT2)) {
> +   c->object.flags |= PARENT2;
> +   parse_commit(c);
> +
> +   prio_queue_put(, *item);
> +   }
> +   }
> +
> +   while (num_to_find && (current = prio_queue_get()) != NULL) {
> +   struct commit_list *parents;
> +
> +   if (current->object.flags & PARENT1) {
> +   current->object.flags &= ~PARENT1;
> +   current->object.flags |= reachable_flag;
> +   commit_list_insert(current, _commits);
> +   num_to_find--;
> +   }
> +
> +   for (parents = current->parents; parents; parents = 
> parents->next) {
> +   struct commit *p = parents->item;
> +
> +   parse_commit(p);
> +
> +   if (p->generation < min_generation)
> +   continue;
> +
> +   if (p->object.flags & PARENT2)
> +   continue;
> +
> +   p->object.flags |= PARENT2;
> +   prio_queue_put(, p);
> +   }
> +   }
> +
> +   clear_commit_marks_many(nr_to, to, PARENT1);
> +   clear_commit_marks_many(nr_from, from, PARENT2);
> +
> +   return found_commits;
> +}
> +
> diff --git a/commit-reach.h b/commit-reach.h
> index 7d313e297..43bd50a70 100644
> --- a/commit-reach.h
> +++ b/commit-reach.h
> @@ -74,4 +74,14 @@ int can_all_from_reach_with_flag(struct object_array *from,
>  int can_all_from_reach(struct commit_list *from, struct commit_list *to,
>int commit_date_cutoff);
>
> +
> +/*
> + * Return a list of commits containing the commits in the 'to' array
> + * that are reachable from at least one commit in the 'from' array.
> + * Also add the given 'flag' to each of the commits in the returned list.
> + */
> +struct commit_list *get_reachable_subset(struct commit **from, int nr_from,
> +struct commit **to, int nr_to,
> +int reachable_flag);
> +
>  #endif
> --
> gitgitgadget
>


Re: [PATCH 1/1] Use correct /dev/null for UNIX and Windows

2018-10-31 Thread Junio C Hamano
Chris Webster  writes:

>>> > Use File::Spec->devnull() for output redirection to avoid messages
>>> > when Windows version of Perl is first in path.  The message 'The
>>>
>>> Dscho, "Windows version of Perl is first in path" somehow feels
>>> contradicting with what one of the topics I saw from you were trying
>>> to enforce (or, at least, "set as the supported configuration").
>>>
>>> I am guessing that the Perl you are building and shipping with Git
>>> for Windows would yield what the shell that ends up running the
>>> scriptlet `git config --get-color $key` prefers when asked for
>>> File::Spec->devnull(), and nothing will break with this patch even
>>> if that is "/dev/null", but I thought I'd double check.
>>>
>>> Thanks.
>>>
> This problem originally showed up in the
> https://github.com/so-fancy/diff-so-fancy project, which has a copy of
> DiffHighlight.pm.   That project allows diffsofancy (perl) to be run
> from the command line without requiring the bash environment ((well ,
> sort of) including the associated perl).

Thanks for additional comments.  

In any case, Windows is not my bailiwick, so I'll hope that the
above comments from you would help Dscho in his response and wait.
I know use of File::Spec->devnull() won't hurt POSIX folks so making
sure this won't break Git for Windows is the primary thing I woudl
worry about this patch.



Re: [PATCH 0/3] Make add_missing_tags() linear

2018-10-31 Thread Elijah Newren
On Tue, Oct 30, 2018 at 7:16 AM Derrick Stolee via GitGitGadget
 wrote:
>
> As reported earlier [1], the add_missing_tags() method in remote.c has
> quadratic performance. Some of that performance is curbed due to the
> generation-number cutoff in in_merge_bases_many(). However, that fix doesn't
> help users without a commit-graph, and it can still be painful if that
> cutoff is sufficiently low compared to the tags we are using for
> reachability testing.
>
> Add a new method in commit-reach.c called get_reachable_subset() which does
> a many-to-many reachability test. Starting at the 'from' commits, walk until
> the generation is below the smallest generation in the 'to' commits, or all
> 'to' commits have been discovered. This performs only one commit walk for
> the entire add_missing_tags() method, giving linear performance in the worst
> case.
>
> Tests are added in t6600-test-reach.sh to ensure get_reachable_subset()
> works independently of its application in add_missing_tags().

On the original repo where the topic was brought up, with commit-graph
NOT turned on and using origin/master, I see:

$ time git push --dry-run --follow-tags /home/newren/repo-mirror
To /home/newren/repo-mirror
 * [new branch]  test5 -> test5

real 1m20.081s
user 1m19.688s
sys 0m0.292s

Merging this series in, I now get:

$ time git push --dry-run --follow-tags /home/newren/repo-mirror
To /home/newren/repo-mirror
 * [new branch]  test5 -> test5

real 0m2.857s
user 0m2.580s
sys 0m0.328s

which provides a very nice speedup.

Oddly enough, if I _also_ do the following:
$ git config core.commitgraph true
$ git config gc.writecommitgraph true
$ git gc

then my timing actually slows down just slightly:
$ time git push --follow-tags --dry-run /home/newren/repo-mirror
To /home/newren/repo-mirror
 * [new branch]  test5 -> test5

real 0m3.027s
user 0m2.696s
sys 0m0.400s

(run-to-run variation seems pretty consistent, < .1s variation, so
this difference is just enough to notice.)  I wouldn't be that
surprised if that means there's some really old tags with very small
generation numbers, meaning it's not gaining anything in this special
case from the commit-graph, but it does pay the cost of loading the
commit-graph.


Anyway, looks good in my testing.  Thanks much for working on this!


Re: [PATCH v7 09/10] submodule: support reading .gitmodules when it's not in the working tree

2018-10-31 Thread Junio C Hamano
Antonio Ospite  writes:

> I see, this is also mentioned in t/README, I had overlooked that part.
> Thank you for reporting.
>
>> Without this fix, your new test case will fail on Windows all the time,
>> see e.g.
>> https://git-for-windows.visualstudio.com/git/_build/results?buildId=22913=logs
>> 
>
> Junio, what is the plan for 'ao/submodule-wo-gitmodules-checked-out'?
>

I did not and do not have a specific plan ;-) 

If the only remaining issue in the previous round of the topic were
what you said in <20181010205645.e1529eff9099805029b1d...@ao2.it>,
which you addressed in this round, and given that Stefan (who is
likely to be the person who would need to work with you if there is
any issues later found in this topic) seemed to be happy with it in
,
I'd say with Dscho's bug fixed, it should be ready for 'next'.

> I see it's not in next yet; do you want me to resend the whole series
> with this fixup in or would it be less overhead for you to apply it
> directly to patch 9/10 from v7 of the series?

In either way, this involves rebuilding ao/* topic and then redoing
sb/submodule-recursive-fetch-gets-the-tip topic on top, before I can
do the 'pu' with them, so I cannot promise it will happen today, but
let's see.  I think I have enough material to do the fix-up locally
without any additional thing sent from you.

Thanks.


> P.S. I was wondering if it is worth having patchset versions mentioned
> somewhere in pu/, maybe in merge commits if not in branch names?

No, not in branch names.

The tip date published in the "What's cooking" report is taken from
the committer date but we may want to use the author date instead,
which may help (and encourage people to be careful _before_ sending
things out, to avoid doing many rerolls in a day).