Re: [PATCH] docs/gitweb.conf: config variable typo

2018-12-07 Thread Jonathan Nieder
Hi,

Frank Dana wrote:

> The documentation for the feature 'snapshot' claimed
> "This feature can be configured on a per-repository basis via
> repository's `gitweb.blame` configuration variable"
>
> Fixed to specify `gitweb.snapshot` as the variable name.
> ---
>  Documentation/gitweb.conf.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks for fixing it.  May we forge your sign-off?  See
Documentation/SubmittingPatches section "Certify your work" for what
this means.

Sincerely,
Jonathan


Re: [PATCH] commit: abort before commit-msg if empty message

2018-12-07 Thread Jonathan Nieder
Hi,

Jonathan Tan wrote:

> (The implementation in this commit reads the commit message twice even
> if there is no commit-msg hook. I think that this is fine, since this is
> for interactive use - an alternative would be to plumb information about
> the absence of the hook from run_hook_ve() upwards, which seems more
> complicated.)

Sounds like a good followup for an interested person to do later.  Can
you include a NEEDSWORK comment describing this?

> Signed-off-by: Jonathan Tan 
> ---
> This was noticed with the commit-msg hook that comes with Gerrit, which
> basically just calls git interpret-trailers to add a Change-Id trailer.

Thanks for fixing it.

This kind of context tends to be very useful when looking back at a
commit later, so I'd be happy to see it in the commit message too.

[...]
> --- a/builtin/commit.c
> +++ b/builtin/commit.c
> @@ -652,6 +652,21 @@ static void adjust_comment_line_char(const struct strbuf 
> *sb)
>   comment_line_char = *p;
>  }
>  
> +static void read_and_clean_commit_message(struct strbuf *sb)
> +{
> + if (strbuf_read_file(sb, git_path_commit_editmsg(), 0) < 0) {
> + int saved_errno = errno;
> + rollback_index_files();
> + die(_("could not read commit message: %s"), 
> strerror(saved_errno));

Long line.

More importantly, I wonder if this can use die_errno.
rollback_index_files calls delete_tempfile, which can clobber errno, so
that would require restoring errno either here or in that function:

diff --git i/lockfile.h w/lockfile.h
index 35403ccc0d..d592f384e7 100644
--- i/lockfile.h
+++ w/lockfile.h
@@ -298,10 +298,14 @@ static inline int commit_lock_file_to(struct lock_file 
*lk, const char *path)
  * remove the lockfile. It is a NOOP to call `rollback_lock_file()`
  * for a `lock_file` object that has already been committed or rolled
  * back.
+ *
+ * Saves and restores errno for more convenient use during error handling.
  */
 static inline void rollback_lock_file(struct lock_file *lk)
 {
+   int save_errno = errno;
delete_tempfile(>tempfile);
+   errno = save_errno;
 }
 
 #endif /* LOCKFILE_H */

It doesn't make the code more obvious so what you have is probably
better.

Also, on second glance, this is just moved code.  Still hopefully some
of the above is amusing (and rewrapping would be fine to do during the
move).

[...]
> @@ -970,6 +985,22 @@ static int prepare_to_commit(const char *index_file, 
> const char *prefix,
>   argv_array_clear();
>   }
>  
> + if (use_editor && !no_verify) {
> + /*
> +  * Abort the commit if the user supplied an empty commit
> +  * message in the editor. (Because the commit-msg hook is to be
> +  * run, we need to check this now, since that hook may change
> +  * the commit message.)
> +  */
> + read_and_clean_commit_message();
> + if (message_is_empty(, cleanup_mode) && 
> !allow_empty_message) {
> + rollback_index_files();
> + fprintf(stderr, _("Aborting commit due to empty commit 
> message.\n"));
> + exit(1);

What about the template_untouched case?

Should the two call sites share code?

[...]
> + }
> + strbuf_release();
> + }
> +
>   if (!no_verify &&
>   run_commit_hook(use_editor, index_file, "commit-msg", 
> git_path_commit_editmsg(), NULL)) {
>   return 0;

This means we have a little duplication of code: first we check whether
to abort here in prepare_to_commit, and then again after the hook is run
in cmd_commit.

Is there some other code structure that would make things more clear?

cmd_commit also seems to be rather long --- is there some logical way
to split it up that would make the code clearer (as a preparatory or
followup patch)?

In cmd_commit, we spend a while (in number of lines, not actual
running time) to determine parents before deciding whether the user
has chosen to abort by writing an empty message.  Should we perform
that check sooner, closer to the prepare_to_commit call?

> @@ -1608,17 +1639,7 @@ int cmd_commit(int argc, const char **argv, const char 
> *prefix)
>  
>   /* Finally, get the commit message */
>   strbuf_reset();
> - if (strbuf_read_file(, git_path_commit_editmsg(), 0) < 0) {
> - int saved_errno = errno;
> - rollback_index_files();
> - die(_("could not read commit message: %s"), 
> strerror(saved_errno));
> - }
> -
> - if (verbose || /* Truncate the message just before the diff, if any. */
> - cleanup_mode == COMMIT_MSG_CLEANUP_SCISSORS)
> - strbuf_setlen(, wt_status_locate_end(sb.buf, sb.len));
> - if (cleanup_mode != COMMIT_MSG_CLEANUP_NONE)
> - strbuf_stripspace(, cleanup_mode == COMMIT_MSG_CLEANUP_ALL);
> + read_and_clean_commit_message();
>  
>   if (message_is_empty(, cleanup_mode) && 

Re: enhancement: support for author.email and author.name in "git config"

2018-12-07 Thread Jonathan Nieder
William Hubbs wrote:
> On Thu, Dec 06, 2018 at 11:20:07PM +0100, Johannes Schindelin wrote:

>> There *is* a way to get what you want that is super easy and will
>> definitely work: if you sit down and do it ;-)
>>
>> Please let us know if you need any additional information before you can
>> start.
>
> Which branch should I work off of in the repo?

"master".

Also, please make sure the documentation (e.g. in
Documentation/config/user.txt) describes when a user would want to set
this option.

See also:
- Documentation/SubmittingPatches
- the DISCUSSION section of "git help format-patch"
- [1]
- https://www.kernel.org/pub/software/scm/git/docs/user-manual.html#hacking-git

Thanks,
Jonathan

[1] 
https://github.com/gitgitgadget/gitgitgadget#a-bot-to-serve-as-glue-between-github-pull-requests-and-the-git-mailing-list


Re: RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:
> On Fri, Dec 07 2018, Jonathan Nieder wrote:

>> The patch-id appears to only care about the diff text, so it should be
>> able to handle this.  So if we have a better heuristic for where the
>> diff starts, it would be good to use it.
>
> No, the patch-id doesn't just care about the diff, it cares about the
> context before the diff too.

Sorry, I did a bad job of communicating.  When I said "diff text", I was
including context.

[...]
> Observe that the diff --git line matters, we hash it:
>
> $ git diff-tree -p HEAD~.. | git patch-id
> 5870d115b7e2a9a936ab8fdc254932234413c710 
> 
> $ git diff-tree --src-prefix=a/ --dst-prefix=b/ -p HEAD~.. | git patch-id 
> --stable
> 5870d115b7e2a9a936ab8fdc254932234413c710 
> 
> $ git diff-tree --src-prefix=x/ --dst-prefix=y/ -p HEAD~.. | git patch-id 
> --stable
> 4cd136f2b98760150f700ac6a5b126389d6d05a7 
> 

Oh, hm.  That's unfortunate.

[...]
> So it seems most sensible to me if this is going to be supported that we
> go a bit beyond the call of duty and fake up the start of it, namely:
>
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c
>
> To be:
>
> diff --git a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
> --- a/arch/x86/kernel/process.c
> +++ b/arch/x86/kernel/process.c

Right.  We may want to handle diff.mnemonicPrefix as well.

Jonathan


Re: [PATCH] mailmap: update brandon williams's email address

2018-12-07 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> What would be more of interest is why we'd be interested in this patch
> as there is no commit/patch sent by Brandon with this email in gits history.

I think there's an implicit assumption in this question that isn't
spelled out.  Do I understand correctly that you're saying the main
purpose of .mailmap is to figure out whether two commits are by the
same author?

My own uses of .mailmap primarily have a different purpose: to find
out the preferred contact address for the author of a given commit.

Thanks,
Jonathan


Re: RFE: git-patch-id should handle patches without leading "diff"

2018-12-07 Thread Jonathan Nieder
Hi,

Konstantin Ryabitsev wrote:

> Every now and again I come across a patch sent to LKML without a leading
> "diff a/foo b/foo" -- usually produced by quilt. E.g.:
>
> https://lore.kernel.org/lkml/20181125185004.151077...@linutronix.de/
>
> I am guessing quilt does not bother including the leading "diff a/foo
> b/foo" because it's redundant with the next two lines, however this
> remains a valid patch recognized by git-am.
>
> If you pipe that patch via git-patch-id, it produces nothing, but if I
> put in the leading "diff", like so:
>
> diff a/arch/x86/kernel/process.c b/arch/x86/kernel/process.c
>
> then it properly returns "fb3ae17451bc619e3d7f0dd647dfba2b9ce8992e".

Interesting.  As Ævar mentioned, the relevant code is

/* Ignore commit comments */
if (!patchlen && !starts_with(line, "diff "))
continue;

which is trying to handle a case where a line that is special to the
parser appears before the diff begins.

The patch-id appears to only care about the diff text, so it should be
able to handle this.  So if we have a better heuristic for where the
diff starts, it would be good to use it.

"git apply" uses apply.c::find_header, which is more permissive.
Maybe it would be possible to unify these somehow.  (I haven't looked
closely enough to tell how painful that would be.)

Thanks and hope that helps,
Jonathan


Re: [PATCH] mailmap: update brandon williams's email address

2018-12-07 Thread Jonathan Nieder
Brandon Williams wrote:

> Signed-off-by: Brandon Williams 
> ---
>  .mailmap | 1 +
>  1 file changed, 1 insertion(+)

I can confirm that this is indeed the same person.

Reviewed-by: Jonathan Nieder 

Welcome back!


gitweb: local configuration not found

2018-12-05 Thread Jonathan Nieder
Martin Mares wrote[1]:

> After upgrade to Stretch, gitweb no longer finds the configuration file
> "gitweb_config.perl" in the current directory. However, "man gitweb" still
> mentions this as one of the possible locations of the config file (and
> indeed a useful one when using multiple instances of gitweb).
>
> It was probably broken by Perl dropping "." from the default search path
> for security reasons.

Indeed, perldelta(1) tells me that in 5.24.1 (and 5.26, etc),

  Core modules and tools no longer search "." for optional modules

gitweb.perl contains

 sub read_config_file {
 my $filename = shift;
 return unless defined $filename;
 # die if there are errors parsing config file
 if (-e $filename) {
 do $filename;

which implies an @INC search but it is silly --- as the "-e" test
illustrates, this never intended to search @INC.

Documentation says "If you are absolutely certain that you want your
script to load and execute a file from the current directory, then use
a ./ prefix".  We can do that, like so:

diff --git i/gitweb/Makefile w/gitweb/Makefile
index cd194d057f..3160b6cc5d 100644
--- i/gitweb/Makefile
+++ w/gitweb/Makefile
@@ -18,7 +18,7 @@ RM ?= rm -f
 INSTALL ?= install
 
 # default configuration for gitweb
-GITWEB_CONFIG = gitweb_config.perl
+GITWEB_CONFIG = ./gitweb_config.perl
 GITWEB_CONFIG_SYSTEM = /etc/gitweb.conf
 GITWEB_CONFIG_COMMON = /etc/gitweb-common.conf
 GITWEB_HOME_LINK_STR = projects

but that does not help if someone overrides GITWEB_CONFIG, and besides,
it would be nicer to avoid the possibility of an @INC search altogether.
Another alternative would be to use

local @INC = ('.');

Would that be better?

Advice from someone more versed than I am in perl would be very welcome
(hence the cc to Ævar).

Thanks for reporting and hope that helps,
Jonathan

[1] https://bugs.debian.org/915632


Re: [PATCH] sideband: color lines with keyword only

2018-12-03 Thread Jonathan Nieder
Stefan Beller wrote:
> On Mon, Dec 3, 2018 at 3:23 PM Jonathan Nieder  wrote:

>> I was curious about what versions of Gerrit this is designed to
>> support (or in other words whether it's a bug fix or a feature).
>> Looking at examples like [1], it seems that Gerrit historically always
>> used "ERROR:" so the 59a255aef0 logic would work for it.  More
>> recently, [2] (ReceiveCommits: add a "SUCCESS" marker for successful
>> change updates, 2018-08-21) put SUCCESS on a line of its own.  That
>> puts this squarely in the new-feature category.
>
> Ooops. From the internal bug, I assumed this to be long standing Gerrit
> behavior, which is why I sent it out in -rc to begin with.

No worries.  Can't hurt for Junio to have a few patches to apply to
"pu" or "next" to practice using the release candidates. :)

[...]
>> In the old code, we would escape early if 'n == len', but we didn't
>> need to.  If 'n == len', then
>>
>> src[len] == '\0'
>
> src[len] could also be one of "\n\r", see the caller
> recv_sideband for sidebase case 2.

Yes, I noticed too late[*].  Sorry for the noise.

The patch still looks good.

Jonathan

[*] https://public-inbox.org/git/20181203233439.gb157...@google.com/


Re: [PATCH] sideband: color lines with keyword only

2018-12-03 Thread Jonathan Nieder
Jonathan Nieder wrote:
> Stefan Beller wrote:

>>  /*
>>   * Match case insensitively, so we colorize output from existing
>> @@ -95,7 +95,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
>> const char *src, int n)
>>   * messages. We only highlight the word precisely, so
>>   * "successful" stays uncolored.
>>   */
>> -if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) {
>> +if (!strncasecmp(p->keyword, src, len) &&
>> +(len == n || !isalnum(src[len]))) {
>
> Our custom isalnum treats '\0' as not alphanumeric (sane_ctype[0] ==
> GIT_CNTRL) so this part of the patch is unnecessary.  That said, it's
> good for clarity and defensive programming.

Correction: I am being silly here.  src[len] can be '\0', '\n', or
'\r' --- it's not always '\0'.  And the contract of this function is
that src[len] could be anything.  Thanks for having handled it
correctly. :)

Jonathan


Re: [PATCH] sideband: color lines with keyword only

2018-12-03 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> When bf1a11f0a1 (sideband: highlight keywords in remote sideband output,
> 2018-08-07) was introduced, it was carefully considered which strings
> would be highlighted. However 59a255aef0 (sideband: do not read beyond
> the end of input, 2018-08-18) brought in a regression that the original
> did not test for. A line containing only the keyword and nothing else
> ("SUCCESS") should still be colored.
>
> Signed-off-by: Stefan Beller 
> ---
>  sideband.c  | 5 +++--
>  t/t5409-colorize-remote-messages.sh | 2 ++
>  2 files changed, 5 insertions(+), 2 deletions(-)

Thanks for writing this.

I was curious about what versions of Gerrit this is designed to
support (or in other words whether it's a bug fix or a feature).
Looking at examples like [1], it seems that Gerrit historically always
used "ERROR:" so the 59a255aef0 logic would work for it.  More
recently, [2] (ReceiveCommits: add a "SUCCESS" marker for successful
change updates, 2018-08-21) put SUCCESS on a line of its own.  That
puts this squarely in the new-feature category.

"success" on its own line is even less likely to be a false positive
than "success" followed by punctuation (for example a period marking
the end of a sentence).  So I like this change.

[1] https://gerrit-review.googlesource.com/c/gerrit/+/22361
[2] https://gerrit-review.googlesource.com/c/gerrit/+/193570

> diff --git a/sideband.c b/sideband.c
> index 368647acf8..7c3d33d3f8 100644
> --- a/sideband.c
> +++ b/sideband.c
> @@ -87,7 +87,7 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
> const char *src, int n)
>   struct keyword_entry *p = keywords + i;
>   int len = strlen(p->keyword);
>  
> - if (n <= len)
> + if (n < len)
>   continue;

In the old code, we would escape early if 'n == len', but we didn't
need to.  If 'n == len', then

src[len] == '\0'
src .. [len-1] is a valid buffer to read from

so the strncasecmp and strbuf_add operations used in this function are
valid.  Good.

>   /*
>* Match case insensitively, so we colorize output from existing
> @@ -95,7 +95,8 @@ static void maybe_colorize_sideband(struct strbuf *dest, 
> const char *src, int n)
>* messages. We only highlight the word precisely, so
>* "successful" stays uncolored.
>*/
> - if (!strncasecmp(p->keyword, src, len) && !isalnum(src[len])) {
> + if (!strncasecmp(p->keyword, src, len) &&
> + (len == n || !isalnum(src[len]))) {

Our custom isalnum treats '\0' as not alphanumeric (sane_ctype[0] ==
GIT_CNTRL) so this part of the patch is unnecessary.  That said, it's
good for clarity and defensive programming.

>   strbuf_addstr(dest, p->color);
>   strbuf_add(dest, src, len);
>   strbuf_addstr(dest, GIT_COLOR_RESET);
> diff --git a/t/t5409-colorize-remote-messages.sh 
> b/t/t5409-colorize-remote-messages.sh
> index f81b6813c0..2a8c449661 100755
> --- a/t/t5409-colorize-remote-messages.sh
> +++ b/t/t5409-colorize-remote-messages.sh
> @@ -17,6 +17,7 @@ test_expect_success 'setup' '
>   echo " " "error: leading space"
>   echo ""
>   echo Err
> + echo SUCCESS
>   exit 0
>   EOF
>   echo 1 >file &&
> @@ -35,6 +36,7 @@ test_expect_success 'keywords' '
>   grep "error: error" decoded &&
>   grep "hint:" decoded &&
>   grep "success:" decoded &&
> + grep "SUCCESS" decoded &&
>   grep "warning:" decoded
>  '

Nice tests.

The "hinting: not highlighted" example shows that we aren't
introducing false positives here, so the coverage seems sufficient.
It might be nice to include a line

echo ERROR:

as well to match another idiom that Gerrit sometimes uses.

Reviewed-by: Jonathan Nieder 

Thanks again for a pleasant read.


Re: [PATCH] rebase: mark the C reimplementation as an experimental opt-in feature (was Re: [ANNOUNCE] Git v2.20.0-rc1)

2018-11-27 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:
>> Ævar Arnfjörð Bjarmason  writes:

>>> Given that we're still finding regressions bugs in the rebase-in-C
>>> version should we be considering reverting 5541bd5b8f ("rebase: default
>>> to using the builtin rebase", 2018-08-08)?
>>>
>>> I love the feature, but fear that the current list of known regressions
>>> serve as a canary for a larger list which we'd discover if we held off
>>> for another major release (and would re-enable rebase.useBuiltin=true in
>>> master right after 2.20 is out the door).
[...]
> So, in a more concrete form, what you want to see is something like
> this in -rc2 and later?
>
> -- >8 --
> Subject: [PATCH] rebase: mark the C reimplementation as an experimental 
> opt-in feature
>
> It turns out to be a bit too early to unleash the reimplementation
> to the general public.  Let's rewrite some documentation and make it
> an opt-in feature.
>
> Signed-off-by: Junio C Hamano 
> ---
>  Documentation/config/rebase.txt | 16 ++--
>  builtin/rebase.c|  2 +-
>  t/README|  4 ++--
>  3 files changed, 9 insertions(+), 13 deletions(-)

I thought I should weigh in on how this would affect Debian's and
Google's deployments.

First of all, I've looked over the revert patch carefully and it is
well written and does what it says on the tin.

At https://bugs.debian.org/914695 is a report of a test regression in
an outside project that is very likely to have been triggered by the
new faster rebase code.  The issue has not been triaged, so I don't
know yet whether it's a problem in rebase-in-c or a manifestation of a
bug in the test.

That said, Google has been running with the new rebase since ~1 month
ago when it became the default, with no issues reported by users.  As
a result, I am confident that it can cope with what most users of
"next" throw at it, which means that if we are to find more issues to
polish it better, it will need all the exposure it can get.

In the Google deployment, we will keep using rebase-in-c even if it
gets disabled by default, in order to help with that.

>From the Debian point of view, it's only a matter of time before
rebase-in-c becomes the default: even if it's not the default in 2.20,
it would presumably be so in 2.21 or 2.22.  That means the community's
attention when resolving security and reliability bugs would be on the
rebase-in-c implementation.  As a result, the Debian package will most
likely enable rebase-in-c by default even if upstream disables it, in
order to increase the package's shelf life (i.e. to ease the
maintenance burden of supporting whichever version of the package ends
up in the next Debian stable).

So with either hat on, it doesn't matter whether you apply this patch
upstream.

Having two pretty different deployments end up with the same
conclusion leads me to suspect that it's best for upstream not to
apply the revert patch, unless either

  (a) we have a concrete regression to address and then try again, or
  (b) we have a test or other plan to follow before trying again.

Thanks and hope that helps,
Jonathan


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> I don't *think* you intend to say "sure, you got user reports, but
>> (those users are wrong | those users are not real | you are not
>> interpreting those users correctly)", but that is what I am hearing.
>
> What I have been saying is "we are sending a wrong message to those
> users by not clearly saying 'optional' (i.e. it is OK for your Git
> not to understand this optional bits of information---you do not
> have to get alarmed immediately) and also not hinting where that
> optional thing comes from (i.e. if users realized they come from the
> future, the coalmine canary message will serve its purpose of
> reminding them that a newer Git is not just available but has been
> used already in their repository and help them to rectify the
> situation sooner)".
>
> As the deployed versions of Git will keep sending the wrong message,
> I do not mind applying 1/5 and 2/5, given especially that Ben seems
> to be OK with the plan.  I however do not think 3 thru 5 is ready
> yet with this round---there were some discussions on phrasing in
> this thread.

Thanks much --- that helps a lot.

Would you mind taking patch 4/5 as well?  (It's a tweak to the
configuration introduced in patches 1 and 2 that addresses a concern
Ben Peart had.)

As for patches 3 and 5, I agree.  In particular, patch 5 needs an
s/performance//, and it seems that the commit messages need some work
as well.

Sorry for getting the conversation in the wrong direction, and I'm
glad to hear we have a good way forward.

Sincerely,
Jonathan


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> Now, a meta point.  Throughout this discussion, I have been hoping for
>> some acknowledgement of the problem --- e.g. an "I am sympathetic to
>> what you are trying to do, but ".  I wasn't able to find that, and
>> that is part of what contributed to the feeling of not being heard.
[...]
> And seeing the same "let's not enable the new extension" patch again
> without much improved justification contributed greatly to the
> feeling of not being heard.

Thanks for the pointer.  I had not understood before that you were
unhappy with those commit messages.

The commit message describes symptoms and the motivation for the
change.  I was confused at the original replies to patch 1 and 2 that
seemed to be more about patch 3; patch 1 and 2 are meaningful without
patch 3, so it would be odd to include a justification for patch 3 in
their commit message.

That said, it sounds like their commit messages are not adequate.  I'd
appreciate help from someone else to improve them.

>  The feeling is mutual.

I was trying to diagnose what was going wrong with the conversation so
as to move things forward on a better footing.  It seems I only
escalated things more. :(

Sorry about that, and I hope there's some way to move forward.

What is the best way to handle this?  I am feeling somewhat burnt by
this review process.  If Ben and I, working together, are able to come
up with a series that we both like, will you consider it for 2.20?  Is
there some other trusted contributor, such as Peff or Duy, that you
would trust to represent your wishes so I can pursue their Reviewed-by
without risking getting burnt in the same way again?

Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Jonathan Nieder
Stefan Xenos wrote:
> Jonathan Nieder wrote:

>> putting it in the commit message is a way to
>> experiment with the workflow without changing the object format
>
> As long as we're talking about a temporary state of affairs for users
> that have opted in, and we're explicit about the fact that future
> versions of git won't understand the change graphs that are produced
> during that temporary state of affairs, I'm fine with using the commit
> message. We can move it to the header prior to enabling the feature by
> default.

Yay!  I think that addresses both my and Ævar's concerns.  Also, if
you run into an issue that requires changing the object format
earlier, that's fine and we can handle the situation when it comes.

I don't have a strong opinion about whether this would go in the
design doc.  I suppose the doc could have an "implementation plan"
section describing temporary stopping points on the way to the final
result, but it's not necessary to include that.

Thanks for the quick and thoughtful replies.

Jonathan


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Jonathan Nieder
Junio C Hamano wrote:

>  This series has a strong smell of pushing back by the
> toolsmiths who refuse to promptly upgrade to help their users, and
> that is why I do not feel entirely happy with this series.

Last reply, I promise. :)

This sentence might have the key to the misunderstanding.  Let me say
a little more about where this showed up in the internal deployment
here, to clarify things a little.

At Google we deploy snapshots of the "next" branch approximately
weekly so that we can find problems early before they affect a
published release.  We rely on the ability to roll back quickly when a
problem is discovered, and we might care more about compatibility than
some others because of that.

A popular tool within Google has a bundled copy of Git (also a
snapshot of the "next" branch, but from a few weeks prior) and when we
deployed Git with the EOIE and IEOT extensions, users of that tool
very quickly reported the mysterious message.

That said, the maintainers of that tool did not complain at all, so
hopefully I can allay your worries about toolsmiths pushing back.
Once the problem reached my attention (a few days later than I would
have liked it to), the Git team at Google knew that we could not roll
back and were certainly alarmed about what that means about our
ability to cope with other problems should we need to.  But we were
able to quickly update that popular tool --- no issue.

Instead, we ran into a number of other users running into the same
problem, when sharing repositories between machines using sshfs, etc.
That, plus the aforementioned inability to roll back Git if we need
to, meant that this was a serious issue so we quickly addressed it in
the internal installation.

In general, we haven't had much trouble getting people to use Git
2.19.1 or newer.  So the problem here does not have to do with users
being slow to upgrade.

Instead, it's simply that upgrading Git should not cause the older,
widely deployed version of Git to complain about the repositories it
acts on.  That's a recipe for difficult debugging situations, it can
lead to people upgrading less quickly and reporting bugs later, and
all in all it's a bad situation to be in.  I've used tools like
Subversion that would upgrade repositories so they are unusable by the
previous version and experienced all of these problems.

So I consider it important *to Git upstream* to handle this well in
the Git 2.20 release.  We can flip the default soon after, even as
soon as 2.21.

Moreover, I am not the only one who ran into this --- e.g. from [1],
2018-10-19:

  17:10  jrnieder: Yes, I noticed that annoyance myself. ;)
  17:11  Yeah, I saw that message a few times and was slightly
 annoyed as well.

Now, a meta point.  Throughout this discussion, I have been hoping for
some acknowledgement of the problem --- e.g. an "I am sympathetic to
what you are trying to do, but ".  I wasn't able to find that, and
that is part of what contributed to the feeling of not being heard.

Thanks for your patient explanations, and hope that helps,
Jonathan

[1] https://colabti.org/irclogger/irclogger_log/git-devel?date=2018-10-19#l114


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Jonathan Nieder
onathan Nieder wrote:
> Junio C Hamano wrote:

>> I am still puzzled by the insistence of 3/5 and this step that wants
>> to kill the coalmine canary.  But I am even more puzzled by the
>> first two steps that want to disable the two optional extensions.
[...]
> I acknowledge your puzzlement.  I'm not sure what to do about it.
>
> There are a few significant differences from the REUC case:
>
>  1. This happens whenever the index is refreshed.  REUC, as you
> mentioned, only affected resolutions of conflicted merges.  So
> users ran into it less often.
>
>  2. I never ran into the REUC case.  If I had, I would have sent the
> same patch then.
>
>  3. Time has passed and people's standards may have gone up.
>
> I wish I had been around when the message was added in the first
> place, so that I could have provided the same feedback about the
> message then.  But I do not think that that should be held against me.
> I'm describing a real user problem.

And to be clear, it is the first two patches that address the
immediate user problem.  Whatever improvements we make to the warning
message today, we cannot retroactively change the other versions of
Git that users are using that want to access the same repository.

Jonathan


Re: [PATCH 5/5] index: offer advice for unknown index extensions

2018-11-20 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:

> I am still puzzled by the insistence of 3/5 and this step that wants
> to kill the coalmine canary.  But I am even more puzzled by the
> first two steps that want to disable the two optional extensions.
>
> What's so different this time with the new optional extensions?
>
> The other early optional extensions like cache-tree or resolve-undo
> were added unconditionally and by definition appeared much earlier
> in git-core than any other Git reimplementations.  Everbody who
> recorded the fact that s/he resolved merge conflicts got REUC, and
> we would have given warning when an older Git did not understand
> these extensions [*1*].  We knudged users to more modern Git by
> preparing the old Gits to warn when there are unknown extensions,
> either by upgrading their Git themselves, or by bugging their
> toolsmiths.  Nobody complained to propose to rip the messages like
> this round.  This series has a strong smell of pushing back by the
> toolsmiths who refuse to promptly upgrade to help their users, and
> that is why I do not feel entirely happy with this series.

I acknowledge your puzzlement.  I'm not sure what to do about it.

There are a few significant differences from the REUC case:

 1. This happens whenever the index is refreshed.  REUC, as you
mentioned, only affected resolutions of conflicted merges.  So
users ran into it less often.

 2. I never ran into the REUC case.  If I had, I would have sent the
same patch then.

 3. Time has passed and people's standards may have gone up.

I wish I had been around when the message was added in the first
place, so that I could have provided the same feedback about the
message then.  But I do not think that that should be held against me.
I'm describing a real user problem.

Are the commit messages unclear?  Is there some missing use case that
this version of the patch misses?

I don't *think* you intend to say "sure, you got user reports, but
(those users are wrong | those users are not real | you are not
interpreting those users correctly)", but that is what I am hearing.
On the other hand, I don't want to discourage useful review feedback,
and I think adding the advise() call was a real improvement.  I'm just
getting confused about why I am still not being heard.

Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-20 Thread Jonathan Nieder
Stefan Xenos wrote:
> On Tue, Nov 20, 2018 at 1:43 AM Ævar Arnfjörð Bjarmason
>  wrote:

>> I think it sounds better to just make it, in the header:
>>
>> x-evolve-pt content
>> x-evolve-pt obsolete
>> x-evolve-pt origin
>>
>> Where "pt = parent-type", we could of course spell that out too, but in
>> this case it's "x-evolve-pt" is the exact same number of bytes as
>> "parent-type", so nobody can object that it takes more space:)
>>
>> We'd then carry some documentation where we say everything except "x-*-"
>> is reserved, and that we'd like to know about new "*" there before it's
>> used, so it can be documented.
[...]
>  that should
> probably be the subject of a separate proposal (who owns the content
> of a namespace, what is the process for adding a new namespace or a
> new attribute within a namespace, what order should the header
> attributes appear in, what problem is namespacing there to solve, when
> do we use a namespaced attribute versus a "reserved" attribute, etc.).

Agreed.  There are reasons that I prefer not to go in this direction,
but regardless, it would be the subject of a separate thread if you want
to pursue it.

>> Putting it in the commit message just sounds like a hack around not
>> having namespaced headers. If we'd like to keep this then tools would
>> need to parse both (potentially unpacking a lot of the commit message
>> object, it can be quite big in some cases...).

On the contrary: putting it in the commit message is a way to
experiment with the workflow without changing the object format at
all.

I don't think we should underestimate the value of that ability.

I don't understand what you're referring to by parsing both.  Are you
saying that if the experiment proves successful, we wouldn't be able
to migrate completely to a new format?  That sounds worrying to me ---
I want the ability to experiment and to act on what we learn from an
experiment, including when it touches on formats.

Thanks,
Jonathan


Re: [PATCH] .mailmap: record canonical email address for Sven Strickroth

2018-11-20 Thread Jonathan Nieder
Hi,

Sven Strickroth wrote:

> Subject: .mailmap: record canonical email address for Sven Strickroth
>
> Signed-off-by: Sven Strickroth 
> ---
>  .mailmap | 2 ++
>  1 file changed, 2 insertions(+)

Thanks for taking care of it.

[...]
> --- a/.mailmap
> +++ b/.mailmap
> @@ -235,6 +235,8 @@ Steven Grimm  
> 
>  Steven Grimm  kor...@midwinter.com
>  Steven Walter  
>  Steven Walter  
> +Sven Strickroth  

Is the above line needed?  It's not clear to me from the commit message
what it does.

> +Sven Strickroth  

This line looks good.

>  Sven Verdoolaege  
>  Sven Verdoolaege  
>  SZEDER G??bor  

This line does not seem to have survived mailing.  Fortunately it's a
context line, but thought I should mention it for the future.

Thanks and hope that helps,
Jonathan


[PATCH 5/5] index: offer advice for unknown index extensions

2018-11-19 Thread Jonathan Nieder
It is not unusual for multiple distinct versions of Git to act on a
single repository.  For example, some IDEs bundle a particular version
of Git, which can be a different version from the system copy of Git,
or on a Mac, /usr/bin/git quickly goes out of sync with the Homebrew
git in /usr/local/bin/git.

When a newer version of Git writes an index file that an older version
of Git does not know how to make full use of, this is a teaching
opportunity.  The user may not be aware of what version of Git they
are using.  Print an advice message to help the user to use the most
full featured version of Git (e.g. by futzing with their PATH).

  warning: ignoring optional IEOT index extension
  hint: This is likely due to the file having been written by a newer
  hint: version of Git than is reading it.  You can upgrade Git to
  hint: take advantage of performance improvements from the updated
  hint: file format.
  hint:
  hint: You can run "git config advice.unknownIndexExtension false"
  hint: to suppress this message.

This replaces the message

  ignoring IEOT extension

that existed previously and did not provide enough detail for a user
to act on it or suppress it.

Helped-by: Junio C Hamano 
Signed-off-by: Jonathan Nieder 
---
New, based on Junio's hints about the message removed in patch 3/5.

That's the end of the series.  Thanks for reading, and thanks again
for your help so far.

 advice.c |  2 ++
 advice.h |  1 +
 read-cache.c | 11 +++
 3 files changed, 14 insertions(+)

diff --git a/advice.c b/advice.c
index 5f35656409..91a55046fd 100644
--- a/advice.c
+++ b/advice.c
@@ -24,6 +24,7 @@ int advice_add_embedded_repo = 1;
 int advice_ignored_hook = 1;
 int advice_waiting_for_editor = 1;
 int advice_graft_file_deprecated = 1;
+int advice_unknown_index_extension = 1;
 int advice_checkout_ambiguous_remote_branch_name = 1;
 
 static int advice_use_color = -1;
@@ -78,6 +79,7 @@ static struct {
{ "ignoredHook", _ignored_hook },
{ "waitingForEditor", _waiting_for_editor },
{ "graftFileDeprecated", _graft_file_deprecated },
+   { "unknownIndexExtension", _unknown_index_extension },
{ "checkoutAmbiguousRemoteBranchName", 
_checkout_ambiguous_remote_branch_name },
 
/* make this an alias for backward compatibility */
diff --git a/advice.h b/advice.h
index 696bf0e7d2..8da0845cfc 100644
--- a/advice.h
+++ b/advice.h
@@ -24,6 +24,7 @@ extern int advice_add_embedded_repo;
 extern int advice_ignored_hook;
 extern int advice_waiting_for_editor;
 extern int advice_graft_file_deprecated;
+extern int advice_unknown_index_extension;
 extern int advice_checkout_ambiguous_remote_branch_name;
 
 int git_default_advice_config(const char *var, const char *value);
diff --git a/read-cache.c b/read-cache.c
index 002ed2c1e4..d1d903e5a1 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1727,6 +1727,17 @@ static int read_index_extension(struct index_state 
*istate,
return error("index uses %.4s extension, which we do 
not understand",
 ext);
trace_printf("ignoring %.4s extension\n", ext);
+   if (advice_unknown_index_extension) {
+   warning(_("ignoring optional %.4s index extension"), 
ext);
+   advise(_("This is likely due to the file having been 
written by a newer\n"
+"version of Git than is reading it. You can 
upgrade Git to\n"
+"take advantage of performance improvements 
from the updated\n"
+"file format.\n"
+"\n"
+"Run \"%s\"\n"
+"to suppress this message."),
+  "git config advice.unknownIndexExtension false");
+   }
break;
}
return 0;
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH 4/5] index: make index.threads=true enable ieot and eoie

2018-11-19 Thread Jonathan Nieder
If a user explicitly sets

[index]
threads = true

to read the index using multiple threads, ensure that index writes
include the offset table by default to make that possible.  This
ensures that the user's intent of turning on threading is respected.

In other words, permit the following configurations:

- index.threads and index.recordOffsetTable unspecified: do not write
  the offset table yet (to avoid alarming the user with "ignoring IEOT
  extension" messages when an older version of Git accesses the
  repository) but do make use of multiple threads to read the index if
  the supporting offset table is present.

  This can also be requested explicitly by setting index.threads=true,
  0, or >1 and index.recordOffsetTable=false.

- index.threads=false or 1: do not write the offset table, and do not
  make use of the offset table.

  One can set index.recordOffsetTable=false as well, to be more
  explicit.

- index.threads=true, 0, or >1 and index.recordOffsetTable unspecified:
  write the offset table and make use of threads at read time.

  This can also be requested by setting index.threads=true, 0, >1, or
  unspecified and index.recordOffsetTable=true.

Fortunately the complication is temporary: once most Git installations
have upgraded to a version with support for the IEOT and EOIE
extensions, we can flip the defaults for index.recordEndOfIndexEntries
and index.recordOffsetTable to true and eliminate the settings.

Helped-by: Ben Peart 
Signed-off-by: Jonathan Nieder 
---
New, based on Ben Peart's feedback.  Turned out simpler than I feared
--- thanks, Ben, for pushing for this.

 Documentation/config/index.txt |  6 --
 config.c   | 17 ++---
 config.h   |  2 +-
 read-cache.c   | 23 +--
 4 files changed, 32 insertions(+), 16 deletions(-)

diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt
index de44183235..f181503041 100644
--- a/Documentation/config/index.txt
+++ b/Documentation/config/index.txt
@@ -3,14 +3,16 @@ index.recordEndOfIndexEntries::
Entry" section. This reduces index load time on multiprocessor
machines but produces a message "ignoring EOIE extension" when
reading the index using Git versions before 2.20. Defaults to
-   'false'.
+   'true' if index.threads has been explicitly enabled, 'false'
+   otherwise.
 
 index.recordOffsetTable::
Specifies whether the index file should include an "Index Entry
Offset Table" section. This reduces index load time on
multiprocessor machines but produces a message "ignoring IEOT
extension" when reading the index using Git versions before 2.20.
-   Defaults to 'false'.
+   Defaults to 'true' if index.threads has been explicitly enabled,
+   'false' otherwise.
 
 index.threads::
Specifies the number of threads to spawn when loading the index.
diff --git a/config.c b/config.c
index 04286f7717..ff521eb27a 100644
--- a/config.c
+++ b/config.c
@@ -2294,22 +2294,25 @@ int git_config_get_fsmonitor(void)
return 0;
 }
 
-int git_config_get_index_threads(void)
+int git_config_get_index_threads(int *dest)
 {
-   int is_bool, val = 0;
+   int is_bool, val;
 
val = git_env_ulong("GIT_TEST_INDEX_THREADS", 0);
-   if (val)
-   return val;
+   if (val) {
+   *dest = val;
+   return 0;
+   }
 
if (!git_config_get_bool_or_int("index.threads", _bool, )) {
if (is_bool)
-   return val ? 0 : 1;
+   *dest = val ? 0 : 1;
else
-   return val;
+   *dest = val;
+   return 0;
}
 
-   return 0; /* auto */
+   return 1;
 }
 
 NORETURN
diff --git a/config.h b/config.h
index a06027e69b..ee5d3fa7b4 100644
--- a/config.h
+++ b/config.h
@@ -246,11 +246,11 @@ extern int git_config_get_bool(const char *key, int 
*dest);
 extern int git_config_get_bool_or_int(const char *key, int *is_bool, int 
*dest);
 extern int git_config_get_maybe_bool(const char *key, int *dest);
 extern int git_config_get_pathname(const char *key, const char **dest);
+extern int git_config_get_index_threads(int *dest);
 extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
 extern int git_config_get_fsmonitor(void);
-extern int git_config_get_index_threads(void);
 
 /* This dies if the configured or default date is in the future */
 extern int git_config_get_expiry(const char *key, const char **output);
diff --git a/read-cache.c b/read-cache.c
index 83d24357a6..002ed2c1e4 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2176,7 +2176,8 @@ int do_read_index(struct index_state *istate, const char 
*path,

[PATCH 3/5] index: do not warn about unrecognized extensions

2018-11-19 Thread Jonathan Nieder
Documentation/technical/index-format explains:

 4-byte extension signature. If the first byte is 'A'..'Z' the
 extension is optional and can be ignored.

This allows gracefully introducing a new index extension without
having to rely on all readers having support for it.  Mandatory
extensions start with a lowercase letter and optional ones start with
a capital.  Thus the versions of Git acting on a shared local
repository do not have to upgrade in lockstep.

We almost obey that convention, but there is a problem: when
encountering an unrecognized optional extension, we write

ignoring FNCY extension

to stderr, which alarms users.  This means that in practice we have
had to introduce index extensions in two steps: first add read
support, and then a while later, start writing by default.  This
delays when users can benefit from improvements to the index format.

We cannot change the past, but for index extensions of the future,
there is a straightforward improvement: silence that message except
when tracing.  This way, the message is still available when
debugging, but in everyday use it does not show up so (once most Git
users have this patch) we can turn on new optional extensions right
away without alarming people.

Signed-off-by: Jonathan Nieder 
---
Unchanged.

 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index f3d5638d9e..83d24357a6 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1726,7 +1726,7 @@ static int read_index_extension(struct index_state 
*istate,
if (*ext < 'A' || 'Z' < *ext)
return error("index uses %.4s extension, which we do 
not understand",
 ext);
-   fprintf(stderr, "ignoring %.4s extension\n", ext);
+   trace_printf("ignoring %.4s extension\n", ext);
break;
}
return 0;
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH 2/5] ieot: default to not writing IEOT section

2018-11-19 Thread Jonathan Nieder
As with EOIE, popular versions of Git do not support the new IEOT
extension yet.  When accessing a Git repository written by a more
modern version of Git, they correctly ignore the unrecognized section,
but in the process they loudly warn

ignoring IEOT extension

resulting in confusion for users.  Introduce the index extension more
gently by not writing it yet in this first version with support for
it.  Soon, once sufficiently many users are running a modern version
of Git, we can flip the default so users benefit from this index
extension by default.

Introduce a '[index] recordOffsetTable' configuration variable to
control whether the new index extension is written.

Signed-off-by: Jonathan Nieder 
---
As with patch 1/5, no change from v1 other than rebasing.

 Documentation/config/index.txt |  7 +++
 read-cache.c   | 11 ++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt
index 8e138aba7a..de44183235 100644
--- a/Documentation/config/index.txt
+++ b/Documentation/config/index.txt
@@ -5,6 +5,13 @@ index.recordEndOfIndexEntries::
reading the index using Git versions before 2.20. Defaults to
'false'.
 
+index.recordOffsetTable::
+   Specifies whether the index file should include an "Index Entry
+   Offset Table" section. This reduces index load time on
+   multiprocessor machines but produces a message "ignoring IEOT
+   extension" when reading the index using Git versions before 2.20.
+   Defaults to 'false'.
+
 index.threads::
Specifies the number of threads to spawn when loading the index.
This is meant to reduce index load time on multiprocessor machines.
diff --git a/read-cache.c b/read-cache.c
index 1e9c772603..f3d5638d9e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2698,6 +2698,15 @@ static int record_eoie(void)
return 0;
 }
 
+static int record_ieot(void)
+{
+   int val;
+
+   if (!git_config_get_bool("index.recordoffsettable", ))
+   return val;
+   return 0;
+}
+
 /*
  * On success, `tempfile` is closed. If it is the temporary file
  * of a `struct lock_file`, we will therefore effectively perform
@@ -2761,7 +2770,7 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
else
nr_threads = 1;
 
-   if (nr_threads != 1) {
+   if (nr_threads != 1 && record_ieot()) {
int ieot_blocks, cpus;
 
/*
-- 
2.20.0.rc0.387.gc7a69e6b6c-goog



[PATCH 1/5] eoie: default to not writing EOIE section

2018-11-19 Thread Jonathan Nieder
Since 3b1d9e04 (eoie: add End of Index Entry (EOIE) extension,
2018-10-10) Git defaults to writing the new EOIE section when writing
out an index file.  Usually that is a good thing because it improves
threaded performance, but when a Git repository is shared with older
versions of Git, it produces a confusing warning:

  $ git status
  ignoring EOIE extension
  HEAD detached at 371ed0defa
  nothing to commit, working tree clean

Let's introduce the new index extension more gently.  First we'll roll
out the new version of Git that understands it, and then once
sufficiently many users are using such a version, we can flip the
default to writing it by default.

Introduce a '[index] recordEndOfIndexEntries' configuration variable
to allow interested users to benefit from this index extension early.

Signed-off-by: Jonathan Nieder 
---
Rebased.  No other change from v1.

As Jonathan pointed out, it would be nice to have tests here.  Ben,
any advice for how I could write some in a followup change?  E.g. does
Derrick Stolee's tracing based testing trick apply here?

 Documentation/config/index.txt |  7 +++
 read-cache.c   | 11 ++-
 t/t1700-split-index.sh | 11 +++
 3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/Documentation/config/index.txt b/Documentation/config/index.txt
index 4b94b6bedc..8e138aba7a 100644
--- a/Documentation/config/index.txt
+++ b/Documentation/config/index.txt
@@ -1,3 +1,10 @@
+index.recordEndOfIndexEntries::
+   Specifies whether the index file should include an "End Of Index
+   Entry" section. This reduces index load time on multiprocessor
+   machines but produces a message "ignoring EOIE extension" when
+   reading the index using Git versions before 2.20. Defaults to
+   'false'.
+
 index.threads::
Specifies the number of threads to spawn when loading the index.
This is meant to reduce index load time on multiprocessor machines.
diff --git a/read-cache.c b/read-cache.c
index 4ca81286c0..1e9c772603 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2689,6 +2689,15 @@ void update_index_if_able(struct index_state *istate, 
struct lock_file *lockfile
rollback_lock_file(lockfile);
 }
 
+static int record_eoie(void)
+{
+   int val;
+
+   if (!git_config_get_bool("index.recordendofindexentries", ))
+   return val;
+   return 0;
+}
+
 /*
  * On success, `tempfile` is closed. If it is the temporary file
  * of a `struct lock_file`, we will therefore effectively perform
@@ -2936,7 +2945,7 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
 * read.  Write it out regardless of the strip_extensions parameter as 
we need it
 * when loading the shared index.
 */
-   if (offset) {
+   if (offset && record_eoie()) {
struct strbuf sb = STRBUF_INIT;
 
write_eoie_extension(, _c, offset);
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 2ac47aa0e4..0cbac64e28 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -25,14 +25,17 @@ test_expect_success 'enable split index' '
git update-index --split-index &&
test-tool dump-split-index .git/index >actual &&
indexversion=$(test-tool index-version <.git/index) &&
+
+   # NEEDSWORK: Stop hard-coding checksums.
if test "$indexversion" = "4"
then
-   own=3527df833c6c100d3d1d921a9a782d62a8be4b58
-   base=746f7ab2ed44fb839efdfbffcf399d0b113fb4cb
+   own=432ef4b63f32193984f339431fd50ca796493569
+   base=508851a7f0dfa8691e9f69c7f055865389012491
else
-   own=5e9b60117ece18da410ddecc8b8d43766a0e4204
-   base=4370042739b31cd17a5c5cd6043a77c9a00df113
+   own=8299b0bcd1ac364e5f1d7768efb62fa2da79a339
+   base=39d890139ee5356c7ef572216cebcd27aa41f9df
fi &&
+
cat >expect <<-EOF &&
own $own
base $base
-- 
2.20.0.rc0.387.gc7a69e6b6c



[PATCH v2 0/5] Avoid confusing messages from new index extensions

2018-11-19 Thread Jonathan Nieder
Junio C Hamano wrote:
> Ben Peart  writes:

>> Why introduce a new setting to disable writing the IEOT extension
>> instead of just using the existing index.threads setting?
>
> But index.threads is about what the reader does, not about the
> writer who does not even know who will be reading the resulting
> index, no?

It affects the writer, too, since it affects the number of blocks, but
from an end user's point of view, I agree.

Here's an updated version of the series.

Patches 1-3 are as before, except that they are rebased to avoid
conflicting with nd/config-split.

Patch 4 allows enabling the new index extensions with a single config
setting, to address the feedback above.

Patch 5 revives the noisiness when encountering an unknown index
extension, guarded with an advice setting.

Sorry for the delay in getting this out.  Thoughts of all kinds
welcome, as always.

Sincerely,
Jonathan Nieder (5):
  eoie: default to not writing EOIE section
  ieot: default to not writing IEOT section
  index: do not warn about unrecognized extensions
  index: make index.threads=true enable ieot and eoie
  index: offer advice for unknown index extensions

 Documentation/config/index.txt | 16 ++
 advice.c   |  2 ++
 advice.h   |  1 +
 config.c   | 17 ++-
 config.h   |  2 +-
 read-cache.c   | 54 +-
 t/t1700-split-index.sh | 11 ---
 7 files changed, 84 insertions(+), 19 deletions(-)


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-19 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:
> On Thu, Nov 15 2018, sxe...@google.com wrote:

>> +Parent-type
>> +---
>> +The “parent-type” field in the commit header identifies a commit as a
>> +meta-commit and indicates the meaning for each of its parents. It is never
>> +present for normal commits.
[...]
> I think it's worth pointing out for those that are rusty on commit
> object details (but I checked) is that the reason for it not being:
>
> tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
> parent aa7ce55545bf2c14bef48db91af1a74e2347539a
> parent-type content
> parent d64309ee51d0af12723b6cb027fc9f195b15a5e9
> parent-type obsolete
> parent 7e1bbcd3a0fa854a7a9eac9bf1eea6465de98136
> parent-type origin
> author Stefan Xenos  1540841596 -0700
> committer Stefan Xenos  1540841596 -0700
>
> Which would be easier to read, is that we're very sensitive to the order
> of the first few fields (tree -> parent -> author -> committer) and fsck
> will error out if we interjected a new field.

By the way, in the spirit of limiting the initial scope, I wonder
whether the parent-type fields can be stored in the commit message
initially.

Elsewhere in this thread it was mentioned that the parent-type is a
field to allow tools like "git fsck" to understand the meaning of
these parent relationships (for example, to forbid a commit
referencing a meta-commit).  The same could be done using special
commit message text, though.

The advantage of such an approach would be that we could experiment
without changing the official object format at all.  If experiments
revealed a different set of information to store, we could update the
format without having to maintain the memory of the older format in
"git fsck"'s understanding of commit object fields.  So even though I
think that in the end we would want to put this information in the
commit object header, I'm tempted to suspect that the benefits of
putting it in the commit message to start outweigh the costs (in
particular, of having to migrate to another format later).

Thanks,
Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-19 Thread Jonathan Nieder
Hi,

Stefan Xenos wrote:

> But since several comments have focused on the commands, let's brainstorm!
>
> Here's some options that occur to me:
>
> 1. Three commands: evolve + change + obslog as top-level commands (the
> current proposal). Change gets a bunch of subcommands for manipulating
> the change graph and metas/ namespace.
>
> 2. All top-level: evolve + lschange + mkchange + rmchange +
> prunechange + obslog. None of the commands get subcommands.
>
> 3. Everything under change: "change evolve", "change obslog" become
> new change subcommands.
>
> 4. Evolve as a rebase argument, obslog as a log argument. Use "rebase
> --evolve" to initiate evolve and use "log --obslog" to initiate
> obslog. The change command stays as it is in the proposal.
>
> 5. Two commands: evolve + change. obslog becomes a "log" argument.
>
> Note that there will be more "evolve"-specific arguments in the
> future. For most transformations that evolve uses, there will be a
> matching argument to enable or disable that transformation and as we
> add transformations we'll also add arguments.
>
> If we go with option 3, it would make for a very cluttered help page.
> For example, if you're looking for information on how to use evolve,
> you'd have to scroll past a bunch of formatting information that are
> only relevant to obslog... and if you're looking for the formatting
> options, you'd have to scroll past a bunch of
> transformation-enablement options that are only relevant to evolve.
>
> Based on your log feedback above, I'm thinking #5 may make sense.

(5) sounds good to me, too.  Thanks, both, for your thoughtfulness.

Jonathan


Re: [PATCH] technical doc: add a design doc for the evolve command

2018-11-19 Thread Jonathan Nieder
Hi,

Xenos wrote:

> Lets explore the "when" question. I think there's a compelling reason
> to add them as soon as possible - namely, gerrit. If and when we come
> to some sort of agreement on this proposal, gerrit could start adding
> tooling to understand change graphs as an alternative to change-id
> footers. That work could proceed in parallel with the work in git-core
> once we know what the data structures look like, but it can't start
> until the data structures are sufficient to address all the use cases
> that were previously covered by change-id. At the moment, meta-commits
> without origin parents would not cover all of gerrit's use-cases so
> this would block adoption in gerrit.

By this, are you referring to the "Cherry-picks" list in the Gerrit
web UI?

Thanks,
Jonathan


Re: [PATCH] tests: add a special setup where rebase.useBuiltin is off (Re: Git Test Coverage Report (v2.20.0-rc0))

2018-11-19 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:
> On Mon, Nov 19 2018, Derrick Stolee wrote:

>> [...]
>> builtin/rebase.c
>> 62c23938fa 55) return env;
>> [...]
>> Ævar Arnfjörð Bjarmason 62c23938f: tests: add a special setup
>> where rebase.useBuiltin is off
>
> This one would be covered with
> GIT_TEST_REBASE_USE_BUILTIN=false. Obviously trivial, but I wonder if
> the rest of the coverage would look different when passed through the various 
> GIT_TEST_* options.

I wonder if we can do better for this kind of thing.

When I do routine development, I am not running tests with any
non-default flags.  So why should tests run with non-default flags
count toward coverage?  Is there a way to make the default test
settings dip their feet into some non-default configurations, without
running the full battery of tests and slowing tests down accordingly?
E.g. is there some kind of smoke test that rebase with
useBuiltin=false works at all that could run, even if I am not running
the full battery of rebase tests?

That's a bit of a non sequitor for this example, which is actual code
to handle GIT_TEST_REBASE_USE_BUILTIN, though.  For it, I wonder why
we need rebase.c to understand the envvar --- couldn't test-lib.sh
take care of setting rebase.useBuiltin to false when it's set?

Thanks,
Jonathan


Re: [PATCH] INSTALL: add macOS gettext and sdk header explanation to INSTALL

2018-11-14 Thread Jonathan Nieder
Hi!

yanke131...@gmail.com wrote:

> From: out0fmemory 
> 
> ---
>  INSTALL | 7 +++
>  1 file changed, 7 insertions(+)

Thanks for writing.  A few bits of administrivia, from
https://www.kernel.org/pub/software/scm/git/docs/SubmittingPatches.html:

Can we forge your sign-off?  See the section "certify your work" in
SubmittingPatches for what this means.

What name should we attribute this change to?  Documentation/SubmittingPatches
explains:

 Also notice that a real name is used in the Signed-off-by: line.
 Please don’t hide your real name.

[...]
> --- a/INSTALL
> +++ b/INSTALL
> @@ -165,6 +165,9 @@ Issues of note:
> use English. Under autoconf the configure script will do this
> automatically if it can't find libintl on the system.
>  
> +In macOS, can install and link gettext with brew as "brew install 
> gettext"
> +and "brew link --force gettext", the link operation is necessary.
> +

As context (e.g. to go in the commit message), can you tell us more
about this?  E.g. what happens if you don't perform the link
operation?

[...]
> @@ -178,6 +181,10 @@ Issues of note:
> will include them.  Note that config.mak is not distributed;
> the name is reserved for local settings.
>  
> +  - In macOs 10.14, the Command Line Tool not contains the headers as 
> before, so it
> +need install Command Line Tool 10.14 and install the headers which 
> command
> +"sudo installer -pkg 
> /Library/Developer/CommandLineTools/Packages/macOS_SDK_headers_for_macOS_10.14.pkg
>  -target".
> +

Likewise: what is the symptom if this isn't done?

Is there more general advice we can put here that will survive past
macOS 10.14?

Thanks and hope that helps,
Jonathan


Re: [PATCH 1/3] eoie: default to not writing EOIE section

2018-11-14 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>>  1. Using multiple versions of Git on a single machine.  For example,
>> some IDEs bundle a particular version of Git, which can be a
>> different version from the system copy, or on a Mac, /usr/bin/git
>> quickly goes out of sync with the Homebrew git in
>> /usr/local/bin/git.
>
> Exactly this, especially the latter, is the answer to your
> question in an earlier message:
>
>>> Am I understanding correctly?  Can you give an example of when a user
>>> would *want* to see this message and what they would do in response?
>
> The user may not be even aware of using another version of Git that
> does not know how to take advantage of the version of Git you have
> used in the repository, and it can be a mistake the user may want to
> fix (e.g. by futzing with PATH).

Ah, thanks much.  I'll add a hint along those lines (e.g.

 warning: ignoring optional IEOT index extension
 hint: This is likely due to the file having been written by a newer
 hint: version of Git than is reading it.  You can upgrade Git to
 hint: take advantage of performance improvements from the updated
 hint: file format.
 hint:
 hint: You can run "git config advice.unknownIndexExtension true" to
 hint: suppress this message.

I am still vaguely uncomfortable with this since it seems analogous to
warning that the server is advertising an unrecognized capability, but
I can live with it. :)

Patch coming in a few moments.

Jonathan


Re: [PATCH 2/3] ieot: default to not writing IEOT section

2018-11-14 Thread Jonathan Nieder
Hi,

Ben Peart wrote:

> There is no way to get multi-threaded reads and NOT get the scary message
> with older versions of git.  Multi-threaded reads require the IEOT extension
> to be written into the index and the existence of the IEOT extension in the
> index will always generate the scary warning.

This is where I think we differ.  I want my local copy of Git to get
multi-threaded reads as long as IEOT happens to be there, even if I am
not ready to write IEOT myself yet.

I understand that this differs from what you would prefer, so I'd like
to find some compromise that makes us both happy.  I've tried to
suggest one:

   Make explicitly setting index.threads=true imply
   index.recordOffsetTable=true.  That way, the default behavior is the
   behavior I prefer, and a client can simply set index.threads=true to
   get the behavior I think you are describing preferring.

My preference is instead what I sent in patch 2/3 (for simplicity,
especially since the default of index.recordOffsetTable=false would be
only temporary), but this would work okay for me.

I'll send this as a patch.  If there is a reason it won't work for
you, I would be very happy to learn more about why.

Thanks,
Jonathan


Re: git-scm.com GUI client links

2018-11-13 Thread Jonathan Nieder
+git@vger.kernel.org, git-secur...@googlegroups.com -> bcc
Paul J Sanchez wrote:
> On Nov 13, 2018, at 2:25 PM, Stefan Beller  wrote:

>> The link seems to be https://aurees.com/ ?
>>
>> They seem to have
>> https://aurees.com/legal/license-agreement
>> which is a hilarious read:
>>
>> You agree that each and every e-mail address, which You use during
>> registration or to commit changes into a Git repository, is
>> automatically sent to and stored by Nezaboodka for verification
>> purposes;
>> You agree with Nezaboodka's and their partners' advertising to be
>> shown by the Software and to be sent to Your registration e-mail;
>> You may neither disable nor block automatic updates of the Software;
>> You may neither disable nor block sending of anonymous usage
>> statistics to Nezaboodka;
>> You may download, install and any number of copies of the Software
>> registered under Free License;
>>
>> Further:
>>   The Agreement is a public agreement (offer) as defined by the law
>> of Republic of Belarus (article 396 of Civil Code of Republic of
>> Belarus). This Agreement is governed by the laws of Republic of
>> Belarus
>>
>> ... I did not know English is an official language in Belarus.
>
> I saw the link on git-scm.com.com.  You’re correct, the site is
> https://aurees.com.com/>.
>
> And no, I hadn’t yet gotten as far as the license-agreement.  Egad!
> Total show-stopper.
>
> After seeing the licensing terms, I’d agree with Sophos.  Software
> which harvests my e-mail addresses and usage data and has
> autoupdates which cannot be disabled or blocked qualifies as malware
> in my opinion.


Re: [PATCH 02/10] git-fast-export.txt: clarify misleading documentation about rev-list args

2018-11-13 Thread Jonathan Nieder
Elijah Newren wrote:

> Actually, no, it actually needs to be inconsistent.
>
> Different Input Choices (neither backslashed, both backslashed, then just 
> one):
>   master~9 and master~10
>   master\~9 and master\~10
>   master\~9 and master~10
>
> What the outputs look like:
>   master9 and master10
>   master~9 and master\~10
>   master~9 and master~10
>
> I have no idea why asciidoc behaves this way, but it appears my
> backslash escaping of just one of the two was necessary.

{tilde} should work consistently.

Thanks,
Jonathan


Re: git-scm.com GUI client links

2018-11-13 Thread Jonathan Nieder
+cc: git@vger.kernel.org, git-secur...@googlegroups.com -> bcc
Hi!

Paul J Sanchez wrote:

> Over the weekend I saw a link to a Mac git client I had not seen
> before:  Aurees.  When I went to the linked site to download a copy,
> my antivirus software (Sophos) warned me that it contains malware.
> I immediately threw it away without installing, but figured that
> git-scm.com  should be aware of this.

See https://groups.google.com/d/msg/msysgit/br212yYOZ70/bp5t4QpZk10J
for a similar symptom in the past.

See that thread for some advice on how to track this down.  Sadly,
my experience with antivirus software is that it is just not very
reliable.

That said, it's possible that this Mac Git client is indeed infected.
Can you point me to the page where you found it?  The git-scm.com web
site is maintained at https://github.com/git/git-scm.com; that page
has instructions for contributing to it.

Thanks and hope that helps,
Jonathan


Re: [PATCH 2/3] ieot: default to not writing IEOT section

2018-11-13 Thread Jonathan Nieder
Hi again,

Ben Peart wrote:
> On 11/13/2018 1:18 PM, Jonathan Nieder wrote:
>> Ben Peart wrote:

>>> Why introduce a new setting to disable writing the IEOT extension instead of
>>> just using the existing index.threads setting?  If index.threads=1 then the
>>> IEOT extension isn't written which (I believe) will accomplish the same
>>> goal.
>>
>> Do you mean defaulting to index.threads=1?  I don't think that would
>> be a good default, but if you have a different change in mind then I'd
>> be happy to hear it.
>>
>> Or do you mean that if the user has explicitly specified index.threads=true,
>> then that should imply index.recordOffsetTable=true so users only have
>> to set one setting to turn it on?  I can imagine that working well.
>
> Reading the index with multiple threads requires the EOIE and IEOT
> extensions to exist in the index.  If either extension doesn't exist, then
> the code falls back to the single threaded path.  That means you can't have
> both 1) no warning for old versions of git and 2) multi-threaded reading for
> new versions of git.
>
> If you set index.threads=1, that will prevent the IEOT extension from being
> written and there will be no "ignoring IEOT extension" warning in older
> versions of git.
>
> With this patch 'as is' you would have to set both index.threads=true and
> index.recordOffsetTable=true to get multi-threaded index reads.  If either
> is set to false, it will silently drop back to single threaded reads.

Sorry, I'm still not understanding what you're proposing.  What would be

- the default behavior
- the mechanism for changing that behavior

under your proposal?

I consider index.threads=1 to be a bad default.  I would understand if
you are saying that that should be the default, and I tried to propose
a different way to achieve what you're looking for in the quoted reply
above (but I'm not understanding whether you like that proposal or
not).

Jonathan


Re: [PATCH 1/3] eoie: default to not writing EOIE section

2018-11-13 Thread Jonathan Nieder
Hi,

Ben Peart wrote:

> While I can understand the user confusion the warning about ignoring an
> extension could cause I guess I'm a little surprised that people would see
> it that often.  To see the warning means they are running a new version of
> git in the same repo as they are running an old version of git.  I just
> haven't ever experienced that (I only ever have one copy of git installed)
> so am surprised it comes up often enough to warrant this change.

Great question.  There are a few contexts where it comes up:

 1. Using multiple versions of Git on a single machine.  For example,
some IDEs bundle a particular version of Git, which can be a
different version from the system copy, or on a Mac, /usr/bin/git
quickly goes out of sync with the Homebrew git in
/usr/local/bin/git.

 2. Sharing a single Git repository between multiple machines.  This is
not unusual, using NFS or sshfs, for example.

 3. Downgrading after trying a new version of Git.

To support these, Git is generally careful to avoid writing
repositories that older versions of Git do not understand.  The EOIE
extension was almost perfect in this respect: it works fine with older
versions of Git, except for the alarming error message.

> That said, if it _is_ that much of an issue, this patch makes sense and
> provides a way to more gracefully transition into this feature.  Even if we
> had some logic to dynamically determine whether to write it or not, we'd
> still want to avoid confusing users when it did get written out.

Yes.  An earlier version of this patch defaulted to writing EOIE if
and only if the .git/index file already has an EOIE extension.  There
were enough holes in that (commands like "git reset" that do not read
the existing index file) and enough complexity that it didn't seem
worth it.

Really in this series, patch 3/3 is the one I care most about.  I wish
we had had it years ago. :)  It would make patches 1 and 2
unnecessary.

Thanks,
Jonathan


Re: [PATCH 2/3] ieot: default to not writing IEOT section

2018-11-13 Thread Jonathan Nieder
Hi,

Ben Peart wrote:
> On 11/12/2018 7:39 PM, Jonathan Nieder wrote:

>> As with EOIE, popular versions of Git do not support the new IEOT
>> extension yet.  When accessing a Git repository written by a more
>> modern version of Git, they correctly ignore the unrecognized section,
>> but in the process they loudly warn
>>
>>  ignoring IEOT extension
>>
>> resulting in confusion for users.  Introduce the index extension more
>> gently by not writing it yet in this first version with support for
>> it.
[...]
>> Introduce a '[index] recordOffsetTable' configuration variable to
>> control whether the new index extension is written.
>
> Why introduce a new setting to disable writing the IEOT extension instead of
> just using the existing index.threads setting?  If index.threads=1 then the
> IEOT extension isn't written which (I believe) will accomplish the same
> goal.

Do you mean defaulting to index.threads=1?  I don't think that would
be a good default, but if you have a different change in mind then I'd
be happy to hear it.

Or do you mean that if the user has explicitly specified index.threads=true,
then that should imply index.recordOffsetTable=true so users only have
to set one setting to turn it on?  I can imagine that working well.

Thanks,
Jonathan


Re: [PATCH 2/3] ieot: default to not writing IEOT section

2018-11-13 Thread Jonathan Nieder
Hi again,

Junio C Hamano wrote:

> Then removing the message is throwing it with bathwater.  First
> think about which part of the message is confusiong and then make it
> less confusing.
>
> How about
>
>   hint: ignoring an optional IEOT extension
>
> to make it clear that it is totally harmless?
>
> With that, we can add advise.unknownIndexExtension=false to turn all
> of them off with a single switch.

After having slept on it, this doesn't seem like a good fit for the
advice subsystem.  The advice subsystem provides hints about suggested
actions for new users to understand what to do about a condition.  In
this example, the message is not suggesting a particular user action
--- instead, it's describing state, which would seem to be a better
fit for tracing, as in the patch 3/3 I sent.

Am I understanding correclty?  Can you give an example of when a user
would *want* to see this message and what they would do in response?

Thanks,
Jonathan


Re: [PATCH 2/3] ieot: default to not writing IEOT section

2018-11-12 Thread Jonathan Nieder
Junio C Hamano wrote:

> How about
>
>   hint: ignoring an optional IEOT extension
>
> to make it clear that it is totally harmless?
>
> With that, we can add advise.unknownIndexExtension=false to turn all
> of them off with a single switch.

I like it.  Expect a patch soon (tonight or tomorrow) that does that.

We'll have to find some appropriate place in the documentation to
explain what the message is about, still.

Thanks,
Jonathan


[PATCH 3/3] index: do not warn about unrecognized extensions

2018-11-12 Thread Jonathan Nieder
Documentation/technical/index-format explains:

 4-byte extension signature. If the first byte is 'A'..'Z' the
 extension is optional and can be ignored.

This allows gracefully introducing a new index extension without
having to rely on all readers having support for it.  Mandatory
extensions start with a lowercase letter and optional ones start with
a capital.  Thus the versions of Git acting on a shared local
repository do not have to upgrade in lockstep.

We almost obey that convention, but there is a problem: when
encountering an unrecognized optional extension, we write

ignoring FNCY extension

to stderr, which alarms users.  This means that in practice we have
had to introduce index extensions in two steps: first add read
support, and then a while later, start writing by default.  This
delays when users can benefit from improvements to the index format.

We cannot change the past, but for index extensions of the future,
there is a straightforward improvement: silence that message except
when tracing.  This way, the message is still available when
debugging, but in everyday use it does not show up so (once most Git
users have this patch) we can turn on new optional extensions right
away without alarming people.

Signed-off-by: Jonathan Nieder 
---
Thanks for reading.  Thoughts?

Sincerely,
Jonathan

 read-cache.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 290bd54708..65530a68c2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1720,7 +1720,7 @@ static int read_index_extension(struct index_state 
*istate,
if (*ext < 'A' || 'Z' < *ext)
return error("index uses %.4s extension, which we do 
not understand",
 ext);
-   fprintf(stderr, "ignoring %.4s extension\n", ext);
+   trace_printf("ignoring %.4s extension\n", ext);
break;
}
return 0;
-- 
2.19.1.930.g4563a0d9d0



[PATCH 2/3] ieot: default to not writing IEOT section

2018-11-12 Thread Jonathan Nieder
As with EOIE, popular versions of Git do not support the new IEOT
extension yet.  When accessing a Git repository written by a more
modern version of Git, they correctly ignore the unrecognized section,
but in the process they loudly warn

ignoring IEOT extension

resulting in confusion for users.  Introduce the index extension more
gently by not writing it yet in this first version with support for
it.  Soon, once sufficiently many users are running a modern version
of Git, we can flip the default so users benefit from this index
extension by default.

Introduce a '[index] recordOffsetTable' configuration variable to
control whether the new index extension is written.

Signed-off-by: Jonathan Nieder 
---
 Documentation/config.txt |  7 +++
 read-cache.c | 11 ++-
 2 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d702379db4..cc66fb7de3 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2195,6 +2195,13 @@ index.recordEndOfIndexEntries::
reading the index using Git versions before 2.20. Defaults to
'false'.
 
+index.recordOffsetTable::
+   Specifies whether the index file should include an "Index Entry
+   Offset Table" section. This reduces index load time on
+   multiprocessor machines but produces a message "ignoring IEOT
+   extension" when reading the index using Git versions before 2.20.
+   Defaults to 'false'.
+
 index.threads::
Specifies the number of threads to spawn when loading the index.
This is meant to reduce index load time on multiprocessor machines.
diff --git a/read-cache.c b/read-cache.c
index 4bfe93c4c2..290bd54708 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2707,6 +2707,15 @@ static int record_eoie(void)
return 0;
 }
 
+static int record_ieot(void)
+{
+   int val;
+
+   if (!git_config_get_bool("index.recordoffsettable", ))
+   return val;
+   return 0;
+}
+
 /*
  * On success, `tempfile` is closed. If it is the temporary file
  * of a `struct lock_file`, we will therefore effectively perform
@@ -2767,7 +2776,7 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
 
 #ifndef NO_PTHREADS
nr_threads = git_config_get_index_threads();
-   if (nr_threads != 1) {
+   if (nr_threads != 1 && record_ieot()) {
int ieot_blocks, cpus;
 
/*
-- 
2.19.1.930.g4563a0d9d0



[PATCH 1/3] eoie: default to not writing EOIE section

2018-11-12 Thread Jonathan Nieder
Since 3b1d9e04 (eoie: add End of Index Entry (EOIE) extension,
2018-10-10) Git defaults to writing the new EOIE section when writing
out an index file.  Usually that is a good thing because it improves
threaded performance, but when a Git repository is shared with older
versions of Git, it produces a confusing warning:

  $ git status
  ignoring EOIE extension
  HEAD detached at 371ed0defa
  nothing to commit, working tree clean

Let's introduce the new index extension more gently.  First we'll roll
out the new version of Git that understands it, and then once
sufficiently many users are using such a version, we can flip the
default to writing it by default.

Introduce a '[index] recordEndOfIndexEntries' configuration variable
to allow interested users to benefit from this index extension early.

Signed-off-by: Jonathan Nieder 
---
 Documentation/config.txt |  7 +++
 read-cache.c | 11 ++-
 t/t1700-split-index.sh   | 11 +++
 3 files changed, 24 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 41a9ff2b6a..d702379db4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2188,6 +2188,13 @@ imap::
The configuration variables in the 'imap' section are described
in linkgit:git-imap-send[1].
 
+index.recordEndOfIndexEntries::
+   Specifies whether the index file should include an "End Of Index
+   Entry" section. This reduces index load time on multiprocessor
+   machines but produces a message "ignoring EOIE extension" when
+   reading the index using Git versions before 2.20. Defaults to
+   'false'.
+
 index.threads::
Specifies the number of threads to spawn when loading the index.
This is meant to reduce index load time on multiprocessor machines.
diff --git a/read-cache.c b/read-cache.c
index f3a848d61c..4bfe93c4c2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2698,6 +2698,15 @@ void update_index_if_able(struct index_state *istate, 
struct lock_file *lockfile
rollback_lock_file(lockfile);
 }
 
+static int record_eoie(void)
+{
+   int val;
+
+   if (!git_config_get_bool("index.recordendofindexentries", ))
+   return val;
+   return 0;
+}
+
 /*
  * On success, `tempfile` is closed. If it is the temporary file
  * of a `struct lock_file`, we will therefore effectively perform
@@ -2945,7 +2954,7 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
 * read.  Write it out regardless of the strip_extensions parameter as 
we need it
 * when loading the shared index.
 */
-   if (offset) {
+   if (offset && record_eoie()) {
struct strbuf sb = STRBUF_INIT;
 
write_eoie_extension(, _c, offset);
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 2ac47aa0e4..0cbac64e28 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -25,14 +25,17 @@ test_expect_success 'enable split index' '
git update-index --split-index &&
test-tool dump-split-index .git/index >actual &&
indexversion=$(test-tool index-version <.git/index) &&
+
+   # NEEDSWORK: Stop hard-coding checksums.
if test "$indexversion" = "4"
then
-   own=3527df833c6c100d3d1d921a9a782d62a8be4b58
-   base=746f7ab2ed44fb839efdfbffcf399d0b113fb4cb
+   own=432ef4b63f32193984f339431fd50ca796493569
+   base=508851a7f0dfa8691e9f69c7f055865389012491
else
-   own=5e9b60117ece18da410ddecc8b8d43766a0e4204
-   base=4370042739b31cd17a5c5cd6043a77c9a00df113
+   own=8299b0bcd1ac364e5f1d7768efb62fa2da79a339
+   base=39d890139ee5356c7ef572216cebcd27aa41f9df
fi &&
+
cat >expect <<-EOF &&
own $own
base $base
-- 
2.19.1.930.g4563a0d9d0



[PATCH 0/3] Avoid confusing messages from new index extensions (Re: [PATCH v8 0/7] speed up index load through parallelization)

2018-11-12 Thread Jonathan Nieder
Hi,

Ben Peart wrote:

> Ben Peart (6):
>   read-cache: clean up casting and byte decoding
>   eoie: add End of Index Entry (EOIE) extension
>   config: add new index.threads config setting
>   read-cache: load cache extensions on a worker thread
>   ieot: add Index Entry Offset Table (IEOT) extension
>   read-cache: load cache entries on worker threads

I love this, but when deploying it I ran into a problem.

How about these patches?

Thanks,
Jonathan Nieder (3):
  eoie: default to not writing EOIE section
  ieot: default to not writing IEOT section
  index: do not warn about unrecognized extensions

 Documentation/config.txt | 14 ++
 read-cache.c | 24 +---
 t/t1700-split-index.sh   | 11 +++
 3 files changed, 42 insertions(+), 7 deletions(-)


Re: Git Reference Manual enhance

2018-11-12 Thread Jonathan Nieder
Hi Fredi,

Fredi Fowler wrote:

> Is there any way to create pull request to git man (https://git-scm.com/docs)?
>
> I found there some inconsistencies. For example, almost in all pages
> are using [no-], but at https://git-scm.com/docs/git-merge each
> command (with [no-] or without) write separately.
>
> There are some same inconsistencies that a easy to fix. So, if I can
> sent a pull-request for such fix – inform me.

Welcome!  Yes, feel free to make improvements to the manual.  First,
you'll want to clone the repo:

 git clone https://kernel.googlesource.com/pub/scm/git/git

Then make changes.  You can test using "make":

 make -C Documentation git-merge.html
 open Documentation/git-merges.html

Once you're happy with the patch, it's time to send it out for review.
The Git project uses a decentralized review process using email.  See
https://www.kernel.org/pub/software/scm/git/docs/SubmittingPatches.html
for details about how it works.

If you are used to the GitHub pull request process, you may enjoy
GitGitGadget, which acts as a sort of bridge.  See [1] for
instructions.  Please do also keep in mind the hints from
SubmittingPatches e.g.  about how to describe your changes and how to
certify your work.

Thanks, and looking forward to seeing your contributions,
Jonathan

[1] 
https://github.com/gitgitgadget/gitgitgadget/blob/master/README.md#a-bot-to-serve-as-glue-between-github-pull-requests-and-the-git-mailing-list


Re: Design of multiple hash support

2018-11-05 Thread Jonathan Nieder
Hi,

Duy Nguyen wrote:
> On Mon, Nov 5, 2018 at 2:02 AM brian m. carlson
>  wrote:

>> There are basically two approaches I can take.  The first is to provide
>> each command that needs to learn about this with its own --hash
>> argument.  So we'd have:
>>
>>   git init --hash=sha256
>>   git show-index --hash=sha256 >
>> The other alternative is that we provide a global option to git, which
>> is parsed by all programs, like so:
>>
>>   git --hash=sha256 init
>>   git --hash=sha256 show-index  I'm leaning towards "git foo --hash=".

Can you say a little more about the semantics of the option?  For
commands like "git init", I tend to agree with Duy here, since it
allows each command's manual to describe what the option means in the
context of that command.

For "git show-index", ideally Git should use the object format named
in the idx file.

>> There's also the question of what we want to call the option.  The
>> obvious name is --hash, which is intuitive and straightforward.
>> However, the transition plan names the config option
>> extensions.objectFormat,
[...]
> --object-format is less vague than --hash. The downside is it's longer
> (more to type) but I'm counting on git-completion.bash and the guess
> that people rarely need to use this option.

Agreed.  --object-format makes more sense to me than --hash, since
it's more precise about what the option affects.

Thanks for looking into this.

Sincerely,
Jonathan


Re: [PATCH 12/12] fsck: mark strings for translation

2018-10-30 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:
>> On Mon, Oct 29, 2018 at 3:09 PM Junio C Hamano  wrote:
>>> SZEDER Gábor  writes:
 Nguyễn Thái Ngọc Duy wrote:

> -fprintf(stderr, "%s in %s %s: %s\n",
> -msg_type, printable_type(obj), describe_object(obj), err);
> +fprintf_ln(stderr, _("%s in %s %s: %s"),

 Are the (f)printf() -> (f)printf_ln() changes all over
 'builtin/fsck.c' really necessary to mark strings for translation?
>>>
>>> It is beyond absolute minimum but I saw it argued here that this
>>> makes it easier to manage the .po and .pot files if your message
>>> strings do not end with LF, a you are much less likely to _add_
>>> unneeded LF to the translated string than _lose_ LF at the end of
>>> translated string.
[...]
> As Jonathan pointed out in the follow-up message[1] this sort of thing
> is checked for in msgfmt, so sure, let's strip the \n, but it's really
> not something we need to worry about. Likewise with translators turning
> "%s" into "%d" (or "% s") or whatever.

IMHO the advantage of leaving the \n out in the message is not so much
that we don't trust translators but more that where we can make their
lives easier, we should.

In other words, I'm glad the patch does that, and Ævar, I agree.

Thanks, both.

Jonathan

> 1. 
> https://public-inbox.org/git/cacsjy8acuy9fziiehgc7mel4i+xp6u0pmh1rgor-wznhyt1...@mail.gmail.com/


Re: Bug with "git mv" and submodules, and with "git submodule add something --force"

2018-10-19 Thread Jonathan Nieder
Stefan Beller wrote:

> Maybe for now we can do with just an update of the documentation/bugs
> section and say we cannot move files in and out of submodules?

I think we have some existing logic to prevent "git add"-ing a file
within a submodule to the superproject, for example.

So "git mv" should learn the same trick.  And perhaps the trick needs
to be moved down a layer (e.g. into the index API).  Hints?

Thanks,
Jonathan


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-18 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Tan  writes:
>> Jonathan Nieder wrote:

>>> [object]
>>> missingObjectRemote = local-cache-remote
>>> missingObjectRemote = origin
>>
>> In the presence of missingObjectRemote, old versions of Git, when lazily
>> fetching, would only know to try the extensions.partialClone remote. But
>> this is safe because existing data wouldn't be clobbered (since we're
>> not using ideas like adding meaning to the contents of the .promisor
>> file). Also, other things like fsck and gc still work.
>
> It is a good idea to implicitly include the promisor-remote to the
> set of secondary places to consult to help existing versions of Git,
> but once the repository starts fetching incomplete subgraphs and
> adding new object.missingobjectremote [*1*], these versions of Git
> will stop working correctly, so I am not sure if it is all that
> useful approach for compatibility in practice.

Can you spell this out for me more?  Do you mean that a remote from
this list might make a promise that the original partialClone remote
can't keep?

If we're careful to only page in objects that were promised by the
original partialClone remote, then a promise "I promise to supply you
on demand with any object directly or indirectly reachable from any
object you have fetched from me" from the partialClone remote should
be enough.

> [Footnote]
>
> *1* That name with two "object" in it sounds horrible.

Sorry about that.  Another name used while discussing this was
objectAccess.promisorRemote.  As long as the idea is clear (that this
means "remotes to use when attempting to obtain an object that is not
already available locally"), I am not attached to any particular name.

Thanks,
Jonathan


Re: [PATCH] builtin/submodule--helper: remove debugging leftover tracing

2018-10-16 Thread Jonathan Nieder
Stefan Beller wrote:

> I noticed 74d4731da1 (submodule--helper: replace connect-gitdir-workingtree
> by ensure-core-worktree, 2018-08-13) had two leftover debugging statements
> when reading The coverage report [1]. Remove them.
>
> https://public-inbox.org/git/e30a9c05-87d8-1f2b-182c-6d6a5fefe...@gmail.com/
>
> Signed-off-by: Stefan Beller 
> ---
>  builtin/submodule--helper.c | 2 --
>  1 file changed, 2 deletions(-)

Doh.  Glad you caught it!

Is there some reference for The Coverage Report other than the mailing
list?  E.g. I suspect a reference to

make coverage-test
make coverage-report

would be useful to readers finding this commit later.

> To be applied on (or squashed into the tip of)
>   sb/submodule-update-in-c

Looks like that's already in "master", so not a candidate for
squashing.

Reviewed-by: Jonathan Nieder 


Re: [PATCH 1/1] subtree: make install targets depend on build targets

2018-10-16 Thread Jonathan Nieder
Hi,

Christian Hesse wrote:

> --- a/contrib/subtree/Makefile
> +++ b/contrib/subtree/Makefile
> @@ -69,11 +69,11 @@ install: $(GIT_SUBTREE)
[...]
> -install-html: $(GIT_SUBTREE_HTML)
> +install-html: html

This broke the build for me:

 make[2]: Entering directory '/build/git-2.19.1+next.20181016/contrib/subtree'
 install -m 644 html 
/build/git-2.19.1+next.20181016/debian/tmp/usr/share/doc/git/html
 install: cannot stat 'html': No such file or directory
 make[2]: *** [Makefile:78: install-html] Error 1

The rule says

 install-html: html
$(INSTALL) -d -m 755 $(DESTDIR)$(htmldir)
$(INSTALL) -m 644 $^ $(DESTDIR)$(htmldir)

and $^ substitutes to "html" after this change.  How was this patch
tested?

Thanks,
Jonathan


Re: On overriding make variables from the environment...

2018-10-16 Thread Jonathan Nieder
SZEDER Gábor wrote:
> On Tue, Oct 16, 2018 at 02:54:56PM -0700, Jonathan Nieder wrote:
>> SZEDER Gábor wrote:

>>> Our Makefile has lines like these:
>>>
>>>   CFLAGS = -g -O2 -Wall
>>>   CC = cc
>>>   AR = ar
>>>   SPATCH = spatch
[...]
>>> I'm not sure what to do.  I'm fine with updating our 'ci/' scripts to
>>> explicitly respect CC in the environment (either by running 'make
>>> CC=$CC' or by writing $CC into 'config.mak').  Or I could update our
>>> Makefile to use '?=' for specific variables, but:
>>
>> That's a good question.  I don't have a strong opinion myself, so I
>> tend to trust larger projects like Linux to have thought this through
>> more, and they use 'CC = cc' as well.
>
> I don't think Linux is a good example to follow in this case, see e.g.
> 6d62c983f7 (Makefile: Change the default compiler from "gcc" to "cc",
> 2011-12-20) (in short: Git is supposed to be buildable with compilers
> other than GCC as well, while Linux not really, so they can hardcode
> 'CC = gcc').

Nowadays Linux builds with clang as well.  People also have other
reasons to override the CC setting (cross-compiling, etc) and to
override other settings like CFLAGS.

> Also, the projects I have on hand usually have 'CC = gcc' hardcoded in
> their Makefiles, too, but those Makefiles were generated by their
> ./configure script (which in turn by ./autogen.sh...), and those tend
> to write CC from the environment into the generated Makefiles.

Yes, I think that's what makes travis's setup normally work.  It makes
sense to me since ./configure is expected to have more implicit or
automatic behavior.  For "make", I kind of like that it doesn't depend
on environment variables that I am not thinking about when debugging a
reported build problem.

When building Git without autoconf, the corresponding place for
settings is config.mak.  Would it make sense for the ci scripts to
write a config.mak file?

Thanks,
Jonathan


Re: On overriding make variables from the environment...

2018-10-16 Thread Jonathan Nieder
Hi,

SZEDER Gábor wrote:

> Our Makefile has lines like these:
>
>   CFLAGS = -g -O2 -Wall
>   CC = cc
>   AR = ar
>   SPATCH = spatch
>
> Note the use of '=', not '?='.
[...]
> I'm not sure what to do.  I'm fine with updating our 'ci/' scripts to
> explicitly respect CC in the environment (either by running 'make
> CC=$CC' or by writing $CC into 'config.mak').  Or I could update our
> Makefile to use '?=' for specific variables, but:

That's a good question.  I don't have a strong opinion myself, so I
tend to trust larger projects like Linux to have thought this through
more, and they use 'CC = cc' as well.  So I'd lean toward the updating
'ci/' scripts approach, to do something like

make ${CC:+"CC=$CC"} ...

(or the equivalent multi-line construction).

That also has the bonus of being explicit.

Just my two cents,
Jonathan


Re: [PATCH] submodule helper: convert relative URL to absolute URL if needed

2018-10-16 Thread Jonathan Nieder
Stefan Beller wrote:

> Reported-by: Jaewoong Jung 
> Signed-off-by: Stefan Beller 
> ---
>  builtin/submodule--helper.c | 51 -
>  t/t7400-submodule-basic.sh  | 24 +
>  2 files changed, 58 insertions(+), 17 deletions(-)

Reviewed-by: Jonathan Nieder 

Thanks for your patient work.


Re: [PATCH v4 9/9] Documentation/config: add odb..promisorRemote

2018-10-16 Thread Jonathan Nieder
Hi Christian,

On Tue, Sep 25, 2018, Christian Couder wrote:

> In the cover letter there is a "Discussion" section which is about
> this, but I agree that it might not be very clear.
>
> The main issue that this patch series tries to solve is that
> extensions.partialclone config option limits the partial clone and
> promisor features to only one remote. One related issue is that it
> also prevents to have other kind of promisor/partial clone/odb
> remotes. By other kind I mean remotes that would not necessarily be
> git repos, but that could store objects (that's where ODB, for Object
> DataBase, comes from) and could provide those objects to Git through a
> helper (or driver) script or program.

Thanks for this explanation.  I took the opportunity to learn more
while you were in the bay area for the google summer of code mentor
summit and learned a little more, which was very helpful to me.

The broader picture is that this is meant to make Git natively handle
large blobs in a nicer way.  The design in this series has a few
components:

 1. Teaching partial clone to attempt to fetch missing objects from
multiple remotes instead of only one.  This is useful because you
can have a server that is nearby and cheaper to serve from (some
kind of local cache server) that you make requests to first before
falling back to the canonical source of objects.

 2. Simplifying the protocol for fetching missing objects so that it
can be satisfied by a lighter weight object storage system than
a full Git server.  The ODB helpers introduced in this series are
meant to speak such a simpler protocol since they are only used
for one-off requests of a collection of missing objects instead of
needing to understand refs, Git's negotiation, etc.

 3. (possibly, though not in this series) Making the criteria for what
objects can be missing more aggressive, so that I can "git add"
a large file and work with it using Git without even having a
second copy of that object in my local object store.

For (2), I would like to see us improve the remote helper
infrastructure instead of introducing a new ODB helper.  Remote
helpers are already permitted to fetch some objects without listing
refs --- perhaps we will want to

 i. split listing refs to a separate capability, so that a remote
helper can advertise that it doesn't support that.  (Alternatively
the remote could advertise that it has no refs.)

 ii. Use the "long-running process" mechanism to improve how Git
 communicates with a remote helper.

For (1), things get more tricky.  In an object store from a partial
clone today, we relax the ordinary "closure under reachability"
invariant but in a minor way.  We'll need to work out how this works
with multiple promisor remotes.

The idea today is that there are two kinds of packs: promisor packs
(from the promisor remote) and non-promisor packs.  Promisor packs are
allowed to have reachability edges (for example a tree->blob edge)
that point to a missing object, since the promisor remote has promised
that we will be able to access that object on demand.  Non-promisor
packs are also allowed to have reachability edges that point to a
missing object, as long as there is a reachability edge from an object
in a promisor pack to the same object (because of the same promise).
See "Handling Missing Objects" in Documentation/technical/partial-clone.txt
for more details.

To prevent older versions of Git from being confused by partial clone
repositories, they use the repositoryFormatVersion mechanism:

[core]
repositoryFormatVersion = 1
[extensions]
partialClone = ...

If we change the invariant, we will need to use a new extensions.* key
to ensure that versions of Git that are not aware of the new invariant
do not operate on the repository.

A promisor pack is indicated by there being a .promisor file next to
the usual .pack file.  Currently the .promisor file is empty.  The
previous idea was that once we want more metadata (e.g. for the sake of
multiple promisor remotes), we could write it in that file.  For
example, remotes could be associated to a  and the
.promisor file could indicate which  has promised to serve
requests for objects reachable from objects in this pack.

That will complicate the object access code as well, since currently
we only find who has promised an object during "git fsck" and similar
operations.  During everyday access we do not care which promisor
pack caused the object to be promised, since there is only one promisor
remote to fetch from anyway.

So much for the current setup.  For (1), I believe you are proposing to
still have only one effective , so it doesn't necessarily
require modifying the extensions.* configuration.  Instead, the idea is
that when trying to access an object, we would follow one of a list of
steps:

 1. First, check the local object store. If it's there, we're done.
 2. Second, try 

Re: [PATCH] submodule helper: convert relative URL to absolute URL if needed

2018-10-15 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> The submodule helper update_clone called by "git submodule update",
> clones submodules if needed. As submodules used to have the URL indicating
> if they were active, the step to resolve relative URLs was done in the
> "submodule init" step. Nowadays submodules can be configured active without
> calling an explicit init, e.g. via configuring submodule.active.
>
> When trying to obtain submodules that are set active this way, we'll
> fallback to the URL found in the .gitmodules, which may be relative to the
> superproject, but we do not resolve it, yet:
> 
> git clone https://gerrit.googlesource.com/gerrit
> cd gerrit && grep url .gitmodules
>   url = ../plugins/codemirror-editor
>   ...
> git config submodule.active .
> git submodule update
> fatal: repository '../plugins/codemirror-editor' does not exist
> fatal: clone of '../plugins/codemirror-editor' into submodule path 
> '/tmp/gerrit/plugins/codemirror-editor' failed
> Failed to clone 'plugins/codemirror-editor'. Retry scheduled
> [...]
> fatal: clone of '../plugins/codemirror-editor' into submodule path 
> '/tmp/gerrit/plugins/codemirror-editor' failed
> Failed to clone 'plugins/codemirror-editor' a second time, aborting
> [...]

Thanks.

"git log" and other tools have the ability to rewrap lines and will get
confused by this transcript.  Can you indent it to unconfuse them?

> Signed-off-by: Stefan Beller 

Please also credit the bug reporter:

Reported-by: Jaewoong Jung 

[...]
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -584,6 +584,27 @@ static int module_foreach(int argc, const char **argv, 
> const char *prefix)
>   return 0;
>  }
>  
> +

nit: inconsistent vertical whitespace (extra blank line?)

> +static char *compute_submodule_clone_url(const char *rel_url)
> +{
> + char *remoteurl, *relurl;
> + char *remote = get_default_remote();
> + struct strbuf remotesb = STRBUF_INIT;
> +
> + strbuf_addf(, "remote.%s.url", remote);
> + free(remote);
> +
> + if (git_config_get_string(remotesb.buf, )) {
> + warning(_("could not lookup configuration '%s'. Assuming this 
> repository is its own authoritative upstream."), remotesb.buf);
> + remoteurl = xgetcwd();
> + }
> + relurl = relative_url(remoteurl, rel_url, NULL);
> + strbuf_release();
> + free(remoteurl);
> +
> + return relurl;
> +}

I think this would be easier to read with all the release commands
together:

...

free(remote);
free(remoteurl);
strbuf_release();
return relurl;

[...]
> @@ -634,21 +655,9 @@ static void init_submodule(const char *path, const char 
> *prefix,
[...]
> - relurl = relative_url(remoteurl, url, NULL);
> - strbuf_release();
> - free(remoteurl);
> - free(url);
> - url = relurl;
> + char *to_free = url;
> + url = compute_submodule_clone_url(url);
> + free(to_free);

I still think this would be easier to read with a style that makes
the ownership passing clearer:

char *oldurl = url;
url = compute_submodule_clone_url(oldurl);
free(oldurl);

With whatever subset of the above tweaks makes sense,
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-12 Thread Jonathan Nieder
Josh Steadmon wrote:
> On 2018.10.12 16:32, Jonathan Nieder wrote:
>> Josh Steadmon wrote:

>>> For now, I'm going to try adding an --allowed_versions flag for the
>>> remote helpers, but if anyone has a better idea, let me know.
>>
>> I forgot to mention: the stateless-connect remote helper capability is
>> still experimental so you're free to change its interface (e.g. to
>> change the syntax of the stateless-connect command it provides).
>
> For v2 of this patch, I ended up using GIT_PROTOCOL to pass the version
> string to the remote helpers.

Makes sense.  Thanks. :)


Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-12 Thread Jonathan Nieder
Josh Steadmon wrote:

> For now, I'm going to try adding an --allowed_versions flag for the
> remote helpers, but if anyone has a better idea, let me know.

I forgot to mention: the stateless-connect remote helper capability is
still experimental so you're free to change its interface (e.g. to
change the syntax of the stateless-connect command it provides).

Thanks again,
Jonathan


Re: [PATCH 1/1] protocol: limit max protocol version per service

2018-10-12 Thread Jonathan Nieder
Hi,

Josh Steadmon wrote:

> So this runs into problems with remote-curl (and possibly other remote
> helpers):
>
> builtin/push.c can declare whatever allowed versions it wants, but those
> are not carried over when remote-curl is started to actually talk to the
> remote. What's worse, remote-curl starts its HTTP connection before it
> knows what command it's actually acting as a helper for.
>
> For now, I'm going to try adding an --allowed_versions flag for the
> remote helpers, but if anyone has a better idea, let me know.

There are some external remote helpers (see "git help remote-helpers"
for the documented interface), so alas, they can't take new flags.

That said, you can add a new remote helper capability and then
communicate on stdin, or in a pinch you can use the existing "option"
capability.  Remote helpers are also allowed to query "git config" if
they want to (either using the config machinery in config.h or by
running "git config").

Thanks,
Jonathan


Re: [PATCH] submodule helper: convert relative URL to absolute URL if needed

2018-10-12 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> The submodule helper update_clone called by "git submodule update",
> clones submodules if needed. As submodules used to have the URL indicating
> if they were active, the step to resolve relative URLs was done in the
> "submodule init" step. Nowadays submodules can be configured active without
> calling an explicit init, e.g. via configuring submodule.active.
>
> Then we'll fallback to the URL found in the .gitmodules, which may be
> relative to the superproject, but we do not resolve it, yet.

Oh!  Good catch.

> To do so, factor out the function that resolves the relative URLs in
> "git submodule init" (in the submodule helper in the init_submodule
> function) and cal it at the appropriate place in the update_clone helper.

s/cal/call/

> Signed-off-by: Stefan Beller 
> ---
>  builtin/submodule--helper.c | 48 -
>  t/t7400-submodule-basic.sh  | 24 +++
>  2 files changed, 55 insertions(+), 17 deletions(-)

What is the symptom when this fails?

> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index f6fb8991f3..a9a3ac38be 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -584,6 +584,27 @@ static int module_foreach(int argc, const char **argv, 
> const char *prefix)
>   return 0;
>  }
>  
> +
> +char *compute_submodule_clone_url(const char *rel_url)

Should be static.

Is the caller responsible for freeing the returned buffer?

> +{
> + char *remoteurl, *relurl;
> + char *remote = get_default_remote();
> + struct strbuf remotesb = STRBUF_INIT;

optional: could rename, something like

struct strbuf key = STRBUF_INIT;

remote = get_default_remote();
strbuf_addf(, "remote.%s.url", remote);

 [...]
strbuf_release();
free(remote);
return result;


> +
> + strbuf_addf(, "remote.%s.url", remote);
> + free(remote);
> +
> + if (git_config_get_string(remotesb.buf, )) {
> + warning(_("could not lookup configuration '%s'. Assuming this 
> repository is its own authoritative upstream."), remotesb.buf);

nit: lookup is the noun, "look up" is the verb

> + remoteurl = xgetcwd();
> + }
> + relurl = relative_url(remoteurl, rel_url, NULL);
> + strbuf_release();
> + free(remoteurl);
> +
> + return relurl;
> +}
> +
>  struct init_cb {
>   const char *prefix;
>   unsigned int flags;
> @@ -634,21 +655,9 @@ static void init_submodule(const char *path, const char 
> *prefix,
>   /* Possibly a url relative to parent */
>   if (starts_with_dot_dot_slash(url) ||
>   starts_with_dot_slash(url)) {
> - char *remoteurl, *relurl;
> - char *remote = get_default_remote();
> - struct strbuf remotesb = STRBUF_INIT;
> - strbuf_addf(, "remote.%s.url", remote);
> - free(remote);
> -
> - if (git_config_get_string(remotesb.buf, )) {
> - warning(_("could not lookup configuration '%s'. 
> Assuming this repository is its own authoritative upstream."), remotesb.buf);
> - remoteurl = xgetcwd();
> - }
> - relurl = relative_url(remoteurl, url, NULL);
> - strbuf_release();
> - free(remoteurl);
> - free(url);
> - url = relurl;

Ah, this is moved code. I should have used --color-moved. ;-)

> + char *to_free = url;
> + url = compute_submodule_clone_url(url);
> + free(to_free);

Maybe:

char *old_url = url;
url = ...(old_url);
free(old_url);

>   }
>  
>   if (git_config_set_gently(sb.buf, url))
> @@ -1562,8 +1571,13 @@ static int prepare_to_clone_next_submodule(const 
> struct cache_entry *ce,
>  
>   strbuf_reset();
>   strbuf_addf(, "submodule.%s.url", sub->name);
> - if (repo_config_get_string_const(the_repository, sb.buf, ))
> - url = sub->url;
> + if (repo_config_get_string_const(the_repository, sb.buf, )) {
> + if (starts_with_dot_slash(sub->url) ||
> + starts_with_dot_dot_slash(sub->url))
> + url = compute_submodule_clone_url(sub->url);
> + else
> + url = sub->url;
> + }

Nice.

>  
>   strbuf_reset();
>   strbuf_addf(, "%s/.git", ce->name);
> diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh
> index c0ffc1022a..3f5dd5e4ef 100755
> --- a/t/t7400-submodule-basic.sh
> +++ b/t/t7400-submodule-basic.sh
> @@ -1224,6 +1224,30 @@ test_expect_success 'submodule update and setting 
> submodule..active' '
>   test_cmp expect actual
>  '
>  
> +test_expect_success 'clone 

Re: [PATCH] config.mak.dev: add -Wformat

2018-10-12 Thread Jonathan Nieder
Jeff King wrote:
> On Fri, Oct 12, 2018 at 07:40:37PM +0100, Thomas Gummerer wrote:

>> 801fa63a90 ("config.mak.dev: add -Wformat-security", 2018-09-08) added
>> the -Wformat-security to the flags set in config.mak.dev.  In the gcc
>> man page this is documented as:
>>
>>  If -Wformat is specified, also warn about uses of format
>>  functions that represent possible security problems.  [...]
>>
>> That commit did however not add the -Wformat flag, and -Wformat is not
>> specified anywhere else by default, so the added -Wformat-security had
>> no effect.  Newer versions of gcc (gcc 8.2.1 in this particular case)
>> warn about this and thus compilation fails with this option set.
[...]
> -Wformat is part of -Wall, which we already turn on by default (even for
> non-developer builds).
>
> So I don't think we need to do anything more, though I'm puzzled that
> you saw a failure. Do you set CFLAGS explicitly in your config.mak to
> something that doesn't include -Wall?

Thomas, do you use autoconf to generate config.mak.autogen?  I'm
wondering if that produces a CFLAGS that doesn't include -Wall.

> I'm not opposed to making config.mak.dev a bit more redundant to handle
> this case, but we'd probably want to include all of -Wall, since it
> contains many other warnings we'd want to make sure are enabled.

Do you mean putting -Wall instead of -Wformat?

Should we add -Wextra too?  From a quick test, it seems to build okay.

Thanks,
Jonathan


Re: [RFC PATCH 00/19] Bring more repository handles into our code base

2018-10-12 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> This applies on nd/the-index (b3c7eef9b05) and is the logical continuation of
> the object store series, which I sent over the last year.
>
> The previous series did take a very slow and pedantic approach,
> using a #define trick, see cfc62fc98c for details, but it turns out,
> that it doesn't work:

Thanks for the heads up --- this will remind me to review this new
series more carefully, since it differs from what was reviewed before.

I think this will be easiest to review with --function-context.  I can
generate that diff locally, so no need to resend.

>When changing the signature of widely used functions, it burdens the
>maintainer in resolving the semantic conflicts.
>
>In the orginal approach this was called a feature, as then we can ensure
>that not bugs creep into the code base during the merge window (while such
>a refactoring series wanders from pu to master). It turns out this
>was not well received and was just burdensome.

I don't agree with this characterization.

The question of who resolves conflicts is separate from the question
of whether conflicts appear, which is in turn separate from the
question of whether the build breaks.

I consider making the build break when a caller tries to use a
half-converted function too early to be a very useful feature.  There
is a way to do that in C++ that allows decoupled conversions, but the
C version forced an ordering of the conversions.  It seems that the
pain was caused by the combination of

 1. that coupling, which forced an ordering on the conversions and
prevented us from ordering the patches in an order based on
convenience of integration (unlike e.g. the "struct object_id"
series which was able to proceed by taking a batch covering a
quiet area of the tree at a time)

 2. as you mentioned, removal of old API at the same time of addition
 of new API forced callers across the tree to update at once

 3. the lack of having decided how to handle the anticipated churn

Now most of the conversions are done (thanks much for that) so the
ordering (1) is not the main remaining pain point.  Thanks for
tackling the other two in this series.

I want future API changes to be easier.  That means tackling the
following questions up front:

 i. Where does this fit in Rusty's API rating scheme
?
Does misuse (or misconverted callers) break the build, break
visibly at runtime, or are the effects more subtle?

 ii. Is there good test coverage for the new API?  Are there tests
 that need to be migrated?

 iii. Is there a way to automatically migrate callers, or does this
  require manual, error-prone work (thanks for tackling that in
  this one.)

 iv. How are we planning to handle multiple patches in flight?  Will
 the change produce merge conflicts?  How can others on the list
 help the maintainer with integrating this set of changes?

 iv. Is the ending point cleaner than where we started?

The #define trick you're referring to was a way of addressing (i).

[...]
>  79 files changed, 571 insertions(+), 278 deletions(-)

Most of the increase is in the coccinelle file and in improved
documentation.

It appears that some patches use a the_index-style
NO_THE_REPOSITORY_COMPATIBILITY_MACROS backward compatibility synonym
and others don't.  Can you say a little more about this aspect of the
approach?  Would the compatibility macros go away eventually?

Thanks,
Jonathan


Re: [RFC PATCH 6/6] commit-reach: fix first-parent heuristic

2018-10-10 Thread Jonathan Nieder
Hi,

Derrick Stolee wrote:

>  commit-reach.c| 4 +++-
>  t/t6600-test-reach.sh | 2 +-
>  2 files changed, 4 insertions(+), 2 deletions(-)

I like this testing technique, and the test passes for me.

Except: if I put

CC = cc -m32
NO_OPENSSL = YesPlease
NO_CURL = YesPlease

in config.mak (the first line to force 32-bit pointers, the others
to avoid some dependencies on libs that I don't have 32-bit versions
of), then the test fails for me:

 $ ./t6600-test-reach.sh -v -x -i
 [...]
 expecting success:
 cp commit-graph-full .git/objects/info/commit-graph &&
 run_and_check_trace2 can_all_from_reach_with_flag num_walked 19 input \
 "test-tool reach can_all_from_reach"

 ++ cp commit-graph-full .git/objects/info/commit-graph
 ++ run_and_check_trace2 can_all_from_reach_with_flag num_walked 19 input 
'test-tool reach can_all_from_r
 each'
 ++ CATEGORY=can_all_from_reach_with_flag
 ++ KEY=num_walked
 ++ VALUE=19
 ++ INPUT=input
 ++ COMMAND='test-tool reach can_all_from_reach'
 +++ pwd
 ++ GIT_TR2_PERFORMANCE='/usr/local/google/home/jrn/src/git/t/trash 
directory.t6600-test-reach/perf-log.t
 xt'
 ++ test-tool reach can_all_from_reach
 can_all_from_reach(X,Y):1
 ++ cat perf-log.txt
 ++ grep 'category:can_all_from_reach_with_flag key:num_walked value:19'
 error: last command exited with $?=1
 not ok 11 - can_all_from_reach:perf
 #
 #   cp commit-graph-full .git/objects/info/commit-graph &&
 #   run_and_check_trace2 can_all_from_reach_with_flag num_walked 
19 input \
 #   "test-tool reach can_all_from_reach"
 #

When I cat perf-log.txt, I see

  ..category:can_all_from_reach_with_flag key:num_walked value:20

instead of the expected 19.

Known issue?

Thanks,
Jonathan


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> Which is what I'm doing by running "gc --auto" across a set of servers
> and looking at the exit code. If it's been failing I get an error, if
> there's no need to gc nothing happens, and if it hasn't been failing and
> it just so happens that it's time to GC then fine, now was as good a
> time as any.

For this, a simple "git gc --detached-status" would work.  It sounds
like the bonus gc --auto run was a side effect instead of being a
requirement.

> So if we assume that for the sake of argument there's no point in a
> --detached-status either. My only reason for ever caring about that
> status is when I run "gc --auto" and it says it can't fork() itself so
> it fails. Since I'm using "gc --auto" I have zero reason to even ask
> that question unless I'm OK with kicking off a gc run as a side-effect,
> so why split up the two? It just introduces a race condition for no
> benefit.

What I am trying to do is design an interface which is simple to
explain in the manual.  The existing "git gc --auto" interface since
detaching was introduced is super confusing to me, so I'm going
through the thought exercise of "If we were starting over, what would
we build instead?"

Part of the answer to that question might include a

--report-from-the-last-time-you-detached

option.  I'm still failing to come up of a case where the answer to
that question would include a

--report-from-the-last-time-you-detached-and-if-it-went-okay\
-then-run-another-detached-gc

option.

In other words, I think our disconnect is that you are describing
things in terms of "I have been happy with the existing git gc --auto
detaching behavior, so how can we maintain something as close as
possible to that"?  And I am trying to describe things in terms of
"What is the simplest, most maintainable, and easiest to explain way
to keep Ævar's servers working well"?

Jonathan


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> Perhaps this reporting could also print the message from a previous
>> run, so you could write:
>>
>>  git gc --detached-status || exit
>>  git gc --auto; # perhaps also passing --detach
>>
>> (Names still open for bikeshedding.)
>
> When the command is given --detached-exit-code/status option, what
> does it do?  Does it perform the "did an earlier run left gc.log?"
> and report the result and nothing else?  In other words, is it a
> pure replacement for "test -e .git/gc.log"?

My intent was the latter.  In other words, in the idiom

do_something_async &
... a lot of time passes ...
wait

it is something like the replacement for "wait".

More precisely,

git gc --detached-status || exit

would mean something like

if test -e .git/gc.log  # Error from previous gc --detach?
then
cat >&2 .git/gc.log # Report the error.
exit 1
fi

>  Or does it do some of
> the "auto-gc" prep logic like guestimating loose object count and
> have that also in its exit status (e.g. "from the gc.log left
> behind, we know that we failed to reduce loose object count down
> sufficiently after finding there are more than 6700 earlier, but now
> we do not have that many loose object, so there is nothing to
> complain about the presence of gc.log")?

Depending on the use case, a user might want to avoid losing
information about the results of a previous "git gc --detach" run,
even if they no longer apply.  For example, a user might want to
collect the error message for monitoring or later log analysis, to
track down intermittent gc errors that go away on their own.

A separate possible use case might be a

git gc --needs-auto-gc

command that detects whether an auto gc is needed.  With that, a
caller that only wants to learn about errors if auto gc is needed
could run

if git gc --needs-auto-gc
then
git gc --detached-status || exit
fi

> I am bad at naming myself, but worse at guessing what others meant
> with a new thing that was given a new name whose name is fuzzy,
> so... ;-)

No problem.  I'm mostly trying to tease out more details about the use
case.

Thanks,
Jonathan


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> Right. I know. What I mean is now I can (and do) use it to run 'git gc
> --auto' across our server fleet and see whether I have any of #3, or
> whether it's all #1 or #2. If there's nothing to do in #1 that's fine,
> and it just so happens that I'll run gc due to #2 that's also fine, but
> I'd like to see if gc really is stuck.
>
> This of course relies on them having other users / scripts doing normal
> git commands which would trigger previous 'git gc --auto' runs.
>
> I.e. with your change that command:
>
> git gc --auto
>
> Would change to something like:
>
> git gc --auto && ! test -e .git/gc.log
>
> Which, as noted is a bit of a nasty breaker of the encapsulation

That helps.  What if we package up the "test -e .git/gc.log" bit
*without* having the side effect of running git gc --auto, so that you
can run

if ! git gc --detached-exit-code
then
... handle the error ...
fi
git gc --auto; # perhaps also with --detach

?

I'm not great at naming options, so the --detached-exit-code name is
bikesheddable.  What I really mean to ask about is: what if the status
reporting goes in a separate command from running gc --auto?

Perhaps this reporting could also print the message from a previous
run, so you could write:

git gc --detached-status || exit
git gc --auto; # perhaps also passing --detach

(Names still open for bikeshedding.)

Thanks and hope that helps,
Jonathan


Re: [PATCH] gc: introduce an --auto-exit-code option for undoing 3029970275

2018-10-10 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> Add an --auto-exit-code variable and a corresponding 'gc.autoExitCode'
> configuration option to optionally bring back the 'git gc --auto' exit
> code behavior as it existed between 2.6.3..2.19.0 (inclusive).

Hm.  Can you tell me more about the use case where this would be
helpful to you?  That would help us come up with a better name for it.

Thanks,
Jonathan


Re: git svn clone/fetch hits issues with gc --auto

2018-10-10 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> I'm just saying it's hard in this case to remove one piece without the
> whole Jenga tower collapsing, and it's probably a good idea in some of
> these cases to pester the user about what he wants, but probably not via
> gc --auto emitting the same warning every time, e.g. in one of these
> threads I suggested maybe "git status" should emit this.

I have to say, I don't have a lot of sympathy for this.

I've been running with the patches I sent before for a while now, and
the behavior that they create is great.  I think we can make further
refinements on top.  To put it another way, I haven't actually
experienced any bad knock-on effects, and I think other feature
requests can be addressed separately.

I do have sympathy for some wishes for changes to "git gc --auto"
behavior (I think it should be synchronous regardless of config and
the asynchrony should move to being requested explicitly through a
command line option by the callers within Git) but I don't understand
why this holds up a change that IMHO is wholly positive for users.

To put it another way, I am getting the feeling that the objections to
that series were theoretical, while the practical benefits of the
patch are pretty immediate and real.  I'm happy to help anyone who
wants to polish it but time has shown no one is working on that, so...

Thanks,
Jonathan


Re: [RFC/PATCH] drop vcs-svn experiment

2018-10-03 Thread Jonathan Nieder
Jeff King wrote:
> On Fri, Aug 17, 2018 at 10:26:05PM -0700, Jonathan Nieder wrote:

>>  - testsvn:: is a demo and at a minimum we ought not to install it
>>with "make install"
>>
>>  - keeping this in-tree for the benefit of just one user is excessive,
>>so removing it is probably the right thing
>>
>>  - it would be nice if the commit removing this code from Git includes
>>a note to help people find its new home
>>
>> Would you mind holding off until I'm able to arrange that last bit?
>
> Any further thoughts / movement on this? Not urgent, but I saw you
> mention it in another thread, and it's on my "to be decided" pile.

I'd like to move this to contrib so it's all in one place.  Then I
should be able to put it in a separate repo and drop it from Git.

If someone has a spare moment to help, I'd be happy to review patches
that move vcs-svn/* and git-remote-testsvn to contrib/svn-fe
(including the tests, contrib/subtree-style; in a separate repo the
tests would then work fine using sharness).

Thanks,
Jonathan


Re: git-remote-helper list command question

2018-10-03 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Tue, Oct 02, 2018 at 11:43:50AM +0200, Stanisław Drozd wrote:

>> I'm trying to write a fast-import-based git remote helper, but I'm not
>> sure what the output of the `list` command should look like. How can I
>> find an example of the format in action?
>
> There's some documentation in "git help remote-helpers".
>
> For working examples (of this or any other remote-helper stuff), your
> best bet is the git-remote-http helper that ships with Git. E.g., you
> should be able to do:
>
>   echo list | git remote-http https://github.com/git/git
>
> to see sample output.

Other examples to look at:

 git cinnabar 

 remote-testsvn.c in git.git (though it will be moving to contrib/ and
 then out of git.git fairly soon)

These might be more helpful than git-remote-http because they are
fast-import-based helpers.

Thanks and hope that helps,
Jonathan


Re: [PATCH] git doc: direct bug reporters to mailing list archive (Re: [PATCH v2] git.txt: mention mailing list archive)

2018-09-28 Thread Jonathan Nieder
Jonathan Nieder wrote:
> Junio C Hamano wrote:

>> OK.  This unfortunately came a bit too late for today's integration
>> cycle, so I'll leave this in my inbox and replace what is queued
>> with it later.
>>
>> Unless there is another one to supersede this proposal comes before
>> that happens, that is.
>>
>> Thanks.
>
> Sounds good.  Thanks for the heads up, and thanks for the heads up.

Ahem, what I meant is "and thanks for the help with writing the
patch".

Sincerely,
Jonathan


Re: [PATCH] git doc: direct bug reporters to mailing list archive (Re: [PATCH v2] git.txt: mention mailing list archive)

2018-09-28 Thread Jonathan Nieder
Junio C Hamano wrote:

> OK.  This unfortunately came a bit too late for today's integration
> cycle, so I'll leave this in my inbox and replace what is queued
> with it later.
>
> Unless there is another one to supersede this proposal comes before
> that happens, that is.
>
> Thanks.

Sounds good.  Thanks for the heads up, and thanks for the heads up.

Jonathan


[PATCH] git doc: direct bug reporters to mailing list archive (Re: [PATCH v2] git.txt: mention mailing list archive)

2018-09-28 Thread Jonathan Nieder
Subject: git doc: direct bug reporters to mailing list archive

The mailing list archive can help a user encountering a bug to tell
whether a recent regression has already been reported and whether a
longstanding bug has already had some discussion to start their
thinking.

Based-on-patch-by: Martin Ågren 
Improved-by: Junio C Hamano 
Signed-off-by: Jonathan Nieder 
---
Junio C Hamano wrote:
>> Jonathan Nieder wrote:

>>> Hm.  I think this encourages a behavior that I want to discourage:
>>> assuming that if a bug has already been reported then there's nothing
>>> more for the new user to add.
[...]
> Yup, in short I think any one of the above three is good enough.
> Let's just pick one and move on.  Unless somebody sends in an
> improvement that can be applied readily, by default I'll just "git
> am" the one Martin sent, as that is the easiest thing to do ;-).

I assume this is meant as a nudge, in which case nudge successful. ;-)

My experience is that bug reporters are very sensitive to hints the
project gives about what kind of bugs they want to receive.  I'd
rather make use of that lesson now instead of waiting to relearn it in
the wild.  Here goes.

Thanks, both.

 Documentation/git.txt | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/Documentation/git.txt b/Documentation/git.txt
index 74a9d7edb4..8e6a92e8ba 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -858,7 +858,9 @@ Reporting Bugs
 
 Report bugs to the Git mailing list  where the
 development and maintenance is primarily done.  You do not have to be
-subscribed to the list to send a message there.
+subscribed to the list to send a message there.  See the list archive
+at https://public-inbox.org/git for previous bug reports and other
+discussions.
 
 Issues which are security relevant should be disclosed privately to
 the Git Security mailing list .
-- 
2.19.0.605.g01d371f741



Re: Git for Windows for Unix?

2018-09-28 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:
> On Thu, Sep 27 2018, Jonathan Nieder wrote:

>> That said, that seems to me like a lot of work to avoid adding some
>> patches to "next" that belong in "next" anyway.  I understand why the
>> Git for Windows maintainer does not always have time to upstream
>> promptly, which is why I suggest working with him to find a way to
>> help with that.
>>
>> If there's something I'm missing and Git is actually an uncooperative
>> upstream like the cases you've mentioned, then I'd be happy to learn
>> about that so we can fix it, too.
>
> That's one and valid way to look at it, convergence would be ideal.
>
> Another way to look at it, which is closer to what I was thinking about,
> is to just view GFW as some alternate universe "next" branch (which by
> my count is ~2-3k commits ahead of master[1]).

You could view it that way, but I don't.  Many Git for Windows patches
have never even visited the Git mailing list.

Thanks,
Jonathan


Re: [PATCH] Documentation/CodingGuidelines: How to document new APIs

2018-09-27 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> There are different opinions on how to document an API properly.
> Discussion turns out nobody disagrees with documenting new APIs on the
> function level in the header file and high level concepts in
> Documentation/technical, so let's state that in the guidelines.

I disagree with this.  I think we should put all the API docs in the
header file, like we're already doing in e.g. strbuf.h.

Do you have a link to where in the discussion this split-docs strategy
was decided?

Thanks,
Jonathan


Re: [PATCH] FYI / RFC: submodules: introduce repo-like workflow

2018-09-27 Thread Jonathan Nieder
(dropping cc-s to my internal address that I don't use on this list
 and to git-c...@google.com which bounces)
Hi,

Stefan Beller wrote:

> Internally we have rolled out this as an experiment for
> "submodules replacing the repo tool[1]". The repo tool is described as:
>
> Repo unifies Git repositories when necessary, performs uploads to the
> Gerrit revision control system, and automates parts of the Android
> development workflow. Repo is not meant to replace Git, only to make
> it easier to work with Git in the context of Android. The repo command
> is an executable Python script that you can put anywhere in your path.
[...]
> Submodules can also be understood as unifying Git repositories, even more,
> in the superproject the submodules have relationships between their
> versions, which the repo tool only provides in releases.
>
> The repo tool does not provide these relationships between versions, but
> all the repositories (in case of Android think of ~1000 git repositories)
> are put in their place without depending on each other.
>
> This comes with a couple of advantages and disadvantages:

Thanks for describing this background.

[...]
> This changes the following commands in the superproject:
>
>   checkout -B/-b create branches in subs, too
>   checkout (-f): update branch in submodule (create if needed) and check
>  it out; Pass the argument down literally if it is a branch
>  name (e.g. "checkout -f master" will run a
> "checkout -f master" in the submodule as well)
>   clone: see checkout
>   reset --hard: see checkout

As you mentioned, I've been using this submodule.repoLike=true mode
for my own use for a while.  You did a nice job of explaining on how
it fits into a Gerrit-driven workflow; I'd like to add that I find it
pleasant in non-Gerrit-driven contexts as well.

The primary difference from repoLike=false is that this makes the
normal state to have branches checked out in submodules.  For example,
if I run

git checkout --recurse-submodules -B master origin/master

then this will create and check out a "master" branch in all
submodules instead of only in the superproject.  This helps avoid some
issues in Git's submodule handling where submodule commits can be
pruned if they have not been checked out in a while because there is
no ref pointing to them.

Some next steps:

- now that we have a repository object, some of the implementation can
  be simplified and made more robust.  I expect that will also make
  these patches easier to review

- also in the direction of reviewability, at that point we may want to
  split this into multiple patches

- gitsubmodules.txt and config.txt should describe the new option, to
  help new users understand what this new repoLike workflow does

- there are some edge cases in the UX that get... messy that I should
  describe in another message

All that said, thanks for sending this out, and I'd be happy to hear
from any interested people --- feedback from anyone adventurous enough
to try this out would be very welcome.

Happy hacking,
Jonathan


Re: [PATCH v2 0/4] Add proto v2 archive command with HTTP support

2018-09-27 Thread Jonathan Nieder
Stefan Beller wrote:
> On Wed, Sep 26, 2018 at 6:25 PM Josh Steadmon  wrote:

>> I've been discussing workarounds for this with Jonathan Nieder, but
>> please let me know if you have any suggestions for v3 of this series.
>
> Care to open the discussion to the list? What are the different
> approaches, what are the pros/cons?

Do you mean sending video of chatting in the office?

Josh and I discussed that

 1. Clients sending version=2 when they do not, in fact, speak protocol
v2 for a service is a (serious) bug.  (Separately from this
series) we should fix it.

 2. That bug is already in the wild, alas.  Fortunately the semantics of
GIT_PROTOCOL as a list of key/value pairs is well defined.  So we
have choices of (a) bump version to version=3 (b) pass another
value 'version=2:yesreallyversion=2' (c) etc.

 3. This is likely to affect push, too.

Thanks and hope that helps,
Jonathan


Re: Git for Windows for Unix?

2018-09-27 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> So it's similar to various packages that have "alternates" and are semi
> or permanently forked, like emacs & xemacs, JDK etc., although I can't
> recall one offhand that's quite similar to GFW v.s. git.git.
>
> My only stake in this is I thought it would be neat to be able to "apt
> install git-for-windows", but I understand there's a support burden, but
> if some *nix packagers are interested, maybe never taking it out of the
> Debian equivalent of "experimental" and saying "this is unsupported, go
> to the GFW tracker..." when bugs are filed would cut down on the support
> burden.

If someone else wants to package git-for-windows for Debian, I am
happy to offer them advice and will not stop them.

That said, that seems to me like a lot of work to avoid adding some
patches to "next" that belong in "next" anyway.  I understand why the
Git for Windows maintainer does not always have time to upstream
promptly, which is why I suggest working with him to find a way to
help with that.

If there's something I'm missing and Git is actually an uncooperative
upstream like the cases you've mentioned, then I'd be happy to learn
about that so we can fix it, too.

Sincerely,
Jonathan


Re: Git for Windows for Unix?

2018-09-27 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> GFW is a "friendly fork", but a permanent one it seems. The diff between
> it and 2.19.0 proper is ~10k lines, and e.g. this last release had
> experimental stash/rebase in C that 2.19.0 didn't.
>
> So it would be great if this were packaged up by linux distro as some
> "alterate" package of git. I'm putting Jonathan in the "To" line because
> I'm mainly interested in this for Debian, but maybe there's wider
> interest at git-packagers...

Please coordinate with Dscho to get these patches into "next" upstream.

Thanks and hope that helps,
Jonathan


Re: [PATCH v2] git.txt: mention mailing list archive

2018-09-27 Thread Jonathan Nieder
Hi,

Martin Ågren wrote:

> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
> @@ -859,6 +859,9 @@ Reporting Bugs
>  Report bugs to the Git mailing list  where the
>  development and maintenance is primarily done.  You do not have to be
>  subscribed to the list to send a message there.
> +If you want to check to see if the issue has
> +been reported already, the list archive can be found at
> + and other places.

Hm.  I think this encourages a behavior that I want to discourage:
assuming that if a bug has already been reported then there's nothing
more for the new user to add.

Especially because the mailing list is not an issue tracker, this
would make it too easy for the project to miss important bugs.

Can this say something more neutral, like

See the list archive at https://public-inbox.org/git/ for
previous bug reports and other discussions.

?  Or if we want to encourage a particular behavior, should we say
something about "To coordinate with others experiencing the same
problem" or something else that encourages joining in with the
thread instead of assuming it's taken care of?

Thanks,
Jonathan


Re: On shipping more of our technical docs as manpages

2018-09-27 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Wed, Sep 26, 2018 at 10:44:33PM +0200, Ævar Arnfjörð Bjarmason wrote:

>> In terms of getting an overview it's indistinguishable from
>> comments. I.e. there's nothing like an index of:
>>
>> man git-api-strbuf ==> working with strings
>> man git-api-sha1-array ==> list, iterate and binary lookup SHA1s
>
> I agree that is a problem, especially for contributors less familiar
> with the code base. But I think generating an index is a separate (and
> much easier) problem than formatting all of the documentation.
>
> We already have the /** convention I mentioned above. Could we have
> another micro-format like:
>
>   /** API:strbuf - working with strings */
>
> in each header file? That would make generating such an index pretty
> trivial.

Can you spell out the problem for me a little more?  E.g. if we had a
convention that the first comment in strbuf.h should say

/* strbuf - Git's standard string type */

or even just

/* Git's standard string type */

would that do the trick?

>> I'm also not in the habit of opening the *.h file for something, I
>> usually just start reading the *.c and only later discover there's some
>> API docs in the relevant *.h (or not).
>
> Interesting. I'm not totally opposed to putting the documentation
> alongside the actual code. It does feel a bit cluttered to me, but I
> think you're right that it keeps everything as close together as
> possible.

I've experienced projects following both conventions.  One thing I like
a lot about documentation in the header file is that it encourages
people to make the API documentation self-contained.  That is, you
describe the contract in a way that doesn't require reading the
implementation for details.

It took me a while to get used to this kind of convention but now I
really like it.  So I really do prefer to keep putting API
documentation in the header files in Git.  (Implementation
documentation in the source files is of course also very welcome.)

>> It means you can't avoid seeing it or updating it when source
>> spelunking, and it leaves header files short, which is useful when you'd
>> like to get a general API overview without all the docs. Our documented
>> headers are quite fat, e.g. strbuf.h is 60% of the size of strbuf.c, but
>> less than 20% when you strip the docs.
>
> I don't use folds in my editor, and I guess nobody else in this thread
> does, either. But they may be a reasonable tool for "wow, there are
> comments, declarations, and definitions all together and I just want to
> view one of them". In vim, try "set foldmethod=syntax" and then "zc" to
> close the folds.

I use kythe to get an outline view of the header files.

[...]
>> E.g. on Debian you can "apt install git-doc" and browse stuff like
>> file:///usr/share/doc/git-doc/technical/api-argv-array.html which is the
>> HTML formatted version, same for all the other api-*.txt docs.
>
> IMHO this is just silly. What am I as an end user going to do with
> api-argv-array.html?

In Debian we just ship all the docs.  For something like
technical/pack-heuristics, it's quite nice.  For the API docs, I think
they belong in the headers.

Thanks,
Jonathan


Re: [PATCH v2] mailmap: consistently normalize brian m. carlson's name

2018-09-24 Thread Jonathan Nieder
brian m. carlson wrote:
> On Mon, Sep 24, 2018 at 10:39:02AM -0700, Jonathan Nieder wrote:
>> brian m. carlson wrote:

>>> I think this commit message makes sense.
[...]
>>  What would it take to make the patch make
>> sense, too? ;-)
>
> I certainly didn't mean to imply a failing on your part for explaining
> the change adequately.  I've just always found the format confusing and
> I know others do, too.

No worries.  I took the opportunity because the patch isn't in "next"
yet so I was looking for a way to nudge it forward.  I think v2 is
simpler than v1.  Thanks for your help.

[...]
> This has been a really helpful explanation.  Thanks.
>
> Maybe I'll have some time over the next week or so to send a patch to
> the documentation to make it more understandable to past me.

Sounds good.  I think the (non-)handling of the 'name  name'
case is likely to be a bug.

>> How about this?
[...]
> Having read your explanation, this looks good.  Thanks for fixing this.

\o/ Thanks for looking it over.

Sincerely,
Jonathan


[PATCH v2] mailmap: consistently normalize brian m. carlson's name

2018-09-24 Thread Jonathan Nieder
v2.18.0-rc0~70^2 (mailmap: update brian m. carlson's email address,
2018-05-08) changed the mailmap to map

  sand...@crustytoothpaste.ath.cx
   -> brian m. carlson 

instead of

  sand...@crustytoothpaste.net
-> brian m. carlson 

That means the mapping

  Brian M. Carlson 
-> brian m. carlson 

is redundant, so we can remove it.  More importantly, it means that
the identity "Brian M. Carlson " used in
some commits is not normalized any more.  Add a mapping for it.

Noticed while updating Debian's Git packaging, which uses "git
shortlog --no-merges" to produce a list of changes in each version,
grouped by author's (normalized) name.

Signed-off-by: Jonathan Nieder 
---
Hi,

brian m. carlson wrote:

> I think this commit message makes sense.  I apparently still fail to
> understand how the .mailmap format works, so I can't tell you if the
> patch is correct.

Thanks for looking it over.  What would it take to make the patch make
sense, too? ;-)

Most mailmap entries are of the form

Some Name 

which means "Wherever you see the email address someem...@example.com,
canonicalize the author's name to Some Name".  We can use that:

brian m. carlson 

When we see sand...@crustytoothpaste.ath.cx, we also want to
canonicalize the email address.  For that, we can do

brian m. carlson  


There's only one person who has used these email addresses, so we
don't have to do matching by name.  If we wanted to tighten the name
normalization to match by name, I think we'd do something like

brian m. carlson  Brian M. Carlson

but I can't get that to seem to have any effect when I test with the
"git check-mailmap" command --- for example, "git check-mailmap 'Dana
How '" does not map and "git check-mailmap
'Random Name '" maps to 'Dana L. How
'.

The even tighter matching used in v1

brian m. carlson  Brian M. Carlson 


does work, but it's unnecessary complexity.  We don't need it.

How about this?

Changes since v1:
- loosened the matching to only look at email and ignore name
- no other changes

 .mailmap | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.mailmap b/.mailmap
index f165222a78..bef3352b0d 100644
--- a/.mailmap
+++ b/.mailmap
@@ -25,7 +25,7 @@ Ben Walton  
 Benoit Sigoure  
 Bernt Hansen  
 Brandon Casey  
-brian m. carlson  Brian M. Carlson 

+brian m. carlson 
 brian m. carlson  

 Bryan Larsen  
 Bryan Larsen  
-- 
2.19.0.444.g18242da7ef



Re: Hash algorithm analysis

2018-09-18 Thread Jonathan Nieder
Hi,

A quick note.

Joan Daemen wrote:

> when going over my todo list I was confronted with the mail of Dan
> Shumow on the successor of SHA-1 for git. I know the decision was
> made and it is not my intention to change it, but please see below
> some comments on Dan's arguments.

When the time comes for the next hash change in Git, it will be useful
to be able to look back over this discussion.  Thanks for adding
details.

[...]
> On 30/07/2018 22:01, Dan Shumow wrote:

>> So, I also want to state my biases in favor of SHA2 as an employee
>> of Microsoft. [...] As such, and reflecting this bias, in the
>> internal discussions that Johannes alluded to, SHA2 and SHA3 were
>> the primary suggestions.  There was a slight preference for SHA2
>> because SHA3 is not exposed through the windows cryptographic APIs
>> (though Git does not use those, so this is a nonissue for this
>> discussion.)
>
> We find it cynical to bring up a Microsoft-internal argument that is
> actually not relevant to Git.

On the contrary, I am quite grateful that Dan was up front about where
his preference comes from, *especially* when the reasons are not
relevant to Git.  It is useful background for better understanding his
rationale and understanding the ramifications for some subset of
users.

In other words, consider someone active in the Git project that
disagrees with the decision to use SHA2.  This explanation by Dan can
help such a person understand where the disagreement is coming from
and whether we are making the decision for the wrong reasons (because
Git on Windows does not even use those APIs).

[...]
> 3) The relatively large state in the sponge construction increases
> the generic strength against attacks when the input contains
> redundancy or has a certain form. For instance, if the input is
> restricted to be text in ASCII (such as source code), then the
> collision-resistance grows higher than the nominal 2^{c/2}. Such an
> effect does not exist with narrow-pipe Merkle-Damgård. (This may be
> what Linus had intuitively in mind.)

Interesting.

[...]
> [2] Daniel J. Bernstein, Cost analysis of hash collisions: Will
> quantum computers make SHARCS obsolete? Workshop Record of
> SHARCS'09.

I remember that paper!  Thanks for the pointer.

Sincerely,
Jonathan


Re: [Question] Alternative to git-lfs under go

2018-09-17 Thread Jonathan Nieder
(+cc: Taylor Blau, git-lfs expert)
Randall S. Becker wrote:
> On September 17, 2018 3:28 PM, Jonathan Nieder wrote:
>> Randall S. Becker wrote:

>>> Does anyone know whether it is practical to rework git-lfs under a
>>> language other than "go"? GCC is not even close to being possible to
>>> port to my NonStop platform (may have tried, some have died - joke -
>>> trying). I would like to convert this directly to C or something more
>>> widely portable. Is there a protocol doc out there I can reference?
>>
>> Can you say more about the context?  You might like
>>
>>  git clone --filter=blob:limit=512m 
>>
>> which tells Git to avoid downloading any blobs larger than 512 megabytes
>> until you know they need them.  See Documentation/technical/partial-clone.txt
>> for more details.
>
> Sorry, I was not clear. I am not having issues with large files or blob
> limits.  Members of my community wish to use Git LFS support on their
> enterprise git servers, so as platform maintainer for git on NonStop, I am
> trying to accommodate them. The stumbling block is that "Go" language will
> not port to the platform.

Thanks.  Then the answer is "no": there has not been a
reimplementation of Git LFS in another language, and my advice to
someone pursuing that would be to use the native Git feature described
above instead (or to get Go working on their platform).

If I'm reading
https://github.com/git-lfs/git-lfs/blob/master/CONTRIBUTING.md
correctly, the preferred way to contact the Git LFS maintainers is
using Github's issue tracker.

Thanks and sorry I don't have better news,
Jonathan


Re: [PATCH] t5551: compare sorted cookies files

2018-09-17 Thread Jonathan Nieder
Thomas Gummerer wrote:
> On 09/17, Junio C Hamano wrote:

>> The other
>> effort implicitly depends on the expected output is kept sorted, but
>> this one is more explicit---I tend to prefer this approach as tools
>> and automation is easier to maintain than having to remember that
>> the source must be sorted.
>
> I'm happy going with either patch, but if we want to go with mine, I'd
> like to make sure Todd is credited appropriately, as he sent a very
> similar patch first.  Not sure what the appropriate way here is
> though?

Thanks for asking.  Credit is a subject that is dear to my heart.

You can for example use
Reported-by: Todd Zullinger 

to credit him for the patch and analysis that appears to have helped
with reviews (and to signal that this fixes the bug he reported).

[...]
>>> --- a/t/t5551-http-fetch-smart.sh
>>> +++ b/t/t5551-http-fetch-smart.sh
>>> @@ -206,7 +206,7 @@ test_expect_success 'dumb clone via http-backend 
>>> respects namespace' '
>>>  cat >cookies.txt <>>  127.0.0.1  FALSE   /smart_cookies/ FALSE   0   othername   
>>> othervalue
>>>  EOF
>>> -cat >expect_cookies.txt <>> +cat expect_cookies.txt <<\EOF

?  That is simpler since it avoids a pipe and means the reader doesn't
have to look out for shell metacharacters like $ inside the text.

Bonus points if this kind of setup moves to inside the test (using
<<-\EOF), which can make the test script easier to read.

Thanks and hope that helps,
Jonathan


Re: [PATCH 3/3] doc/Makefile: drop doc-diff worktree and temporary files on "make clean"

2018-09-17 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> I'd rather that we revert this change altogether.  I have nothing
>> against a convenient command to do this kind of non build related
>> cleanup, but it shouldn't be spelled as "make clean".
>
> OK, let's do this for now as I wanted to merge the remainder to
> 'master' today.
>
> -- >8 --
> Subject: Revert "doc/Makefile: drop doc-diff worktree and temporary files on 
> "make clean""
>
> This reverts commit 6f924265a0bf6efa677e9a684cebdde958e5ba06, which
> started to require that we have an executable git available in order
> to say "make clean", which gives us a chicken-and-egg problem.
>
> Having to have Git installed, or be in a repository, in order to be
> able to run an optional "doc-diff" tool is fine.  Requiring either
> in order to run "make clean" is a different story.
>
> Reported by Jonathan Nieder .
> ---
>  Documentation/Makefile | 1 -
>  1 file changed, 1 deletion(-)

Does this want a sign-off?

In any event, this matches what I had applied in Debian[1] and is

Reviewed-by: Jonathan Nieder 

Thanks,
Jonathan

[1] 
http://repo.or.cz/git/debian.git/blob/refs/heads/debian-experimental:/debian/patches/0002-Revert-doc-Makefile-drop-doc-diff-worktree-and-tempor.diff
aka 
http://repo.or.cz/git/debian.git/blob/9872ab1d87634a9288266de290571928e5b9346f:/debian/patches/0002-Revert-doc-Makefile-drop-doc-diff-worktree-and-tempor.diff


Re: [PATCH 3/3] doc/Makefile: drop doc-diff worktree and temporary files on "make clean"

2018-09-17 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:
>> Eric Sunshine wrote:

>>> +   '$(SHELL_PATH_SQ)' ./doc-diff --clean
>>
>> This means I need a copy of git in order to run "make clean".  That
>> was never required before.  It makes bootstrapping difficult --- do we
>> really need it?
>
> Gahh, you are absolutely right.  Also "doc-diff --clean", if I am
> reading the code correctly, requires us to be in a Git repository,
> not a tarball extract.
>
> Having to have Git installed, or be in a repository, in order to be
> able to run an optional "doc-diff" tool is fine.  Requiring either
> in order to run "make clean" is a different story.
>
> Thanks for spotting.  We can just prefix the line with '-'?  Or does
> the script badly misbehave (due to lack of CEILING_DIRECTORY) when
> run in a tarball extract inside somebody else's repository?

I'd rather that we revert this change altogether.  I have nothing
against a convenient command to do this kind of non build related
cleanup, but it shouldn't be spelled as "make clean".

That said, for my particular use, prefixing with '-' would work okay.

Thanks,
Jonathan


Re: [Question] Alternative to git-lfs under go

2018-09-17 Thread Jonathan Nieder
Hi,

Randall S. Becker wrote:

> Does anyone know whether it is practical to rework git-lfs under a language
> other than "go"? GCC is not even close to being possible to port to my
> NonStop platform (may have tried, some have died - joke -  trying). I would
> like to convert this directly to C or something more widely portable. Is
> there a protocol doc out there I can reference?

Can you say more about the context?  You might like

 git clone --filter=blob:limit=512m 

which tells Git to avoid downloading any blobs larger than 512 megabytes
until you know they need them.  See Documentation/technical/partial-clone.txt
for more details.

Thanks and hope that helps,
Jonathan


Re: [PATCH v2] linear-assignment: fix potential out of bounds memory access

2018-09-17 Thread Jonathan Nieder
Thomas Gummerer wrote:

> Currently the 'compute_assignment()' function may read memory out
> of bounds, even if used correctly.  Namely this happens when we only
> have one column.
[...]
> Reported-by: ryenus 
> Helped-by: Derrick Stolee 
> Signed-off-by: Thomas Gummerer 
> ---
>  linear-assignment.c   | 6 ++
>  t/t3206-range-diff.sh | 5 +
>  2 files changed, 11 insertions(+)

Here's a bit of a non-review, but hopefully it pokes others into
doing a proper review.

I haven't carefully examined the checks you're adding here.  Your
goals seem to be right.  I've been running with this patch since last
Thursday, with no problems yet.  Thanks for writing it.

Sincerely,
Jonathan


Re: [PATCH 2/3] gc: exit with status 128 on failure

2018-09-17 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Tue, Jul 17, 2018 at 03:59:47PM -0400, Jeff King wrote:
>> On Mon, Jul 16, 2018 at 11:54:16PM -0700, Jonathan Nieder wrote:

>>> A value of -1 returned from cmd_gc gets propagated to exit(),
>>> resulting in an exit status of 255.  Use die instead for a clearer
>>> error message and a controlled exit.
>>
>> This feels a little funny because we know we're going to turn some of
>> these back in the next patch. That said, I'm OK with it, since this
>> version is already written.
>
> There's discussion elsewhere[1] of applying just up to patch 2.
>
> Do we still want to convert these cases to die() as their end-state?

IMHO yes, we do.  die() is the function that you can use to exit with
a fatal error.

If we want to get rid of die(), that would be a tree-wide effort, not
something that should hold up this patch.

[...]
> It also makes the code more flexible and lib-ifiable (since the caller
> can decide how to handle the errors). That probably doesn't matter much
> since this is all static-local to builtin/gc.c,

Exactly.  I'm a strong believer in http://wiki.c2.com/?YouArentGonnaNeedIt.

Thanks,
Jonathan


Re: [PATCH 3/3] doc/Makefile: drop doc-diff worktree and temporary files on "make clean"

2018-09-17 Thread Jonathan Nieder
Hi,

Eric Sunshine wrote:

> doc-diff creates a temporary working tree (git-worktree) and generates a
> bunch of temporary files which it does not remove since they act as a
> cache to speed up subsequent runs. Although doc-diff's working tree and
> generated files are not strictly build products of the Makefile (which,
> itself, never runs doc-diff), as a convenience, update "make clean" to
> clean up doc-diff's working tree and generated files along with other
> development detritus normally removed by "make clean".
>
> Signed-off-by: Eric Sunshine 
> ---
>  Documentation/Makefile | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/Documentation/Makefile b/Documentation/Makefile
> index a42dcfc745..26e268ae8d 100644
> --- a/Documentation/Makefile
> +++ b/Documentation/Makefile
> @@ -332,6 +332,7 @@ clean:
>   $(RM) SubmittingPatches.txt
>   $(RM) $(cmds_txt) $(mergetools_txt) *.made
>   $(RM) manpage-base-url.xsl
> + '$(SHELL_PATH_SQ)' ./doc-diff --clean

This means I need a copy of git in order to run "make clean".  That
was never required before.  It makes bootstrapping difficult --- do we
really need it?

Thanks,
Jonathan


Re: What's cooking in git.git (Sep 2018, #03; Fri, 14)

2018-09-17 Thread Jonathan Nieder
Jeff King wrote:

> Let me inject some more uncertainty, then. ;)
>
> If we are not going to do 3/3, then should 2/3 simply avoid passing "-1"
> back via return from main? I guess I don't have a strong opinion, but
> one of the things I noted was that we converted those die() calls
> introduced in 2/3 back into returns in 3/3. Do we want to leave it in
> the state where we are calling die() a lot more?

Would you mind replying in the patch thread instead of this what's
cooking email?

That way, I can understand your suggestion better in context, I can
find it more easily later, I would feel less bad about adding noise by
replying, etc.

Thanks,
Jonathan


[PATCH] mailmap: consistently normalize brian m. carlson's name

2018-09-17 Thread Jonathan Nieder
v2.18.0-rc0~70^2 (mailmap: update brian m. carlson's email address,
2018-05-08) changed the mailmap to map

  sand...@crustytoothpaste.ath.cx
   -> brian m. carlson 

instead of

  sand...@crustytoothpaste.net
-> brian m. carlson 

That means the mapping

  Brian M. Carlson 
-> brian m. carlson 

is redundant, so we can remove it.  More importantly, it means that
the identity "Brian M. Carlson " used in
some commits is not normalized any more.  Add a mapping for it.

Noticed while updating Debian's Git packaging, which uses "git
shortlog --no-merges" to produce a list of changes in each version,
grouped by author's (normalized) name.

Signed-off-by: Jonathan Nieder 
---
brian m. carlson wrote:
> On Tue, May 22, 2018 at 03:08:26PM -0700, Jonathan Nieder wrote:

>> These commits use author Brian M. Carlson .
>> Previously they matched
>>
>>  brian m. carlson  
>> 
>>
>> but now they don't match any line in the .mailmap file.
>>
>> Should we add a line
>>
>>  brian m. carlson 
>>
>> to handle these commits?
>
> Ah, good point.  Probably so.

Sorry to leave this hanging for so long.  How about this patch?

Thanks,
Jonathan

 .mailmap | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.mailmap b/.mailmap
index f165222a78..29047a110a 100644
--- a/.mailmap
+++ b/.mailmap
@@ -25,7 +25,7 @@ Ben Walton  
 Benoit Sigoure  
 Bernt Hansen  
 Brandon Casey  
-brian m. carlson  Brian M. Carlson 

+brian m. carlson  Brian M. Carlson 

 brian m. carlson  

 Bryan Larsen  
 Bryan Larsen  
-- 
2.19.0.397.gdd90340f6a



Re: [PATCH v1 1/2] t/*: fix pipe placement and remove \'s

2018-09-17 Thread Jonathan Nieder
Matthew DeVore wrote:

> Subject: t/*: fix pipe placement and remove \'s
>
> Where ever there was code in the tests like this:
>
>   foo \
>   | bar

Language nits:
- s/Where ever/Wherever/
- Git's commit messages use the present tense to describe the existing
  previous state of the codebase, as though reporting a bug.

Maybe something like

tests: standardize pipe placement

Instead of using a line-continuation and pipe on the second
line, take advantage of the shell's implicit line continuation
after a pipe character.  So for example, instead of

some long line \
| next line

use

some long line |
next line

At this point, it would be useful to say something about rationale ---
for example,

This better matches the coding style documented in
Documentation/CodingGuidelines and used in shell scripts
elsewhere in Git.

Except: is this documented in Documentation/CodingGuidelines?  Or,
better, is there a linter that we can run in the test-lint target of
t/Makefile to ensure we keep sticking to this style?

[...]
> --- a/t/lib-gpg.sh
> +++ b/t/lib-gpg.sh
> @@ -57,8 +57,8 @@ then
>   echo | gpgsm --homedir "${GNUPGHOME}" 2>/dev/null \
>   --passphrase-fd 0 --pinentry-mode loopback \
>   --import "$TEST_DIRECTORY"/lib-gpg/gpgsm_cert.p12 &&
> - gpgsm --homedir "${GNUPGHOME}" 2>/dev/null -K \
> - | grep fingerprint: | cut -d" " -f4 | tr -d '\n' > \
> + gpgsm --homedir "${GNUPGHOME}" 2>/dev/null -K |
> + grep fingerprint: | cut -d" " -f4 | tr -d '\n' > \
>   ${GNUPGHOME}/trustlist.txt &&

I think this would be more readable with one item from the pipeline
per line:

gpgsm --homedir ... |
grep ... |
cut ... |
tr ... >... &&

[...]
> --- a/t/t1006-cat-file.sh
> +++ b/t/t1006-cat-file.sh
> @@ -218,8 +218,8 @@ test_expect_success "--batch-check for a non-existent 
> hash" '
>  test "0042 missing
>  0084 missing" = \
>  "$( ( echo 0042;
> - echo_without_newline 0084; ) \
> -   | git cat-file --batch-check)"
> + echo_without_newline 0084; ) |
> +   git cat-file --batch-check)"

This test is problematic in a lot of ways.  Most importantly, it ignores
the exist status from git cat-file.

So it should say something like:

cat >expect <<-\EOF &&
foobar42 missing
foobar84 missing
EOF
printf "foobar42\nfoobar84" |
git cat-file --batch-check >actual &&
test_cmp expect actual

If we want to restrict to the pipeline style fixes, we could do

test "..." = "$(
{   # Couldn't resist changing the ( to {!
echo ... && # Couldn't resist changing the ; to &&!
echo_without_newline ...
} |
git cat-file --batch-check
)"

but unless there's a linter that we're helping support, it's probably
better to skip this file and use a dedicated patch to modernize its
style more generally.

[...]
> --- a/t/t1300-config.sh
> +++ b/t/t1300-config.sh
> @@ -1770,8 +1770,9 @@ test_expect_success '--show-origin stdin with file 
> include' '
>   cat >expect <<-EOF &&
>   file:$INCLUDE_DIR/stdin.include include
>   EOF
> - echo "[include]path=\"$INCLUDE_DIR\"/stdin.include" \
> - | git config --show-origin --includes --file - user.stdin 
> >output &&
> + echo "[include]path=\"$INCLUDE_DIR\"/stdin.include" |
> + git config --show-origin --includes --file - user.stdin >output &&

Okay.

[...]
> --- a/t/t5317-pack-objects-filter-objects.sh
> +++ b/t/t5317-pack-objects-filter-objects.sh
> @@ -20,17 +20,20 @@ test_expect_success 'setup r1' '
>  '
>  
>  test_expect_success 'verify blob count in normal packfile' '
> - git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 \
> - | awk -f print_2.awk \
> - | sort >expected &&
> + git -C r1 ls-files -s file.1 file.2 file.3 file.4 file.5 |
> + awk -f print_2.awk |
> + sort >expected &&

This loses the exit status from git, so we should make it write to a
temporary file instead (as a separate patch).

[...]
> - git -C r1 verify-pack -v ../all.pack \
> - | grep blob \
> - | awk -f print_1.awk \
> - | sort >observed &&
> +
> + git -C r1 verify-pack -v ../all.pack |

Likewise (and likewise for the rest in this file).

[...]
> --- a/t/t5500-fetch-pack.sh
> +++ b/t/t5500-fetch-pack.sh
> @@ -50,8 +50,9 @@ pull_to_client () {
>   case "$heads" 

Re: Git for games working group

2018-09-17 Thread Jonathan Nieder
Taylor Blau wrote:
> On Sun, Sep 16, 2018 at 03:05:48PM -0700, Jonathan Nieder wrote:
> > On Sun, Sep 16, 2018 at 11:17:27AM -0700, John Austin wrote:
> > > Taylor Blau wrote:

>>>> Right, though this still subjects the remote copy to all of the
>>>> difficulty of packing large objects (though Christian's work to support
>>>> other object database implementations would go a long way to help this).
>>>
>>> Ah, interesting -- I didn't realize this step was part of the
>>> bottleneck. I presumed git didn't do much more than perhaps gzip'ing
>>> binary files when it packed them up. Or do you mean the growing cost
>>> of storing the objects locally as you work? Perhaps that could be
>>> solved by allowing the client more control (ie. delete the oldest
>>> blobs that exist on the server).
>>
>> John, I believe you are correct.  Taylor, can you elaborate about what
>> packing overhead you are referring to?
>
> Jonathan, you are right. I was also referring about the increased time
> that Git would spend trying to find good packfile chains with larger,
> non-textual objects. I haven't done any hard benchmarking work on this,
> so it may be a moot point.

Ah, thanks.  See git-config(1):

core.bigFileThreshold
Files larger than this size are stored deflated,
without attempting delta compression.

Default is 512 MiB on all platforms.

If that's failing on your machine then it would be a bug, so we'd
definitely want to know.

Jonathan


Re: Question - no space in smtp-server-option

2018-09-16 Thread Jonathan Nieder
On Mon, Sep 17, 2018 at 03:27:21AM +0200, Chris Coutinho wrote:
> On Sep-16-18, Jonathan Nieder wrote:
>> Chris Coutinho wrote:

>>> Currently my gitconfig contains the following line:
>>>
>>> sendemail.smtpserveroption=-a
>>>
>>> Whereas, the following results in an 'account' not found error:
>>>
>>> sendemail.smtpserveroption=-a 
>>
>> Do you mean that your ~/.gitconfig literally contains that exact line?
[...]
> Yes that's the exact line in my gitconfig file, which correctly mails using
> the non-default account I'm after - I'm assuming you're noticing the lack of
> camelCase? To be honest, that came from zsh autosuggestions, which are all
> lower-case for some reason.

No, case shouldn't matter.  I'm noticing the it looks like

foo.bar=baz

instead of

[foo]
bar = baz

(i.e. it seems to be some syntax other than ini syntax).  E.g. I tried

echo a.b=c >test.config
git config -f test.config -l

and get

fatal: bad config line 1 in file test.config

Thanks,
Jonathan


Re: Question - no space in smtp-server-option

2018-09-16 Thread Jonathan Nieder
Hi,

Chris Coutinho wrote:

> Currently my gitconfig contains the following line:
>
>   sendemail.smtpserveroption=-a
>
> Whereas, the following results in an 'account' not found error:
>
>   sendemail.smtpserveroption=-a 

Do you mean that your ~/.gitconfig literally contains that exact line?
I would be surprised to hear that syntax works --- see [1] for the
syntax I would expect to work.

If you have more details, that would help.

Thanks,
Jonathan

[1] https://www.kernel.org/pub/software/scm/git/docs/git-config.html


Re: Git for games working group

2018-09-16 Thread Jonathan Nieder
Hi,

On Sun, Sep 16, 2018 at 11:17:27AM -0700, John Austin wrote:
> Taylor Blau wrote:

>> Right, though this still subjects the remote copy to all of the
>> difficulty of packing large objects (though Christian's work to support
>> other object database implementations would go a long way to help this).
>
> Ah, interesting -- I didn't realize this step was part of the
> bottleneck. I presumed git didn't do much more than perhaps gzip'ing
> binary files when it packed them up. Or do you mean the growing cost
> of storing the objects locally as you work? Perhaps that could be
> solved by allowing the client more control (ie. delete the oldest
> blobs that exist on the server).

John, I believe you are correct.  Taylor, can you elaborate about what
packing overhead you are referring to?

One thing I would like to see in the long run to help Git cope with
very large files is adding something similar to bup's "bupsplit" to
the packfile format (or even better, to the actual object format, so
that it affects object names).  In other words, using a rolling hash
to decide where to split a blob and use a tree-like structure so that
(1) common portions between files can deduplicated and (2) portions
can be hashed in parallel.  I haven't heard of these things being the
bottleneck for anyone in practice today, though.

Thanks,
Jonathan


  1   2   3   4   5   6   7   8   9   10   >