Re: which files are "known to git"?

2018-05-21 Thread Jonathan Nieder
Robert P. J. Day wrote:
> On Mon, 21 May 2018, Elijah Newren wrote:

>> Hi Robert,
>>
>> I had always assumed prior to your email that 'known to Git' meant
>> 'tracked' or 'recorded in the index'...
>
>   i *know* i've been in this discussion before, but i don't remember
> where, i *assume* it was on this list, and i recall someone (again,
> don't remember who) who opined that there are two categories of files
> that are "known to git":

My understanding was the same as Elijah's.

I would be in favor of a patch that replaces the phrase "known to Git"
in Git's documentation with something less confusing.

Thanks,
Jonathan


Re: issues(?) installing git-lfs via fedora "dnf" command

2018-05-21 Thread Jonathan Nieder
Hi,

Robert P. J. Day wrote:
> On Mon, 21 May 2018, Jonathan Nieder wrote:

>> The packager should be able to find out whether it's an issue in
>> git-lfs upstream and report it to that project if it is.  Git-lfs is
>> not part of git.git; it's a separate project:
>> https://github.com/git-lfs/git-lfs/blob/master/CONTRIBUTING.md
>> I believe they use github's issue tracker to track bugs.
>
>   it would *appear* that this is a combination of both a git issue,
> and a red hat packaging issue:
>
> https://bugzilla.redhat.com/show_bug.cgi?id=1580357
> https://github.com/git-lfs/git-lfs/issues/3013

Can you clarify?  Neither of those bugs points to a git issue if I
understand correctly.  They may be a git-lfs issue, though.

This kind of confusion is exactly why the Git project adopted a
trademark policy to prevent outside projects from identifying
themselves with Git: https://git-scm.com/about/trademark.
Unfortunately Git LFS's existence precedes that policy.  I believe
they got permission to continue using the name, confusing as it is.

Thanks,
Jonathan


Re: issues(?) installing git-lfs via fedora "dnf" command

2018-05-21 Thread Jonathan Nieder
Robert P. J. Day wrote:

>   $ sudo dnf install git-lfs
[...]
>   Running transaction
> Preparing:
> Installing   : git-lfs-2.4.0-1.fc28.x86_64
> Running scriptlet: git-lfs-2.4.0-1.fc28.x86_64
>   Error: Failed to call git rev-parse --git-dir --show-toplevel: "fatal:
>   not a git repository (or any of the parent directories): .git\n"
[...]
> is one supposed to be *in* a git repository when installing, because i
> was in fact at the top level of my linux kernel source repo, so i'm
> unclear on what that "Error" is trying to tell me. am i just being
> clueless? is this something that i should submit as a fedora packaging
> issue?

Yes, this looks like something that should be reported as a Fedora
packaging issue.

The packager should be able to find out whether it's an issue in
git-lfs upstream and report it to that project if it is.  Git-lfs is
not part of git.git; it's a separate project:
https://github.com/git-lfs/git-lfs/blob/master/CONTRIBUTING.md
I believe they use github's issue tracker to track bugs.

Thanks,
Jonathan


Re: [PATCH] git-submodule.sh: try harder to fetch a submodule

2018-05-15 Thread Jonathan Nieder
Stefan Beller wrote:

> I'll resend it with a warning (using say()).

Thanks, makes sense.

> I think we have 2 bugs and this is merely fixing the second bug.

I'm fearing that there are more than two.

[...]
>   $ git init confused-head
>   $ (cd confused-head && git branch test \
> $(git commit-tree $(git write-tree) -m test))
>   $ git clone --no-checkout  --depth=1 \
> --separate-git-dir=test.git confused-head/.git test
> Cloning into 'test'...
> warning: --depth is ignored in local clones; use file:// instead.
> done.
>
>   $ git -C test.git config remote.origin.fetch
>   $ echo $?
> 1
>
> (A) Despite the warning of --depth having no impact, the
>   omission thereof changes the repository state.
> (B) There is no remote.origin.fetch configuration, which
>   is weird. See builtin/clone.c:830, that states for this case:

I can reproduce the issue without submodules and without --local,
as follows:

git init --bare empty.git
git init --bare almost-empty.git
git -C ~/src/git push $(pwd)/almost-empty HEAD:refs/heads/upstream

git clone --single-branch file://$(pwd)/empty.git
git clone --single-branch file://$(pwd)/almost-empty.git

git -C almost-empty.git branch -D upstream

git -C empty fetch
git -C almost-empty fetch

Expected result:
Both fetches succeed.

Actual result:
First fetch succeeds, second produces
"fatal: Couldn't find remote ref HEAD".

Note that empty.git and almost-empty.git are basically identical.
The difference instead lies in the clones' .git/config files:

diff --git 1/empty/.git/config 2/almost-empty/.git/config
index b51bb0d..ee21198 100644
--- 1/empty/.git/config
+++ 2/almost-empty/.git/config
@@ -4,7 +4,4 @@
bare = false
logallrefupdates = true
 [remote "origin"]
-   url = file:///tmp/t/empty.git
-[branch "master"]
-   remote = origin
-   merge = refs/heads/master
+   url = file:///tmp/t/almost-empty.git

Thanks,
Jonathan


Re: [PATCH] git-submodule.sh: try harder to fetch a submodule

2018-05-11 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> This is the logical continuum of fb43e31f2b4 (submodule: try harder to
> fetch needed sha1 by direct fetching sha1, 2016-02-23) and fixes it as
> some assumptions were not correct.

Interesting.

I think what would help most is an example set of commands I can use
to reproduce this (bonus points if in the form of a test).

> > If $sha1 was not part of the default fetch ... fail ourselves here
> assumes that the fetch_in_submodule only fails when the serverside does

I'm having some trouble with the formatting here.  Is the part
preceded by '>' a quote, and if so a quote from what?

> not support fetching by sha1.
>
> There are other failures, why such a fetch may fail, such as
> fatal: Couldn't find remote ref HEAD
> which can happen if the remote side doesn't advertise HEAD. Not advertising

nit: it can be useful to have a blank line before and after such
example output to help both my eyes and tools like "git log
--format='%w(100)%b'" to understand the formatting.

> HEAD is allowed by the protocol spec and would happen, if HEAD points at a
> ref, that this user cannot see (due to ACLs for example).

A more typical example would be if the ref simply doesn't exist (i.e.,
is a branch yet to be born).

> So do try even harder for a submodule by ignoring the exit code of the
> first fetch and rather relying on the following is_tip_reachable to
> see if we try fetching again.
>
> Signed-off-by: Stefan Beller 
> ---
>  git-submodule.sh | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 24914963ca2..13b378a6c8f 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -614,7 +614,6 @@ cmd_update()
>   # is not reachable from a ref.
>   is_tip_reachable "$sm_path" "$sha1" ||
>   fetch_in_submodule "$sm_path" $depth ||

Is keeping the '||' at the end of this line intended?

> - die "$(eval_gettext "Unable to fetch in 
> submodule path '\$displaypath'")"
>  
>   # Now we tried the usual fetch, but $sha1 may
>   # not be reachable from any of the refs
>   is_tip_reachable "$sm_path" "$sha1" ||
>   fetch_in_submodule "$sm_path" $depth "$sha1" ||
>   die "$(eval_gettext "Fetched in submodule path 
> '\$displaypath', but it did not contain \$sha1. Direct fetching of that 
> commit failed.")"

Should this error message be changed?

Thanks,
Jonathan


Re: [PATCH] sha1dc: fix for compiling on AIX using IBM XLC compiler

2018-05-10 Thread Jonathan Nieder
(cc-ing Marc Stevens for real this time. Sorry for the noise)
Ævar Arnfjörð Bjarmason wrote:
> On Wed, May 09 2018, jens persson wrote:

>> Hello, first patch. I'm having trouble compiling on AIX using IBMs
>> compiler, leading to
>> unusable binaries. The following patch solved the problem for 2.17.0.
>> The patch below is cut&pasted via gmail to allow for firewalls, but
>> exists in an unmolested form on github:
>> https://github.com/MrShark/git/commit/44bfcaca6637e24548ec06f46fb6035a846b14af
>>
>> Best regards
>> /jp

Thanks for the patch.

>> Building on AIX using XLC every checkout gives an error:
>> fatal: pack is corrupted (SHA1 mismatch)
>> fatal: index-pack failed
>>
>> Back tracking it was introduced in 2.13.2, most likely in [1]
>>
>> Add a #ifdef guard based on macros defined at [2] and [3].
>>
>> Should perhaps __xlc__ should should be changed to or combined with _AIX
>> based on the behavour of GCC on AIX or XL C on Linux.
>>
>> 1. https://github.com/git/git/commit/6b851e536b05e0c8c61f77b9e4c3e7cedea39ff8
>> 2. 
>> https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/macros_platform.html
>> 3. 
>> https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/xlmacros.html
>>
>> Signed-off-by: jens persson 
>> ---
>>  sha1dc/sha1.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>> index 25eded139..68a8a0180 100644
>> --- a/sha1dc/sha1.c
>> +++ b/sha1dc/sha1.c
>> @@ -84,7 +84,7 @@
>>  /* Not under GCC-alike or glibc or *BSD or newlib */
>>  #elif (defined(__ARMEB__) || defined(__THUMBEB__) || defined(__AARCH64EB__) 
>> || \
>> defined(__MIPSEB__) || defined(__MIPSEB) || defined(_MIPSEB) || \
>> -   defined(__sparc))
>> +   defined(__sparc) || (defined(__powerpc) && defined(__xlc__)))

I wonder if there's a simpler way to detect this.

__powerpc seems orthogonal to the goal, since there are little-endian
powerpc machines.

It appears that XLC defines _BIG_ENDIAN on big-endian machines.  I
wonder if we should do

 #elif defined(_BYTE_ORDER) && defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN)
  ... as today ...
 #elif !defined(_BYTE_ORDER) && defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)
 # define SHA1DC_BIGENDIAN
 #elif !defined(_BYTE_ORDER) && !defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN)
  /* little endian. */
 #else
  ...

It also seems to me that Git should enable the #error in the
fallthrough case.  The test suite would catch this kind of problem but
it does not seem that everyone runs the test suite, so a compiler
error is more robust.  Is there a #define we can set to enable that?

>>  /*
>>   * Should define Big Endian for a whitelist of known processors. See
>>   * https://sourceforge.net/p/predef/wiki/Endianness/ and
>
> This patch looks sane to me, but we don't manage this software but
> instead try to pull it as-is from upstream at
> https://github.com/cr-marcstevens/sha1collisiondetection
>
> Which we could apply this, it would be much better if you could submit a
> pull request with this to upstream, and then we can update our copy as I
> did in e.g. 9936c1b52a ("sha1dc: update from upstream", 2017-07-01).

Thanks,
Jonathan


Re: [PATCH] sha1dc: fix for compiling on AIX using IBM XLC compiler

2018-05-09 Thread Jonathan Nieder
(+cc: Marc Stevens)
Ævar Arnfjörð Bjarmason wrote:
> On Wed, May 09 2018, jens persson wrote:

>> Hello, first patch. I'm having trouble compiling on AIX using IBMs
>> compiler, leading to
>> unusable binaries. The following patch solved the problem for 2.17.0.
>> The patch below is cut&pasted via gmail to allow for firewalls, but
>> exists in an unmolested form on github:
>> https://github.com/MrShark/git/commit/44bfcaca6637e24548ec06f46fb6035a846b14af
>>
>> Best regards
>> /jp

Thanks for the patch.

>> Building on AIX using XLC every checkout gives an error:
>> fatal: pack is corrupted (SHA1 mismatch)
>> fatal: index-pack failed
>>
>> Back tracking it was introduced in 2.13.2, most likely in [1]
>>
>> Add a #ifdef guard based on macros defined at [2] and [3].
>>
>> Should perhaps __xlc__ should should be changed to or combined with _AIX
>> based on the behavour of GCC on AIX or XL C on Linux.
>>
>> 1. https://github.com/git/git/commit/6b851e536b05e0c8c61f77b9e4c3e7cedea39ff8
>> 2. 
>> https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/macros_platform.html
>> 3. 
>> https://www.ibm.com/support/knowledgecenter/SSGH2K_13.1.3/com.ibm.xlc1313.aix.doc/compiler_ref/xlmacros.html
>>
>> Signed-off-by: jens persson 
>> ---
>>  sha1dc/sha1.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/sha1dc/sha1.c b/sha1dc/sha1.c
>> index 25eded139..68a8a0180 100644
>> --- a/sha1dc/sha1.c
>> +++ b/sha1dc/sha1.c
>> @@ -84,7 +84,7 @@
>>  /* Not under GCC-alike or glibc or *BSD or newlib */
>>  #elif (defined(__ARMEB__) || defined(__THUMBEB__) || defined(__AARCH64EB__) 
>> || \
>> defined(__MIPSEB__) || defined(__MIPSEB) || defined(_MIPSEB) || \
>> -   defined(__sparc))
>> +   defined(__sparc) || (defined(__powerpc) && defined(__xlc__)))

I wonder if there's a simpler way to detect this.

__powerpc seems orthogonal to the goal, since there are little-endian
powerpc machines.

It appears that XLC defines _BIG_ENDIAN on big-endian machines.  I
wonder if we should do

 #elif defined(_BYTE_ORDER) && defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN)
  ... as today ...
 #elif !defined(_BYTE_ORDER) && defined(_BIG_ENDIAN) && !defined(_LITTLE_ENDIAN)
 # define SHA1DC_BIGENDIAN
 #elif !defined(_BYTE_ORDER) && !defined(_BIG_ENDIAN) && defined(_LITTLE_ENDIAN)
  /* little endian. */
 #else
  ...

It also seems to me that Git should enable the #error in the
fallthrough case.  The test suite would catch this kind of problem but
it does not seem that everyone runs the test suite, so a compiler
error is more robust.  Is there a #define we can set to enable that?

>>  /*
>>   * Should define Big Endian for a whitelist of known processors. See
>>   * https://sourceforge.net/p/predef/wiki/Endianness/ and
>
> This patch looks sane to me, but we don't manage this software but
> instead try to pull it as-is from upstream at
> https://github.com/cr-marcstevens/sha1collisiondetection
>
> Which we could apply this, it would be much better if you could submit a
> pull request with this to upstream, and then we can update our copy as I
> did in e.g. 9936c1b52a ("sha1dc: update from upstream", 2017-07-01).

Thanks,
Jonathan


Re: Implementing reftable in Git

2018-05-09 Thread Jonathan Nieder
Carlos Martín Nieto wrote:
> On Wed, 2018-05-09 at 09:48 -0700, Jonathan Nieder wrote:

>> If you would like the patches at https://git.eclipse.org/r/q/topic:reftable
>> relicensed for Git's use so that you don't need to include that
>> license header, let me know.  Separate from any legal concerns, if
>> you're doing a straight port, a one-line comment crediting the JGit
>> project would still be appreciated, of course.
[...]
> Would you expect that this port would keep the Eclipse Distribution
> License or would it get relicensed to GPLv2?

I think you're way overcomplicating things.

The patches are copyright Google.  We can handle issues as they come.

Jonathan


Re: Implementing reftable in Git

2018-05-09 Thread Jonathan Nieder
Stefan Beller wrote:

> * We *might* be able to use reftables in negotiation later
>   ("client: Last I fetched, you said your latest transaction
>   number was '5' with the hash over all refs to be ;
>   server: ok, here are the refs and the pack, you're welcome").

Do you mean that reftable's reflog layout makes this easier?

It's not clear to me why this wouldn't work with the current
reflogs.

[...]
> On Wed, May 9, 2018 at 7:33 AM, Christian Couder
>  wrote:

>> During the last Git Merge conference last March Stefan talked about
>> reftable. In Alex Vandiver's notes [1] it is asked that people
>> announce it on the list when they start working on it,
>
> Mostly because many parties want to see it implemnented
> and were not sure when they could start implementing it.

And to coordinate / help each other!

[...]
> I volunteer for reviewing.

\o/

[...]
> With that said, please implement it in a way that it can not just be used as
> a refs backend, but can easily be re-used to write ref advertisements
> onto the wire?

Can you spell this out a little more for me?  At first glance it's not
obvious to me how knowing about this potential use would affect the
initial code.

Thanks,
Jonathan


Re: Implementing reftable in Git

2018-05-09 Thread Jonathan Nieder
Hi,

Christian Couder wrote:

> I might start working on implementing reftable in Git soon.

Yay!

[...]
> So I think the most straightforward and compatible way to do it would
> be to port the JGit implementation.

I suspect following the spec[1] would be even more compatible, since it
would force us to tighten the spec where it is unclear.

>It looks like the
> JGit repo and the reftable code there are licensed under the Eclipse
> Distribution License - v 1.0 [7] which is very similar to the 3-Clause
> BSD License also called Modified BSD License

If you would like the patches at https://git.eclipse.org/r/q/topic:reftable
relicensed for Git's use so that you don't need to include that
license header, let me know.  Separate from any legal concerns, if
you're doing a straight port, a one-line comment crediting the JGit
project would still be appreciated, of course.

That said, I would not be surprised if going straight from the spec is
easier than porting the code.

Thanks,
Jonathan

[1] 
https://eclipse.googlesource.com/jgit/jgit/+/master/Documentation/technical/reftable.md


[PATCH 2/2 v2] Makefile: quote $INSTLIBDIR when passing it to sed

2018-04-23 Thread Jonathan Nieder
f6a0ad4b (Makefile: generate Perl header from template file,
2018-04-10) moved some code for generating the 'use lib' lines at the
top of perl scripts from the $(SCRIPT_PERL_GEN) rule to a separate
GIT-PERL-HEADER rule.

This rule first populates INSTLIBDIR and then substitutes it into the
GIT-PERL-HEADER using sed:

INSTLIBDIR=... something ...
sed -e 's=@@INSTLIBDIR@@='$$INSTLIBDIR'=g' $< > $@

Because $INSTLIBDIR is not surrounded by double quotes, the shell
splits it at each space, causing errors if INSTLIBDIR contains a
space:

 sed: 1: "s=@@INSTLIBDIR@@=/usr/l ...": unescaped newline inside substitute 
pattern

Add back the missing double-quotes to make it work again.

Improved-by: Junio C Hamano 
Signed-off-by: Jonathan Nieder 
---
Hi,

Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> +++ b/Makefile
>> @@ -2108,7 +2108,7 @@ GIT-PERL-HEADER: $(PERL_HEADER_TEMPLATE) 
>> GIT-PERL-DEFINES Makefile
>>  INSTLIBDIR_EXTRA='$(PERLLIB_EXTRA_SQ)' && \
>>  INSTLIBDIR="$$INSTLIBDIR$${INSTLIBDIR_EXTRA:+:$$INSTLIBDIR_EXTRA}" && \
>>  sed -e 's=@@PATHSEP@@=$(pathsep)=g' \
>> --e 's=@@INSTLIBDIR@@='$$INSTLIBDIR'=g' \
>> +-e 's=@@INSTLIBDIR@@='"$$INSTLIBDIR"'=g' \
>
> Good find.  FWIW, I'd find it a lot easier to read if the whole
> thing were enclosed inside a single pair of dq.

Thanks. I agree, so here's an updated version doing that.

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 2327ccb906..5e25441861 100644
--- a/Makefile
+++ b/Makefile
@@ -2116,7 +2116,7 @@ GIT-PERL-HEADER: $(PERL_HEADER_TEMPLATE) GIT-PERL-DEFINES 
Makefile
INSTLIBDIR_EXTRA='$(PERLLIB_EXTRA_SQ)' && \
INSTLIBDIR="$$INSTLIBDIR$${INSTLIBDIR_EXTRA:+:$$INSTLIBDIR_EXTRA}" && \
sed -e 's=@@PATHSEP@@=$(pathsep)=g' \
-   -e 's=@@INSTLIBDIR@@='$$INSTLIBDIR'=g' \
+   -e "s=@@INSTLIBDIR@@=$$INSTLIBDIR=g" \
-e 's=@@PERLLIBDIR@@='$(perllibdir_SQ)'=g' \
-e 's=@@PERLLIBDIR_REL@@=$(perllibdir_relative_SQ)=g' \
-e 's=@@GITEXECDIR_REL@@=$(gitexecdir_relative_SQ)=g' \
-- 
2.17.0.441.gb46fe60e1d



[PATCH 2/2] Makefile: quote $INSTLIBDIR when passing it to sed

2018-04-23 Thread Jonathan Nieder
f6a0ad4b (Makefile: generate Perl header from template file,
2018-04-10) moved code for generating the 'use lib' lines at the top
of perl scripts from the $(SCRIPT_PERL_GEN) rule to a separate
GIT-PERL-HEADER rule.

This rule first populates INSTLIBDIR and then substitutes it into the
GIT-PERL-HEADER using sed:

INSTLIBDIR=... something ...
sed -e 's=@@INSTLIBDIR@@='$$INSTLIBDIR'=g' $< > $@

Because $INSTLIBDIR is not surrounded by double quotes, the shell
splits it at each space, causing errors if INSTLIBDIR contains an $IFS
character:

 sed: 1: "s=@@INSTLIBDIR@@=/usr/l ...": unescaped newline inside substitute 
pattern

Add back the missing double-quotes to make it work again.

Signed-off-by: Jonathan Nieder 
---
Thanks for reading.

 Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Makefile b/Makefile
index 8f4cb506ff..727eca5d0a 100644
--- a/Makefile
+++ b/Makefile
@@ -2108,7 +2108,7 @@ GIT-PERL-HEADER: $(PERL_HEADER_TEMPLATE) GIT-PERL-DEFINES 
Makefile
INSTLIBDIR_EXTRA='$(PERLLIB_EXTRA_SQ)' && \
INSTLIBDIR="$$INSTLIBDIR$${INSTLIBDIR_EXTRA:+:$$INSTLIBDIR_EXTRA}" && \
sed -e 's=@@PATHSEP@@=$(pathsep)=g' \
-   -e 's=@@INSTLIBDIR@@='$$INSTLIBDIR'=g' \
+   -e 's=@@INSTLIBDIR@@='"$$INSTLIBDIR"'=g' \
-e 's=@@PERLLIBDIR_REL@@=$(perllibdir_relative_SQ)=g' \
-e 's=@@GITEXECDIR_REL@@=$(gitexecdir_relative_SQ)=g' \
-e 's=@@LOCALEDIR_REL@@=$(localedir_relative_SQ)=g' \
-- 
2.17.0.441.gb46fe60e1d



[PATCH 1/2] Makefile: remove unused @@PERLLIBDIR@@ substitution variable

2018-04-23 Thread Jonathan Nieder
Junio noticed that this variable is not quoted correctly when it is
passed to sed.  As a shell-quoted string, it should be inside
single-quotes like $(perllibdir_relative_SQ), not outside them like
$INSTLIBDIR.

In fact, this substitution variable is not used.  Simplify by removing
it.

Reported-by: Junio C Hamano 
Signed-off-by: Jonathan Nieder 
---
An unrelated cleanup noticed while looking over this code.

 Makefile | 1 -
 1 file changed, 1 deletion(-)

diff --git a/Makefile b/Makefile
index 154929f1c8..8f4cb506ff 100644
--- a/Makefile
+++ b/Makefile
@@ -2109,7 +2109,6 @@ GIT-PERL-HEADER: $(PERL_HEADER_TEMPLATE) GIT-PERL-DEFINES 
Makefile
INSTLIBDIR="$$INSTLIBDIR$${INSTLIBDIR_EXTRA:+:$$INSTLIBDIR_EXTRA}" && \
sed -e 's=@@PATHSEP@@=$(pathsep)=g' \
-e 's=@@INSTLIBDIR@@='$$INSTLIBDIR'=g' \
-   -e 's=@@PERLLIBDIR@@='$(perllibdir_SQ)'=g' \
-e 's=@@PERLLIBDIR_REL@@=$(perllibdir_relative_SQ)=g' \
-e 's=@@GITEXECDIR_REL@@=$(gitexecdir_relative_SQ)=g' \
-e 's=@@LOCALEDIR_REL@@=$(localedir_relative_SQ)=g' \
-- 
2.17.0.441.gb46fe60e1d



[PATCH dj/runtime-prefix 0/2] Handle $IFS in $INSTLIBDIR

2018-04-23 Thread Jonathan Nieder
Hi,

Johannes Sixt wrote:
> Am 05.12.2017 um 22:35 schrieb Junio C Hamano:
> > Dan Jacques  writes:

>>> Thanks for checking! The patch that you quoted above looks like it's from
>>> this "v4" thread; however, the patch that you are diffing against in your
>>> latest reply seems like it is from an earlier version.
>>>
>>> I believe that the $(pathsep) changes in your proposed patch are already
>>> present in v4,...
>>
>> You're of course right.  The patches I had in my tree are outdated.
>>
>> Will replace, even though I won't be merging them to 'pu' while we
>> wait for Ævar's perl build procedure update to stabilize.
>
> The updated series works for me now. Nevertheless, I suggest to squash
> in the following change to protect against IFS and globbing characters in
> $INSTLIBDIR.
>
> diff --git a/Makefile b/Makefile
> index 7ac4458f11..08c78a1a63 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -2072,7 +2072,7 @@ GIT-PERL-HEADER: $(PERL_HEADER_TEMPLATE) 
> GIT-PERL-DEFINES perl/perl.mak Makefile
>   INSTLIBDIR_EXTRA='$(PERLLIB_EXTRA_SQ)' && \
>   INSTLIBDIR="$$INSTLIBDIR$${INSTLIBDIR_EXTRA:+:$$INSTLIBDIR_EXTRA}" && \
>   sed -e 's=@@PATHSEP@@=$(pathsep)=g' \
> - -e 's=@@INSTLIBDIR@@='$$INSTLIBDIR'=g' \
> +     -e 's=@@INSTLIBDIR@@='"$$INSTLIBDIR"'=g' \
>   -e 's=@@GITEXECDIR@@=$(gitexecdir_relative_SQ)=g' \
>   -e 's=@@PERLLIBDIR@@=$(perllibdir_relative_SQ)=g' \

I just ran into this.  Here's a pair of patches to fix it.

Thanks,
Jonathan Nieder (2):
  Makefile: remove unused substitution variable @@PERLLIBDIR@@
  Makefile: quote $INSTLIBDIR when passing it to sed

 Makefile | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

-- 
2.17.0.441.gb46fe60e1d



Re: [RFC 00/10] Make .the gitmodules file path configurable

2018-04-23 Thread Jonathan Nieder
Hi,

Antonio Ospite wrote:

> vcsh[1] uses bare git repositories and detached work-trees to manage
> *distinct* sets of configuration files directly into $HOME.

Cool!  I like the tooling you're creating for this, though keep in mind
that Git has some weaknesses as a tool for deployment.

In particular, keep in mind that when git updates a file, there is a
period of time while it is missing from the filesystem, which can be
problematic for dotfiles.

[...]
> However when multiple repositories take turns using the same directory
> as their work-tree, and more than one of them want to use submodules,
> there could still be conflicts about the '.gitmodules' file because git
> hardcodes this path.
>
> For comparison, in case of '.gitignore' a similar conflict might arise,
> but git has alternative ways to specify exclude files, so vcsh solves
> this by setting core.excludesFile for each repository and track ignored
> files somewhere else (in ~/.gitignore.d/$VCSH_REPO_NAME).

For reference:

core.excludesFile
Specifies the pathname to the file that contains
patterns to describe paths that are not meant to be
tracked, in addition to .gitignore (per-directory) and
.git/info/exclude. Defaults to
$XDG_CONFIG_HOME/git/ignore. If $XDG_CONFIG_HOME is
either not set or empty, $HOME/.config/git/ignore is
used instead. See gitignore(5).

Using this as a substitute for /.gitignore is a bit of a
hack.  It happens to work, though, so reading on. :)

[...]
> So this series proposes a mechanism to set an alternative path for the
> submodules configuration file (from now on "gitmodules file").

I am nervous about this.  I wonder if there is another way to
accomplish the goal.

One possibility would be to handle the case where .gitmodules is
excluded by a sparse checkout specification and use .gitmodules from
the index in that case.  Would that work for you?

Thanks,
Jonathan


Re: [RFC/PATCH] upload-pack: disable object filtering when disabled by config

2018-03-29 Thread Jonathan Nieder
Jeff King wrote:
> On Wed, Mar 28, 2018 at 01:33:03PM -0700, Jonathan Nieder wrote:

>> When upload-pack gained partial clone support (v2.17.0-rc0~132^2~12,
>> 2017-12-08), it was guarded by the uploadpack.allowFilter config item
>> to allow server operators to control when they start supporting it.
>>
>> That config item didn't go far enough, though: it controls whether the
>> 'filter' capability is advertised, but if a (custom) client ignores
>> the capability advertisement and passes a filter specification anyway,
>> the server would handle that despite allowFilter being false.
[...]
> Great catch. If I understand correctly, this is an issue in the 2.17.0
> release candidates. Is this worth delaying the release?

Yes, IMHO this is a release blocker.  (To put it another way, if this is
going to delay the release, then we need to revert that patch.)

Thanks,
Jonathan


[RFC/PATCH] upload-pack: disable object filtering when disabled by config

2018-03-28 Thread Jonathan Nieder
When upload-pack gained partial clone support (v2.17.0-rc0~132^2~12,
2017-12-08), it was guarded by the uploadpack.allowFilter config item
to allow server operators to control when they start supporting it.

That config item didn't go far enough, though: it controls whether the
'filter' capability is advertised, but if a (custom) client ignores
the capability advertisement and passes a filter specification anyway,
the server would handle that despite allowFilter being false.

This is particularly significant if a security bug is discovered in
this new experimental partial clone code.  Installations without
uploadpack.allowFilter ought not to be affected since they don't
intend to support partial clone, but they would be swept up into being
vulnerable.

Simplify and limit the attack surface by making uploadpack.allowFilter
disable the feature, not just the advertisement of it.

NEEDSWORK: tests

Signed-off-by: Jonathan Nieder 
---
Noticed while reviewing the corresponding JGit code.

If this change seems like a good idea, I can add tests and re-send it
for real.

Thanks,
Jonathan

 Documentation/config.txt | 2 +-
 upload-pack.c| 8 
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ce9102cea8..4e0cff87f6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -3364,7 +3364,7 @@ uploadpack.packObjectsHook::
stdout.
 
 uploadpack.allowFilter::
-   If this option is set, `upload-pack` will advertise partial
+   If this option is set, `upload-pack` will support partial
clone and partial fetch object filtering.
 +
 Note that this configuration variable is ignored if it is seen in the
diff --git a/upload-pack.c b/upload-pack.c
index f51b6cfca9..4a82602be5 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -69,7 +69,7 @@ static int stateless_rpc;
 static const char *pack_objects_hook;
 
 static int filter_capability_requested;
-static int filter_advertise;
+static int allow_filter;
 static struct list_objects_filter_options filter_options;
 
 static void reset_timeout(void)
@@ -846,7 +846,7 @@ static void receive_needs(void)
no_progress = 1;
if (parse_feature_request(features, "include-tag"))
use_include_tag = 1;
-   if (parse_feature_request(features, "filter"))
+   if (allow_filter && parse_feature_request(features, "filter"))
filter_capability_requested = 1;
 
o = parse_object(&oid_buf);
@@ -976,7 +976,7 @@ static int send_ref(const char *refname, const struct 
object_id *oid,
 " allow-reachable-sha1-in-want" : "",
 stateless_rpc ? " no-done" : "",
 symref_info.buf,
-filter_advertise ? " filter" : "",
+allow_filter ? " filter" : "",
 git_user_agent_sanitized());
strbuf_release(&symref_info);
} else {
@@ -1056,7 +1056,7 @@ static int upload_pack_config(const char *var, const char 
*value, void *unused)
if (!strcmp("uploadpack.packobjectshook", var))
return git_config_string(&pack_objects_hook, var, 
value);
} else if (!strcmp("uploadpack.allowfilter", var)) {
-   filter_advertise = git_config_bool(var, value);
+   allow_filter = git_config_bool(var, value);
}
return parse_hide_refs_config(var, value, "uploadpack");
 }
-- 
2.17.0.rc1.321.gba9d0f2565



Re: [PATCH] branch: implement shortcut to delete last branch

2018-03-27 Thread Jonathan Nieder
Hi,

Aaron Greenberg wrote:

> This patch gives git-branch the ability to delete the previous
> checked-out branch using the "-" shortcut. This shortcut already exists
> for git-checkout, git-merge, and git-revert. A common workflow is
>
> 1. Do some work on a local topic-branch and push it to a remote.
> 2. 'remote/topic-branch' gets merged in to 'remote/master'.
> 3. Switch back to local master and fetch 'remote/master'.
> 4. Delete previously checked-out local topic-branch.

Thanks for a clear example.

[...]
>  builtin/branch.c  | 3 +++
>  t/t3200-branch.sh | 8 
>  2 files changed, 11 insertions(+)
[...]
> With the approvals listed in [*1*] and in accordance with the
> guidelines set out in Documentation/SubmittingPatches, I am submitting
> this patch to be applied upstream.
>
> After work on this patch is done, I'll look into picking up where the
> prior work done in [*2*] left off.
>
> Is there anything else that needs to be done before this can be
> accepted?
>
> [Reference]
>
> *1* 
> https://public-inbox.org/git/1521844835-23956-2-git-send-emai...@aaronjgreenberg.com/
> *2* 
> https://public-inbox.org/git/1488007487-12965-1-git-send-email-kannan.siddhart...@gmail.com/

For the future, please don't use a separate cover letter message in a
single-patch series like this one.  Instead, please put any discussion
that you don't want to go in the commit message after the three-dash
divider in the same message as the patch, like the diffstat.  See the
section "Sending your patches" in Documentation/SubmittingPatches for
more details:

| You often want to add additional explanation about the patch,
| other than the commit message itself.  Place such "cover letter"
| material between the three-dash line and the diffstat.  For
| patches requiring multiple iterations of review and discussion,
| an explanation of changes between each iteration can be kept in
| Git-notes and inserted automatically following the three-dash
| line via `git format-patch --notes`.

That makes it easier for reviewers to see all the information in one
place and in particular can help them in fleshing out the commit
message if it is missing details.

[...]
> diff --git a/builtin/branch.c b/builtin/branch.c
> index 6d0cea9..9e37078 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -221,6 +221,9 @@ static int delete_branches(int argc, const char **argv, 
> int force, int kinds,
>   char *target = NULL;
>   int flags = 0;
>  
> + if (!strcmp(argv[i], "-"))
> + argv[i] = "@{-1}";
> +
>   strbuf_branchname(&bname, argv[i], allowed_interpret);

This makes me wonder: should the "-" shortcut be handled in
strbuf_branchname itself?  That would presumably simplify callers like
this one.

[...]
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -776,6 +776,14 @@ test_expect_success 'deleting currently checked out 
> branch fails' '
>   test_must_fail git branch -d my7
>  '
>  
> +test_expect_success 'test deleting last branch' '
> + git checkout -b my7.1 &&

This naming scheme feels likely to conflict with other patches.
How about something like

git checkout -B previous &&
git checkout -B new-branch &&
git show-ref --verify refs/heads/previous &&
git branch -d - &&
test_must_fail git show-ref --verify refs/heads/previous

?

> + git checkout  - &&
> + test_path_is_file .git/refs/heads/my7.1 &&
> + git branch -d - &&
> + test_path_is_missing .git/refs/heads/my7.1

not specific to this test, but this is relying on low-level details
and means that an implementation that e.g. deleted a loose ref but
kept a packed ref would pass the test despite being broken.

Some of the other tests appear to use show-ref, so that might work
well.

No need to act on this, since what you have here is at least
consistent with some of the other tests in the file.  In other words,
it might be even better to address this throughout the file in a
separate patch.

> +'
> +

A few questions that the tests leave unanswered for me:

 1. Does "git branch -d -" refuse to delete an un-merged branch
like "git branch -d topic" would?  (That seems like a valuable
thing to test for typo protection reasons.)

 2. What happens if there is no previous branch, as in e.g. a new
clone?

 3. What does the error message look like when it cannot delete the
previous branch for whatever reason?  Does it identify the branch
that can't be deleted?

>  test_expect_success 'test --track without .fetch entries' '
>   git branch --track my8 &&
>   test "$(git config branch.my8.remote)" &&

Thanks and hope that helps,
Jonathan


Re: [PATCH v3] git{,-blame}.el: remove old bitrotting Emacs code

2018-03-27 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote[1]:

> The git-blame.el mode has been superseded by Emacs's own
> vc-annotate (invoked by C-x v g). Users of the git.el mode are now
> much better off using either Magit or the Git backend for Emacs's own
> VC mode.
>
> These modes were added over 10 years ago when Emacs's own Git support
> was much less mature, and there weren't other mature modes in the wild
> or shipped with Emacs itself.
>
> These days these modes have few if any users, and users of git aren't
> well served by us shipping these (some OS's install them alongside git
> by default, which is confusing and leads users astray).
>
> So let's remove these per Alexandre Julliard's message to the
> ML[1]. If someone still wants these for some reason they're better
> served by hosting these elsewhere (e.g. on ELPA), instead of us
> distributing them with git.

The trouble with removing these so abruptly is that it makes for a bad
user experience.

  Warning (initialization): An error occurred while loading ‘/home/jrn/.emacs’:

  File error: Cannot open load file, No such file or directory, git

In some sense that is the distributor's fault: just because Git
upstream stops removing the git.el file doesn't mean that the
distributor needs to.  But the same thing would happen if the user
symlinked git.el into a place that emacs could find when using
upstream Git directly.  And we are putting the distributor in a bad
place.

Ami Fischman (cc-ed) writes:

| IMO a placeholder git.el that did something like:
|
|   (error "git.el is no more; replace (require 'git) with (require 'magit) or
|   simply delete the former in your initialization file(s)")
|
| ideally with a pointer to a short URL explaining the rationale would have
| been fine.
| (note that though I've seen
| https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=893734 I'm _still_ unclear
| as to why the change was made; you might want to clarify in that bug and
| point to it from this, or something else)

What do you think?  Would adding such a placeholder during a
transitional period work well for you?

Thanks,
Jonathan

[1] https://public-inbox.org/git/20180310184545.16950-1-ava...@gmail.com/


Re: get commit ID from a tree object ID

2018-03-26 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:
> On Sat, Mar 17, 2018 at 10:57 AM Junio C Hamano  wrote:
>> Jeff King  writes:

>>> If you want to dig further, you can use the diff machinery to show which
>>> commit introduced a particular tree, like:
>>>
>>>   git rev-list --all |
>>>   git diff-tree --stdin --pretty=raw --raw -t -r |
>>>   less +/$desired_tree
>>>
>>> That "less" will find the mentioned tree, and then you'll have to
>>> manually read the commit. It would be possible to do it mechanically
>>> with a short perl script, but I'll leave that as an exercise for the
>>> reader.
>
>> Before Stefan jumps in ;-) I wonder if a recently materialized
>> "find-object" option to the diff family can be used here as a
>> sugar-coated way.
>
> I am late to jump in, but testing the 'git log --find-object'
> seems to have issues with trees named by sha1 here,
> but the named tree via : still seems to work.

Experimenting a little more, I wondered if "-t" wasn't being passed
by default:

  $ git --oneline log -t --find-object=$(git rev-parse 
HEAD~30:Documentation/technical)
  $

No, that's not it.  Could it have to do with merges?

  $ git log --oneline -m --first-parent --find-object=$(git rev-parse 
HEAD~30:Documentation/technical)
  df6cfe8c30 Merge branch 'debian-experimental' into google3
  f86d5fd2e4 Merge branch 'debian-experimental' into google3

Yes.

That doesn't explain why : worked for you, though. :)

Thanks,
Jonathan


Including object type and size in object id (Re: Git Merge contributor summit notes)

2018-03-26 Thread Jonathan Nieder
(administrivia: please omit parts of the text you are replying to that
 are not relevant to the reply.  This makes it easier to see what you're
 replying to, especially in mail readers that don't hide quoted text by
 the default)
Hi Jeff,

Jeff Hostetler wrote:
[long quote snipped]

> While we are converting to a new hash function, it would be nice
> if we could add a couple of fields to the end of the OID:  the object
> type and the raw uncompressed object size.
>
> If would be nice if we could extend the OID to include 6 bytes of data
> (4 or 8 bits for the type and the rest for the raw object size), and
> just say that an OID is a {hash,type,size} tuple.
>
> There are lots of places where we open an object to see what type it is
> or how big it is.  This requires uncompressing/undeltafying the object
> (or at least decoding enough to get the header).  In the case of missing
> objects (partial clone or a gvfs-like projection) it requires either
> dynamically fetching the object or asking an object-size-server for the
> data.
>
> All of these cases could be eliminated if the type/size were available
> in the OID.

This implies a limit on the object size (e.g. 5 bytes in your
example).  What happens when someone wants to encode an object larger
than that limit?

This also decreases the number of bits available for the hash, but
that shouldn't be a big issue.

Aside from those two, I don't see any downsides.  It would mean that
tree objects contain information about the sizes of blobs contained
there, which helps with virtual file systems.  It's also possible to
do that without putting the size in the object id, but maybe having it
in the object id is simpler.

Will think more about this.

Thanks for the idea,
Jonathan


Per-object encryption (Re: Git Merge contributor summit notes)

2018-03-26 Thread Jonathan Nieder
Hi Ævar,

Ævar Arnfjörð Bjarmason wrote:

> It occurred to me recently that once we have such a layer it could be
> (ab)used with some relatively minor changes to do any arbitrary
> local-to-remote object content translation, unless I've missed something
> (but I just re-read hash-function-transition.txt now...).
>
> E.g. having a SHA-1 (or NewHash) local repo, but interfacing with a
> remote server so that you upload a GPG encrypted version of all your
> blobs, and have your trees reference those blobs.

Interesting!

To be clear, this would only work with deterministic encryption.
Normal GPG encryption would not have the round-tripping properties
required by the design.

If I understand correctly, it also requires both sides of the
connection to have access to the encryption key.  Otherwise they
cannot perform ordinary operations like revision walks.  So I'm not
seeing a huge advantage over ordinary transport-layer encryption.

That said, it's an interesting idea --- thanks for that.  I'm changing
the subject line since otherwise there's no way I'll find this again. :)

Jonathan


Re: [PATCH v3] json_writer: new routines to create data in JSON format

2018-03-23 Thread Jonathan Nieder
g...@jeffhostetler.com wrote:

> From: Jeff Hostetler 
>
> Add basic routines to generate data in JSON format.
>
> Signed-off-by: Jeff Hostetler 

If I understand the cover letter correctly, this is a JSON-like
format, not actual JSON.  Am I understanding correctly?

What are the requirements for consuming this output?  Will a typical
JSON library be able to handle it without trouble?  If not, is there
some subset of cases where a typical JSON library is able to handle it
without trouble?

Can you say a little about the motivation here?  (I know you already
put some of that in the cover letter, but since that doesn't become
part of permanent history, it doesn't help the audience that matters.)

This would also be easier to review if there were an application of it
in the same series.  It's fine to send an RFC like this without such
an application, but I think we should hold off on merging it until we
have one.  Having an application makes review much easier since we can
see how the API works in practice, how well the approach fits what
users need, etc.

Thanks and hope that helps,
Jonathan


Re: [PATCH v3] routines to generate JSON data

2018-03-23 Thread Jonathan Nieder
Hi,

g...@jeffhostetler.com wrote:

> From: Jeff Hostetler 
>
> This is version 3 of my JSON data format routines.
>
> This version addresses the variable name changes in [v2] and adds additional
> test cases.  I also changed the BUG() calls to die() to help with testing.

Can the information below go in the commit message?

Usually to avoid that kind of problem, I don't send a cover letter in
one-patch series.  Information that I don't want to make part of
permanent history (like the above) can go after the three-dash line to
ensure it doesn't go in the commit message.

Thanks,
Jonathan

> The json-writer routines can be used generate structured data in a JSON-like
> format.  I say "JSON-like" because we don't enforce the Unicode/UTF-8
> requirement [3,4] on string values.  This was discussed on the mailing list
> in the [v1] and [v2] threads, but to summarize here: Git doesn't know if
> various fields, such as Unix pathnames and author names, are Unicode or just
> 8-bit character data, so Git would not know how to properly encode such
> fields and the consumer of such output would not know these strings were
> encoded (once or twice).  So, until we have a pressing need to generate
> proper Unicode data, we avoid it for now.
>
> The initial use for the json-writer routines is for generating telemetry data
> for executed Git commands.  Later, we might want to use them in other commands
> such as status.
>
> [v1] 
> https://public-inbox.org/git/20180316194057.77513-1-...@jeffhostetler.com/
> [v2] 
> https://public-inbox.org/git/20180321192827.44330-1-...@jeffhostetler.com/
> [3]  http://www.ietf.org/rfc/rfc7159.txt
> [4]  http://json.org/


Re: What's cooking in git.git (Mar 2018, #03; Wed, 14)

2018-03-16 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:
> Jonathan Tan  writes:
>> On Wed, 14 Mar 2018 18:34:49 -0700
>> Junio C Hamano  wrote:

>>> * sb/object-store (2018-03-05) 27 commits
>>
>> [snip list of commits]
>>
>>>  (this branch is used by sb/packfiles-in-repository; uses 
>>> nd/remove-ignore-env-field.)
>>>
>>>  Refactoring the internal global data structure to make it possible
>>>  to open multiple repositories, work with and then close them.
>>>
>>>  Rerolled by Duy on top of a separate preliminary clean-up topic.
>>>  The resulting structure of the topics looked very sensible.
>>>
>>>  Waiting for a follow-up discussion.
>>
>> Would it be possible for this set to go in independently of
>> nd/remove-ignore-env-field? I understand that some patches might be
>> cleaner if ignore_env is first removed, but this patch set has already
>> undergone several rounds of review and (I think) is an improvement to
>> the codebase on its own.
>
> I thought the "remove-ignore-env-field" thing is a quite small and
> more-or-less straightforward improvements that would serve as a good
> preparatory change to give a solid foundation to the object-store
> topic.
>
> I was hoping to hear quick Acks for remove-ignore-env (and also
> Duy's reroll of their topics on it) from people involved in all the
> related topics, so that we can advance them more-or-less at the same
> time.

I can go along with this for this series, but I want to explain why I
am not thrilled with the process in general.

The series "Moving global state into the repository object (part 1)"
had gone through three revisions with relatively minor changes between
each and we were in the middle of reviewing the fourth.  In response
to a series that comes after it, "(part 2)", Duy created
nd/remove-ignore-env, a series that I quite like.

But with this reroll, the result is:

 * The patches that have already been reviewed 3 times (in their
   current incarnation; parts of this series have also appeared on
   list earlier too) and we were in the middle of reviewing a fourth
   time are now on a new base.

 * These three series, each of a manageable size, have been combined
   into a larger series of 44 patches.

 * I have fears about when they're ever going to land and be part of
   an API I can start making use of.

So even though I agree with the goal and I like the improved
initialization code, I am not happy.

I would be happier with just the new initialization code being its own
series that we can fast-track, then deal with part 1 separately on
top, and then deal with part 2 after that.

Thanks,
Jonathan


Re: getting fatal error trying to open git gui

2018-03-16 Thread Jonathan Nieder
Briggs, John wrote:
> Jonathan Nieder wrote:
>> Briggs, John wrote:

>>> I just installed git for windows 10 and am getting "git-gui: fatal
>>> error" "Cannot parse Git version string.
>>>
>>> When I execute "git version" in the command prompt I get Git version
>>> 2.16.2.windows.1
>>>
>>> Everything else seems to be working. How can I get the gui to work?
>>
>> That's strange indeed.  Why is Git capitalized when you run "git version"?
>
> Got it figured out. Git gui must be ran as administrator.

Hm, that leaves me even more mystified.

Before v1.7.4-rc0~155^2~4 (git-gui: generic version trimming,
2010-10-07), git-gui was not able to handle "git version" output like
"git version 2.16.2.windows.1", but since then, it should have been
able to cope fine with it.

I wonder: do you have multiple versions of git gui installed?

Thanks,
Jonathan


Re: getting fatal error trying to open git gui

2018-03-16 Thread Jonathan Nieder
Hi,

Briggs, John wrote:

> I just installed git for windows 10 and am getting "git-gui: fatal error" 
> "Cannot parse Git version string.
>
> When I execute "git version" in the command prompt I get
> Git version 2.16.2.windows.1
>
> Everything else seems to be working. How can I get the gui to work?

That's strange indeed.  Why is Git capitalized when you run
"git version"?

Thanks,
Jonathan


Re: [PATCH] http: fix an unused variable warning for 'curl_no_proxy'

2018-03-14 Thread Jonathan Nieder
Hi,

Ramsay Jones wrote:

> Signed-off-by: Ramsay Jones 
> ---
>
> Hi Junio,
>
> I happened to be building git on an _old_ laptop earlier this evening
> and gcc complained, thus:
>
>   CC http.o
>   http.c:77:20: warning: ‘curl_no_proxy’ defined but not used 
> [-Wunused-variable]
>static const char *curl_no_proxy;
>   ^
> The version of libcurl installed was 0x070f04. So, while it was fresh in my
> mind, I applied and tested this patch.

Mind including this in the commit message?  Especially the error message
can be very useful.

With or without such a commit message tweak,
Reviewed-by: Jonathan Nieder 

This variable has been unused in the old-curl case since it was
introduced in v2.8.0-rc2~2^2 (http: honor no_http env variable to
bypass proxy, 2016-02-29).  Thanks for fixing it.

Sincerely,
Jonathan

> ATB,
> Ramsay Jones
> 
>  http.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/http.c b/http.c
> index 8c11156ae..a5bd5d62c 100644
> --- a/http.c
> +++ b/http.c
> @@ -69,6 +69,9 @@ static const char *ssl_key;
>  #if LIBCURL_VERSION_NUM >= 0x070908
>  static const char *ssl_capath;
>  #endif
> +#if LIBCURL_VERSION_NUM >= 0x071304
> +static const char *curl_no_proxy;
> +#endif
>  #if LIBCURL_VERSION_NUM >= 0x072c00
>  static const char *ssl_pinnedkey;
>  #endif
> @@ -77,7 +80,6 @@ static long curl_low_speed_limit = -1;
>  static long curl_low_speed_time = -1;
>  static int curl_ftp_no_epsv;
>  static const char *curl_http_proxy;
> -static const char *curl_no_proxy;
>  static const char *http_proxy_authmethod;
>  static struct {
>   const char *name;


Re: [PATCH] Documentation/githooks: Clarify the behavior of post-checkout hook

2018-03-13 Thread Jonathan Nieder
Hi,

Magne Land wrote:

> From: Magne Land 
>
> This can happen when using 'git rebase -i’:
> could not detach HEAD
>
> Based on discovering this Stack Overflow discussion:
> https://stackoverflow.com/questions/25561485/git-rebase-i-with-squash-cannot-detach-head
> ---
>  Documentation/githooks.txt | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Thanks for investigating and writing this.

May we forge your sign-off?  See Documentation/SubmittingPatches
section [[sign-off] 'Certify your work' for more about what this
means.

The above leaves one question unanswered: is this the *right* behavior
for "git checkout" to have?  I.e. is it useful for "git checkout" to
fail when the post-checkout hook fails, or would it be better for it
to e.g. simply print a message and exit with status 0?

Not a rhetorical question: I'm asking because I don't know the answer.
What do you think?

Thanks,
Jonathan

> --- a/Documentation/githooks.txt
> +++ b/Documentation/githooks.txt
> @@ -166,7 +166,9 @@ worktree.  The hook is given three parameters: the ref of 
> the previous HEAD,
>  the ref of the new HEAD (which may or may not have changed), and a flag
>  indicating whether the checkout was a branch checkout (changing branches,
>  flag=1) or a file checkout (retrieving a file from the index, flag=0).
> -This hook cannot affect the outcome of 'git checkout'.
> +
> +If this hook exits with a non-zero status, 'git checkout' will exit with the
> +same status.
>  
>  It is also run after 'git clone', unless the --no-checkout (-n) option is
>  used. The first parameter given to the hook is the null-ref, the second the


Re: allow "~" to be present in a tag name

2018-03-13 Thread Jonathan Nieder
Hi Michal,

Michal Novotny wrote:

> currently, if I try to create a tag that has tilde "~"  in name, an
> error is raised. E.g.
>
> $ git tag rpkg-util-1.4~rc1
> fatal: 'rpkg-util-1.4~rc1' is not a valid tag name.

As Ævar mentioned, this is disallowed to prevent a collision with
Git's revision specifying syntax.

While I'm sympathetic to wanting the tag name to match the version
number used by the package manager, the line has to be drawn
somewhere.  "git help check-ref-format" describes the current
namespace:

Git imposes the following rules on how references are named:

1. They can include slash / for hierarchical (directory)
   grouping, but no slash-separated component can begin with a
   dot . or end with the sequence .lock.

2. They must contain at least one /. This enforces the
   presence of a category like heads/, tags/ etc. but the
   actual names are not restricted. If the --allow-onelevel
   option is used, this rule is waived.

3. They cannot have two consecutive dots .. anywhere.

4. They cannot have ASCII control characters (i.e. bytes whose
   values are lower than \040, or \177 DEL), space, tilde ~,
   caret ^, or colon : anywhere.

5. They cannot have question-mark ?, asterisk *, or open
   bracket [ anywhere. See the --refspec-pattern option below
   for an exception to this rule.

6. They cannot begin or end with a slash / or contain multiple
   consecutive slashes (see the --normalize option below for
   an exception to this rule)

7. They cannot end with a dot ..

8. They cannot contain a sequence @{.

9. They cannot be the single character @.

   10. They cannot contain a \.

If anything, I suspect the current namespace is too wide.  For
example, it has issues with Unicode normalization in filenames on some
platforms, and it allows some potentially problematic characters like
` and |.

So my first instinct is to recommend that you apply some mapping
between your packager manager's version syntax and Git's tag syntax
--- e.g. using -rc1 as Ævar suggested, or using urlencoding %7Erc1 as
you hinted.

That isn't to say that this would be impossible to loosen.  But my
worry is that it's hard to decide where to draw the line: there are a
number of sets of names that might want to be valid tags, and it is
hard to say which are worth the complexity of expanding the set of
valid ref names.  That's why my first reaction is to look around for
another way to accomplish your goal.

Thanks,
Jonathan


Re: Opinions on changing add/add conflict resolution?

2018-03-13 Thread Jonathan Nieder
Hi,

Elijah Newren wrote:

> However, my question here about what to write to the working tree for
> a rename/rename(2to1) conflict in one particular corner case still
> remains.  Should a two-way merge be performed even if it may result in
> nested sets of conflict markers, or is that a sufficiently bad outcome
> for the user that it's the one case we do want to write colliding
> files out to different temporary paths?

Nested conflict markers only happen in the conflictstyle=diff3 case, I
would think.

merge-recursive writes them already.  I've often wished that it would
use a union merge strategy when building the common ancestor to avoid
the nested conflicts that rerere doesn't understand.  But anyway,
that's an orthogonal issue: in the rename/rename context, it should be
fine to write nested conflict markers since that's consistent with
what merge-recursive already does.

Thanks,
Jonathan


Re: [PATCH 3/3] shortlog: do not accept revisions when run outside repo

2018-03-13 Thread Jonathan Nieder
Martin Ågren wrote:
> On 13 March 2018 at 20:56, Jonathan Nieder  wrote:
>> Martin Ågren wrote:

>>>   (So yes, after
>>> this patch, we will still silently ignore stdin for confused usage such
>>> as `git log v2.15.0.. | git shortlog v2.16.0..`. But at least that does
>>> not crash.)
>>
>> I don't follow here.  Are you saying this command should notice that
>> there is input in stdin?  How would it notice?
>
> I have no idea how it would notice (portably!) and the gain seems
> minimal. I added this to keep the reader from wondering "but wait a
> minute, doesn't that mean that we fail to catch this bad usage if we're
> in a repo?". So my answer would be "yep, but it's not a huge problem".
> Of course, my attempt to pre-emptively answer a question only provoked
> another one. :-) I could phrase this better.

Ah, I think I see what I was missing.  Let me look again at the whole
paragraph in context:

[...]
>>> Disallow left-over arguments when run from outside a repo. Another
>>> approach would be to disallow them when reading from stdin. However, our
>>> logic is currently the other way round: we check the number of revisions
>>> in order to decide whether we should read from stdin. (So yes, after
>>> this patch, we will still silently ignore stdin for confused usage such
>>> as `git log v2.15.0.. | git shortlog v2.16.0..`. But at least that does
>>> not crash.)

How about something like this?

Disallow left-over arguments when run from outside a repo.  This
way, such invalid usage is diagnosed correctly:

$ git shortlog v2.16.0..
error: [...]
[...]

while still permitting the piped form:

$ git -C ~/src/git log --pretty=short | git shortlog
A Large Angry SCM (15):
[...]

This cannot catch an incorrect usage that combines the piped and
 forms:

$ git log --pretty=short | git shortlog v2.16.0..
Alban Gruin (1)
[...]

but (1) the operating system does not provide a straightforward
way to detect that and (2) at least it doesn't crash.

That detail is mostly irrelevant to what the patch does, though.  I
wouldn't expect git to be able to diagnose that third variant anyway.
If we want to make the command less error-prone, I think a good path
forward would be to introduce an explicit --stdin option.  So I'd be
tempted to go with the short and sweet:

Disallow left-over arguments when run from outside a repo.

[...]
>>> + error(_("no revisions can be given when running "
>>> + "from outside a repository"));
>>> + usage_with_options(shortlog_usage, options);
>>> + }
>>> +
>>
>> The error message is
>>
>> error: no revisions can be given when running from outside a 
>> repository
>> usage: ...
>>
>> Do we need to dump usage here?  I wonder if a simple die() call would
>> be easier for the user to act on.
>
> I can see an argument for "dumping the usage makes the error harder than
> necessary to find". I mainly went for consistency. This ties into your
> other observations below: what little consistency do we have and in
> which direction do we want to push it...
[...]
> I think it would be a larger project to make these consistent. The one
> I'm adding here is at least consistent with the other one in this file.

Ah, thanks.  That makes sense.

>> Separate from that, I wonder if the error message can be made a little
>> shorter and clearer.  E.g.
>>
>> fatal: shortlog  can only be used inside a git repository
>
> Some grepping suggests we do not usually name the command ("shortlog
> ..."), perhaps to play well with aliasing, nor do we use "such "
> very often, but it does happen. Quoting and allowing for options might
> make this more correct, but perhaps less readable: "'shortlog [...]
> ' can only ...". Slightly better than what I had, "revisions can
> only be given inside a git repository" would avoid some negating.

Sounds good to me.

Thanks,
Jonathan


Re: [PATCH 3/3] shortlog: do not accept revisions when run outside repo

2018-03-13 Thread Jonathan Nieder
Hi,

Martin Ågren wrote:

> If we are outside a repo and have any arguments left after
> option-parsing, `setup_revisions()` will try to do its job and
> something like this will happen:
>
>  $ git shortlog v2.16.0..
>  BUG: environment.c:183: git environment hasn't been setup
>  Aborted (core dumped)

Yikes.  Thanks for fixing it.

[...]
>   (So yes, after
> this patch, we will still silently ignore stdin for confused usage such
> as `git log v2.15.0.. | git shortlog v2.16.0..`. But at least that does
> not crash.)

I don't follow here.  Are you saying this command should notice that
there is input in stdin?  How would it notice?

[...]
> --- a/t/t4201-shortlog.sh
> +++ b/t/t4201-shortlog.sh
> @@ -127,6 +127,11 @@ test_expect_success !MINGW 'shortlog can read 
> --format=raw output' '
>   test_cmp expect out
>  '
>  
> +test_expect_success 'shortlog from non-git directory refuses revisions' '
> + test_must_fail env GIT_DIR=non-existing git shortlog HEAD 2>out &&
> + test_i18ngrep "no revisions can be given" out
> +'

\o/

[...]
> --- a/builtin/shortlog.c
> +++ b/builtin/shortlog.c
> @@ -293,6 +293,12 @@ int cmd_shortlog(int argc, const char **argv, const char 
> *prefix)
>  parse_done:
>   argc = parse_options_end(&ctx);
>  
> + if (nongit && argc != 1) {

Just curious: would argc ever be 0 here?  'argc <= 1' might be clearer.

> + error(_("no revisions can be given when running "
> + "from outside a repository"));
> + usage_with_options(shortlog_usage, options);
> + }
> +

The error message is

error: no revisions can be given when running from outside a repository
usage: ...

Do we need to dump usage here?  I wonder if a simple die() call would
be easier for the user to act on.

Not about this patch: I was a little surprised to see 'error:' instead
of 'usage:' or 'fatal:'.  It turns out git is pretty inconsistent
about that: e.g. there is

error(_("no remote specified"));
usage_with_options(builtin_remote_setbranches_usage, options);

Some other callers just use usage_with_options without describing the
error.  check-attr has a private error_with_usage() helper to implement
the error() followed by usage_with_options() idiom.  Most callers just
use die(), like

die(_("'%s' cannot be used with %s"), "--merge", "--patch");

Documentation/technical/api-error-handling.txt says

 - `usage` is for errors in command line usage.  After printing its
   message, it exits with status 129.  (See also `usage_with_options`
   in the link:api-parse-options.html[parse-options API].)

which is not prescriptive enough to help.

Separate from that, I wonder if the error message can be made a little
shorter and clearer.  E.g.

fatal: shortlog  can only be used inside a git repository

Thanks and hope that helps,
Jonathan


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-03-12 Thread Jonathan Nieder
Jeff King wrote:

> We could even give it an environment variable, which would allow
> something like:
>
>   tar xf maybe-evil.git.tar
>   cd maybe-evil
>   export GIT_TRUST_REPO=false
>   git log

Interesting idea.  Putting it in an envvar means it gets inherited by
child processes, which if I understand you correctly is a good thing.

[...]
>   1. We have to manually annotate any "dangerous" code to act more
>  safely when it sees the flag. Which means it's highly likely to
>  a spot, or to add a new feature which doesn't respect it. And
>  suddenly that's a security hole. So I'm concerned it may create a
>  false sense of security and actually make things worse.

As an internal implementation detail, this is so obviously fragile
that it wouldn't give me any feeling of security. ;-)  So it should be
strictly an improvement.

As a public-facing feature, I suspect it's a bad idea for exactly that
reason.

FWIW for pager specifically I am going for a whitelisting approach:
new commands would have to explicitly set ALLOW_PAGER if they want to
respect pager config.  That doesn't guarantee people think about it
again as things evolve but it should at least help with getting the
right setting for new plumbing.

Thanks,
Jonathan


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Jonathan Nieder
Hi,

Hilco Wijbenga wrote:
> On Mon, Mar 12, 2018 at 2:35 PM, Jonathan Nieder  wrote:

>> Interesting.  I would be tempted to resolve this inconsistency the
>> other way: by doing a half-hearted two-way merge (e.g. by picking one
>> of the two versions of the colliding file) and marking the path as
>> conflicted in the index.  That way it's more similar to edit/edit,
>> too.
>
> If work is going to be done in this area, would it be possible to
> include making auto-merging (in general) optional? Preferably,
> configurable by file (or glob) but I'd already be happy with a global
> setting to opt out.

Have you experimented with the 'merge' attribute (see "git help
attributes")?  E.g. you can put

 * -merge

in .gitattributes or .git/info/attributes.

If that helps, then a patch adding a pointer to the most helpful place
(maybe git-merge.txt?) would be very welcome.

Thanks and hope that helps,
Jonathan


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-03-12 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Thu, Feb 22, 2018 at 01:26:34PM -0800, Jonathan Nieder wrote:

>> Keep in mind that git upload-archive (a read-only command, just like
>> git upload-pack) also already has the same issues.
>
> Yuck. I don't think we've ever made a historical promise about that. But
> then, I don't think the promise about upload-pack has ever really been
> documented, except in mailing list discussions.

Sorry to revive this old side-thread.  Good news: for a dashed command
like git-upload-archive, the pager selection code only runs for
commands with RUN_SETUP or RUN_SETUP_GENTLY:

if (use_pager == -1 && p->option & (RUN_SETUP | RUN_SETUP_GENTLY) &&
!(p->option & DELAY_PAGER_CONFIG))
use_pager = check_pager_config(p->cmd);

None of upload-pack, receive-pack,git-serve, or upload-archive set
those flags, so we (narrowly) escape trouble here.

Later today I should be able to send a cleanup to make the behavior
more obvious.

Thanks again,
Jonathan


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Jonathan Nieder
Hi again,

Elijah Newren wrote:
> On Mon, Mar 12, 2018 at 11:47 AM, Jonathan Nieder  wrote:

>> Would this behavior be configurable or unconditional?  I suspect I
>> would want it turned off in my own use.
>>
>> On the other hand, in the case of wild difference between the two
>> files, skipping the two-way merge and just writing one of the versions
>> to the worktree (like we do for binary files) sounds like something I
>> would like in my own use.
>
> I think you just said the exact opposite thing in these last two
> paragraphs; that you wouldn't want my proposed behavior and that you'd
> want it.  I suspect that may mean that I misunderstood something you
> said here.  Could you clarify?

Sorry for the lack of clarity.  My understanding was that the proposed
behavior was to write two files:

${path}~HEAD
${path}~MERGE

My proposal is instead to write one file:

${path}

with the content that would have gone to ${path}~HEAD.  This is what
already happens when trying to merge binary files.

[...]
>> Can you add something more about the motivation to the commit message?
>> E.g. is this about performance, interaction with some tools, to
>> support some particular workflow, etc?
>
> To be honest, I'm a little unsure how without even more excessive and
> repetitive wording across commits.

Simplest way IMHO is to just put the rationale in patch 5/5. :)  In
other words, explain the rationale for the end-user facing change in the
same patch that changes the end-user facing behavior.

> Let me attempt here, and maybe you
> can suggest how to change my commit messages?
>
>   * When files are wildly dissimilar -- as you mentioned -- it'd be
> easier for users to resolve conflicts if we wrote files out to
> separate paths instead of two-way merging them.

Today what we do (in both the wildly-dissimilar case and the
less-dissimilar case) is write one proposed resolution to the worktree
and put the competing versions in the index.  Tools like "git
mergetool" are then able to pull the competing versions out of the
index to allow showing them at the same time.

My bias is that I've used VCSes before that wrote multiple competing
files to the worktree and I have been happier with my experience
resolving conflicts in git.  E.g. at any step I can run a build to try
out the current proposed resolution, and there's less of a chance of
accidentally commiting a ~HEAD file.

[...]
> There are three types of conflicts representing two (possibly
> unrelated) files colliding at the same path: add/add, rename/add, and
> rename/rename(2to1).  add/add does the two-way merge of the colliding
> files, and the other two conflict types write the two colliding files
> out to separate paths.

Interesting.  I would be tempted to resolve this inconsistency the
other way: by doing a half-hearted two-way merge (e.g. by picking one
of the two versions of the colliding file) and marking the path as
conflicted in the index.  That way it's more similar to edit/edit,
too.

Thanks,
Jonathan


Re: Opinions on changing add/add conflict resolution?

2018-03-12 Thread Jonathan Nieder
Hi,

Elijah Newren wrote:

> Hi everyone,
>
> I'd like to change add/add conflict resolution.  Currently when such a
> conflict occurs (say at ${path}), git unconditionally does a two-way
> merge of the two files and sticks the result in the working tree at
> ${path}.
>
> I would like to make it instead first check whether the two files are
> similar; if they are, then do the two-way merge, but if they're not,
> then instead write the two files out to separate paths (${path}~HEAD
> and ${path}~$MERGE, while making sure that ${path} is removed from the
> working copy).
>
> Thoughts?

My immediate reaction is that it seems inconsistent with the rest of
merge behavior.  Why would add/add behave this way but edit/edit not
behave this way?

Would this behavior be configurable or unconditional?  I suspect I
would want it turned off in my own use.

On the other hand, in the case of wild difference between the two
files, skipping the two-way merge and just writing one of the versions
to the worktree (like we do for binary files) sounds like something I
would like in my own use.

Can you add something more about the motivation to the commit message?
E.g. is this about performance, interaction with some tools, to
support some particular workflow, etc?

Thanks and hope that helps,
Jonathan

> I have a patch series[1] with more details and other changes, but
> wanted to especially get feedback on this issue even from folks that
> didn't have enough time to read the patches or even the cover letter.
>
> Thanks,
> Elijah
>
> [1] https://public-inbox.org/git/20180305171125.22331-1-new...@gmail.com/


Re: recent glob expansion breakage on Windows?

2018-03-08 Thread Jonathan Nieder
+git-for-windows
Hi,

Laszlo Ersek wrote:

> Jaben reports that git-send-email is suddenly failing to expand the
> "*.patch" glob for him, at the Windows CMD prompt:
>
> -
> E:\...>git send-email --suppress-cc=author --suppress-cc=self 
> --suppress-cc=cc --suppress-cc=sob --dry-run *.patch
>
> No patch files specified!
> -
>
> Whereas, moving the same patch files to another subdir, and then passing
> the subdir to git-send-email, works fine.
>
> I seem to have found some $^O based perl code in the git tree that
> expands the glob manually (in place of the shell) on Windows -- however,
> that code looks ancient ^W very stable, and doesn't seem to explain the
> sudden breakage.
>
> Is it possible that a separate perl update on Jaben's Windows box is
> causing this? Or does the breakage look consistent with a recent git change?
>
> Has anyone else reported something similar recently?

This reminds me of https://github.com/git-for-windows/git/issues/339.
There, Johannes Schindelin writes (about a different command):

| This is expected because neither PowerShell nor cmd.exe nor git.exe
| expand wildcards. Those examples you found were written with a shell
| in mind, and the shell expands wildcards (hence Git does not think
| it needs to).

That may or may not also apply to send-email.  In what version did it
work?

Thanks,
Jonathan

> Thanks (and sorry about the noise; this list might not be the best place
> to ask)!
> Laszlo


Re: [RFC PATCH] git-submodule.sh:cmd_update: if submodule branch exists, fetch that instead of default

2018-03-06 Thread Jonathan Nieder
(cc list snipped)
Hi,

Eddy Petrișor wrote:

> Cc: [a lot of people]

Can you say a little about how this cc list was created?  E.g. should
"git send-email" get a feature to warn about long cc lists?

> Signed-off-by: Eddy Petrișor 
> ---
>
> There are projects such as llvm/clang which use several repositories, and they
> might be forked for providing support for various features such as adding 
> Redox
> awareness to the toolchain. This typically means the superproject will use
> another branch than master, occasionally even use an old commit from that
> non-master branch.
>
> Combined with the fact that when incorporating such a hierachy of repositories
> usually the user is interested in just the exact commit specified in the
> submodule info, it follows that a desireable usecase is to be also able to
> provide '--depth 1' to avoid waiting for ages for the clone operation to
> finish.

Some previous discussion is at
https://public-inbox.org/git/CAGZ79ka6UXKyVLmdLg_M5-sB1x96g8FRzRZy=eny5ajbqf9...@mail.gmail.com/.

In theory this should be straightforward: Git protocol allows fetching
an arbitrary commit, so "git submodule update" and similar commands
could fetch the submodule commit by SHA-1 instead of by refname.  Poof!
Problem gone.

In practice, some complications:

 - some servers do not permit fetch-by-sha1.  For example, github does
   not permit it.  This is governed by the
   uploadpack.allowReachableSHA1InWant / uploadpack.allowAnySHA1InWant
   configuration items.

   That should be surmountable by making the behavior conditional, but
   it's a complication.

 - When the user passes --depth=, do they mean that to apply to
   the superproject, to the submodules, or both?  Documentation should
   make the behavior clear.

   Fortunately I believe this complication has been takencare of using
   the --shallow-submodules option.

> Git submodule seems to be very stubborn and cloning master, although the
> wrapper script and the gitmodules-helper could work together to clone directly
> the branch specified in the .gitmodules file, if specified.

This could make sense.  For the same reason as --depth in the
superproject gives ambiguous signals about what should happen in
submodules, --single-branch in the superproject gives ambiguous
signals about what branch to fetch in submodules.

> Another wrinkle is that when the commit is not the tip of the branch, the 
> depth
> parameter should somehow be stored in the .gitmodules info, but any change in
> the submodule will break the supermodule submodule depth info sooner or later,
> which is definitly frigile.

Hm, this seems to go in another direction.  I don't think we should
store the depth parameter in the .gitmodules file, since different
users are likely to have different preferences about what to make
shallow.  If we make --depth easy enough to use at the superproject
level then the user can specify what they want there.

> I tried digging into this section of the code and debugging with bashdb to see
> where --depth might fit, but I got stuck on the shell-to-helper interaction 
> and
> the details of the submodule implementation, so I want to lay out this first
> patch as starting point for the discussion in the hope somebody else picks it
> up or can provide some inputs. I have the feeling there are multiple code 
> paths
> that are being ran, depending on the moment (initial clone, submodule
> recursive, post-clone update etc.) and I have a gut feeling there shouldn't be
> any code duplication just because the operation is different.
>
> This first patch is only trying to use a non-master branch, I have some 
> changes
> for the --depth part, but I stopped working on it due to the "default depth"
> issue above.
>
> Does any of this sound reasonable?
> Is this patch idea usable or did I managed to touch the part of the code that
> should not be touched?

I agree with the goal.  As mentioned above, I'm not confident about
the particular mechanism --- e.g. something using fetch-by-sha1 seems
likely to be more intuitive.

Today, the 'branch' setting in .gitmodules is only for "git submodule
update --remote".  This patch would be a significant expansion in
scope for it.  Hopefully others on the list can talk more about how
that fits into various workflows and whether it would work out well.

Thanks and hope that helps,
Jonathan

>  git-submodule.sh | 5 -
>  1 file changed, 4 insertions(+), 1 deletion(-)
>
> diff --git a/git-submodule.sh b/git-submodule.sh
> index 2491496..370f19e 100755
> --- a/git-submodule.sh
> +++ b/git-submodule.sh
> @@ -589,8 +589,11 @@ cmd_update()
>   branch=$(git submodule--helper remote-branch "$sm_path")
>   if test -z "$nofetch"
>   then
> + # non-default branch
> + rbranch=$(git config -f .gitmodules 
> submodule.$sm_path.branch)
> + 
> br_refspec=${rbanch:+"refs/heads/$rbranch

Re: man gittutorial: patch proposal

2018-03-05 Thread Jonathan Nieder
Hi,

kalle wrote:

> see attachment.

Thanks for writing.  A few questions:

 1. Can you say a little more about the rationale for this change?
E.g. is the current formatting sending a confusing message?  Is the
current formatting intending to use '' as quotes and using italics
instead?  If so, should this use "" to fix it?

 2. May we forge your sign-off? See the section "Certify your work"
in Documentation/SubmittingPatches for more about what this means.

Thanks,
Jonathan


Re: Bug report: "Use of uninitialized value $_ in print"

2018-03-05 Thread Jonathan Nieder
Jeff King wrote:
> On Fri, Mar 02, 2018 at 09:15:44AM -0800, Junio C Hamano wrote:
>> Jeff King  writes:

>>> That's probably a reasonable sanity check, but I think we need to abort
>>> and not just have a too-small DISPLAY array. Because later code like the
>>> hunk-splitting is going to assume that there's a 1:1 line
>>> correspondence. We definitely don't want to end up in a situation where
>>> we show one thing but apply another.
>>
>> Yes, agreed completely.
>
> Let's add this sanity check while we're thinking about it. Here's a
> series.
>
>   [1/2]: t3701: add a test for interactive.diffFilter
>   [2/2]: add--interactive: detect bogus diffFilter output
>
>  git-add--interactive.perl  |  8 
>  t/t3701-add-interactive.sh | 20 ++++
>  2 files changed, 28 insertions(+)

With or without the tweak Ævar Arnfjörð Bjarmason suggested,
Reviewed-by: Jonathan Nieder 

Thanks.  It's probably also worth adding Sam's reported-by to patch 2/2:
Reported-by: Sam Kuper 


Re: [PATCH v3 12/35] serve: introduce git-serve

2018-03-05 Thread Jonathan Nieder
Hi,

Jeff King wrote:

> I agree that would be a lot more pleasant for adding protocol features.
> But I just worry that the stateful protocols get a lot less efficient.
> I'm having trouble coming up with an easy reproduction, but my
> recollection is that http has some nasty corner cases, because each
> round of "have" lines sent to the server has to summarize the previous
> conversation. So you can get a case where the client's requests keep
> getting bigger and bigger during the negotiation (and eventually getting
> large enough to cause problems).

That's not so much a corner case as just how negotiation works over
http.

We want to do better (e.g. see [1]) but that's a bigger change than
the initial protocol v2.

As Brandon explained it to me, we really do want to use stateless-rpc
semantics by default, since that's just better for maintainability.
Instead of having two protocols, one that is sane and one that
struggles to hoist that into stateless-rpc, there would be one
stateless baseline plus capabilities to make use of state.

For example, it would be nice to have a capability to remember
negotiation state between rounds, to get around exactly the problem
you're describing when using a stateful protocol.  Stateless backends
would just not advertise such a capability.  But doing that without [1]
still sort of feels like a cop-out.  If we can get a reasonable
baseline using ideas like [1] and then have a capability to keep
server-side state as icing on the cake instead of having a negotiation
process that only really makes sense when you have server-side state,
then that would be even better.

> If anything, I wish we could push the http protocol in a more stateful
> direction with something like websockets. But I suspect that's an
> unrealistic dream, just because not everybody's http setup (proxies,
> etc) will be able to handle that.

Agreed.  I think we have to continue to deal with stateless-rpc
semantics, at least for the near future.

Jonathan

[1] 
https://public-inbox.org/git/20180227054638.gb65...@aiede.svl.corp.google.com/


Re: Contributor Summit planning

2018-03-05 Thread Jonathan Nieder
Lars Schneider wrote:

> Thanks for starting this. I would like to propose the following topics:

Cool!  Do you mind starting threads for these so people who aren't there
can provide input into the discussion, too?  In other words, I'm
imagining

 1. Thread starts on mailing list

 2. Contributor summit: in-person presentation, discussion, etc lead to
people having better ideas

 3. On-list thread goes really well as a result of aforementioned
in-person discussion

Quick feedback:

> - hooks: Discuss a proposal for multiple local Git hooks of the same type.

I'd be happy to weigh in on a mailing list thread about this.  It's
also related to
https://public-inbox.org/git/20171002234517.gv19...@aiede.mtv.corp.google.com/
which is an interest of mine.

> - error reporting: Git is distributed and therefore lots of errors are only
>   reported locally. That makes it hard for administrators in larger
>   companies to see trouble. Would it make sense to add a config option that
>   would push recent errors along with "git push" to the server?

I'm interested in instrumentation but worried about the privacy
ramifications of this particular proposal.  I'd be happy to see some
built-in instrumentation hooks (or even a standard instrumentation
approach, if the mailing list comes up with a good one that respects
privacy).

> - fuzzing: Would it make sense to register Git to Google's OSS fuzzing
>   program https://github.com/google/oss-fuzz ?

Of course!

Alongside the obvious security benefit, there is money available to
support someone working on this:
https://opensource.googleblog.com/2017/05/oss-fuzz-five-months-later-and.html
https://www.google.com/about/appsecurity/patch-rewards/ clarifies that
the reward goes to the contributor, so you don't even necessarily have
to share your reward with the Git project. ;-)

Thanks,
Jonathan


Re: [PATCH v4 13/35] ls-refs: introduce ls-refs server command

2018-03-05 Thread Jonathan Nieder
Hi,

On Mon, Mar 05, 2018 at 10:21:55AM -0800, Brandon Williams wrote:
> On 03/02, Jeff King wrote:

>> It also accepts "refs/h*" to get "refs/heads" and "refs/hello".  I think
>> it's worth going for the most-restrictive thing to start with, since
>> that enables a lot more server operations without worrying about
>> breaking compatibility.
>
> And just to clarify what do you see as being the most-restrictive case
> of patterns that would should use?

Peff, can you say a little more about the downsides of accepting
refs/h*?

IIRC the "git push" command already accepts such refspecs, so there's a
benefit to accepting them.  Reftable and packed-refs support such
queries about as efficiently as refs/heads/*.  For loose refs, readdir
doesn't provide a way to restrict which files you look at, but loose
refs are always slow anyway. :)

In other words, I see real benefits and I don't see much in the way of
costs, so I'm not seeing why not to support this.

Thanks,
Jonathan


Re: [RFC] Contributing to Git (on Windows)

2018-03-05 Thread Jonathan Nieder
Hi again,

Back on topic, some quick clarifications to tie up loose ends.

Also I want to thank you for continuing to push the project to work
better (especially to work better for newcomers).  We don't seem to
have a habit of saying often enough how important that goal is.  Of
course we'll disagree from time to time in minor ways about particular
aspects of how to change the development workflow, but the progress
you've already made (e.g. via tools like SubmitGit) is a huge deal.

[...]
Johannes Schindelin wrote:
> On Sat, 3 Mar 2018, Jonathan Nieder wrote:

>>  1. Approximately 1/2 of the differences you describe apply to Mac as
>> well as Windows.
>
> No. The executable bit, the native line endings, most of the issues I
> listed do not catch macOS-based developers off guard.

Yeah, my wording was really sloppy.

I meant that one of the differences you described half-applies to Mac
and the rest don't apply to Mac.  I should have proofread.

[...]
> Stolee agreed in the PR to mention alternatives to Hyper-V, such as
> VirtualBox, which would help macOS-based developers here.

I have no opinion about that (maybe it will make the text too long and
overwhelming, or maybe it will help people on balance).

[...]
> The Google gang (you & Junio included) uses Linux. Peff uses Linux. From
> what I can see Duy, Eric and Jake use Linux. That covers already the most
> active reviewers right there.

We are not as uniform as it may seem.  The Google gang all uses Linux
on workstations but some use Mac laptops as well.  We deal with user
reports all the time from all three popular platforms.

IIRC then Junio has a test setup that tests on Linux, FreeBSD, and
some other platforms.  IIRC Microsoft provides a way to run a Windows
VM for development purposes that he could use for testing in the same
way as he tests on FreeBSD if there are clear enough instructions for
doing it (hint, hint). :)

Of course it's even better if there is some public shared build/test
dashboard that we can all work together to at least keep green.

[...]
> So where is that formatter call that fixes your code?

There's "make style", and people still continue to work on improving
its configuration (thanks!).

[...]
> However, VSTS is free for teams up to five, meaning that any developer can
> register a project, and once I manage to get build definitions in shape, I
> can make them public and anybody can test their code on the major three
> platforms, in their personal VSTS account.

Thanks.  That sounds potentially useful.  (Though a shared dashboard
that we all keep green might be even more so.)

[...]
> So everything is a legal text.

Yeah.  In this context I should have instead said something like
"lawyer-reviewed text".

[...]
> Put another way: I indulged in veering off on a tangent. You know, like we
> do for fun here ;-)

Feel free to call me on it when my tangents are hurting, or when
they're helping for that matter.  That way I have training data to
improve my model of how to make a good tangent. :)

Sincerely,
Jonathan


Re: [RFC] Contributing to Git (on Windows)

2018-03-05 Thread Jonathan Nieder
Derrick Stolee wrote:
> Dereck Stolee wrote:
>
> nit: s/Dereck/Derrick/ Is my outgoing email name misspelled, or do you have
> a misspelled contact info for me?

A manual typo.  Sorry about that.

[... a bunch snipped ...]
> I have a habit of being too loose in language around lawyer-speak. I should
> not have attempted to summarize what "Signed-off-by:" means and will use
> that helpful link for the description instead.

No worries.  I make that kind of mistake all the time but just thought
it worth pointing out.

BTW, thanks again for writing and submitting this document.  It can't
land soon enough. :)

Jonathan


Re: [RFC] Contributing to Git (on Windows)

2018-03-05 Thread Jonathan Nieder
Hi Dscho,

Johannes Schindelin wrote:
> On Sat, 3 Mar 2018, Jonathan Nieder wrote:

>> Hopefully the clarifications and suggestions higher in this message
>> help.  If they don't, then I'm nervous about our ability to understand
>> each other.
>
> Okay, let me state what I think the goal of this document should be:
>
>   To help Windows-based developers who want to contribute their first
>   patch to the Git project.
>
> Whatever makes it easier to contributors and is easily explained, should
> go in, in my opinion.
>
> Wishful thinking, and philosophical considerations, should probably stay
> out.

I think this conversation has gone way off the rails.

I certainly share some blame for that: in
https://public-inbox.org/git/20180303182723.ga76...@aiede.svl.corp.google.com/
I let my emotions get the better of me and let my bafflement show
instead of focusing on my gratitude for the context and clarifications
you were providing.  And I am grateful for those.

What went wrong is that I somehow turned it into a debate.  That's not
the point of a patch review.  After all, we have the same goals for
this document!  So I am happy to continue to work with Derrick Stolee
(and anyone else interested) on whatever improvements he would like to
pursue, but I am going to bow out of the arguing with you part, if
that's okay.

Jonathan


Re: [RFC] Contributing to Git (on Windows)

2018-03-03 Thread Jonathan Nieder
Hi Dscho,

Johannes Schindelin wrote:
>> Jonathan Nieder  writes:
>>> Dereck Stolee wrote:

>>>> +Test Your Changes on Linux
>>>> +--
>>>> +
>>>> +It can be important to work directly on the [core Git
>>>> codebase](https://github.com/git/git), +such as a recent commit into
>>>> the `master` or `next` branch that has not been incorporated +into
>>>> Git for Windows. Also, it can help to run functional and performance
>>>> tests on your +code in Linux before submitting patches to the
>>>> Linux-focused mailing list.
>>>
>>> I'm surprised at this advice.  Does it actually come up?
>
> Yes.
>
> I personally set up the automated builds on Windows, Linux and macOS for
> our team, and as a rule we always open an internal Pull Request on our
> topic branches as we develop them, and you would probably not believe the
> number of issues we caught before sending the patches to this list. Issues
> including
[nice list snipped]

Thanks for explaining.  I still am going to push back on the wording
here, and here is why:

 1. Approximately 1/2 of the differences you describe apply to Mac as
well as Windows.  The advice certainly does not apply on Mac.

You might object: Mac readers would not be reading this text!  But
that is not how humans work: if project documentation (e.g. the
CONTRIBUTING.md on GitHub!) says that the project is Linux-focused
and if you don't test on Linux then you might as well not bother,
then people are going to believe it.

 2. It is not unusual for Linux users to make portability mistakes that
are quickly pointed out on list.  If anything, the advice to test on
other platforms should apply to contributors on Linux even more.
This happens especially often to new contributors, who sometimes use
bashisms, etc that get quickly pointed out.

 3. I do not *want* Git to be a Linux-focused project; I want the code
to perform well on all popular platforms and for people not to be
afraid to make it so.

If the docs say that all we care about is Linux, then people are
likely to be too scared to do the necessary and valuable work of
making it work well on Mac, Windows, etc.  The actual mix of
contributors doesn't bear it out anyway: a large number of
contributors are already on Mac or Windows.

Fortunately this is pretty straightforward to fix in the doc: it could
say something like "to the multi-platform focused mailing list", for
example.

[...]
> To my chagrin, this idea of making most of the boring stuff (and I include
> formatting in that category, but I am probably special in that respect) as
> automatable, and as little of an issue for review as possible, leaving
> most brain cycles to work on the correctness of the patches instead, did
> not really receive a lot of traction on this mailing list.

Huh?  I'm confident that this is a pretty widely supported idea within
the Git project.

I get the feeling you must have misread something or misinterpreted a
different response.

[...]
> No, this advice comes straight from my personal observation that the
> reviewers on the Git mailing list are Linux-centric.

Hopefully the clarifications and suggestions higher in this message
help.  If they don't, then I'm nervous about our ability to understand
each other.

[...]
> Now, how reasonable do I think it is to ask those contributors to purchase
> an Apple machine to test their patches on macOS (you cannot just download
> an .iso nor would it be legal to run a macOS VM on anything but Apple
> hardware)? You probably guessed my answer: not at all.

Agreed, this is something that needs to be automated (and not via a
CONTRIBUTING.md file).  As a stopgap, having a section in the
contribution instructions about testing using Windows's Linux
subsystem is a valuable thing, and thanks for that; I never meant to
imply otherwise.

[...]
> On Fri, 2 Mar 2018, Junio C Hamano wrote:

>> In fact, portability issues in a patch originally written for a platform
>> is rather quickly discovered if the original platform is more minor than
>> the others, so while it is good advice to test your ware before you make
>> it public, singling out the portability issues may not add much value.
>> The fewer number of obvious bugs remaining, the fewer number of
>> iterations it would take for a series to get in a good shape.
[...]
> For you, Junio, however, the task *now* is to put yourself into the shoes
> of a Computer Science student in their 2nd year who wants to become an
> Open Source contributor and is a little afraid to talk directly to "the
> core committers", and quite scared what negative feedback they might
> receive.
>
> &q

Re: Bug report: "Use of uninitialized value $_ in print"

2018-03-01 Thread Jonathan Nieder
Hi,

Sam Kuper wrote:

> First, background. I encountered a bug on Debian Stretch, using this
> git version:
>
> $ git --version
> git version 2.11.0
>
> The bug is that in the midst of running
>
> git -c interactive.diffFilter="git diff --word-diff --color" add --patch
>
> and after having answered "y" to some patches and "n" to others, git
> suddenly spewed the following output:
>
> Use of uninitialized value $_ in print at
> /usr/lib/git-core/git-add--interactive line 1371,  line 74.
> Stage this hunk [y,n,q,a,d,/,K,j,J,g,e,?]? n
> Use of uninitialized value $_ in print at
> /usr/lib/git-core/git-add--interactive line 1371,  line 75.
[...]

Strange.  The relevant line, for reference:

$ dpkg-deb --fsys-tarfile git_2.11.0-3+deb9u2_amd64.deb |
  tar Oxf - ./usr/lib/git-core/git-add--interactive |
  sed -n '1370,1372 p'

for (@{$hunk[$ix]{DISPLAY}}) {
print; < this one
}

This is a foreach loop, so it's supposed to have set $_ to each value
in the list @{$hunk[$ix]{DISPLAY}).  So why is Perl considering it
uninitialized?

Is this reproducible for you?  Do you have more details about how I
can reproduce it?

What arch are you on?  What perl version do you use?  Can you report
this using "reportbug git"?

Thanks,
Jonathan


Re: [RFC] Contributing to Git (on Windows)

2018-03-01 Thread Jonathan Nieder
Hi,

Derrick Stolee wrote:

> Now, we'd like to make that document publicly available. These steps are
> focused on a Windows user, so we propose putting them in the
> git-for-windows/git repo under CONTRIBUTING.md. I have a pull request open
> for feedback [1]. I'll read comments on that PR or in this thread.

Thanks!  I wonder if we can put this in the standard Documentation/
directory as well.  E.g. the Windows CONTRIBUTING.md could say say
"See Documentation/Contributing-On-Windows.md" so that the bulk of the
text would not have to be git-for-windows specific.

[...]
> +++ b/CONTRIBUTING.md
> @@ -0,0 +1,401 @@
> +How to Contribute to Git for Windows
> +

Would it make sense for this to be checked in with LF instead of CRLF
line endings, so that clones use the user's chosen / platform native
line ending?  The .gitattributes file could include

/CONTRIBUTING.md text

so that line ending conversion takes place even if the user hasn't
enabled core.autocrlf.

[...]
> +The SDK uses a different credential manager, so you may still want to 
> use Visual Studio
> +or normal Git Bash for interacting with your remotes.  Alternatively, 
> use SSH rather
> +than HTTPS and avoid credential manager problems.

Where can I read more about the problems in question?

[...]
> +Many new contributors ask: What should I start working on?
> +
> +One way to win big with the maintainer of an open-source project is to look 
> at the
> +[issues page](https://github.com/git-for-windows/git/issues) and see if 
> there are any issues that
> +you can fix quickly, or if anything catches your eye.

You can also look at https://crbug.com/git for non
Windows-specific issues.  And you can look at recent user questions
on the mailing list: https://public-inbox.org/git

[...]
> +If you are adding new functionality, you may need to create low-level tests 
> by creating
> +helper commands that test a very limited action. These commands are stored 
> in `t/helpers`.
> +When adding a helper, be sure to add a line to `t/Makefile` and to the 
> `.gitignore` for the
> +binary file you add. The Git community prefers functional tests using the 
> full `git`
> +executable, so be sure the test helper is required.

Maybe s/low-level/unit/?

[...]
> +Read [`t/README`](https://github.com/git/git/blob/master/t/README) for more 
> details.

Forgive my ignorance: does github flavored markdown allow relative
links?  (I.e., could this say [`t/README`](t/README)?)

[...]
> +You can also set certain environment variables to help test the performance 
> on different
> +repositories or with more repetitions. The full list is available in
> +[the `t/perf/README` 
> file](https://github.com/git/git/blob/master/t/perf/README),

Likewise.

[...]
> +Test Your Changes on Linux
> +--
> +
> +It can be important to work directly on the [core Git 
> codebase](https://github.com/git/git),
> +such as a recent commit into the `master` or `next` branch that has not been 
> incorporated
> +into Git for Windows. Also, it can help to run functional and performance 
> tests on your
> +code in Linux before submitting patches to the Linux-focused mailing list.

I'm surprised at this advice.  Does it actually come up?  When I was
on Mac I never worried about this, nor when I was on Windows.

I'm personally happy to review patches that haven't been tested on
Linux, though it's of course even nicer if the patch author mentions
what platforms they've tested on.

Maybe this can be reframed to refer specifically to cases where you've
modified some Linux platform-specific code (e.g. to add a new feature
to run-command.c)?

[...]
> +When preparing your patch, it is important to put yourself in the shoes of 
> the maintainer.

... and in the shoes of other users and developers working with Git down
the line who will interact with the patch later!

Actually, the maintainer is one of the least important audiences for a
commit message.  But may 'the maintainer' is a stand-in for 'the
project' here?

[...]
> +* Make sure the commits are signed off using `git commit (-s|--signoff)`. 
> This means that you
> +  testify to be allowed to contribute your changes, in particular that if 
> you did this on company
> +  time, said company approved to releasing your work as Open Source under 
> the GNU Public License v2.

Can this either point to or quote the relevant legal text from
Documentation/SubmittingPatches?  It feels dangerous (in the sense of,
potentially leading to misunderstandings and major future pain) to ask
people to sign off without them knowing exactly what that means.

The rest also looks nice.  Thanks for working on this.

Sincerely,
Jonathan


Re: [PATCH 2/2] parse-options: remove the unused parse_opt_commits() function

2018-03-01 Thread Jonathan Nieder
Ramsay Jones wrote:

> Commit fcfba37337 ('ref-filter: make "--contains " less chatty if
>  is invalid', 2018-02-23), removed the last use of the callback
> function parse_opt_commits(). Remove this function declaration and
> definition, since it is now dead code.
>
> Signed-off-by: Ramsay Jones 
> ---
>  parse-options-cb.c | 16 
>  parse-options.h|  1 -
>  2 files changed, 17 deletions(-)

Reviewed-by: Jonathan Nieder 
Thanks.


Re: [PATCH 1/2] ref-filter: mark a file-local symbol as static

2018-03-01 Thread Jonathan Nieder
Hi,

Ramsay Jones wrote:

> Commit fcfba37337 ('ref-filter: make "--contains " less chatty if
>  is invalid', 2018-02-23) added the add_str_to_commit_list()
> function, which causes sparse to issue a "... not declared. Should it
> be static?" warning for that symbol.

Thanks for catching it!

> In order to suppress the warning, mark that function as static.

Isn't this closer to

Indeed, the function is only used in this one compilation
unit. Mark it static.

?  In other words, sparse's warning is accurate, and this is not about
trying to quiet a false positive but about addressing a true positive.

> Signed-off-by: Ramsay Jones 
> ---
>  ref-filter.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks,
Jonathan


Re: The case for two trees in a commit ("How to make rebase less modal")

2018-03-01 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> $ git hash-object --stdin -w -t commit < tree c70b4a33a0089f15eb3b38092832388d75293e86
> parent 105d5b91138ced892765a84e771a061ede8d63b8
> author Stefan Beller  1519859216 -0800
> committer Stefan Beller  1519859216 -0800
> tree 5495266479afc9a4bd9560e9feac465ed43fa63a
> test commit
> EOF
> 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $
>
> So it is technically possible to create a commit with two tree entries
> and fsck is not complaining.

As others mentioned, this is essentially a fancy way to experiment
with adding a new header (with the same name as an existing header) to
a commit.  It is kind of a scary thing to do because anyone trying to
parse commits, including old versions of git, is likely to get
confused by the multiple trees.  It doesn't affect the reachability
calculation in the way that it should so this ends up being something
that should be straightforward to do with a message in the commit body
instead.

To affect reachability, you could use multiple parent lines instead.
You'd need synthetic commits to hang the trees on.  This is similar to
how "git stash" stores the index state.

In other words, I think what you are trying to do is feasible, but not
in the exact way you described.

[...]
> * porcelain to modify the repo "at larger scale", such as rebase,
> cherrypicking, reverting
>   involving more than 1 commit.
>
> These large scale operations involving multiple commits however
> are all modal in its nature. Before doing anything else, you have to
> finish or abort the rebase or you need expert knowledge how to
> go otherwise.
>
> During the rebase there might be a hard to resolve conflict, which
> you may not want to resolve right now, but defer to later.  Deferring a
> conflict is currently impossible, because precisely one tree is recorded.

Junio mentions you'd want to record:
 - stages of the index, to re-create a conflicted index
 - working tree files, with conflict markers

In addition you may also want to record:
 - state (todo list) from .git/rebase-merge, to allow picking up where
   you left off in such a larger operation
 - similar state for other commands --- e.g. MERGE_MSG

Recording this work-in-progress state is in the spirit of "git stash"
does.  People also sometimes like to record their state in progress with
a "wip commit" at the tip of a branch.  Both of those workflows would
benefit from something like this, I'd think.

So I kind of like this.  Maybe a "git save-wip" command that is like
"git stash" but records state to the current branch?  And similarly
improving "git stash" to record the same richer state.

And in the spirit of "git stash" I think it is possible without
even modifying the commit object format.

[...]
> I'd be advocating for having multiple trees in a commit
> possible locally; it might be a bad idea to publish such trees.

I think such "WIP state" may also be useful for publishing, to allow
collaborating on a thorny rebase or merge.

Thanks and hope that helps,
Jonathan


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-28 Thread Jonathan Nieder
Randall S. Becker wrote:

> The original thread below has details of what the original issue was and
> why. It hit three tests specifically on this platform where die was invoked
> (at least on one of them). Perl was invoked under the covers and the
> completion code of 169 propagated back through git to the test framework.
> https://public-inbox.org/git/499fb29f-ca34-8d28-256d-896107c29...@kdbg.org/T
> /#m0b30f0857feae2044f38e04dc6b8565b68b7d52b

That is:

test_must_fail git send-email --dump-aliases --to=jan...@example.com -1 
refs/heads/accounting

Which matches the case described elsewhere in this thread.  It's
git-send-email.perl.  test_must_fail is doing its job correctly by
diagnosing a real bug in git-send-email.perl, which we're grateful for
you having discovered. :)

Thanks,
Jonathan


Re: [PATCH 00/11] Moving global state into the repository object (part 2)

2018-02-28 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:
> On Wed, Feb 28, 2018 at 9:57 AM, Junio C Hamano  wrote:

>> Wait a minute.  Is that topic ever shown to work well together with
>> other topics in flight and are now ready to be built upon?  I had an
>> impression that it is just starting to get serious reviews.
>
> And I had the impression the serious reviews were done and fine;
> the only issue would be demonstrating its working fine with other
> series, where I was also worrying about conflicts with
> brians series. And to address that, I'd just send series in small sizes.

Some of the patches looked cooked to me and others still do not look
cooked yet.  I marked the former with Reviewed-by.  In general, a few
things can help to make the process easier for me:

 1. Giving a quick reply to a review to say how the comments were
resolved, sometimes even with a resend of that one patch to
illustrate.  That way the conversation can continue and the
individual patch can get to a reviewed state faster, without
having to chase between different rerolls of the entire series.

This also has an effect of making the review process more
collaborative: perhaps after seeing how you address their
comments, a reviewer may have another idea that they suggest via a
patch to squash in, etc.

 2. In a reroll, summarizing the result of previous reviews by
including acks as appropriate and Reviewed-by if a reviewer
granted it.  This helps with reviewing the reroll since it tells
people where to focus their attention.

[...]
> Is there anything that a contributor can help with that eases
> refactoring series in flight?

For helping reviewers, see above.

For helping Junio, what I've seen people occasionally do is to locally
run a "trial merge" against next and pu and see what semantic or
lexical conflicts arise.  In the cover letter you can describe these
and give Junio advice to make applying the patch easier for him.

[...]
> Sorry for the miscommunication, though,

FWIW, even though part 1 doesn't look done to me yet, it looks *close*
to done, and I was happy to see the sneak peek at part 2.

Thanks,
Jonathan


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-28 Thread Jonathan Nieder
Randall S. Becker wrote:
> On February 28, 2018 12:44 PM, Jonathan Nieder wrote:
>> Randall S. Becker wrote:

>>> The problem is actually in git code in its test suite that uses perl
>>> inline, not in my test code itself.
[...]
>> Can you elaborate with an example?  My understanding was that
>> test_must_fail is only for running git.
[...]
> Have a look at a recent t1404 as a sample. Line 615 is the one causing the
> platform grief, because it triggers a 'die'. However, the particular test
> case #54, had no difference on platform with test_must_fail or !, which has
> the same underlying EBADF completion after digging and digging.

Sorry to be dense: what I see on that line is

test_must_fail git update-ref -d $prefix/foo >out 2>err &&

How is perl involved?

Puzzled,
Jonathan


Re: [PATCH] protocol: treat unrecognized protocol.version setting as 0

2018-02-28 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> If I share my .gitconfig or .git/config file between multiple machines
>> (or between multiple Git versions on a single machine) and set
>>
>>  [protocol]
>>  version = 2
>>
>> then running "git fetch" with a Git version that does not support
>> protocol v2 errors out with
>>
>>  fatal: unknown value for config 'protocol.version': 2
>>
>> In the spirit of v1.7.6-rc0~77^2~1 (Improve error handling when
>> parsing dirstat parameters, 2011-04-29), it is better to (perhaps
>> after warning the user) ignore the unrecognized protocol version.
>
> I do not agree with the analogy at all.
>
> Showing dirstat with different tweaks than the user expected to see
> is a local and read-only thing.  Talking to the other side over a
> protocol the user explicitly wanted to avoid (e.g. imagine the case
> where your upstream's protocol version 1 implementation is
> critically buggy and you want to use version 2 if you talk with
> them) by accident is a more grave error, possibly affecting the
> other side that you may not have enough power to recover from
> (e.g. damaging the remote repository to which you only have push
> access and not interactive shell).

Fair enough about the analogy being a poor one.

I disagree with characterizing using protocol v0 as a grave error,
because the scenario seems far-fetched to me.  I can imagine the
opposite scenario, wanting to use protocol v0 because a server's
implementation of v2 is buggy (for example, producing wrong
negotiation results and wasting bandwidth, or erroring out for a
request that should not be an error).  I am having trouble imagining a
broken v0 implementation doing anything worse than being slow or
erroring out.

That said, erroring out to catch spelling errors is nice and helpful,
so I would be okay with continuing to apply this as a local patch and
it not landing upstream.

The following third way would also be possible, but I'm pretty sure I
don't like it:

- As Duy suggested, allowing multiple 'protocol.version' settings.
  The last setting that the Git implementation recognizes wins.

- If there is at least one 'protocol.version' setting but the Git
  implementation doesn't recognize any of them, error out.

The reason I don't like it is that it doesn't make your example case
work significantly better.  If I have

[protocol]
version = 1

in my ~/.gitconfig and

[protocol]
version = v2

in .git/config, then that means I intended to use protocol v2 and
misspelled it, but this heuristic would ignore the v2 and fall back to
version 1.

As a side note, the exact same problem would happen if I make a typo
in the config key:

[protocol]
vesion = 2

Treating unrecognized values from the growing set of values as an
error while not treating unrecognized keys from the growing set of
keys as an error feels odd to me.  I think it would be useful to
document whatever we decide about this subject in
Documentation/CodingGuidelines.

Thanks,
Jonathan


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-28 Thread Jonathan Nieder
Randall S. Becker wrote:

> The problem is actually in git code in its test suite that uses perl
> inline, not in my test code itself. The difficulty I'm having is
> placing this appropriate so that the signal handler gets used
> throughout the test suite including in the perl -e invocations. This
> is more a lack of my own understanding of plumbing of git test
> framework rather than of using or coding perl.

Can you elaborate with an example?  My understanding was that
test_must_fail is only for running git.  If a test is running perl and
wants to check its exit code, the test is supposed to use !, not
test_must_fail.

t/README backs me up:

 - use '! git cmd' when you want to make sure the git command exits
   with failure in a controlled way by calling "die()".  Instead,
   use 'test_must_fail git cmd'.  This will signal a failure if git
   dies in an unexpected way (e.g. segfault).

   On the other hand, don't use test_must_fail for running regular
   platform commands; just use '! cmd'.  We are not in the business
   of verifying that the world given to us sanely works.

So I don't consider the initial issue you raised a test issue at all!
It's a bug in the git commands, and a fix for it should not be
specific to the test suite.

And now it sounds like there is a second issue: the test suite is
overusing test_must_fail in some context and that needs to be fixed as
well.

Thanks,
Jonathan


Re: [PATCH] protocol: treat unrecognized protocol.version setting as 0

2018-02-27 Thread Jonathan Nieder
Duy Nguyen wrote:
> On Wed, Feb 28, 2018 at 8:02 AM, Brandon Williams  wrote:
>> On 02/27, Jonathan Nieder wrote:

>>> If I share my .gitconfig or .git/config file between multiple machines
>>> (or between multiple Git versions on a single machine) and set
>>>
>>>   [protocol]
>>>   version = 2
>>>
>>> then running "git fetch" with a Git version that does not support
>>> protocol v2 errors out with
>>>
>>>   fatal: unknown value for config 'protocol.version': 2
>>>
>>> In the spirit of v1.7.6-rc0~77^2~1 (Improve error handling when
>>> parsing dirstat parameters, 2011-04-29), it is better to (perhaps
>>> after warning the user) ignore the unrecognized protocol version.
>>> After all, future Git versions might add even more protocol versions,
>>> and using two different Git versions with the same Git repo, machine,
>>> or home directory should not cripple the older Git version just
>>> because of a parameter that is only understood by a more recent Git
>>> version.
>
> I wonder if it's better to specify multiple versions. If v2 is not
> recognized by this git but v0 is, then it can pick that up. But if you
> explicitly tell it to choose between v2 and v3 only and it does not
> understand either, then it dies. Not sure if this is a good idea
> though.

Interesting thought.  Something roughly like this (on top of the patch
this is a reply to)?

diff --git i/protocol.c w/protocol.c
index ce9c634a3a..6aa8857a11 100644
--- i/protocol.c
+++ w/protocol.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "string-list.h"
 #include "config.h"
 #include "protocol.h"
 
@@ -14,14 +15,18 @@ static enum protocol_version parse_protocol_version(const 
char *value)
 
 enum protocol_version get_protocol_version_config(void)
 {
-   const char *value;
-   if (!git_config_get_string_const("protocol.version", &value)) {
-   enum protocol_version version = parse_protocol_version(value);
-   if (version != protocol_unknown_version)
-   return version;
+   const struct string_list *values;
+   const struct string_list_item *value;
+   enum protocol_version result = protocol_v0;
+
+   values = git_config_get_value_multi("protocol.version");
+   for_each_string_list_item(value, values) {
+   enum protocol_version v = parse_protocol_version(value->string);
+   if (v != protocol_unknown_version)
+   result = v;
}
 
-   return protocol_v0;
+   return result;
 }
 
 enum protocol_version determine_protocol_version_server(void)


[PATCH] protocol: treat unrecognized protocol.version setting as 0

2018-02-27 Thread Jonathan Nieder
If I share my .gitconfig or .git/config file between multiple machines
(or between multiple Git versions on a single machine) and set

[protocol]
version = 2

then running "git fetch" with a Git version that does not support
protocol v2 errors out with

fatal: unknown value for config 'protocol.version': 2

In the spirit of v1.7.6-rc0~77^2~1 (Improve error handling when
parsing dirstat parameters, 2011-04-29), it is better to (perhaps
after warning the user) ignore the unrecognized protocol version.
After all, future Git versions might add even more protocol versions,
and using two different Git versions with the same Git repo, machine,
or home directory should not cripple the older Git version just
because of a parameter that is only understood by a more recent Git
version.

So ignore the unrecognized value.  It may be useful for spell checking
(for instance, if I put "version = v1" intending "version = 1") to
warn about such settings, but this patch does not, since at least in
these early days for protocol v2 it is expected for configurations
that want to opportunistically use protocol v2 if available not to be
unusual.

Signed-off-by: Jonathan Nieder 
---
Google has been running with a patch like this internally for a while,
since we have been changing the protocol.version number to a new value
like 20180226 each time a minor tweak to the protocolv2 RFC occured.

The bit I have doubts about is whether to warn.  What do you think?

Thanks,
Jonathan

 protocol.c |  8 ++--
 t/t5700-protocol-v1.sh | 12 
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/protocol.c b/protocol.c
index 43012b7eb6..ce9c634a3a 100644
--- a/protocol.c
+++ b/protocol.c
@@ -17,12 +17,8 @@ enum protocol_version get_protocol_version_config(void)
const char *value;
if (!git_config_get_string_const("protocol.version", &value)) {
enum protocol_version version = parse_protocol_version(value);
-
-   if (version == protocol_unknown_version)
-   die("unknown value for config 'protocol.version': %s",
-   value);
-
-   return version;
+   if (version != protocol_unknown_version)
+   return version;
}
 
return protocol_v0;
diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
index ba86a44eb1..c35767ab01 100755
--- a/t/t5700-protocol-v1.sh
+++ b/t/t5700-protocol-v1.sh
@@ -31,6 +31,18 @@ test_expect_success 'clone with git:// using protocol v1' '
grep "clone< version 1" log
 '
 
+test_expect_success 'unrecognized protocol versions fall back to v0' '
+   GIT_TRACE_PACKET=1 git -c protocol.version= \
+   clone "$GIT_DAEMON_URL/parent" v 2>log &&
+
+   git -C daemon_child log -1 --format=%s >actual &&
+   git -C "$daemon_parent" log -1 --format=%s >expect &&
+   test_cmp expect actual &&
+
+   # Client requested and server responded using protocol v0
+   ! grep version log
+'
+
 test_expect_success 'fetch with git:// using protocol v1' '
test_commit -C "$daemon_parent" two &&
 
-- 
2.16.2.395.g2e18187dfd



Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-27 Thread Jonathan Nieder
Hi,

Randall S. Becker wrote:

> After months of arguing with some platform developers on this subject, the
> perl spec was held over my head repeatedly about a few lines that are
> causing issues. The basic problem is this line (test-lib-functions.sh, line
> 633, still in ffa952497)
>
>>elif test $exit_code -gt 129 && test $exit_code -le 192
>>   then
>>   echo >&2 "test_must_fail: died by signal $(($exit_code - 128)):
>
> According to the perl spec http://perldoc.perl.org/functions/die.html, die
> basically takes whatever errno is, mods it with 256 and there you go. EBADF
> is what is used when perl reads from stdin and calls die - that's standard
> perl. In most systems, you end up with something useful, when EBADF is 9.
> But when it is 4009, you get a garbage answer (4009 mod 256 a.k.a. 169).
> However, only 128-165 are technically reserved for signals, rather than all
> the way up to 192, which may be true in some places but not everywhere.
>
> The advice (I'm putting that nicely) I received was to use exit so that the
> result is predictable - unlikely to be useful in the 15K test suites in git.

The fundamental thing is the actual Git commands, not the tests in the
testsuite, no?

In the rest of git, die() makes a command exit with status 128.  The
trouble here is that our code in Perl is assuming the same meaning for
die() but using perl's die builtin instead.  That suggests a few
options:

 a) We could override the meaning of die() in Git.pm.  This feels
ugly but if it works, it would be a very small patch.

 b) We could forbid use of die() and use some git_die() instead (but
with a better name) for our own error handling.

 c) We could have a special different exit code convention for
commands written in Perl.  And then change expectations whenever a
command is rewritten in C.  As you might expect, I don't like this
option.

 d) We could wrap each command in an eval {...} block to convert the
result from die() to exit 128.

Option (b) feels simplest to me.

Thoughts?

Thanks,
Jonathan


Re: [PATCH v3 34/35] remote-curl: implement stateless-connect command

2018-02-27 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> Teach remote-curl the 'stateless-connect' command which is used to
> establish a stateless connection with servers which support protocol
> version 2.  This allows remote-curl to act as a proxy, allowing the git
> client to communicate natively with a remote end, simply using
> remote-curl as a pass through to convert requests to http.

Cool!  I better look at the spec for that first.

*looks at the previous patch*

Oh, there is no documented spec. :/  I'll muddle through this instead,
then.

[...]
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -188,7 +188,10 @@ static struct ref *parse_git_refs(struct discovery 
> *heads, int for_push)
>   heads->version = discover_version(&reader);
>   switch (heads->version) {
>   case protocol_v2:
> - die("support for protocol v2 not implemented yet");
> + /*
> +  * Do nothing.  Client should run 'stateless-connect' and
> +  * request the refs themselves.
> +  */
>   break;

This is the 'list' command, right?  Since we expect the client to run
'stateless-connect' instead, can we make it error out?

[...]
> @@ -1082,6 +1085,184 @@ static void parse_push(struct strbuf *buf)
>   free(specs);
>  }
>  
> +struct proxy_state {
> + char *service_name;
> + char *service_url;
> + struct curl_slist *headers;
> + struct strbuf request_buffer;
> + int in;
> + int out;
> + struct packet_reader reader;
> + size_t pos;
> + int seen_flush;
> +};

Can this have a comment describing what it is/does?  It's not obvious
to me at first glance.

It doesn't have to go in a lot of detail since this is an internal
implementation detail, but something saying e.g. that this represents
a connection to an HTTP server (that's an example; I'm not saying
that's what it represents :)) would help.

> +
> +static void proxy_state_init(struct proxy_state *p, const char *service_name,
> +  enum protocol_version version)
[...]
> +static void proxy_state_clear(struct proxy_state *p)

Looks sensible.

[...]
> +static size_t proxy_in(char *buffer, size_t eltsize,
> +size_t nmemb, void *userdata)

Can this have a comment describing the interface?  (E.g. does it read
a single pkt_line?  How is the caller expected to use it?  Does it
satisfy the interface of some callback?)

libcurl's example https://curl.haxx.se/libcurl/c/ftpupload.html just
calls this read_callback.  Such a name plus a pointer to
CURLOPT_READFUNCTION should do the trick; bonus points if the comment 
says what our implementation of the callback does.

Is this about having peek ability?

> +{
> + size_t max = eltsize * nmemb;

Can this overflow?  st_mult can avoid having to worry about that.

> + struct proxy_state *p = userdata;
> + size_t avail = p->request_buffer.len - p->pos;
> +
> + if (!avail) {
> + if (p->seen_flush) {
> + p->seen_flush = 0;
> + return 0;
> + }
> +
> + strbuf_reset(&p->request_buffer);
> + switch (packet_reader_read(&p->reader)) {
> + case PACKET_READ_EOF:
> + die("unexpected EOF when reading from parent process");
> + case PACKET_READ_NORMAL:
> + packet_buf_write_len(&p->request_buffer, p->reader.line,
> +  p->reader.pktlen);
> + break;
> + case PACKET_READ_DELIM:
> + packet_buf_delim(&p->request_buffer);
> + break;
> + case PACKET_READ_FLUSH:
> + packet_buf_flush(&p->request_buffer);
> + p->seen_flush = 1;
> + break;
> + }
> + p->pos = 0;
> + avail = p->request_buffer.len;
> + }
> +
> + if (max < avail)
> + avail = max;
> + memcpy(buffer, p->request_buffer.buf + p->pos, avail);
> + p->pos += avail;
> + return avail;

This is a number of bytes, but CURLOPT_READFUNCTION expects a number
of items, fread-style.  That is:

if (avail < eltsize)
... handle somehow, maybe by reading in more? ...

avail_memb = avail / eltsize;
memcpy(buffer,
   p->request_buffer.buf + p->pos,
   st_mult(avail_memb, eltsize));
p->pos += st_mult(avail_memb, eltsize);
return avail_memb;

But https://curl.haxx.se/libcurl/c/CURLOPT_READFUNCTION.html says

Your function must then return the actual number of bytes that
it stored in that memory area.

Does this mean eltsize is always 1?  This is super confusing...

... ok, a quick grep for fread_func in libcurl reveals that eltsize is
indeed always 1.  Can we add an assertion so we notice if that
changes?

if (eltsize != 1)
BUG("curl read callback called with size = %zu != 1",

Re: [PATCH v3 28/35] transport-helper: introduce stateless-connect

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:

> Introduce the transport-helper capability 'stateless-connect'.  This
> capability indicates that the transport-helper can be requested to run
> the 'stateless-connect' command which should attempt to make a
> stateless connection with a remote end.  Once established, the
> connection can be used by the git client to communicate with
> the remote end natively in a stateless-rpc manner as supported by
> protocol v2.  This means that the client must send everything the server
> needs in a single request as the client must not assume any
> state-storing on the part of the server or transport.
>
> If a stateless connection cannot be established then the remote-helper
> will respond in the same manner as the 'connect' command indicating that
> the client should fallback to using the dumb remote-helper commands.
>
> Signed-off-by: Brandon Williams 
> ---
>  transport-helper.c | 8 
>  transport.c| 1 +
>  transport.h| 6 ++
>  3 files changed, 15 insertions(+)

Please add documentation for this command to
Documentation/gitremote-helpers.txt.

That helps reviewers, since it means reviewers can get a sense of what
the interface is meant to be.  It helps remote helper implementers as
well: it tells them what they can rely on and what can't rely on in
this interface.  For the same reason it helpers remote helper callers
as well.

Thanks,
Jonathan


Re: [PATCH v3 31/35] remote-curl: store the protocol version the server responded with

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:

> Store the protocol version the server responded with when performing
> discovery.  This will be used in a future patch to either change the
> 'Git-Protocol' header sent in subsequent requests or to determine if a
> client needs to fallback to using a different protocol version.

nit: s/fallback/fall back/ (fallback is the noun/adjective, fall back
the verb)

> Signed-off-by: Brandon Williams 

With or without that tweak,
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH v3 29/35] pkt-line: add packet_buf_write_len function

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:

> Add the 'packet_buf_write_len()' function which allows for writing an
> arbitrary length buffer into a 'struct strbuf' and formatting it in
> packet-line format.

Makes sense.

[...]
> --- a/pkt-line.h
> +++ b/pkt-line.h
> @@ -26,6 +26,7 @@ void packet_buf_flush(struct strbuf *buf);
>  void packet_buf_delim(struct strbuf *buf);
>  void packet_write(int fd_out, const char *buf, size_t size);
>  void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
> __attribute__((format (printf, 2, 3)));
> +void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len);

I wonder if we should rename packet_buf_write to something like
packet_buf_writef.  Right now there's a kind of confusing collection of
functions without much symmetry.

Alternatively, the _buf_ ones could become strbuf_* functions:

strbuf_add_packet(&buf, data, len);
strbuf_addf_packet(&buf, fmt, ...);

That would make it clearer that these append to buf.

I'm just thinking out loud.  For this series, the API you have here
looks fine, even if it is a bit inconsistent.  (In other words, even
if you agree with me, this would probably be best addressed as a patch
on top.)

[...]
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -215,6 +215,22 @@ void packet_buf_write(struct strbuf *buf, const char 
> *fmt, ...)
>   va_end(args);
>  }
>  
> +void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len)
> +{
> + size_t orig_len, n;
> +
> + orig_len = buf->len;
> + strbuf_addstr(buf, "");
> + strbuf_add(buf, data, len);
> + n = buf->len - orig_len;
> +
> + if (n > LARGE_PACKET_MAX)
> + die("protocol error: impossibly long line");

Could the error message describe the long line (e.g.

...impossibly long line %.*s...", 256, data);

)?

> +
> + set_packet_header(&buf->buf[orig_len], n);
> + packet_trace(buf->buf + orig_len + 4, n - 4, 1);

Could do, more simply:

packet_trace(data, len, 1);

Thanks,
Jonathan


Re: [PATCH v3 26/35] transport-helper: remove name parameter

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:

> Commit 266f1fdfa (transport-helper: be quiet on read errors from
> helpers, 2013-06-21) removed a call to 'die()' which printed the name of
> the remote helper passed in to the 'recvline_fh()' function using the
> 'name' parameter.  Once the call to 'die()' was removed the parameter
> was no longer necessary but wasn't removed.  Clean up 'recvline_fh()'
> parameter list by removing the 'name' parameter.
>
> Signed-off-by: Brandon Williams 
> ---
>  transport-helper.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Nice.

Reviewed-by: Jonathan Nieder 


Re: [PATCH] docs/pretty-formats: fix typo '% <()' -> '%<|()'

2018-02-27 Thread Jonathan Nieder
Mårten Kongstad wrote:

> Remove erroneous space between % and < in '% <()'.
>
> Signed-off-by: Mårten Kongstad 
> ---
>  Documentation/pretty-formats.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Looks correct to me.  The space was introduced in v1.8.3-rc0~22^2
(pretty: support %>> that steals trailing spaces, 2013-04-19) and
appears to be a plain typo.  Thanks for fixing it.

Reviewed-by: Jonathan Nieder 

Thanks.

> --- a/Documentation/pretty-formats.txt
> +++ b/Documentation/pretty-formats.txt
> @@ -202,7 +202,7 @@ endif::git-rev-list[]
>  - '%>>()', '%>>|()': similar to '%>()', '%>|()'
>respectively, except that if the next placeholder takes more spaces
>than given and there are spaces on its left, use those spaces
> -- '%><()', '%><|()': similar to '% <()', '%<|()'
> +- '%><()', '%><|()': similar to '%<()', '%<|()'
>respectively, but padding both sides (i.e. the text is centered)
>  - %(trailers[:options]): display the trailers of the body as interpreted
>by linkgit:git-interpret-trailers[1]. The `trailers` string may be


Re: [PATCH v3 06/35] transport: use get_refs_via_connect to get refs

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:
> On 02/26, Jonathan Nieder wrote:
>> Brandon Williams wrote:

>>> +++ b/transport.c
>>> @@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport 
>>> *transport,
>>> args.cloning = transport->cloning;
>>> args.update_shallow = data->options.update_shallow;
>>>  
>>> -   if (!data->got_remote_heads) {
>>> -   connect_setup(transport, 0);
>>> -   get_remote_heads(data->fd[0], NULL, 0, &refs_tmp, 0,
>>> -NULL, &data->shallow);
>>> -   data->got_remote_heads = 1;
>>> -   }
>>> +   if (!data->got_remote_heads)
>>> +   refs_tmp = get_refs_via_connect(transport, 0);
>>
>> The only difference between the old and new code is that the old code
>> passes NULL as 'extra_have' and the new code passes &data->extra_have.
>>
>> That means this populates the data->extra_have oid_array.  Does it
>> matter?
[...]
> I don't think its a problem to have extra_have populated, least I
> haven't seen anything to lead me to believe it would be a problem.

Assuming it gets properly freed later, the only effect I can imagine
is some increased memory usage.

I'm inclined to agree with you that the simplicity is worth it.  It
seems worth mentioning in the commit message, though.

[...]
>>> @@ -541,14 +537,8 @@ static int git_transport_push(struct transport 
>>> *transport, struct ref *remote_re
>>> struct send_pack_args args;
>>> int ret;
>>>  
>>> -   if (!data->got_remote_heads) {
>>> -   struct ref *tmp_refs;
>>> -   connect_setup(transport, 1);
>>> -
>>> -   get_remote_heads(data->fd[0], NULL, 0, &tmp_refs, REF_NORMAL,
>>> -NULL, &data->shallow);
>>> -   data->got_remote_heads = 1;
>>> -   }
>>> +   if (!data->got_remote_heads)
>>> +   get_refs_via_connect(transport, 1);
>>
>> not a new problem, just curious: Does this leak tmp_refs?
>
> Maybe, though its removed by this patch.

Sorry for the lack of clarity.  If it was leaked before, then it is
still leaked now, via the discarded return value from
get_refs_via_connect.

Any idea how we can track that down?  E.g. are there ways to tell leak
checkers "just tell me about this particular allocation"?

Thanks,
Jonathan


Re: [PATCH v3 02/35] pkt-line: introduce struct packet_reader

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:
> On 02/12, Jonathan Nieder wrote:

>>> --- a/pkt-line.h
>>> +++ b/pkt-line.h
>>> @@ -111,6 +111,64 @@ char *packet_read_line_buf(char **src_buf, size_t 
>>> *src_len, int *size);
>>>   */
>>>  ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out);
>>>  
>>> +struct packet_reader {
>>> +   /* source file descriptor */
>>> +   int fd;
>>> +
>>> +   /* source buffer and its size */
>>> +   char *src_buffer;
>>> +   size_t src_len;
>>
>> Can or should this be a strbuf?
>>
>>> +
>>> +   /* buffer that pkt-lines are read into and its size */
>>> +   char *buffer;
>>> +   unsigned buffer_size;
>>
>> Likewise.
>
> This struct is setup to be a drop in replacement for the existing
> read_packet() family of functions.  Because of this I tried to make the
> interface as similar as possible to make it easy to convert to using it
> as well as having no need to clean anything up (because the struct is
> really just a wrapper and doesn't own anything).

Sorry, I don't completely follow.  Are you saying some callers play
with the buffer, or are you saying you haven't checked?  (If the
latter, that's perfectly fine; I'm just trying to understand the API.)

Either way, can you add some comments about ownership / who is allowed
to write to it / etc to make it easier to clean up later?

Thanks,
Jonathan


Re: [PATCH v3 22/35] upload-pack: support shallow requests

2018-02-27 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> Add the 'shallow' feature to the protocol version 2 command 'fetch'
> which indicates that the server supports shallow clients and deepen
> requets.
>
> Signed-off-by: Brandon Williams 
> ---
>  Documentation/technical/protocol-v2.txt |  67 +++-
>  serve.c |   2 +-
>  t/t5701-git-serve.sh|   2 +-
>  upload-pack.c   | 138 
> +++-
>  upload-pack.h   |   3 +
>  5 files changed, 173 insertions(+), 39 deletions(-)

Yay!  We've been running with this for a while at Google (for file://
fetches, at least) and it's been working well.

[...]
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -201,12 +201,42 @@ packet-lines:
>   to its base by position in pack rather than by an oid.  That is,
>   they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
>  
> +shallow 
> + A client must notify the server of all objects for which it only
> + has shallow copies of (meaning that it doesn't have the parents

Grammar nit: "for which it only has shallow copies of" should be e.g.
"for which it only has shallow copies" or "that it only has shallow
copies of" or "that it only has shallow copies for".

I think s/objects/commits/ would also help clarify.

> + of a commit) by supplying a 'shallow ' line for each such
> + object so that the serve is aware of the limitations of the

s/serve/server/

> + client's history.

Is it worth mentioning that this is about negotiation?  E.g. "so that
the server is aware that the client may not have all objects reachable
from such commits".

> +
> +deepen 
> + Request that the fetch/clone should be shallow having a commit depth of

nit: s/Request/Requests/, for consistency with the others?

> +  relative to the remote side.

What does the value of  mean? E.g. does a depth of 1 mean to
fetch only the commits named in "have", 2 to fetch those commits plus
their parents, etc, or am I off by one?

Is  always a positive number?

What happens if  starts with a 0?  Is that a client error?

> +
> +deepen-relative
> + Requests that the semantics of the "deepen" command be changed
> + to indicate that the depth requested is relative to the clients
> + current shallow boundary, instead of relative to the remote
> + refs.

s/clients/client's/

s/remote refs/requested commits/ or "wants" or something.

> +
> +deepen-since 
> + Requests that the shallow clone/fetch should be cut at a
> + specific time, instead of depth.  Internally it's equivalent of
> + doing "rev-list --max-age=". Cannot be used with
> + "deepen".

Nits:
  s/rev-list/git rev-list/
  s/equivalent of/equivalent to/ or 'the equivalent of'.

Since the git-rev-list(1) manpage doesn't tell me: what is the format
of ?  And is the requested time interval inclusive of
exclusive?

> +
> +deepen-not 
> + Requests that the shallow clone/fetch should be cut at a
> + specific revision specified by '', instead of a depth.
> + Internally it's equivalent of doing "rev-list --not ".
> + Cannot be used with "deepen", but can be used with
> + "deepen-since".

Interesting.

nit: s/rev-list/git rev-list/

What is the format of ?  E.g. can it be an arbitrary revision
specifier or is it an oid?

[...]
>  output = *section
> -section = (acknowledgments | packfile)
> +section = (acknowledgments | shallow-info | packfile)
> (flush-pkt | delim-pkt)

It looks like sections can go in an arbitrary order.  Are there
tests to make sure the server can cope with reordering?  (I ask
not because I mistrust the server but because I have some vague
hope that other server implementations might be inspired by our
tests.)

[...]
> @@ -215,6 +245,11 @@ header.
>  nak = PKT-LINE("NAK" LF)
>  ack = PKT-LINE("ACK" SP obj-id LF)
>  
> +shallow-info = PKT-LINE("shallow-info" LF)
> +*PKT-LINE((shallow | unshallow) LF)
> +shallow = "shallow" SP obj-id
> +unshallow = "unshallow" SP obj-id

Likewise: it looks like shallows and unshallows can be intermixed; can
this be either (a) tightened or (b) covered by tests to make sure a
later refactoring doesn't accidentally tighten it?

[...]
> @@ -247,6 +282,36 @@ header.
> determined the objects it plans to send to the client and no
> further negotiation is needed.
>  
> +
> +shallow-info section
> + If the client has requested a shallow fetch/clone, a shallow
> + client requests a fetch or the server is shallow then the
> + server's response may include a shallow-info section.

I'm having trouble parsing this sentence.

>  The
> + shallow-info section will be include if (due to one of the above

nit: s/include/included/

> + conditions) the server needs to info

Re: [PATCH v3 18/35] fetch: pass ref patterns when fetching

2018-02-26 Thread Jonathan Nieder
Brandon Williams wrote:

> Construct a list of ref patterns to be passed to
> 'transport_get_remote_refs()' from the refspec to be used during the
> fetch.  This list of ref patterns will be used to allow the server to
> filter the ref advertisement when communicating using protocol v2.
>
> Signed-off-by: Brandon Williams 
> ---
>  builtin/fetch.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder 
Nice.

I take it that tests covering this come later in the series?


Re: [PATCH v3 14/35] connect: request remote refs using v2

2018-02-26 Thread Jonathan Nieder
Brandon Williams wrote:

> Teach the client to be able to request a remote's refs using protocol
> v2.  This is done by having a client issue a 'ls-refs' request to a v2
> server.

Yay, ls-remote support!

[...]
> --- a/builtin/upload-pack.c
> +++ b/builtin/upload-pack.c
> @@ -5,6 +5,7 @@
>  #include "parse-options.h"
>  #include "protocol.h"
>  #include "upload-pack.h"
> +#include "serve.h"

nit, no change needed in this patch: What is a good logical order for
the #includes here?  Bonus points if there's a tool to make it happen
automatically.

Asking since adding #includes like this at the end tends to result in
a harder-to-read list of #includes, sometimes with duplicates, and
often producing conflicts when multiple patches in flight add a
#include to the same file.

[...]
> @@ -48,11 +50,9 @@ int cmd_upload_pack(int argc, const char **argv, const 
> char *prefix)
>  
>   switch (determine_protocol_version_server()) {
>   case protocol_v2:
> - /*
> -  * fetch support for protocol v2 has not been implemented yet,
> -  * so ignore the request to use v2 and fallback to using v0.
> -  */
> - upload_pack(&opts);
> + serve_opts.advertise_capabilities = opts.advertise_refs;
> + serve_opts.stateless_rpc = opts.stateless_rpc;
> + serve(&serve_opts);

Interesting!  daemon.c has its own (file-local) serve() function;
can one of the two be renamed?

I actually like both names: serve() is a traditional name for the
function a server calls when it's done setting up and is ready to
serve.  But the clash is confusing.

[...]
> +++ b/connect.c
> @@ -12,9 +12,11 @@
>  #include "sha1-array.h"
>  #include "transport.h"
>  #include "strbuf.h"
> +#include "version.h"
>  #include "protocol.h"
>  
>  static char *server_capabilities;
> +static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;

Can a quick doc comment describe these and how they relate?

Is only one of them set, based on which protocol version is in use?
Should server_capabilities be renamed to server_capabilities_v1?

>  static const char *parse_feature_value(const char *, const char *, int *);
>  
>  static int check_ref(const char *name, unsigned int flags)
> @@ -62,6 +64,33 @@ static void die_initial_contact(int unexpected)
> "and the repository exists."));
>  }
>  
> +/* Checks if the server supports the capability 'c' */
> +int server_supports_v2(const char *c, int die_on_error)
> +{
> + int i;
> +
> + for (i = 0; i < server_capabilities_v2.argc; i++) {
> + const char *out;
> + if (skip_prefix(server_capabilities_v2.argv[i], c, &out) &&
> + (!*out || *out == '='))
> + return 1;
> + }
> +
> + if (die_on_error)
> + die("server doesn't support '%s'", c);
> +
> + return 0;
> +}

Nice.
> +
> +static void process_capabilities_v2(struct packet_reader *reader)
> +{
> + while (packet_reader_read(reader) == PACKET_READ_NORMAL)
> + argv_array_push(&server_capabilities_v2, reader->line);
> +
> + if (reader->status != PACKET_READ_FLUSH)
> + die("protocol error");

Can this say more?  E.g. "expected flush after capabilities, got "?

[...]
> @@ -85,7 +114,7 @@ enum protocol_version discover_version(struct 
> packet_reader *reader)
>   /* Maybe process capabilities here, at least for v2 */

Is this comment out of date now?

>   switch (version) {
>   case protocol_v2:
> - die("support for protocol v2 not implemented yet");
> + process_capabilities_v2(reader);
>   break;
>   case protocol_v1:
>   /* Read the peeked version line */
> @@ -293,6 +322,98 @@ struct ref **get_remote_heads(struct packet_reader 
> *reader,
>   return list;
>  }
>  
> +static int process_ref_v2(const char *line, struct ref ***list)

What does the return value represent?

Could it return the more typical 0 on success, -1 on error?

> +{
> + int ret = 1;
> + int i = 0;
> + struct object_id old_oid;
> + struct ref *ref;
> + struct string_list line_sections = STRING_LIST_INIT_DUP;
> +
> + if (string_list_split(&line_sections, line, ' ', -1) < 2) {

Can there be a comment describing the expected format?

> + ret = 0;
> + goto out;
> + }
> +
> + if (get_oid_hex(line_sections.items[i++].string, &old_oid)) {
> + ret = 0;
> + goto out;
> + }
> +
> + ref = alloc_ref(line_sections.items[i++].string);

Ref names cannot contains a space, so this is safe.  Good.

> +
> + oidcpy(&ref->old_oid, &old_oid);
> + **list = ref;
> + *list = &ref->next;
> +
> + for (; i < line_sections.nr; i++) {
> + const char *arg = line_sections.items[i].string;
> + if (skip_prefix(arg, "symref-target:", &arg))
> + ref->symref = xstrdup(arg);

Using space-delimited fie

Re: [PATCH v3 14/35] connect: request remote refs using v2

2018-02-26 Thread Jonathan Nieder
Jonathan Tan wrote:
> On Thu, 22 Feb 2018 13:26:58 -0500
> Jeff King  wrote:

>> I agree that it shouldn't matter much here. But if the name argv_array
>> is standing in the way of using it, I think we should consider giving it
>> a more general name. I picked that not to evoke "this must be arguments"
>> but "this is terminated by a single NULL".
[...]
> This sounds reasonable - I withdraw my comment about using struct
> string_list.

Marking with #leftoverbits as a reminder to think about what such a
more general name would be (or what kind of docs to put in
argv-array.h) and make it so the next time I do a search for that
keyword.

Thanks,
Jonathan


Re: [PATCH v3 10/35] protocol: introduce enum protocol_version value protocol_v2

2018-02-26 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> Introduce protocol_v2, a new value for 'enum protocol_version'.
> Subsequent patches will fill in the implementation of protocol_v2.
>
> Signed-off-by: Brandon Williams 
> ---

Yay!

[...]
> +++ b/builtin/fetch-pack.c
> @@ -201,6 +201,9 @@ int cmd_fetch_pack(int argc, const char **argv, const 
> char *prefix)
>  PACKET_READ_GENTLE_ON_EOF);
>  
>   switch (discover_version(&reader)) {
> + case protocol_v2:
> + die("support for protocol v2 not implemented yet");
> + break;

This code goes away in a later patch, so no need to do anything about
this, but the 'break' is redundant after the 'die'.

[...]
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -1963,6 +1963,12 @@ int cmd_receive_pack(int argc, const char **argv, 
> const char *prefix)
>   unpack_limit = receive_unpack_limit;
>  
>   switch (determine_protocol_version_server()) {
> + case protocol_v2:
> + /*
> +  * push support for protocol v2 has not been implemented yet,
> +  * so ignore the request to use v2 and fallback to using v0.
> +  */
> + break;

As you mentioned in the cover letter, it's probably worth doing the
same fallback on the client side (send-pack), too.

Otherwise when this client talks to a new-enough server, it would
request protocol v2 and then get confused when the server responds
with the protocol v2 it requested.

Thanks,
Jonathan


Re: [PATCH v3 02/35] pkt-line: introduce struct packet_reader

2018-02-26 Thread Jonathan Nieder
Jonathan Nieder wrote:
> Brandon Williams wrote:

>> Sometimes it is advantageous to be able to peek the next packet line
>> without consuming it (e.g. to be able to determine the protocol version
>> a server is speaking).  In order to do that introduce 'struct
>> packet_reader' which is an abstraction around the normal packet reading
>> logic.  This enables a caller to be able to peek a single line at a time
>> using 'packet_reader_peek()' and having a caller consume a line by
>> calling 'packet_reader_read()'.
>>
>> Signed-off-by: Brandon Williams 
>> ---
>>  pkt-line.c | 59 +++
>>  pkt-line.h | 58 ++
>>  2 files changed, 117 insertions(+)
>
> I like it!
>
> The questions and nits from
> https://public-inbox.org/git/20180213004937.gb42...@aiede.svl.corp.google.com/
> still apply.  In particular, the ownership of the buffers inside the
> 'struct packet_reader' is still unclear; could the packet_reader create
> its own (strbuf) buffers so that the contract around them (who is allowed
> to write to them; who is responsible for freeing them) is more obvious?

Just to be clear: I sent that review after you sent this patch, so
there should not have been any reason for me to expect the q's and
nits to magically not apply. ;-)


Re: [PATCH v3 06/35] transport: use get_refs_via_connect to get refs

2018-02-26 Thread Jonathan Nieder
Brandon Williams wrote:

> Remove code duplication and use the existing 'get_refs_via_connect()'
> function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
> 'git_transport_push()'.
> 
> Signed-off-by: Brandon Williams 
> ---
>  transport.c | 18 --
>  1 file changed, 4 insertions(+), 14 deletions(-)

I like the diffstat.

[...]
> +++ b/transport.c
> @@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport 
> *transport,
>   args.cloning = transport->cloning;
>   args.update_shallow = data->options.update_shallow;
>  
> - if (!data->got_remote_heads) {
> - connect_setup(transport, 0);
> - get_remote_heads(data->fd[0], NULL, 0, &refs_tmp, 0,
> -  NULL, &data->shallow);
> - data->got_remote_heads = 1;
> - }
> + if (!data->got_remote_heads)
> + refs_tmp = get_refs_via_connect(transport, 0);

The only difference between the old and new code is that the old code
passes NULL as 'extra_have' and the new code passes &data->extra_have.

That means this populates the data->extra_have oid_array.  Does it
matter?

> @@ -541,14 +537,8 @@ static int git_transport_push(struct transport 
> *transport, struct ref *remote_re
>   struct send_pack_args args;
>   int ret;
>  
> - if (!data->got_remote_heads) {
> - struct ref *tmp_refs;
> - connect_setup(transport, 1);
> -
> - get_remote_heads(data->fd[0], NULL, 0, &tmp_refs, REF_NORMAL,
> -  NULL, &data->shallow);
> - data->got_remote_heads = 1;
> - }
> + if (!data->got_remote_heads)
> + get_refs_via_connect(transport, 1);

not a new problem, just curious: Does this leak tmp_refs?

Same question as the other caller about whether we mind getting
extra_have populated.

Thanks,
Jonathan


Re: [PATCH v3 02/35] pkt-line: introduce struct packet_reader

2018-02-26 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> Sometimes it is advantageous to be able to peek the next packet line
> without consuming it (e.g. to be able to determine the protocol version
> a server is speaking).  In order to do that introduce 'struct
> packet_reader' which is an abstraction around the normal packet reading
> logic.  This enables a caller to be able to peek a single line at a time
> using 'packet_reader_peek()' and having a caller consume a line by
> calling 'packet_reader_read()'.
>
> Signed-off-by: Brandon Williams 
> ---
>  pkt-line.c | 59 +++
>  pkt-line.h | 58 ++
>  2 files changed, 117 insertions(+)

I like it!

The questions and nits from
https://public-inbox.org/git/20180213004937.gb42...@aiede.svl.corp.google.com/
still apply.  In particular, the ownership of the buffers inside the
'struct packet_reader' is still unclear; could the packet_reader create
its own (strbuf) buffers so that the contract around them (who is allowed
to write to them; who is responsible for freeing them) is more obvious?

Thanks,
Jonathan


Re: [PATCH v2 12/27] serve: introduce git-serve

2018-02-26 Thread Jonathan Nieder
Hi Duy,

Duy Nguyen wrote:
> On Fri, Jan 26, 2018 at 6:58 AM, Brandon Williams  wrote:

>> + stateless-rpc
>> +---
>> +
>> +If advertised, the `stateless-rpc` capability indicates that the server
>> +supports running commands in a stateless-rpc mode, which means that a
>> +command lasts for only a single request-response round.
>> +
>> +Normally a command can last for as many rounds as are required to
>> +complete it (multiple for negotiation during fetch or no additional
>> +trips in the case of ls-refs).  If the client sends the `stateless-rpc`
>> +capability with a value of `true` (in the form `stateless-rpc=true`)
>> +then the invoked command must only last a single round.
>
> Speaking of stateless-rpc, I remember last time this topic was brought
> up, there was some discussion to kind of optimize it for http as well,
> to fit the "client sends request, server responds data" model and
> avoid too many round trips (ideally everything happens in one round
> trip). Does it evolve to anything real? All the cool stuff happened
> while I was away, sorry if this was discussed and settled.

We have a few different ideas for improving negotiation.  They were
speculative enough that we didn't want to make them part of the
baseline protocol v2.  Feel free to poke me in a new thread. :)

Some teasers:

- allow both client and server to suggest commits in negotiation,
  instead of just the client?

- send a bloom filter for the peer to filter their suggestions
  against?

- send other basic information like maximum generation number or
  maximum commit date?

- exponential backoff in negotiation instead of linear walking?
  prioritizing ref tips?  Imitating the bitmap selection algorithm?

- at the "end" of negotiation, sending a graph data structure instead
  of a pack, to allow an extra round trip to produce a truly minimal
  pack?

Those are some initial ideas, but it's also likely that someone can
come up with some other experiments to try, too.  (E.g. we've looked
at various papers on set reconciliation, but they don't make enough
use of the graph structure to help much.)

Thanks,
Jonathan


Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command

2018-02-26 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:
> On Sat, Feb 24 2018, Jeff King jotted:

>> I actually wonder if we should just specify that the patterns must
>> _always_ be fully-qualified, but may end with a single "/*" to iterate
>> over wildcards. Or even simpler, that "refs/heads/foo" would find that
>> ref itself, and anything under it.
>
> I agree that this is a very good trade-off for now, but I think having
> an escape hatch makes sense. It looks like the protocol is implicitly
> extendible since another parameter could be added, but maybe having such
> a parameter from the get-go would make sense:

I prefer to rely on the implicit extensibility (following the general
YAGNI principle).

In other words, we can introduce a pattern-type later and make the
current pattern-type the default.

Thanks for looking to the future.

[...]
> E.g. if the refs were stored indexed using the method described at
> https://swtch.com/~rsc/regexp/regexp4.html tail matching becomes no less
> efficient than prefix matching, but a function of how many trigrams in
> your index match the pattern given.

I think the nearest planned change to ref storage is [1], which is
still optimized for prefix matching.  Longer term, maybe some day
we'll want a secondary index that supports infix matching, or maybe
we'll never need it. :)

Sincerely,
Jonathan

[1] 
https://public-inbox.org/git/CAJo=hjszcam9sipdvr7tmd-fd2v2w6_pvmq791egcdsdkq0...@mail.gmail.com/#t


Re: [BUG] [git 2.16.1] yeeek ... my files are gone .. by git pull

2018-02-22 Thread Jonathan Nieder
Hi Marcel,

Marcel 'childNo͡.de' Trautwein" wrote:

> I think we have a problem … or at least I had
> and I’m not quite sure if this is „working as designed“
> but I’m sure it „should not work as it did“.
[...]
> I wanted to clone another repository … but yeah … it’s late for me today and 
> I put
> in s.th. `git pull 
> g...@private.gitlab.instance.example.com:aGroup/repository.git`
>
> next … all committed files are zapped and the repository given has
> been checked out in my home directory 🤯👻
>
> what? Shouldn’t this just fail? Why can I pass another remote to pull?

Sorry, this is not the most helpful reply but:

Can you describe a reproduction recipe so that I can experience the
same thing?

That is:

 1. steps to reproduce
 2. expected result
 3. actual result
 4. the difference and why it was unexpected

I suspect that this information is in your message, somewhere, but it
is (understandably) unfocussed and I am having trouble pulling it out.

[...]
> trying to fix this up by doing another pull failed:
> ```
> -bash:$ git remote -v
> origing...@bitbucket.org:childnode/marcel.git (fetch)
> origing...@bitbucket.org:childnode/marcel.git (push)
>
> -bash:$ git pull
> fatal: refusing to merge unrelated histories

Ok, this part is something I might be able to help shed some light on.

Searching for 'unrelated' in "git help pull" finds:

   --allow-unrelated-histories
   By default, git merge command refuses to merge histories that do not
   share a common ancestor. This option can be used to override this
   safety when merging histories of two projects that started their
   lives independently. As that is a very rare occasion, no
   configuration variable to enable this by default exists and will not
   be added.

So that explains the "what" of that error message.

The "why" is a separate question.  Could you share output from

  git log --all --graph --decorate --oneline --simplify-by-decoration

and

  git status

to help us understand your current state?

Also, suggestions for improvements to the 'refusing to merge' message
would be very welcome.

Thanks and hope that helps,
Jonathan


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-02-22 Thread Jonathan Nieder
Jeff King wrote:

> All of that said, I think the current code is quite dangerous already,
> and maybe even broken.  upload-pack may run sub-commands like rev-list
> or pack-objects, which are themselves builtins.

Sounds like more commands to set the IGNORE_PAGER_CONFIG flag for in
git.c.

Thanks for looking this over thoughtfully.

[...]
> I couldn't quite get it to work, but I think it's because I'm doing
> something wrong with the submodules. But I also think this attack would
> _have_ to be done over ssh, because on a local system the submodule
> clone would a hard-link rather than a real fetch.

What happens if the submodule URL starts with file://?

Thanks,
Jonathan


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-02-22 Thread Jonathan Nieder
Jeff King wrote:

> The current property is that it's safe to fetch from an
> untrusted repository, even over ssh. If we're keeping that for protocol
> v1, we'd want it to apply to protocol v2, as well.

Ah, this is what I had been missing (the non-ssh case).

I see your point.  I think we need to fix the pager config issue and add
some clarifying documentation to git.c so that people know what to look
out for.

Keep in mind that git upload-archive (a read-only command, just like
git upload-pack) also already has the same issues.

Thanks,
Jonathan


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-02-22 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Thu, Feb 22, 2018 at 11:38:14AM -0800, Jonathan Nieder wrote:

>> To be clear, which of the following are you (most) worried about?
>>
>>  1. being invoked with --help and spawning a pager
>>  2. receiving and acting on options between 'git' and 'upload-pack'
>>  3. repository discovery
>>  4. pager config
>>  5. alias discovery
>>  6. increased code surface / unknown threats
>
> My immediate concern is (4).

Thanks for clarifying.

>  But my greater concern is that people who
> work on git.c should not have to worry about accidentally violating this
> principle when they add a new feature or config option.

That sounds like a combination of (6) and insufficient documentation
or tests.  Ideas for how we can help prevent such accidents?

> In other words, it seems like an accident waiting to happen. I'd be more
> amenable to it if there was some compelling reason for it to be a
> builtin, but I don't see one listed in the commit message. I see only
> "let's make it easier to share the code", which AFAICT is equally served
> by just lib-ifying the code and calling it from the standalone
> upload-pack.c.

If we have so little control of the common code used by git commands
that could be invoked by a remote user, I think we're in trouble
already.  I don't think being a builtin vs not makes that
significantly different, since there are plenty of builtins that can
be triggered by remote users.  Further, if we have so little control
over the security properties of git.c, what hope do we have of making
the rest of libgit.a usable in secure code?

In other words, having to pay more attention to the git wrapper from a
security pov actually feels to me like a *good* thing.  The git
wrapper is the entry point to almost all git commands.  If it is an
accident waiting to happen, then anything that calls git commands is
already an accident waiting to happen.  So how can we make it not an
accident waiting to happen? :)

>> Although in most setups the user does not control the config files on
>> a server, item (4) looks like a real issue worth solving.  I think we
>> should introduce a flag to skip looking for pager config.  We could
>> use it for receive-pack, too.
>
> There's not much point for receive-pack. It respects hooks, so any
> security ship has already sailed there.

Yet there are plenty of cases where people who can push are not
supposed to have root privilege.  I am not worried about hooks
specifically (although the changes described at [1] might help and I
still plan to work on those) but I am worried about e.g. commandline
injection issues.  I don't think we can treat receive-pack as out of
scope.

And to be clear, I don't think you were saying receive-pack *is* out
of scope.  But you seem to be trying to draw some boundary, where I
only see something fuzzier (e.g. if a bug only applies to
receive-pack, then that certainly decreases its impact, but it doesn't
make the impact go away).

Thanks,
Jonathan

[1] 
https://public-inbox.org/git/20171002234517.gv19...@aiede.mtv.corp.google.com/


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-02-22 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Thu, Feb 22, 2018 at 10:07:15AM -0800, Brandon Williams wrote:
>> On 02/22, Jeff King wrote:
>>> On Wed, Feb 21, 2018 at 01:44:22PM -0800, Jonathan Tan wrote:
 On Tue,  6 Feb 2018 17:12:41 -0800
 Brandon Williams  wrote:

> In order to allow for code sharing with the server-side of fetch in
> protocol-v2 convert upload-pack to be a builtin.
[...]
 As Stefan mentioned in [1], also mention in the commit message that this
 means that the "git-upload-pack" invocation gains additional
 capabilities (for example, invoking a pager for --help).
>>>
>>> And possibly respecting pager.upload-pack, which would violate our rule
>>> that it is safe to run upload-pack in untrusted repositories.
>>
>> And this isn't an issue with receive-pack because this same guarantee
>> doesn't exist?
>
> Yes, exactly (which is confusing and weird, yes, but that's how it is).

To be clear, which of the following are you (most) worried about?

 1. being invoked with --help and spawning a pager
 2. receiving and acting on options between 'git' and 'upload-pack'
 3. repository discovery
 4. pager config
 5. alias discovery
 6. increased code surface / unknown threats

For (1), "--help" has to be the first argument.  "git daemon" passes
--strict so it doesn't happen there.  "git http-backend" passes
--stateless-rpc so it doesn't happen there.  "git shell" sanitizes
input to avoid it happening there.

A custom setup could provide their own entry point that doesn't do
such sanitization.  I think that in some sense it's out of our hands,
but it would be nice to harden as described upthread.

For (2), I am having trouble imagining a setup where it would happen.

upload-pack doesn't have the RUN_SETUP or RUN_SETUP_GENTLY flag set,
so (3) doesn't apply.

Although in most setups the user does not control the config files on
a server, item (4) looks like a real issue worth solving.  I think we
should introduce a flag to skip looking for pager config.  We could
use it for receive-pack, too.

Builtins are handled before aliases, so (5) doesn't apply.

(6) is a real issue: it is why "git shell" is not a builtin, for
example.  But I would rather that we use appropriate sanitization
before upload-pack is invoked than live in fear.  git upload-pack is
sufficiently complicated that I don't think the git.c wrapper
increases the complexity by a significant amount.

That leaves me with a personal answer of only being worried about (4)
and not the rest.  What do you think?  Is one of the other items I
listed above worrisome, or is there another item I missed?

Thanks,
Jonathan


Re: [PATCH 26/27] sha1_file: allow map_sha1_file to handle arbitrary repositories

2018-02-21 Thread Jonathan Nieder
Stefan Beller wrote:

> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 
> ---
>  object-store.h | 3 +--
>  sha1_file.c| 5 +++--
>  2 files changed, 4 insertions(+), 4 deletions(-)

Thanks: this is short and simple, making my life as a reviewer very
easy.

Reviewed-by: Jonathan Nieder 


Re: [PATCH 21/27] sha1_file: add repository argument to sha1_loose_object_info

2018-02-21 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Add a repository argument to allow the sha1_loose_object_info caller
> to be more specific about which repository to act on. This is a small
> mechanical change; it doesn't change the implementation to handle
> repositories other than the_repository yet.
>
> As with the previous commits, use a macro to catch callers passing a
> repository other than the_repository at compile time.
>
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 
> ---
>  sha1_file.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)

Simple and obviously correct.
Reviewed-by: Jonathan Nieder 


Re: [PATCH 17/27] sha1_file: add repository argument to stat_sha1_file

2018-02-21 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Add a repository argument to allow the stat_sha1_file caller to be
> more specific about which repository to act on. This is a small
> mechanical change; it doesn't change the implementation to handle
> repositories other than the_repository yet.
>
> As with the previous commits, use a macro to catch callers passing a
> repository other than the_repository at compile time.
>
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 
> ---
>  sha1_file.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)

Simple and obviously correct.
Reviewed-by: Jonathan Nieder 


Re: [PATCH 13/27] sha1_file: add repository argument to prepare_alt_odb

2018-02-21 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> See previous patch for explanation.
>
> While at it, move the declaration to object-store.h,
> where it should be easier to find.

Which declaration?  It looks like prepare_alt_odb is already in
object-store.h.

[...]
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -717,7 +717,7 @@ int cmd_fsck(int argc, const char **argv, const char 
> *prefix)
>   } else {
>   fsck_object_dir(get_object_directory());
>  
> - prepare_alt_odb();
> + prepare_alt_odb(the_repository);

Patch 2 added a #include of "repository.h".  Good.

(I checked because with the definition of prepare_alt_odb as a macro,
the function call would compile correctly even if the_repository
weren't in scope, but we want to include what we use as a matter of
style/maintainability.)

[...]
> --- a/packfile.c
> +++ b/packfile.c
> @@ -890,7 +890,7 @@ void prepare_packed_git(void)
>   if (the_repository->objects.packed_git_initialized)
>   return;
>   prepare_packed_git_one(get_object_directory(), 1);
> - prepare_alt_odb();
> + prepare_alt_odb(the_repository);

Also good, since patch 3 added a #include of "repository.h".

[...]
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -23,6 +23,7 @@
>  #include "sha1-lookup.h"
>  #include "bulk-checkin.h"
>  #include "repository.h"
> +#include "object-store.h"

Should this #include have been added in an earlier patch, since the
file both uses and defines prepare_alt_odb, which is declared there?

The rest looks good.

Thanks,
Jonathan


Re: [PATCH 05/27] object-store: move packed_git and packed_git_mru to object store

2018-02-21 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> In a process with multiple repositories open, packfile accessors
> should be associated to a single repository and not shared globally.
> Move packed_git and packed_git_mru into the_repository and adjust
> callers to reflect this.
>
> Patch generated by
>
>  1. Moving the struct packed_git declaration to object-store.h
> and packed_git, packed_git_mru globals to struct object_store.
>
>  2. Applying the semantic patch
> contrib/coccinelle/refactoring/packed_git.cocci to adjust callers.
> This semantic patch is placed in a sub directory of the coccinelle
> contrib dir, as this semantic patch is not expected to be of general
> usefulness; it is only useful during developing this series and
> merging it with other topics in flight. At a later date, just
> delete that semantic patch.

Can the semantic patch go in the commit message instead?  It is very
brief.

Actually, I don't see this semantic patch in the diffstat.  Is the
commit message stale?

>  3. Applying line wrapping fixes from "make style" to break the
> resulting long lines.
>
>  4. Adding missing #includes of repository.h and object-store.h
> where needed.

Is there a way to automate this step?  (I'm asking for my own
reference when writing future patches, not because of any concern
about the correctness of this one.)
>
>  5. As the packfiles are now owned by an objectstore/repository, which
> is ephemeral unlike globals, we introduce memory leaks. So address
> them in raw_object_store_clear().

The compound words are confusing me.  What is an
objectstore/repository?  Are these referring to particular identifiers
or something else?

Would some wording like the following work?

   5. Freeing packed_git and packed_git_mru in raw_object_store_clear
  to avoid a per-repository memory leak.  Previously they were
  global singletons, so code to free them did not exist.

[...]
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -12,6 +12,7 @@
>  #include "exec_cmd.h"
>  #include "streaming.h"
>  #include "thread-utils.h"
> +#include "object-store.h"
>  #include "packfile.h"
>  
>  static const char index_pack_usage[] =

Change from a different patch leaked into this one?

[...]
> +++ b/builtin/pack-objects.c
[...]
> @@ -1044,7 +1045,7 @@ static int want_object_in_pack(const struct object_id 
> *oid,
>   }
>   want = want_found_object(exclude, p);
>   if (!exclude && want > 0)
> - list_move(&p->mru, &packed_git_mru);
> + list_move(&p->mru, 
> &the_repository->objects.packed_git_mru);

Long line.  Can "make style" catch this?

[...]
> +++ b/builtin/receive-pack.c
> @@ -7,6 +7,7 @@
>  #include "sideband.h"
>  #include "run-command.h"
>  #include "exec_cmd.h"
> +#include "object-store.h"
>  #include "commit.h"
>  #include "object.h"
>  #include "remote.h"

Another change leaked in?

[...]
> --- a/cache.h
> +++ b/cache.h
> @@ -1585,35 +1585,6 @@ struct pack_window {
>   unsigned int inuse_cnt;
>  };
>  
> -extern struct packed_git {
[...]
> -} *packed_git;

Move detecting diff confirms that this wasn't modified.  Thanks for
creating it.

[...]
> +++ b/fast-import.c
[...]
> @@ -1110,7 +1112,7 @@ static int store_object(
>   if (e->idx.offset) {
>   duplicate_count_by_type[type]++;
>   return 1;
> - } else if (find_sha1_pack(oid.hash, packed_git)) {
> + } else if (find_sha1_pack(oid.hash, 
> the_repository->objects.packed_git)) {

Long line.  (I'll refrain from commenting about any further ones.)

[...]
> +++ b/http-push.c
> @@ -1,4 +1,5 @@
>  #include "cache.h"
> +#include "object-store.h"
>  #include "commit.h"
>  #include "tag.h"
>  #include "blob.h"

Stray change?

> diff --git a/http-walker.c b/http-walker.c
> index 07c2b1af82..8bb5d991bb 100644
> --- a/http-walker.c
> +++ b/http-walker.c
> @@ -4,6 +4,7 @@
>  #include "http.h"
>  #include "list.h"
>  #include "transport.h"
> +#include "object-store.h"
>  #include "packfile.h"
>  
>  struct alt_base {

Same question.

> diff --git a/http.c b/http.c
> index 31755023a4..a4a9e583c7 100644
> --- a/http.c
> +++ b/http.c
> @@ -1,6 +1,7 @@
>  #include "git-compat-util.h"
>  #include "http.h"
>  #include "config.h"
> +#include "object-store.h"
>  #include "pack.h"
>  #include "sideband.h"
>  #include "run-command.h"

Likewise.

> diff --git a/object-store.h b/object-store.h
> index e78eea1dde..1de9e07102 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -52,6 +52,30 @@ void add_to_alternates_memory(const char *dir);
>   */
>  struct strbuf *alt_scratch_buf(struct alternate_object_database *alt);
>  
> +struct packed_git {
> + struct packed_git *next;
> + struct list_head mru;
> + struct pack_window *windows;
> + off_t pack_size;
> + const void *index_data;
> + size_t index_size;
> + uint32_t num_objects;
> + uint32_t num_bad_objects

Re: [PATCH 01/27] repository: introduce raw object store field

2018-02-21 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> The raw object store field will contain any objects needed for
> access to objects in a given repository.
>
> This patch introduces the raw object store and populates it with the
> `objectdir`, which used to be part of the repository struct.
>
> As the struct gains members, we'll also populate the function to clear
> the memory for these members.
>
> In a later we'll introduce a struct object_parser, that will complement
> the object parsing in a repository struct: The raw object parser is the
> layer that will provide access to raw object content, while the higher
> level object parser code will parse raw objects and keeps track of
> parenthood and other object relationships using 'struct object'.
> For now only add the lower level to the repository struct.
>
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 
> ---

Heh, I suppose that sign-off makes me not a great candidate for
reviewing this.  It's been long enough since I looked at the patch
that I feel okay trying anyway.

[...]
> --- /dev/null
> +++ b/object-store.h
> @@ -0,0 +1,15 @@
> +#ifndef OBJECT_STORE_H
> +#define OBJECT_STORE_H
> +
> +struct raw_object_store {
> + /*
> +  * Path to the repository's object store.
> +  * Cannot be NULL after initialization.
> +  */
> + char *objectdir;
> +};
> +#define RAW_OBJECT_STORE_INIT { NULL }

Who owns and is responsible for freeing objectdir?

> +
> +void raw_object_store_clear(struct raw_object_store *o);

Oh, that answers that.

It could be worth a note in the doc comment anyway, but I don't mind
not having one.

[...]
> +
> +void raw_object_store_clear(struct raw_object_store *o)
> +{
> + free(o->objectdir);
> +}

Should this use FREE_AND_NULL?

[...]
> --- a/repository.c
> +++ b/repository.c
[...]
> @@ -42,8 +49,8 @@ static void repo_setup_env(struct repository *repo)
>   !repo->ignore_env);
>   free(repo->commondir);
>   repo->commondir = strbuf_detach(&sb, NULL);
> - free(repo->objectdir);
> - repo->objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir,
> + free(repo->objects.objectdir);

Should this call raw_object_store_clear instead of calling free
directly?

> + repo->objects.objectdir = git_path_from_env(DB_ENVIRONMENT, 
> repo->commondir,
>   "objects", !repo->ignore_env);

Long line.  One way to break it would be

repo->objects.objectdir =
git_path_from_env(DB_ENVIRONMENT, repo->commondir,
  "objects", !repo->ignore_env);

>   free(repo->graft_file);
>   repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir,
> @@ -209,12 +216,14 @@ void repo_clear(struct repository *repo)
>  {
>   FREE_AND_NULL(repo->gitdir);
>   FREE_AND_NULL(repo->commondir);
> - FREE_AND_NULL(repo->objectdir);
>   FREE_AND_NULL(repo->graft_file);
>   FREE_AND_NULL(repo->index_file);
>   FREE_AND_NULL(repo->worktree);
>   FREE_AND_NULL(repo->submodule_prefix);
>  
> + raw_object_store_clear(&repo->objects);
> + memset(&repo->objects, 0, sizeof(repo->objects));

If raw_object_store_clear uses FREE_AND_NULL, this memset won't be necessary.

[...]
> --- a/repository.h
> +++ b/repository.h
> @@ -1,6 +1,8 @@
>  #ifndef REPOSITORY_H
>  #define REPOSITORY_H
>  
> +#include "object-store.h"
> +
>  struct config_set;
>  struct index_state;
>  struct submodule_cache;
> @@ -21,10 +23,9 @@ struct repository {
>   char *commondir;
>  
>   /*
> -  * Path to the repository's object store.
> -  * Cannot be NULL after initialization.
> +  * Holds any information related to the object store.
>*/

This comment doesn't make it clear to me what the field represents.
Can it be replaced with some of the description from the commit
message?

> - char *objectdir;
> + struct raw_object_store objects;
>  

Thanks,
Jonathan


Re: bug in HTTP protocol spec

2018-02-21 Thread Jonathan Nieder
(+cc: bmwill@, who is working on protocol v2)
Hi,

Dorian Taylor wrote:
> On Feb 21, 2018, at 2:15 PM, Jeff King  wrote:

>> Thanks, I agree the document is buggy. Do you want to submit a patch?
>
> Will this do?

Thanks for writing it.

Do you mind if we forge your sign-off?  (See Documentation/SubmittingPatches
item '(5) Certify your work' for details about what this means.)

> Note I am not sure what the story is behind that `version 1`
> element, whether it's supposed to go before or after the null packet
> or if there should be another null packet or what. Perhaps somebody
> more fluent with the smart protocol can advise.

I believe the 'version 1' goes after the flush-packet.

Thanks,
Jonathan


Re: Bug Report: git status triggers a file change event

2018-02-21 Thread Jonathan Nieder
+git-for-windows
Hi,

Raining Chain wrote:

> On Windows 10, git version 2.16.2.windows.1, running the command
>
> git status
>
> will trigger a file change event to file C:\myPath\.git  "Attributes changed."
>
> This causes problems when using scripts that detect file changes such
> as tsc -w (Typescript compiler) and using softwares that regularly
> call git status such as VisualStudioCode.
>
> Thanks.

Can you say more about how "tsc -w" reacts to the file change?  Is there
a way to tell it to exclude particular files from the files it watches?

Thanks,
Jonathan


Re: [PATCH 04/26] upload-pack: convert to a builtin

2018-02-21 Thread Jonathan Nieder
Brandon Williams wrote:
> On 01/03, Stefan Beller wrote:
> > On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams  wrote:

>>> In order to allow for code sharing with the server-side of fetch in
>>> protocol-v2 convert upload-pack to be a builtin.
>>
>> What is the security aspect of this patch?
>>
>> By making upload-pack builtin, it gains additional abilities,
>> such as answers to '-h' or '--help' (which would start a pager).
>> Is there an easy way to sooth my concerns? (best put into the
>> commit message)
>
> receive-pack is already a builtin, so theres that.

*nod*

Since v2.4.12~1^2 (shell: disallow repo names beginning with dash,
2017-04-29), git-shell refuses to pass --help to upload-pack, limiting
the security impact in configurations that use git-shell (e.g.
gitolite installations).

If you're not using git-shell, then hopefully you have some other form
of filtering preventing arbitrary options being passed to
git-upload-pack.  If you don't, then you're in trouble, for the
reasons described in that commit.

Since some installations may be allowing access to git-upload-pack
(for read-only access) without allowing access to git-receive-pack,
this does increase the chance of attack.  On the other hand, I suspect
the maintainability benefit is worth it.

For defense in depth, it would be comforting if the git wrapper had
some understanding of "don't support --help in handle_builtin when
invoked as a dashed command".  That is, I don't expect that anyone has
been relying on

git-add --help

acting like

git help add

instead of printing the usage message from

git add -h

It's a little fussy because today we rewrite "git add --help" to
"git-add --help" before rewriting it to "git help add"; we'd have to
skip that middle hop for this to work.

I don't think that has to block this patch or series, though --- it's
just a separate thought about hardening.

Cc-ing Sitaram Chamarty since he tends to be wiser about this kind of
thing than I am.

What do you think?

Thanks,
Jonathan


Re: [PATCHv2 00/16] Moving global state into the repository object (part 1)

2018-02-16 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> v2:
> * duplicated the 'ignore_env' bit into the object store as well
> * the #define trick no longer works as we do not have a "the_objectstore" 
> global,
>   which means there is just one patch per function that is converted.
>   As this follows the same structure of the previous series, I am still 
> confident
>   there is no hidden dependencies to globals outside the object store in these
>   converted functions.
> * rebased on top of current master, resolving the merge conflicts.
>   I think I used the list.h APIs right, but please double check.

For what it's worth, I think I prefer v1.  I put some comments on why
on patch 0 of v1 and would be interested in your thoughts on them
(e.g. as a reply to that).  I also think that even if we want to
switch to a style that passes around object_store separately from
repository, it is easier to do the migration in two steps: first get
rid of hidden dependencies on the_repository, then do the (simpler)
automatic migration from

 f(the_repository)

to

 f(the_repository->object_store)

*afterwards*.

Thoughts?

Thanks,
Jonathan


Re: [PATCH 3/8] perl: generalize the Git::LoadCPAN facility

2018-02-16 Thread Jonathan Nieder
Hi,

Todd Zullinger wrote:

> Subject: [PATCH] Makefile: add NO_PERL_CPAN_FALLBACKS to disable fallback 
> module install
>
> As noted in perl/Git/LoadCPAN.pm, operating system packages often don't
> want to ship Git::FromCPAN tree at all, preferring to use their own
> packaging of CPAN modules instead.  Allow such packagers to set
> NO_PERL_CPAN_FALLBACKS to easily avoid installing these fallback perl
> CPAN modules.
>
> Signed-off-by: Todd Zullinger 
> ---
>  Makefile| 6 ++
>  perl/{Git => }/FromCPAN/.gitattributes  | 0
>  perl/{Git => }/FromCPAN/Error.pm| 0
>  perl/{Git => }/FromCPAN/Mail/.gitattributes | 0
>  perl/{Git => }/FromCPAN/Mail/Address.pm | 0
>  5 files changed, 6 insertions(+)
>  rename perl/{Git => }/FromCPAN/.gitattributes (100%)
>  rename perl/{Git => }/FromCPAN/Error.pm (100%)
>  rename perl/{Git => }/FromCPAN/Mail/.gitattributes (100%)
>  rename perl/{Git => }/FromCPAN/Mail/Address.pm (100%)

Nice!  I like it.

[...]
> +++ b/Makefile
> @@ -296,6 +296,9 @@ all::
>  #
>  # Define NO_PERL if you do not want Perl scripts or libraries at all.
>  #
> +# Define NO_PERL_CPAN_FALLBACKS if you do not want to install fallbacks for
> +# perl CPAN modules.

nit: Looking at this as a naive user, it's not obvious what kind of
fallbacks are meant. How about:

Define NO_PERL_CPAN_FALLBACKS if you do not want to install bundled
copies of CPAN modules that serve as a fallback in case the modules are
not available on the system.

Or perhaps:

Define HAVE_CPAN_MODULES if you have Error.pm and Mail::Address 
installed
and do not want to install the fallback copies from perl/FromCPAN.

Would this patch need to update the loader to expect the modules in
the new location?

Thanks,
Jonathan


Re: [PATCH 8/8] perl: hard-depend on the File::{Temp,Spec} modules

2018-02-14 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -1324,8 +1324,9 @@ sub _temp_cache {
>  }
>  
>  sub _verify_require {
> - eval { require File::Temp; require File::Spec; };
> - $@ and throw Error::Simple($@);
> + require File::Temp;
> + require File::Spec;
> + return;

Same question as in the other patches: any reason not to simplify by
using 'use' at the top of the file instead?

Thanks,
Jonathan


Re: [PATCH 7/8] gitweb: hard-depend on the Digest::MD5 5.8 module

2018-02-14 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  gitweb/INSTALL |  3 +--
>  gitweb/gitweb.perl | 17 +
>  2 files changed, 6 insertions(+), 14 deletions(-)

Makes sense, and I like the diffstat.

[...]
> +++ b/gitweb/gitweb.perl
[...]
> @@ -1166,18 +1165,11 @@ sub configure_gitweb_features {
>   our @snapshot_fmts = gitweb_get_feature('snapshot');
>   @snapshot_fmts = filter_snapshot_fmts(@snapshot_fmts);
>  
> - # check that the avatar feature is set to a known provider name,
> - # and for each provider check if the dependencies are satisfied.
> - # if the provider name is invalid or the dependencies are not met,
> - # reset $git_avatar to the empty string.
> + # check that the avatar feature is set to a known provider
> + # name, if the provider name is invalid, reset $git_avatar to
> + # the empty string.

Comma splice.  Formatting as sentences would make this easier to read,
anyway:

# Check that the avatar feature is set to a known provider name.
# If the provider name is invalid, reset $git_avatar to the empty
# string.

Even better would be to remove the comment altogether.  The code is
clear enough on its own and the comment should not be needed now that
it is a one-liner.

[...]
> @@ -2165,6 +2157,7 @@ sub picon_url {
>  sub gravatar_url {
>   my $email = lc shift;
>   my $size = shift;
> + require Digest::MD5;

Same question as the previous patch: how do we decide whether to 'use'
or to 'require' in cases like this?

Thanks,
Jonathan


Re: [PATCH 6/8] git-send-email: unconditionally use Net::{SMTP,Domain}

2018-02-14 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> The Net::SMTP and Net::Domain were both first released with perl
> v5.7.3, since my d48b284183 ("perl: bump the required Perl version to
> 5.8 from 5.6.[21]", 2010-09-24) we've depended on 5.8, so there's no
> reason to conditionally require this anymore.
>
> This conditional loading was initially added in
> 87840620fd ("send-email: only 'require' instead of 'use' Net::SMTP",
> 2006-06-01) for Net::SMTP and 134550fe21 ("git-send-email.perl - try
> to give real name of the calling host to HELO/EHLO", 2010-03-14) for
> Net::Domain, both of which predate the hard dependency on 5.8.
> 
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  git-send-email.perl | 24 +++-
>  1 file changed, 11 insertions(+), 13 deletions(-)
> 
> diff --git a/git-send-email.perl b/git-send-email.perl
> index 85bb6482f2..69bd443245 100755
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -1143,10 +1143,9 @@ sub valid_fqdn {
>  sub maildomain_net {
>   my $maildomain;
>  
> - if (eval { require Net::Domain; 1 }) {
> - my $domain = Net::Domain::domainname();
> - $maildomain = $domain if valid_fqdn($domain);
> - }
> + require Net::Domain;
> + my $domain = Net::Domain::domainname();
> + $maildomain = $domain if valid_fqdn($domain);

Now that we indeed require the module, any reason not to 'use' it?
E.g. is it particularly expensive to load?

I haven't checked the assertions above about minimal perl versions
including these modules, but I assume they're true. :)  So this looks
like a good change.

Thanks,
Jonathan


Re: [PATCH 5/8] perl: update our copy of Mail::Address

2018-02-14 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> Update our copy of Mail::Address from 2.19 (Aug 22, 2017) to 2.20 (Jan
> 23, 2018). This should be a trivial update[1] but it seems the version
> Matthieu Moy imported in bd869f67b9 ("send-email: add and use a local
> copy of Mail::Address", 2018-01-05) doesn't correspond to any 2.19
> version found on the CPAN. From the comment at the top of the file it
> looks like some OS version with the POD stripped, and with different
> indentation.

Were there changes other than the POD stripping?

> Let's instead use the upstream version as-is, and without copyright
> notices stripped. Like Error.pm this doesn't cleanly pass --check, so
> add a .gitattributes file to ignore the errors.
>
> 1. 
> https://metacpan.org/diff/file?target=MARKOV/MailTools-2.20/lib%2FMail%2FAddress.pod&source=MARKOV%2FMailTools-2.19%2Flib%2FMail%2FAddress.pod
>
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
>  perl/Git/FromCPAN/Mail/.gitattributes |   1 +
>  perl/Git/FromCPAN/Mail/Address.pm | 436 
> +-
>  2 files changed, 163 insertions(+), 274 deletions(-)
>  create mode 100644 perl/Git/FromCPAN/Mail/.gitattributes

Yikes re the stripped POD with license notice. Thanks for fixing it.

Reviewed-by: Jonathan Nieder 


<    1   2   3   4   5   6   7   8   9   10   >