Re: [PATCH 8/8] perl: hard-depend on the File::{Temp,Spec} modules

2018-02-14 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:

> --- a/perl/Git.pm
> +++ b/perl/Git.pm
> @@ -1324,8 +1324,9 @@ sub _temp_cache {
>  }
>  
>  sub _verify_require {
> - eval { require File::Temp; require File::Spec; };
> - $@ and throw Error::Simple($@);
> + require File::Temp;
> + require File::Spec;
> + return;

Same question as in the other patches: any reason not to simplify by
using 'use' at the top of the file instead?

Thanks,
Jonathan


Re: [PATCH 3/8] perl: generalize the Git::LoadCPAN facility

2018-02-16 Thread Jonathan Nieder
Hi,

Todd Zullinger wrote:

> Subject: [PATCH] Makefile: add NO_PERL_CPAN_FALLBACKS to disable fallback 
> module install
>
> As noted in perl/Git/LoadCPAN.pm, operating system packages often don't
> want to ship Git::FromCPAN tree at all, preferring to use their own
> packaging of CPAN modules instead.  Allow such packagers to set
> NO_PERL_CPAN_FALLBACKS to easily avoid installing these fallback perl
> CPAN modules.
>
> Signed-off-by: Todd Zullinger 
> ---
>  Makefile| 6 ++
>  perl/{Git => }/FromCPAN/.gitattributes  | 0
>  perl/{Git => }/FromCPAN/Error.pm| 0
>  perl/{Git => }/FromCPAN/Mail/.gitattributes | 0
>  perl/{Git => }/FromCPAN/Mail/Address.pm | 0
>  5 files changed, 6 insertions(+)
>  rename perl/{Git => }/FromCPAN/.gitattributes (100%)
>  rename perl/{Git => }/FromCPAN/Error.pm (100%)
>  rename perl/{Git => }/FromCPAN/Mail/.gitattributes (100%)
>  rename perl/{Git => }/FromCPAN/Mail/Address.pm (100%)

Nice!  I like it.

[...]
> +++ b/Makefile
> @@ -296,6 +296,9 @@ all::
>  #
>  # Define NO_PERL if you do not want Perl scripts or libraries at all.
>  #
> +# Define NO_PERL_CPAN_FALLBACKS if you do not want to install fallbacks for
> +# perl CPAN modules.

nit: Looking at this as a naive user, it's not obvious what kind of
fallbacks are meant. How about:

Define NO_PERL_CPAN_FALLBACKS if you do not want to install bundled
copies of CPAN modules that serve as a fallback in case the modules are
not available on the system.

Or perhaps:

Define HAVE_CPAN_MODULES if you have Error.pm and Mail::Address 
installed
and do not want to install the fallback copies from perl/FromCPAN.

Would this patch need to update the loader to expect the modules in
the new location?

Thanks,
Jonathan


Re: [PATCHv2 00/16] Moving global state into the repository object (part 1)

2018-02-16 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> v2:
> * duplicated the 'ignore_env' bit into the object store as well
> * the #define trick no longer works as we do not have a "the_objectstore" 
> global,
>   which means there is just one patch per function that is converted.
>   As this follows the same structure of the previous series, I am still 
> confident
>   there is no hidden dependencies to globals outside the object store in these
>   converted functions.
> * rebased on top of current master, resolving the merge conflicts.
>   I think I used the list.h APIs right, but please double check.

For what it's worth, I think I prefer v1.  I put some comments on why
on patch 0 of v1 and would be interested in your thoughts on them
(e.g. as a reply to that).  I also think that even if we want to
switch to a style that passes around object_store separately from
repository, it is easier to do the migration in two steps: first get
rid of hidden dependencies on the_repository, then do the (simpler)
automatic migration from

 f(the_repository)

to

 f(the_repository->object_store)

*afterwards*.

Thoughts?

Thanks,
Jonathan


Re: [PATCH 04/26] upload-pack: convert to a builtin

2018-02-21 Thread Jonathan Nieder
Brandon Williams wrote:
> On 01/03, Stefan Beller wrote:
> > On Tue, Jan 2, 2018 at 4:18 PM, Brandon Williams  wrote:

>>> In order to allow for code sharing with the server-side of fetch in
>>> protocol-v2 convert upload-pack to be a builtin.
>>
>> What is the security aspect of this patch?
>>
>> By making upload-pack builtin, it gains additional abilities,
>> such as answers to '-h' or '--help' (which would start a pager).
>> Is there an easy way to sooth my concerns? (best put into the
>> commit message)
>
> receive-pack is already a builtin, so theres that.

*nod*

Since v2.4.12~1^2 (shell: disallow repo names beginning with dash,
2017-04-29), git-shell refuses to pass --help to upload-pack, limiting
the security impact in configurations that use git-shell (e.g.
gitolite installations).

If you're not using git-shell, then hopefully you have some other form
of filtering preventing arbitrary options being passed to
git-upload-pack.  If you don't, then you're in trouble, for the
reasons described in that commit.

Since some installations may be allowing access to git-upload-pack
(for read-only access) without allowing access to git-receive-pack,
this does increase the chance of attack.  On the other hand, I suspect
the maintainability benefit is worth it.

For defense in depth, it would be comforting if the git wrapper had
some understanding of "don't support --help in handle_builtin when
invoked as a dashed command".  That is, I don't expect that anyone has
been relying on

git-add --help

acting like

git help add

instead of printing the usage message from

git add -h

It's a little fussy because today we rewrite "git add --help" to
"git-add --help" before rewriting it to "git help add"; we'd have to
skip that middle hop for this to work.

I don't think that has to block this patch or series, though --- it's
just a separate thought about hardening.

Cc-ing Sitaram Chamarty since he tends to be wiser about this kind of
thing than I am.

What do you think?

Thanks,
Jonathan


Re: Bug Report: git status triggers a file change event

2018-02-21 Thread Jonathan Nieder
+git-for-windows
Hi,

Raining Chain wrote:

> On Windows 10, git version 2.16.2.windows.1, running the command
>
> git status
>
> will trigger a file change event to file C:\myPath\.git  "Attributes changed."
>
> This causes problems when using scripts that detect file changes such
> as tsc -w (Typescript compiler) and using softwares that regularly
> call git status such as VisualStudioCode.
>
> Thanks.

Can you say more about how "tsc -w" reacts to the file change?  Is there
a way to tell it to exclude particular files from the files it watches?

Thanks,
Jonathan


Re: bug in HTTP protocol spec

2018-02-21 Thread Jonathan Nieder
(+cc: bmwill@, who is working on protocol v2)
Hi,

Dorian Taylor wrote:
> On Feb 21, 2018, at 2:15 PM, Jeff King  wrote:

>> Thanks, I agree the document is buggy. Do you want to submit a patch?
>
> Will this do?

Thanks for writing it.

Do you mind if we forge your sign-off?  (See Documentation/SubmittingPatches
item '(5) Certify your work' for details about what this means.)

> Note I am not sure what the story is behind that `version 1`
> element, whether it's supposed to go before or after the null packet
> or if there should be another null packet or what. Perhaps somebody
> more fluent with the smart protocol can advise.

I believe the 'version 1' goes after the flush-packet.

Thanks,
Jonathan


Re: [PATCH 01/27] repository: introduce raw object store field

2018-02-21 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> The raw object store field will contain any objects needed for
> access to objects in a given repository.
>
> This patch introduces the raw object store and populates it with the
> `objectdir`, which used to be part of the repository struct.
>
> As the struct gains members, we'll also populate the function to clear
> the memory for these members.
>
> In a later we'll introduce a struct object_parser, that will complement
> the object parsing in a repository struct: The raw object parser is the
> layer that will provide access to raw object content, while the higher
> level object parser code will parse raw objects and keeps track of
> parenthood and other object relationships using 'struct object'.
> For now only add the lower level to the repository struct.
>
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 
> ---

Heh, I suppose that sign-off makes me not a great candidate for
reviewing this.  It's been long enough since I looked at the patch
that I feel okay trying anyway.

[...]
> --- /dev/null
> +++ b/object-store.h
> @@ -0,0 +1,15 @@
> +#ifndef OBJECT_STORE_H
> +#define OBJECT_STORE_H
> +
> +struct raw_object_store {
> + /*
> +  * Path to the repository's object store.
> +  * Cannot be NULL after initialization.
> +  */
> + char *objectdir;
> +};
> +#define RAW_OBJECT_STORE_INIT { NULL }

Who owns and is responsible for freeing objectdir?

> +
> +void raw_object_store_clear(struct raw_object_store *o);

Oh, that answers that.

It could be worth a note in the doc comment anyway, but I don't mind
not having one.

[...]
> +
> +void raw_object_store_clear(struct raw_object_store *o)
> +{
> + free(o->objectdir);
> +}

Should this use FREE_AND_NULL?

[...]
> --- a/repository.c
> +++ b/repository.c
[...]
> @@ -42,8 +49,8 @@ static void repo_setup_env(struct repository *repo)
>   !repo->ignore_env);
>   free(repo->commondir);
>   repo->commondir = strbuf_detach(&sb, NULL);
> - free(repo->objectdir);
> - repo->objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir,
> + free(repo->objects.objectdir);

Should this call raw_object_store_clear instead of calling free
directly?

> + repo->objects.objectdir = git_path_from_env(DB_ENVIRONMENT, 
> repo->commondir,
>   "objects", !repo->ignore_env);

Long line.  One way to break it would be

repo->objects.objectdir =
git_path_from_env(DB_ENVIRONMENT, repo->commondir,
  "objects", !repo->ignore_env);

>   free(repo->graft_file);
>   repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir,
> @@ -209,12 +216,14 @@ void repo_clear(struct repository *repo)
>  {
>   FREE_AND_NULL(repo->gitdir);
>   FREE_AND_NULL(repo->commondir);
> - FREE_AND_NULL(repo->objectdir);
>   FREE_AND_NULL(repo->graft_file);
>   FREE_AND_NULL(repo->index_file);
>   FREE_AND_NULL(repo->worktree);
>   FREE_AND_NULL(repo->submodule_prefix);
>  
> + raw_object_store_clear(&repo->objects);
> + memset(&repo->objects, 0, sizeof(repo->objects));

If raw_object_store_clear uses FREE_AND_NULL, this memset won't be necessary.

[...]
> --- a/repository.h
> +++ b/repository.h
> @@ -1,6 +1,8 @@
>  #ifndef REPOSITORY_H
>  #define REPOSITORY_H
>  
> +#include "object-store.h"
> +
>  struct config_set;
>  struct index_state;
>  struct submodule_cache;
> @@ -21,10 +23,9 @@ struct repository {
>   char *commondir;
>  
>   /*
> -  * Path to the repository's object store.
> -  * Cannot be NULL after initialization.
> +  * Holds any information related to the object store.
>*/

This comment doesn't make it clear to me what the field represents.
Can it be replaced with some of the description from the commit
message?

> - char *objectdir;
> + struct raw_object_store objects;
>  

Thanks,
Jonathan


Re: [PATCH 05/27] object-store: move packed_git and packed_git_mru to object store

2018-02-21 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> In a process with multiple repositories open, packfile accessors
> should be associated to a single repository and not shared globally.
> Move packed_git and packed_git_mru into the_repository and adjust
> callers to reflect this.
>
> Patch generated by
>
>  1. Moving the struct packed_git declaration to object-store.h
> and packed_git, packed_git_mru globals to struct object_store.
>
>  2. Applying the semantic patch
> contrib/coccinelle/refactoring/packed_git.cocci to adjust callers.
> This semantic patch is placed in a sub directory of the coccinelle
> contrib dir, as this semantic patch is not expected to be of general
> usefulness; it is only useful during developing this series and
> merging it with other topics in flight. At a later date, just
> delete that semantic patch.

Can the semantic patch go in the commit message instead?  It is very
brief.

Actually, I don't see this semantic patch in the diffstat.  Is the
commit message stale?

>  3. Applying line wrapping fixes from "make style" to break the
> resulting long lines.
>
>  4. Adding missing #includes of repository.h and object-store.h
> where needed.

Is there a way to automate this step?  (I'm asking for my own
reference when writing future patches, not because of any concern
about the correctness of this one.)
>
>  5. As the packfiles are now owned by an objectstore/repository, which
> is ephemeral unlike globals, we introduce memory leaks. So address
> them in raw_object_store_clear().

The compound words are confusing me.  What is an
objectstore/repository?  Are these referring to particular identifiers
or something else?

Would some wording like the following work?

   5. Freeing packed_git and packed_git_mru in raw_object_store_clear
  to avoid a per-repository memory leak.  Previously they were
  global singletons, so code to free them did not exist.

[...]
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -12,6 +12,7 @@
>  #include "exec_cmd.h"
>  #include "streaming.h"
>  #include "thread-utils.h"
> +#include "object-store.h"
>  #include "packfile.h"
>  
>  static const char index_pack_usage[] =

Change from a different patch leaked into this one?

[...]
> +++ b/builtin/pack-objects.c
[...]
> @@ -1044,7 +1045,7 @@ static int want_object_in_pack(const struct object_id 
> *oid,
>   }
>   want = want_found_object(exclude, p);
>   if (!exclude && want > 0)
> - list_move(&p->mru, &packed_git_mru);
> + list_move(&p->mru, 
> &the_repository->objects.packed_git_mru);

Long line.  Can "make style" catch this?

[...]
> +++ b/builtin/receive-pack.c
> @@ -7,6 +7,7 @@
>  #include "sideband.h"
>  #include "run-command.h"
>  #include "exec_cmd.h"
> +#include "object-store.h"
>  #include "commit.h"
>  #include "object.h"
>  #include "remote.h"

Another change leaked in?

[...]
> --- a/cache.h
> +++ b/cache.h
> @@ -1585,35 +1585,6 @@ struct pack_window {
>   unsigned int inuse_cnt;
>  };
>  
> -extern struct packed_git {
[...]
> -} *packed_git;

Move detecting diff confirms that this wasn't modified.  Thanks for
creating it.

[...]
> +++ b/fast-import.c
[...]
> @@ -1110,7 +1112,7 @@ static int store_object(
>   if (e->idx.offset) {
>   duplicate_count_by_type[type]++;
>   return 1;
> - } else if (find_sha1_pack(oid.hash, packed_git)) {
> + } else if (find_sha1_pack(oid.hash, 
> the_repository->objects.packed_git)) {

Long line.  (I'll refrain from commenting about any further ones.)

[...]
> +++ b/http-push.c
> @@ -1,4 +1,5 @@
>  #include "cache.h"
> +#include "object-store.h"
>  #include "commit.h"
>  #include "tag.h"
>  #include "blob.h"

Stray change?

> diff --git a/http-walker.c b/http-walker.c
> index 07c2b1af82..8bb5d991bb 100644
> --- a/http-walker.c
> +++ b/http-walker.c
> @@ -4,6 +4,7 @@
>  #include "http.h"
>  #include "list.h"
>  #include "transport.h"
> +#include "object-store.h"
>  #include "packfile.h"
>  
>  struct alt_base {

Same question.

> diff --git a/http.c b/http.c
> index 31755023a4..a4a9e583c7 100644
> --- a/http.c
> +++ b/http.c
> @@ -1,6 +1,7 @@
>  #include "git-compat-util.h"
>  #include "http.h"
>  #include "config.h"
> +#include "object-store.h"
>  #include "pack.h"
>  #include "sideband.h"
>  #include "run-command.h"

Likewise.

> diff --git a/object-store.h b/object-store.h
> index e78eea1dde..1de9e07102 100644
> --- a/object-store.h
> +++ b/object-store.h
> @@ -52,6 +52,30 @@ void add_to_alternates_memory(const char *dir);
>   */
>  struct strbuf *alt_scratch_buf(struct alternate_object_database *alt);
>  
> +struct packed_git {
> + struct packed_git *next;
> + struct list_head mru;
> + struct pack_window *windows;
> + off_t pack_size;
> + const void *index_data;
> + size_t index_size;
> + uint32_t num_objects;
> + uint32_t num_bad_objects

Re: [PATCH 13/27] sha1_file: add repository argument to prepare_alt_odb

2018-02-21 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> See previous patch for explanation.
>
> While at it, move the declaration to object-store.h,
> where it should be easier to find.

Which declaration?  It looks like prepare_alt_odb is already in
object-store.h.

[...]
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -717,7 +717,7 @@ int cmd_fsck(int argc, const char **argv, const char 
> *prefix)
>   } else {
>   fsck_object_dir(get_object_directory());
>  
> - prepare_alt_odb();
> + prepare_alt_odb(the_repository);

Patch 2 added a #include of "repository.h".  Good.

(I checked because with the definition of prepare_alt_odb as a macro,
the function call would compile correctly even if the_repository
weren't in scope, but we want to include what we use as a matter of
style/maintainability.)

[...]
> --- a/packfile.c
> +++ b/packfile.c
> @@ -890,7 +890,7 @@ void prepare_packed_git(void)
>   if (the_repository->objects.packed_git_initialized)
>   return;
>   prepare_packed_git_one(get_object_directory(), 1);
> - prepare_alt_odb();
> + prepare_alt_odb(the_repository);

Also good, since patch 3 added a #include of "repository.h".

[...]
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -23,6 +23,7 @@
>  #include "sha1-lookup.h"
>  #include "bulk-checkin.h"
>  #include "repository.h"
> +#include "object-store.h"

Should this #include have been added in an earlier patch, since the
file both uses and defines prepare_alt_odb, which is declared there?

The rest looks good.

Thanks,
Jonathan


Re: [PATCH 17/27] sha1_file: add repository argument to stat_sha1_file

2018-02-21 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Add a repository argument to allow the stat_sha1_file caller to be
> more specific about which repository to act on. This is a small
> mechanical change; it doesn't change the implementation to handle
> repositories other than the_repository yet.
>
> As with the previous commits, use a macro to catch callers passing a
> repository other than the_repository at compile time.
>
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 
> ---
>  sha1_file.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)

Simple and obviously correct.
Reviewed-by: Jonathan Nieder 


Re: [PATCH 21/27] sha1_file: add repository argument to sha1_loose_object_info

2018-02-21 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Add a repository argument to allow the sha1_loose_object_info caller
> to be more specific about which repository to act on. This is a small
> mechanical change; it doesn't change the implementation to handle
> repositories other than the_repository yet.
>
> As with the previous commits, use a macro to catch callers passing a
> repository other than the_repository at compile time.
>
> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 
> ---
>  sha1_file.c | 9 +
>  1 file changed, 5 insertions(+), 4 deletions(-)

Simple and obviously correct.
Reviewed-by: Jonathan Nieder 


Re: [PATCH 26/27] sha1_file: allow map_sha1_file to handle arbitrary repositories

2018-02-21 Thread Jonathan Nieder
Stefan Beller wrote:

> Signed-off-by: Stefan Beller 
> Signed-off-by: Jonathan Nieder 
> ---
>  object-store.h | 3 +--
>  sha1_file.c| 5 +++--
>  2 files changed, 4 insertions(+), 4 deletions(-)

Thanks: this is short and simple, making my life as a reviewer very
easy.

Reviewed-by: Jonathan Nieder 


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-02-22 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Thu, Feb 22, 2018 at 10:07:15AM -0800, Brandon Williams wrote:
>> On 02/22, Jeff King wrote:
>>> On Wed, Feb 21, 2018 at 01:44:22PM -0800, Jonathan Tan wrote:
 On Tue,  6 Feb 2018 17:12:41 -0800
 Brandon Williams  wrote:

> In order to allow for code sharing with the server-side of fetch in
> protocol-v2 convert upload-pack to be a builtin.
[...]
 As Stefan mentioned in [1], also mention in the commit message that this
 means that the "git-upload-pack" invocation gains additional
 capabilities (for example, invoking a pager for --help).
>>>
>>> And possibly respecting pager.upload-pack, which would violate our rule
>>> that it is safe to run upload-pack in untrusted repositories.
>>
>> And this isn't an issue with receive-pack because this same guarantee
>> doesn't exist?
>
> Yes, exactly (which is confusing and weird, yes, but that's how it is).

To be clear, which of the following are you (most) worried about?

 1. being invoked with --help and spawning a pager
 2. receiving and acting on options between 'git' and 'upload-pack'
 3. repository discovery
 4. pager config
 5. alias discovery
 6. increased code surface / unknown threats

For (1), "--help" has to be the first argument.  "git daemon" passes
--strict so it doesn't happen there.  "git http-backend" passes
--stateless-rpc so it doesn't happen there.  "git shell" sanitizes
input to avoid it happening there.

A custom setup could provide their own entry point that doesn't do
such sanitization.  I think that in some sense it's out of our hands,
but it would be nice to harden as described upthread.

For (2), I am having trouble imagining a setup where it would happen.

upload-pack doesn't have the RUN_SETUP or RUN_SETUP_GENTLY flag set,
so (3) doesn't apply.

Although in most setups the user does not control the config files on
a server, item (4) looks like a real issue worth solving.  I think we
should introduce a flag to skip looking for pager config.  We could
use it for receive-pack, too.

Builtins are handled before aliases, so (5) doesn't apply.

(6) is a real issue: it is why "git shell" is not a builtin, for
example.  But I would rather that we use appropriate sanitization
before upload-pack is invoked than live in fear.  git upload-pack is
sufficiently complicated that I don't think the git.c wrapper
increases the complexity by a significant amount.

That leaves me with a personal answer of only being worried about (4)
and not the rest.  What do you think?  Is one of the other items I
listed above worrisome, or is there another item I missed?

Thanks,
Jonathan


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-02-22 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Thu, Feb 22, 2018 at 11:38:14AM -0800, Jonathan Nieder wrote:

>> To be clear, which of the following are you (most) worried about?
>>
>>  1. being invoked with --help and spawning a pager
>>  2. receiving and acting on options between 'git' and 'upload-pack'
>>  3. repository discovery
>>  4. pager config
>>  5. alias discovery
>>  6. increased code surface / unknown threats
>
> My immediate concern is (4).

Thanks for clarifying.

>  But my greater concern is that people who
> work on git.c should not have to worry about accidentally violating this
> principle when they add a new feature or config option.

That sounds like a combination of (6) and insufficient documentation
or tests.  Ideas for how we can help prevent such accidents?

> In other words, it seems like an accident waiting to happen. I'd be more
> amenable to it if there was some compelling reason for it to be a
> builtin, but I don't see one listed in the commit message. I see only
> "let's make it easier to share the code", which AFAICT is equally served
> by just lib-ifying the code and calling it from the standalone
> upload-pack.c.

If we have so little control of the common code used by git commands
that could be invoked by a remote user, I think we're in trouble
already.  I don't think being a builtin vs not makes that
significantly different, since there are plenty of builtins that can
be triggered by remote users.  Further, if we have so little control
over the security properties of git.c, what hope do we have of making
the rest of libgit.a usable in secure code?

In other words, having to pay more attention to the git wrapper from a
security pov actually feels to me like a *good* thing.  The git
wrapper is the entry point to almost all git commands.  If it is an
accident waiting to happen, then anything that calls git commands is
already an accident waiting to happen.  So how can we make it not an
accident waiting to happen? :)

>> Although in most setups the user does not control the config files on
>> a server, item (4) looks like a real issue worth solving.  I think we
>> should introduce a flag to skip looking for pager config.  We could
>> use it for receive-pack, too.
>
> There's not much point for receive-pack. It respects hooks, so any
> security ship has already sailed there.

Yet there are plenty of cases where people who can push are not
supposed to have root privilege.  I am not worried about hooks
specifically (although the changes described at [1] might help and I
still plan to work on those) but I am worried about e.g. commandline
injection issues.  I don't think we can treat receive-pack as out of
scope.

And to be clear, I don't think you were saying receive-pack *is* out
of scope.  But you seem to be trying to draw some boundary, where I
only see something fuzzier (e.g. if a bug only applies to
receive-pack, then that certainly decreases its impact, but it doesn't
make the impact go away).

Thanks,
Jonathan

[1] 
https://public-inbox.org/git/20171002234517.gv19...@aiede.mtv.corp.google.com/


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-02-22 Thread Jonathan Nieder
Jeff King wrote:

> The current property is that it's safe to fetch from an
> untrusted repository, even over ssh. If we're keeping that for protocol
> v1, we'd want it to apply to protocol v2, as well.

Ah, this is what I had been missing (the non-ssh case).

I see your point.  I think we need to fix the pager config issue and add
some clarifying documentation to git.c so that people know what to look
out for.

Keep in mind that git upload-archive (a read-only command, just like
git upload-pack) also already has the same issues.

Thanks,
Jonathan


Re: [PATCH v3 04/35] upload-pack: convert to a builtin

2018-02-22 Thread Jonathan Nieder
Jeff King wrote:

> All of that said, I think the current code is quite dangerous already,
> and maybe even broken.  upload-pack may run sub-commands like rev-list
> or pack-objects, which are themselves builtins.

Sounds like more commands to set the IGNORE_PAGER_CONFIG flag for in
git.c.

Thanks for looking this over thoughtfully.

[...]
> I couldn't quite get it to work, but I think it's because I'm doing
> something wrong with the submodules. But I also think this attack would
> _have_ to be done over ssh, because on a local system the submodule
> clone would a hard-link rather than a real fetch.

What happens if the submodule URL starts with file://?

Thanks,
Jonathan


Re: [BUG] [git 2.16.1] yeeek ... my files are gone .. by git pull

2018-02-22 Thread Jonathan Nieder
Hi Marcel,

Marcel 'childNo͡.de' Trautwein" wrote:

> I think we have a problem … or at least I had
> and I’m not quite sure if this is „working as designed“
> but I’m sure it „should not work as it did“.
[...]
> I wanted to clone another repository … but yeah … it’s late for me today and 
> I put
> in s.th. `git pull 
> g...@private.gitlab.instance.example.com:aGroup/repository.git`
>
> next … all committed files are zapped and the repository given has
> been checked out in my home directory 🤯👻
>
> what? Shouldn’t this just fail? Why can I pass another remote to pull?

Sorry, this is not the most helpful reply but:

Can you describe a reproduction recipe so that I can experience the
same thing?

That is:

 1. steps to reproduce
 2. expected result
 3. actual result
 4. the difference and why it was unexpected

I suspect that this information is in your message, somewhere, but it
is (understandably) unfocussed and I am having trouble pulling it out.

[...]
> trying to fix this up by doing another pull failed:
> ```
> -bash:$ git remote -v
> origing...@bitbucket.org:childnode/marcel.git (fetch)
> origing...@bitbucket.org:childnode/marcel.git (push)
>
> -bash:$ git pull
> fatal: refusing to merge unrelated histories

Ok, this part is something I might be able to help shed some light on.

Searching for 'unrelated' in "git help pull" finds:

   --allow-unrelated-histories
   By default, git merge command refuses to merge histories that do not
   share a common ancestor. This option can be used to override this
   safety when merging histories of two projects that started their
   lives independently. As that is a very rare occasion, no
   configuration variable to enable this by default exists and will not
   be added.

So that explains the "what" of that error message.

The "why" is a separate question.  Could you share output from

  git log --all --graph --decorate --oneline --simplify-by-decoration

and

  git status

to help us understand your current state?

Also, suggestions for improvements to the 'refusing to merge' message
would be very welcome.

Thanks and hope that helps,
Jonathan


Re: [PATCH v3 13/35] ls-refs: introduce ls-refs server command

2018-02-26 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:
> On Sat, Feb 24 2018, Jeff King jotted:

>> I actually wonder if we should just specify that the patterns must
>> _always_ be fully-qualified, but may end with a single "/*" to iterate
>> over wildcards. Or even simpler, that "refs/heads/foo" would find that
>> ref itself, and anything under it.
>
> I agree that this is a very good trade-off for now, but I think having
> an escape hatch makes sense. It looks like the protocol is implicitly
> extendible since another parameter could be added, but maybe having such
> a parameter from the get-go would make sense:

I prefer to rely on the implicit extensibility (following the general
YAGNI principle).

In other words, we can introduce a pattern-type later and make the
current pattern-type the default.

Thanks for looking to the future.

[...]
> E.g. if the refs were stored indexed using the method described at
> https://swtch.com/~rsc/regexp/regexp4.html tail matching becomes no less
> efficient than prefix matching, but a function of how many trigrams in
> your index match the pattern given.

I think the nearest planned change to ref storage is [1], which is
still optimized for prefix matching.  Longer term, maybe some day
we'll want a secondary index that supports infix matching, or maybe
we'll never need it. :)

Sincerely,
Jonathan

[1] 
https://public-inbox.org/git/CAJo=hjszcam9sipdvr7tmd-fd2v2w6_pvmq791egcdsdkq0...@mail.gmail.com/#t


Re: [PATCH v2 12/27] serve: introduce git-serve

2018-02-26 Thread Jonathan Nieder
Hi Duy,

Duy Nguyen wrote:
> On Fri, Jan 26, 2018 at 6:58 AM, Brandon Williams  wrote:

>> + stateless-rpc
>> +---
>> +
>> +If advertised, the `stateless-rpc` capability indicates that the server
>> +supports running commands in a stateless-rpc mode, which means that a
>> +command lasts for only a single request-response round.
>> +
>> +Normally a command can last for as many rounds as are required to
>> +complete it (multiple for negotiation during fetch or no additional
>> +trips in the case of ls-refs).  If the client sends the `stateless-rpc`
>> +capability with a value of `true` (in the form `stateless-rpc=true`)
>> +then the invoked command must only last a single round.
>
> Speaking of stateless-rpc, I remember last time this topic was brought
> up, there was some discussion to kind of optimize it for http as well,
> to fit the "client sends request, server responds data" model and
> avoid too many round trips (ideally everything happens in one round
> trip). Does it evolve to anything real? All the cool stuff happened
> while I was away, sorry if this was discussed and settled.

We have a few different ideas for improving negotiation.  They were
speculative enough that we didn't want to make them part of the
baseline protocol v2.  Feel free to poke me in a new thread. :)

Some teasers:

- allow both client and server to suggest commits in negotiation,
  instead of just the client?

- send a bloom filter for the peer to filter their suggestions
  against?

- send other basic information like maximum generation number or
  maximum commit date?

- exponential backoff in negotiation instead of linear walking?
  prioritizing ref tips?  Imitating the bitmap selection algorithm?

- at the "end" of negotiation, sending a graph data structure instead
  of a pack, to allow an extra round trip to produce a truly minimal
  pack?

Those are some initial ideas, but it's also likely that someone can
come up with some other experiments to try, too.  (E.g. we've looked
at various papers on set reconciliation, but they don't make enough
use of the graph structure to help much.)

Thanks,
Jonathan


Re: [PATCH v3 02/35] pkt-line: introduce struct packet_reader

2018-02-26 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> Sometimes it is advantageous to be able to peek the next packet line
> without consuming it (e.g. to be able to determine the protocol version
> a server is speaking).  In order to do that introduce 'struct
> packet_reader' which is an abstraction around the normal packet reading
> logic.  This enables a caller to be able to peek a single line at a time
> using 'packet_reader_peek()' and having a caller consume a line by
> calling 'packet_reader_read()'.
>
> Signed-off-by: Brandon Williams 
> ---
>  pkt-line.c | 59 +++
>  pkt-line.h | 58 ++
>  2 files changed, 117 insertions(+)

I like it!

The questions and nits from
https://public-inbox.org/git/20180213004937.gb42...@aiede.svl.corp.google.com/
still apply.  In particular, the ownership of the buffers inside the
'struct packet_reader' is still unclear; could the packet_reader create
its own (strbuf) buffers so that the contract around them (who is allowed
to write to them; who is responsible for freeing them) is more obvious?

Thanks,
Jonathan


Re: [PATCH v3 06/35] transport: use get_refs_via_connect to get refs

2018-02-26 Thread Jonathan Nieder
Brandon Williams wrote:

> Remove code duplication and use the existing 'get_refs_via_connect()'
> function to retrieve a remote's heads in 'fetch_refs_via_pack()' and
> 'git_transport_push()'.
> 
> Signed-off-by: Brandon Williams 
> ---
>  transport.c | 18 --
>  1 file changed, 4 insertions(+), 14 deletions(-)

I like the diffstat.

[...]
> +++ b/transport.c
> @@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport 
> *transport,
>   args.cloning = transport->cloning;
>   args.update_shallow = data->options.update_shallow;
>  
> - if (!data->got_remote_heads) {
> - connect_setup(transport, 0);
> - get_remote_heads(data->fd[0], NULL, 0, &refs_tmp, 0,
> -  NULL, &data->shallow);
> - data->got_remote_heads = 1;
> - }
> + if (!data->got_remote_heads)
> + refs_tmp = get_refs_via_connect(transport, 0);

The only difference between the old and new code is that the old code
passes NULL as 'extra_have' and the new code passes &data->extra_have.

That means this populates the data->extra_have oid_array.  Does it
matter?

> @@ -541,14 +537,8 @@ static int git_transport_push(struct transport 
> *transport, struct ref *remote_re
>   struct send_pack_args args;
>   int ret;
>  
> - if (!data->got_remote_heads) {
> - struct ref *tmp_refs;
> - connect_setup(transport, 1);
> -
> - get_remote_heads(data->fd[0], NULL, 0, &tmp_refs, REF_NORMAL,
> -  NULL, &data->shallow);
> - data->got_remote_heads = 1;
> - }
> + if (!data->got_remote_heads)
> + get_refs_via_connect(transport, 1);

not a new problem, just curious: Does this leak tmp_refs?

Same question as the other caller about whether we mind getting
extra_have populated.

Thanks,
Jonathan


Re: [PATCH v3 02/35] pkt-line: introduce struct packet_reader

2018-02-26 Thread Jonathan Nieder
Jonathan Nieder wrote:
> Brandon Williams wrote:

>> Sometimes it is advantageous to be able to peek the next packet line
>> without consuming it (e.g. to be able to determine the protocol version
>> a server is speaking).  In order to do that introduce 'struct
>> packet_reader' which is an abstraction around the normal packet reading
>> logic.  This enables a caller to be able to peek a single line at a time
>> using 'packet_reader_peek()' and having a caller consume a line by
>> calling 'packet_reader_read()'.
>>
>> Signed-off-by: Brandon Williams 
>> ---
>>  pkt-line.c | 59 +++
>>  pkt-line.h | 58 ++
>>  2 files changed, 117 insertions(+)
>
> I like it!
>
> The questions and nits from
> https://public-inbox.org/git/20180213004937.gb42...@aiede.svl.corp.google.com/
> still apply.  In particular, the ownership of the buffers inside the
> 'struct packet_reader' is still unclear; could the packet_reader create
> its own (strbuf) buffers so that the contract around them (who is allowed
> to write to them; who is responsible for freeing them) is more obvious?

Just to be clear: I sent that review after you sent this patch, so
there should not have been any reason for me to expect the q's and
nits to magically not apply. ;-)


Re: [PATCH v3 10/35] protocol: introduce enum protocol_version value protocol_v2

2018-02-26 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> Introduce protocol_v2, a new value for 'enum protocol_version'.
> Subsequent patches will fill in the implementation of protocol_v2.
>
> Signed-off-by: Brandon Williams 
> ---

Yay!

[...]
> +++ b/builtin/fetch-pack.c
> @@ -201,6 +201,9 @@ int cmd_fetch_pack(int argc, const char **argv, const 
> char *prefix)
>  PACKET_READ_GENTLE_ON_EOF);
>  
>   switch (discover_version(&reader)) {
> + case protocol_v2:
> + die("support for protocol v2 not implemented yet");
> + break;

This code goes away in a later patch, so no need to do anything about
this, but the 'break' is redundant after the 'die'.

[...]
> --- a/builtin/receive-pack.c
> +++ b/builtin/receive-pack.c
> @@ -1963,6 +1963,12 @@ int cmd_receive_pack(int argc, const char **argv, 
> const char *prefix)
>   unpack_limit = receive_unpack_limit;
>  
>   switch (determine_protocol_version_server()) {
> + case protocol_v2:
> + /*
> +  * push support for protocol v2 has not been implemented yet,
> +  * so ignore the request to use v2 and fallback to using v0.
> +  */
> + break;

As you mentioned in the cover letter, it's probably worth doing the
same fallback on the client side (send-pack), too.

Otherwise when this client talks to a new-enough server, it would
request protocol v2 and then get confused when the server responds
with the protocol v2 it requested.

Thanks,
Jonathan


Re: [PATCH v3 14/35] connect: request remote refs using v2

2018-02-26 Thread Jonathan Nieder
Jonathan Tan wrote:
> On Thu, 22 Feb 2018 13:26:58 -0500
> Jeff King  wrote:

>> I agree that it shouldn't matter much here. But if the name argv_array
>> is standing in the way of using it, I think we should consider giving it
>> a more general name. I picked that not to evoke "this must be arguments"
>> but "this is terminated by a single NULL".
[...]
> This sounds reasonable - I withdraw my comment about using struct
> string_list.

Marking with #leftoverbits as a reminder to think about what such a
more general name would be (or what kind of docs to put in
argv-array.h) and make it so the next time I do a search for that
keyword.

Thanks,
Jonathan


Re: [PATCH v3 14/35] connect: request remote refs using v2

2018-02-26 Thread Jonathan Nieder
Brandon Williams wrote:

> Teach the client to be able to request a remote's refs using protocol
> v2.  This is done by having a client issue a 'ls-refs' request to a v2
> server.

Yay, ls-remote support!

[...]
> --- a/builtin/upload-pack.c
> +++ b/builtin/upload-pack.c
> @@ -5,6 +5,7 @@
>  #include "parse-options.h"
>  #include "protocol.h"
>  #include "upload-pack.h"
> +#include "serve.h"

nit, no change needed in this patch: What is a good logical order for
the #includes here?  Bonus points if there's a tool to make it happen
automatically.

Asking since adding #includes like this at the end tends to result in
a harder-to-read list of #includes, sometimes with duplicates, and
often producing conflicts when multiple patches in flight add a
#include to the same file.

[...]
> @@ -48,11 +50,9 @@ int cmd_upload_pack(int argc, const char **argv, const 
> char *prefix)
>  
>   switch (determine_protocol_version_server()) {
>   case protocol_v2:
> - /*
> -  * fetch support for protocol v2 has not been implemented yet,
> -  * so ignore the request to use v2 and fallback to using v0.
> -  */
> - upload_pack(&opts);
> + serve_opts.advertise_capabilities = opts.advertise_refs;
> + serve_opts.stateless_rpc = opts.stateless_rpc;
> + serve(&serve_opts);

Interesting!  daemon.c has its own (file-local) serve() function;
can one of the two be renamed?

I actually like both names: serve() is a traditional name for the
function a server calls when it's done setting up and is ready to
serve.  But the clash is confusing.

[...]
> +++ b/connect.c
> @@ -12,9 +12,11 @@
>  #include "sha1-array.h"
>  #include "transport.h"
>  #include "strbuf.h"
> +#include "version.h"
>  #include "protocol.h"
>  
>  static char *server_capabilities;
> +static struct argv_array server_capabilities_v2 = ARGV_ARRAY_INIT;

Can a quick doc comment describe these and how they relate?

Is only one of them set, based on which protocol version is in use?
Should server_capabilities be renamed to server_capabilities_v1?

>  static const char *parse_feature_value(const char *, const char *, int *);
>  
>  static int check_ref(const char *name, unsigned int flags)
> @@ -62,6 +64,33 @@ static void die_initial_contact(int unexpected)
> "and the repository exists."));
>  }
>  
> +/* Checks if the server supports the capability 'c' */
> +int server_supports_v2(const char *c, int die_on_error)
> +{
> + int i;
> +
> + for (i = 0; i < server_capabilities_v2.argc; i++) {
> + const char *out;
> + if (skip_prefix(server_capabilities_v2.argv[i], c, &out) &&
> + (!*out || *out == '='))
> + return 1;
> + }
> +
> + if (die_on_error)
> + die("server doesn't support '%s'", c);
> +
> + return 0;
> +}

Nice.
> +
> +static void process_capabilities_v2(struct packet_reader *reader)
> +{
> + while (packet_reader_read(reader) == PACKET_READ_NORMAL)
> + argv_array_push(&server_capabilities_v2, reader->line);
> +
> + if (reader->status != PACKET_READ_FLUSH)
> + die("protocol error");

Can this say more?  E.g. "expected flush after capabilities, got "?

[...]
> @@ -85,7 +114,7 @@ enum protocol_version discover_version(struct 
> packet_reader *reader)
>   /* Maybe process capabilities here, at least for v2 */

Is this comment out of date now?

>   switch (version) {
>   case protocol_v2:
> - die("support for protocol v2 not implemented yet");
> + process_capabilities_v2(reader);
>   break;
>   case protocol_v1:
>   /* Read the peeked version line */
> @@ -293,6 +322,98 @@ struct ref **get_remote_heads(struct packet_reader 
> *reader,
>   return list;
>  }
>  
> +static int process_ref_v2(const char *line, struct ref ***list)

What does the return value represent?

Could it return the more typical 0 on success, -1 on error?

> +{
> + int ret = 1;
> + int i = 0;
> + struct object_id old_oid;
> + struct ref *ref;
> + struct string_list line_sections = STRING_LIST_INIT_DUP;
> +
> + if (string_list_split(&line_sections, line, ' ', -1) < 2) {

Can there be a comment describing the expected format?

> + ret = 0;
> + goto out;
> + }
> +
> + if (get_oid_hex(line_sections.items[i++].string, &old_oid)) {
> + ret = 0;
> + goto out;
> + }
> +
> + ref = alloc_ref(line_sections.items[i++].string);

Ref names cannot contains a space, so this is safe.  Good.

> +
> + oidcpy(&ref->old_oid, &old_oid);
> + **list = ref;
> + *list = &ref->next;
> +
> + for (; i < line_sections.nr; i++) {
> + const char *arg = line_sections.items[i].string;
> + if (skip_prefix(arg, "symref-target:", &arg))
> + ref->symref = xstrdup(arg);

Using space-delimited fie

Re: [PATCH v3 18/35] fetch: pass ref patterns when fetching

2018-02-26 Thread Jonathan Nieder
Brandon Williams wrote:

> Construct a list of ref patterns to be passed to
> 'transport_get_remote_refs()' from the refspec to be used during the
> fetch.  This list of ref patterns will be used to allow the server to
> filter the ref advertisement when communicating using protocol v2.
>
> Signed-off-by: Brandon Williams 
> ---
>  builtin/fetch.c | 12 +++-
>  1 file changed, 11 insertions(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder 
Nice.

I take it that tests covering this come later in the series?


Re: [PATCH v3 22/35] upload-pack: support shallow requests

2018-02-27 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> Add the 'shallow' feature to the protocol version 2 command 'fetch'
> which indicates that the server supports shallow clients and deepen
> requets.
>
> Signed-off-by: Brandon Williams 
> ---
>  Documentation/technical/protocol-v2.txt |  67 +++-
>  serve.c |   2 +-
>  t/t5701-git-serve.sh|   2 +-
>  upload-pack.c   | 138 
> +++-
>  upload-pack.h   |   3 +
>  5 files changed, 173 insertions(+), 39 deletions(-)

Yay!  We've been running with this for a while at Google (for file://
fetches, at least) and it's been working well.

[...]
> --- a/Documentation/technical/protocol-v2.txt
> +++ b/Documentation/technical/protocol-v2.txt
> @@ -201,12 +201,42 @@ packet-lines:
>   to its base by position in pack rather than by an oid.  That is,
>   they can read OBJ_OFS_DELTA (ake type 6) in a packfile.
>  
> +shallow 
> + A client must notify the server of all objects for which it only
> + has shallow copies of (meaning that it doesn't have the parents

Grammar nit: "for which it only has shallow copies of" should be e.g.
"for which it only has shallow copies" or "that it only has shallow
copies of" or "that it only has shallow copies for".

I think s/objects/commits/ would also help clarify.

> + of a commit) by supplying a 'shallow ' line for each such
> + object so that the serve is aware of the limitations of the

s/serve/server/

> + client's history.

Is it worth mentioning that this is about negotiation?  E.g. "so that
the server is aware that the client may not have all objects reachable
from such commits".

> +
> +deepen 
> + Request that the fetch/clone should be shallow having a commit depth of

nit: s/Request/Requests/, for consistency with the others?

> +  relative to the remote side.

What does the value of  mean? E.g. does a depth of 1 mean to
fetch only the commits named in "have", 2 to fetch those commits plus
their parents, etc, or am I off by one?

Is  always a positive number?

What happens if  starts with a 0?  Is that a client error?

> +
> +deepen-relative
> + Requests that the semantics of the "deepen" command be changed
> + to indicate that the depth requested is relative to the clients
> + current shallow boundary, instead of relative to the remote
> + refs.

s/clients/client's/

s/remote refs/requested commits/ or "wants" or something.

> +
> +deepen-since 
> + Requests that the shallow clone/fetch should be cut at a
> + specific time, instead of depth.  Internally it's equivalent of
> + doing "rev-list --max-age=". Cannot be used with
> + "deepen".

Nits:
  s/rev-list/git rev-list/
  s/equivalent of/equivalent to/ or 'the equivalent of'.

Since the git-rev-list(1) manpage doesn't tell me: what is the format
of ?  And is the requested time interval inclusive of
exclusive?

> +
> +deepen-not 
> + Requests that the shallow clone/fetch should be cut at a
> + specific revision specified by '', instead of a depth.
> + Internally it's equivalent of doing "rev-list --not ".
> + Cannot be used with "deepen", but can be used with
> + "deepen-since".

Interesting.

nit: s/rev-list/git rev-list/

What is the format of ?  E.g. can it be an arbitrary revision
specifier or is it an oid?

[...]
>  output = *section
> -section = (acknowledgments | packfile)
> +section = (acknowledgments | shallow-info | packfile)
> (flush-pkt | delim-pkt)

It looks like sections can go in an arbitrary order.  Are there
tests to make sure the server can cope with reordering?  (I ask
not because I mistrust the server but because I have some vague
hope that other server implementations might be inspired by our
tests.)

[...]
> @@ -215,6 +245,11 @@ header.
>  nak = PKT-LINE("NAK" LF)
>  ack = PKT-LINE("ACK" SP obj-id LF)
>  
> +shallow-info = PKT-LINE("shallow-info" LF)
> +*PKT-LINE((shallow | unshallow) LF)
> +shallow = "shallow" SP obj-id
> +unshallow = "unshallow" SP obj-id

Likewise: it looks like shallows and unshallows can be intermixed; can
this be either (a) tightened or (b) covered by tests to make sure a
later refactoring doesn't accidentally tighten it?

[...]
> @@ -247,6 +282,36 @@ header.
> determined the objects it plans to send to the client and no
> further negotiation is needed.
>  
> +
> +shallow-info section
> + If the client has requested a shallow fetch/clone, a shallow
> + client requests a fetch or the server is shallow then the
> + server's response may include a shallow-info section.

I'm having trouble parsing this sentence.

>  The
> + shallow-info section will be include if (due to one of the above

nit: s/include/included/

> + conditions) the server needs to info

Re: [PATCH v3 02/35] pkt-line: introduce struct packet_reader

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:
> On 02/12, Jonathan Nieder wrote:

>>> --- a/pkt-line.h
>>> +++ b/pkt-line.h
>>> @@ -111,6 +111,64 @@ char *packet_read_line_buf(char **src_buf, size_t 
>>> *src_len, int *size);
>>>   */
>>>  ssize_t read_packetized_to_strbuf(int fd_in, struct strbuf *sb_out);
>>>  
>>> +struct packet_reader {
>>> +   /* source file descriptor */
>>> +   int fd;
>>> +
>>> +   /* source buffer and its size */
>>> +   char *src_buffer;
>>> +   size_t src_len;
>>
>> Can or should this be a strbuf?
>>
>>> +
>>> +   /* buffer that pkt-lines are read into and its size */
>>> +   char *buffer;
>>> +   unsigned buffer_size;
>>
>> Likewise.
>
> This struct is setup to be a drop in replacement for the existing
> read_packet() family of functions.  Because of this I tried to make the
> interface as similar as possible to make it easy to convert to using it
> as well as having no need to clean anything up (because the struct is
> really just a wrapper and doesn't own anything).

Sorry, I don't completely follow.  Are you saying some callers play
with the buffer, or are you saying you haven't checked?  (If the
latter, that's perfectly fine; I'm just trying to understand the API.)

Either way, can you add some comments about ownership / who is allowed
to write to it / etc to make it easier to clean up later?

Thanks,
Jonathan


Re: [PATCH v3 06/35] transport: use get_refs_via_connect to get refs

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:
> On 02/26, Jonathan Nieder wrote:
>> Brandon Williams wrote:

>>> +++ b/transport.c
>>> @@ -230,12 +230,8 @@ static int fetch_refs_via_pack(struct transport 
>>> *transport,
>>> args.cloning = transport->cloning;
>>> args.update_shallow = data->options.update_shallow;
>>>  
>>> -   if (!data->got_remote_heads) {
>>> -   connect_setup(transport, 0);
>>> -   get_remote_heads(data->fd[0], NULL, 0, &refs_tmp, 0,
>>> -NULL, &data->shallow);
>>> -   data->got_remote_heads = 1;
>>> -   }
>>> +   if (!data->got_remote_heads)
>>> +   refs_tmp = get_refs_via_connect(transport, 0);
>>
>> The only difference between the old and new code is that the old code
>> passes NULL as 'extra_have' and the new code passes &data->extra_have.
>>
>> That means this populates the data->extra_have oid_array.  Does it
>> matter?
[...]
> I don't think its a problem to have extra_have populated, least I
> haven't seen anything to lead me to believe it would be a problem.

Assuming it gets properly freed later, the only effect I can imagine
is some increased memory usage.

I'm inclined to agree with you that the simplicity is worth it.  It
seems worth mentioning in the commit message, though.

[...]
>>> @@ -541,14 +537,8 @@ static int git_transport_push(struct transport 
>>> *transport, struct ref *remote_re
>>> struct send_pack_args args;
>>> int ret;
>>>  
>>> -   if (!data->got_remote_heads) {
>>> -   struct ref *tmp_refs;
>>> -   connect_setup(transport, 1);
>>> -
>>> -   get_remote_heads(data->fd[0], NULL, 0, &tmp_refs, REF_NORMAL,
>>> -NULL, &data->shallow);
>>> -   data->got_remote_heads = 1;
>>> -   }
>>> +   if (!data->got_remote_heads)
>>> +   get_refs_via_connect(transport, 1);
>>
>> not a new problem, just curious: Does this leak tmp_refs?
>
> Maybe, though its removed by this patch.

Sorry for the lack of clarity.  If it was leaked before, then it is
still leaked now, via the discarded return value from
get_refs_via_connect.

Any idea how we can track that down?  E.g. are there ways to tell leak
checkers "just tell me about this particular allocation"?

Thanks,
Jonathan


Re: [PATCH] docs/pretty-formats: fix typo '% <()' -> '%<|()'

2018-02-27 Thread Jonathan Nieder
Mårten Kongstad wrote:

> Remove erroneous space between % and < in '% <()'.
>
> Signed-off-by: Mårten Kongstad 
> ---
>  Documentation/pretty-formats.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Looks correct to me.  The space was introduced in v1.8.3-rc0~22^2
(pretty: support %>> that steals trailing spaces, 2013-04-19) and
appears to be a plain typo.  Thanks for fixing it.

Reviewed-by: Jonathan Nieder 

Thanks.

> --- a/Documentation/pretty-formats.txt
> +++ b/Documentation/pretty-formats.txt
> @@ -202,7 +202,7 @@ endif::git-rev-list[]
>  - '%>>()', '%>>|()': similar to '%>()', '%>|()'
>respectively, except that if the next placeholder takes more spaces
>than given and there are spaces on its left, use those spaces
> -- '%><()', '%><|()': similar to '% <()', '%<|()'
> +- '%><()', '%><|()': similar to '%<()', '%<|()'
>respectively, but padding both sides (i.e. the text is centered)
>  - %(trailers[:options]): display the trailers of the body as interpreted
>by linkgit:git-interpret-trailers[1]. The `trailers` string may be


Re: [PATCH v3 26/35] transport-helper: remove name parameter

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:

> Commit 266f1fdfa (transport-helper: be quiet on read errors from
> helpers, 2013-06-21) removed a call to 'die()' which printed the name of
> the remote helper passed in to the 'recvline_fh()' function using the
> 'name' parameter.  Once the call to 'die()' was removed the parameter
> was no longer necessary but wasn't removed.  Clean up 'recvline_fh()'
> parameter list by removing the 'name' parameter.
>
> Signed-off-by: Brandon Williams 
> ---
>  transport-helper.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)

Nice.

Reviewed-by: Jonathan Nieder 


Re: [PATCH v3 29/35] pkt-line: add packet_buf_write_len function

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:

> Add the 'packet_buf_write_len()' function which allows for writing an
> arbitrary length buffer into a 'struct strbuf' and formatting it in
> packet-line format.

Makes sense.

[...]
> --- a/pkt-line.h
> +++ b/pkt-line.h
> @@ -26,6 +26,7 @@ void packet_buf_flush(struct strbuf *buf);
>  void packet_buf_delim(struct strbuf *buf);
>  void packet_write(int fd_out, const char *buf, size_t size);
>  void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
> __attribute__((format (printf, 2, 3)));
> +void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len);

I wonder if we should rename packet_buf_write to something like
packet_buf_writef.  Right now there's a kind of confusing collection of
functions without much symmetry.

Alternatively, the _buf_ ones could become strbuf_* functions:

strbuf_add_packet(&buf, data, len);
strbuf_addf_packet(&buf, fmt, ...);

That would make it clearer that these append to buf.

I'm just thinking out loud.  For this series, the API you have here
looks fine, even if it is a bit inconsistent.  (In other words, even
if you agree with me, this would probably be best addressed as a patch
on top.)

[...]
> --- a/pkt-line.c
> +++ b/pkt-line.c
> @@ -215,6 +215,22 @@ void packet_buf_write(struct strbuf *buf, const char 
> *fmt, ...)
>   va_end(args);
>  }
>  
> +void packet_buf_write_len(struct strbuf *buf, const char *data, size_t len)
> +{
> + size_t orig_len, n;
> +
> + orig_len = buf->len;
> + strbuf_addstr(buf, "");
> + strbuf_add(buf, data, len);
> + n = buf->len - orig_len;
> +
> + if (n > LARGE_PACKET_MAX)
> + die("protocol error: impossibly long line");

Could the error message describe the long line (e.g.

...impossibly long line %.*s...", 256, data);

)?

> +
> + set_packet_header(&buf->buf[orig_len], n);
> + packet_trace(buf->buf + orig_len + 4, n - 4, 1);

Could do, more simply:

packet_trace(data, len, 1);

Thanks,
Jonathan


Re: [PATCH v3 31/35] remote-curl: store the protocol version the server responded with

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:

> Store the protocol version the server responded with when performing
> discovery.  This will be used in a future patch to either change the
> 'Git-Protocol' header sent in subsequent requests or to determine if a
> client needs to fallback to using a different protocol version.

nit: s/fallback/fall back/ (fallback is the noun/adjective, fall back
the verb)

> Signed-off-by: Brandon Williams 

With or without that tweak,
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH v3 28/35] transport-helper: introduce stateless-connect

2018-02-27 Thread Jonathan Nieder
Brandon Williams wrote:

> Introduce the transport-helper capability 'stateless-connect'.  This
> capability indicates that the transport-helper can be requested to run
> the 'stateless-connect' command which should attempt to make a
> stateless connection with a remote end.  Once established, the
> connection can be used by the git client to communicate with
> the remote end natively in a stateless-rpc manner as supported by
> protocol v2.  This means that the client must send everything the server
> needs in a single request as the client must not assume any
> state-storing on the part of the server or transport.
>
> If a stateless connection cannot be established then the remote-helper
> will respond in the same manner as the 'connect' command indicating that
> the client should fallback to using the dumb remote-helper commands.
>
> Signed-off-by: Brandon Williams 
> ---
>  transport-helper.c | 8 
>  transport.c| 1 +
>  transport.h| 6 ++
>  3 files changed, 15 insertions(+)

Please add documentation for this command to
Documentation/gitremote-helpers.txt.

That helps reviewers, since it means reviewers can get a sense of what
the interface is meant to be.  It helps remote helper implementers as
well: it tells them what they can rely on and what can't rely on in
this interface.  For the same reason it helpers remote helper callers
as well.

Thanks,
Jonathan


Re: [PATCH v3 34/35] remote-curl: implement stateless-connect command

2018-02-27 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> Teach remote-curl the 'stateless-connect' command which is used to
> establish a stateless connection with servers which support protocol
> version 2.  This allows remote-curl to act as a proxy, allowing the git
> client to communicate natively with a remote end, simply using
> remote-curl as a pass through to convert requests to http.

Cool!  I better look at the spec for that first.

*looks at the previous patch*

Oh, there is no documented spec. :/  I'll muddle through this instead,
then.

[...]
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -188,7 +188,10 @@ static struct ref *parse_git_refs(struct discovery 
> *heads, int for_push)
>   heads->version = discover_version(&reader);
>   switch (heads->version) {
>   case protocol_v2:
> - die("support for protocol v2 not implemented yet");
> + /*
> +  * Do nothing.  Client should run 'stateless-connect' and
> +  * request the refs themselves.
> +  */
>   break;

This is the 'list' command, right?  Since we expect the client to run
'stateless-connect' instead, can we make it error out?

[...]
> @@ -1082,6 +1085,184 @@ static void parse_push(struct strbuf *buf)
>   free(specs);
>  }
>  
> +struct proxy_state {
> + char *service_name;
> + char *service_url;
> + struct curl_slist *headers;
> + struct strbuf request_buffer;
> + int in;
> + int out;
> + struct packet_reader reader;
> + size_t pos;
> + int seen_flush;
> +};

Can this have a comment describing what it is/does?  It's not obvious
to me at first glance.

It doesn't have to go in a lot of detail since this is an internal
implementation detail, but something saying e.g. that this represents
a connection to an HTTP server (that's an example; I'm not saying
that's what it represents :)) would help.

> +
> +static void proxy_state_init(struct proxy_state *p, const char *service_name,
> +  enum protocol_version version)
[...]
> +static void proxy_state_clear(struct proxy_state *p)

Looks sensible.

[...]
> +static size_t proxy_in(char *buffer, size_t eltsize,
> +size_t nmemb, void *userdata)

Can this have a comment describing the interface?  (E.g. does it read
a single pkt_line?  How is the caller expected to use it?  Does it
satisfy the interface of some callback?)

libcurl's example https://curl.haxx.se/libcurl/c/ftpupload.html just
calls this read_callback.  Such a name plus a pointer to
CURLOPT_READFUNCTION should do the trick; bonus points if the comment 
says what our implementation of the callback does.

Is this about having peek ability?

> +{
> + size_t max = eltsize * nmemb;

Can this overflow?  st_mult can avoid having to worry about that.

> + struct proxy_state *p = userdata;
> + size_t avail = p->request_buffer.len - p->pos;
> +
> + if (!avail) {
> + if (p->seen_flush) {
> + p->seen_flush = 0;
> + return 0;
> + }
> +
> + strbuf_reset(&p->request_buffer);
> + switch (packet_reader_read(&p->reader)) {
> + case PACKET_READ_EOF:
> + die("unexpected EOF when reading from parent process");
> + case PACKET_READ_NORMAL:
> + packet_buf_write_len(&p->request_buffer, p->reader.line,
> +  p->reader.pktlen);
> + break;
> + case PACKET_READ_DELIM:
> + packet_buf_delim(&p->request_buffer);
> + break;
> + case PACKET_READ_FLUSH:
> + packet_buf_flush(&p->request_buffer);
> + p->seen_flush = 1;
> + break;
> + }
> + p->pos = 0;
> + avail = p->request_buffer.len;
> + }
> +
> + if (max < avail)
> + avail = max;
> + memcpy(buffer, p->request_buffer.buf + p->pos, avail);
> + p->pos += avail;
> + return avail;

This is a number of bytes, but CURLOPT_READFUNCTION expects a number
of items, fread-style.  That is:

if (avail < eltsize)
... handle somehow, maybe by reading in more? ...

avail_memb = avail / eltsize;
memcpy(buffer,
   p->request_buffer.buf + p->pos,
   st_mult(avail_memb, eltsize));
p->pos += st_mult(avail_memb, eltsize);
return avail_memb;

But https://curl.haxx.se/libcurl/c/CURLOPT_READFUNCTION.html says

Your function must then return the actual number of bytes that
it stored in that memory area.

Does this mean eltsize is always 1?  This is super confusing...

... ok, a quick grep for fread_func in libcurl reveals that eltsize is
indeed always 1.  Can we add an assertion so we notice if that
changes?

if (eltsize != 1)
BUG("curl read callback called with size = %zu != 1",

Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-27 Thread Jonathan Nieder
Hi,

Randall S. Becker wrote:

> After months of arguing with some platform developers on this subject, the
> perl spec was held over my head repeatedly about a few lines that are
> causing issues. The basic problem is this line (test-lib-functions.sh, line
> 633, still in ffa952497)
>
>>elif test $exit_code -gt 129 && test $exit_code -le 192
>>   then
>>   echo >&2 "test_must_fail: died by signal $(($exit_code - 128)):
>
> According to the perl spec http://perldoc.perl.org/functions/die.html, die
> basically takes whatever errno is, mods it with 256 and there you go. EBADF
> is what is used when perl reads from stdin and calls die - that's standard
> perl. In most systems, you end up with something useful, when EBADF is 9.
> But when it is 4009, you get a garbage answer (4009 mod 256 a.k.a. 169).
> However, only 128-165 are technically reserved for signals, rather than all
> the way up to 192, which may be true in some places but not everywhere.
>
> The advice (I'm putting that nicely) I received was to use exit so that the
> result is predictable - unlikely to be useful in the 15K test suites in git.

The fundamental thing is the actual Git commands, not the tests in the
testsuite, no?

In the rest of git, die() makes a command exit with status 128.  The
trouble here is that our code in Perl is assuming the same meaning for
die() but using perl's die builtin instead.  That suggests a few
options:

 a) We could override the meaning of die() in Git.pm.  This feels
ugly but if it works, it would be a very small patch.

 b) We could forbid use of die() and use some git_die() instead (but
with a better name) for our own error handling.

 c) We could have a special different exit code convention for
commands written in Perl.  And then change expectations whenever a
command is rewritten in C.  As you might expect, I don't like this
option.

 d) We could wrap each command in an eval {...} block to convert the
result from die() to exit 128.

Option (b) feels simplest to me.

Thoughts?

Thanks,
Jonathan


[PATCH] protocol: treat unrecognized protocol.version setting as 0

2018-02-27 Thread Jonathan Nieder
If I share my .gitconfig or .git/config file between multiple machines
(or between multiple Git versions on a single machine) and set

[protocol]
version = 2

then running "git fetch" with a Git version that does not support
protocol v2 errors out with

fatal: unknown value for config 'protocol.version': 2

In the spirit of v1.7.6-rc0~77^2~1 (Improve error handling when
parsing dirstat parameters, 2011-04-29), it is better to (perhaps
after warning the user) ignore the unrecognized protocol version.
After all, future Git versions might add even more protocol versions,
and using two different Git versions with the same Git repo, machine,
or home directory should not cripple the older Git version just
because of a parameter that is only understood by a more recent Git
version.

So ignore the unrecognized value.  It may be useful for spell checking
(for instance, if I put "version = v1" intending "version = 1") to
warn about such settings, but this patch does not, since at least in
these early days for protocol v2 it is expected for configurations
that want to opportunistically use protocol v2 if available not to be
unusual.

Signed-off-by: Jonathan Nieder 
---
Google has been running with a patch like this internally for a while,
since we have been changing the protocol.version number to a new value
like 20180226 each time a minor tweak to the protocolv2 RFC occured.

The bit I have doubts about is whether to warn.  What do you think?

Thanks,
Jonathan

 protocol.c |  8 ++--
 t/t5700-protocol-v1.sh | 12 
 2 files changed, 14 insertions(+), 6 deletions(-)

diff --git a/protocol.c b/protocol.c
index 43012b7eb6..ce9c634a3a 100644
--- a/protocol.c
+++ b/protocol.c
@@ -17,12 +17,8 @@ enum protocol_version get_protocol_version_config(void)
const char *value;
if (!git_config_get_string_const("protocol.version", &value)) {
enum protocol_version version = parse_protocol_version(value);
-
-   if (version == protocol_unknown_version)
-   die("unknown value for config 'protocol.version': %s",
-   value);
-
-   return version;
+   if (version != protocol_unknown_version)
+   return version;
}
 
return protocol_v0;
diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
index ba86a44eb1..c35767ab01 100755
--- a/t/t5700-protocol-v1.sh
+++ b/t/t5700-protocol-v1.sh
@@ -31,6 +31,18 @@ test_expect_success 'clone with git:// using protocol v1' '
grep "clone< version 1" log
 '
 
+test_expect_success 'unrecognized protocol versions fall back to v0' '
+   GIT_TRACE_PACKET=1 git -c protocol.version= \
+   clone "$GIT_DAEMON_URL/parent" v 2>log &&
+
+   git -C daemon_child log -1 --format=%s >actual &&
+   git -C "$daemon_parent" log -1 --format=%s >expect &&
+   test_cmp expect actual &&
+
+   # Client requested and server responded using protocol v0
+   ! grep version log
+'
+
 test_expect_success 'fetch with git:// using protocol v1' '
test_commit -C "$daemon_parent" two &&
 
-- 
2.16.2.395.g2e18187dfd



Re: [PATCH] protocol: treat unrecognized protocol.version setting as 0

2018-02-27 Thread Jonathan Nieder
Duy Nguyen wrote:
> On Wed, Feb 28, 2018 at 8:02 AM, Brandon Williams  wrote:
>> On 02/27, Jonathan Nieder wrote:

>>> If I share my .gitconfig or .git/config file between multiple machines
>>> (or between multiple Git versions on a single machine) and set
>>>
>>>   [protocol]
>>>   version = 2
>>>
>>> then running "git fetch" with a Git version that does not support
>>> protocol v2 errors out with
>>>
>>>   fatal: unknown value for config 'protocol.version': 2
>>>
>>> In the spirit of v1.7.6-rc0~77^2~1 (Improve error handling when
>>> parsing dirstat parameters, 2011-04-29), it is better to (perhaps
>>> after warning the user) ignore the unrecognized protocol version.
>>> After all, future Git versions might add even more protocol versions,
>>> and using two different Git versions with the same Git repo, machine,
>>> or home directory should not cripple the older Git version just
>>> because of a parameter that is only understood by a more recent Git
>>> version.
>
> I wonder if it's better to specify multiple versions. If v2 is not
> recognized by this git but v0 is, then it can pick that up. But if you
> explicitly tell it to choose between v2 and v3 only and it does not
> understand either, then it dies. Not sure if this is a good idea
> though.

Interesting thought.  Something roughly like this (on top of the patch
this is a reply to)?

diff --git i/protocol.c w/protocol.c
index ce9c634a3a..6aa8857a11 100644
--- i/protocol.c
+++ w/protocol.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "string-list.h"
 #include "config.h"
 #include "protocol.h"
 
@@ -14,14 +15,18 @@ static enum protocol_version parse_protocol_version(const 
char *value)
 
 enum protocol_version get_protocol_version_config(void)
 {
-   const char *value;
-   if (!git_config_get_string_const("protocol.version", &value)) {
-   enum protocol_version version = parse_protocol_version(value);
-   if (version != protocol_unknown_version)
-   return version;
+   const struct string_list *values;
+   const struct string_list_item *value;
+   enum protocol_version result = protocol_v0;
+
+   values = git_config_get_value_multi("protocol.version");
+   for_each_string_list_item(value, values) {
+   enum protocol_version v = parse_protocol_version(value->string);
+   if (v != protocol_unknown_version)
+   result = v;
}
 
-   return protocol_v0;
+   return result;
 }
 
 enum protocol_version determine_protocol_version_server(void)


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-28 Thread Jonathan Nieder
Randall S. Becker wrote:

> The problem is actually in git code in its test suite that uses perl
> inline, not in my test code itself. The difficulty I'm having is
> placing this appropriate so that the signal handler gets used
> throughout the test suite including in the perl -e invocations. This
> is more a lack of my own understanding of plumbing of git test
> framework rather than of using or coding perl.

Can you elaborate with an example?  My understanding was that
test_must_fail is only for running git.  If a test is running perl and
wants to check its exit code, the test is supposed to use !, not
test_must_fail.

t/README backs me up:

 - use '! git cmd' when you want to make sure the git command exits
   with failure in a controlled way by calling "die()".  Instead,
   use 'test_must_fail git cmd'.  This will signal a failure if git
   dies in an unexpected way (e.g. segfault).

   On the other hand, don't use test_must_fail for running regular
   platform commands; just use '! cmd'.  We are not in the business
   of verifying that the world given to us sanely works.

So I don't consider the initial issue you raised a test issue at all!
It's a bug in the git commands, and a fix for it should not be
specific to the test suite.

And now it sounds like there is a second issue: the test suite is
overusing test_must_fail in some context and that needs to be fixed as
well.

Thanks,
Jonathan


Re: [PATCH] protocol: treat unrecognized protocol.version setting as 0

2018-02-28 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> If I share my .gitconfig or .git/config file between multiple machines
>> (or between multiple Git versions on a single machine) and set
>>
>>  [protocol]
>>  version = 2
>>
>> then running "git fetch" with a Git version that does not support
>> protocol v2 errors out with
>>
>>  fatal: unknown value for config 'protocol.version': 2
>>
>> In the spirit of v1.7.6-rc0~77^2~1 (Improve error handling when
>> parsing dirstat parameters, 2011-04-29), it is better to (perhaps
>> after warning the user) ignore the unrecognized protocol version.
>
> I do not agree with the analogy at all.
>
> Showing dirstat with different tweaks than the user expected to see
> is a local and read-only thing.  Talking to the other side over a
> protocol the user explicitly wanted to avoid (e.g. imagine the case
> where your upstream's protocol version 1 implementation is
> critically buggy and you want to use version 2 if you talk with
> them) by accident is a more grave error, possibly affecting the
> other side that you may not have enough power to recover from
> (e.g. damaging the remote repository to which you only have push
> access and not interactive shell).

Fair enough about the analogy being a poor one.

I disagree with characterizing using protocol v0 as a grave error,
because the scenario seems far-fetched to me.  I can imagine the
opposite scenario, wanting to use protocol v0 because a server's
implementation of v2 is buggy (for example, producing wrong
negotiation results and wasting bandwidth, or erroring out for a
request that should not be an error).  I am having trouble imagining a
broken v0 implementation doing anything worse than being slow or
erroring out.

That said, erroring out to catch spelling errors is nice and helpful,
so I would be okay with continuing to apply this as a local patch and
it not landing upstream.

The following third way would also be possible, but I'm pretty sure I
don't like it:

- As Duy suggested, allowing multiple 'protocol.version' settings.
  The last setting that the Git implementation recognizes wins.

- If there is at least one 'protocol.version' setting but the Git
  implementation doesn't recognize any of them, error out.

The reason I don't like it is that it doesn't make your example case
work significantly better.  If I have

[protocol]
version = 1

in my ~/.gitconfig and

[protocol]
version = v2

in .git/config, then that means I intended to use protocol v2 and
misspelled it, but this heuristic would ignore the v2 and fall back to
version 1.

As a side note, the exact same problem would happen if I make a typo
in the config key:

[protocol]
vesion = 2

Treating unrecognized values from the growing set of values as an
error while not treating unrecognized keys from the growing set of
keys as an error feels odd to me.  I think it would be useful to
document whatever we decide about this subject in
Documentation/CodingGuidelines.

Thanks,
Jonathan


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-28 Thread Jonathan Nieder
Randall S. Becker wrote:
> On February 28, 2018 12:44 PM, Jonathan Nieder wrote:
>> Randall S. Becker wrote:

>>> The problem is actually in git code in its test suite that uses perl
>>> inline, not in my test code itself.
[...]
>> Can you elaborate with an example?  My understanding was that
>> test_must_fail is only for running git.
[...]
> Have a look at a recent t1404 as a sample. Line 615 is the one causing the
> platform grief, because it triggers a 'die'. However, the particular test
> case #54, had no difference on platform with test_must_fail or !, which has
> the same underlying EBADF completion after digging and digging.

Sorry to be dense: what I see on that line is

test_must_fail git update-ref -d $prefix/foo >out 2>err &&

How is perl involved?

Puzzled,
Jonathan


Re: [PATCH 00/11] Moving global state into the repository object (part 2)

2018-02-28 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:
> On Wed, Feb 28, 2018 at 9:57 AM, Junio C Hamano  wrote:

>> Wait a minute.  Is that topic ever shown to work well together with
>> other topics in flight and are now ready to be built upon?  I had an
>> impression that it is just starting to get serious reviews.
>
> And I had the impression the serious reviews were done and fine;
> the only issue would be demonstrating its working fine with other
> series, where I was also worrying about conflicts with
> brians series. And to address that, I'd just send series in small sizes.

Some of the patches looked cooked to me and others still do not look
cooked yet.  I marked the former with Reviewed-by.  In general, a few
things can help to make the process easier for me:

 1. Giving a quick reply to a review to say how the comments were
resolved, sometimes even with a resend of that one patch to
illustrate.  That way the conversation can continue and the
individual patch can get to a reviewed state faster, without
having to chase between different rerolls of the entire series.

This also has an effect of making the review process more
collaborative: perhaps after seeing how you address their
comments, a reviewer may have another idea that they suggest via a
patch to squash in, etc.

 2. In a reroll, summarizing the result of previous reviews by
including acks as appropriate and Reviewed-by if a reviewer
granted it.  This helps with reviewing the reroll since it tells
people where to focus their attention.

[...]
> Is there anything that a contributor can help with that eases
> refactoring series in flight?

For helping reviewers, see above.

For helping Junio, what I've seen people occasionally do is to locally
run a "trial merge" against next and pu and see what semantic or
lexical conflicts arise.  In the cover letter you can describe these
and give Junio advice to make applying the patch easier for him.

[...]
> Sorry for the miscommunication, though,

FWIW, even though part 1 doesn't look done to me yet, it looks *close*
to done, and I was happy to see the sneak peek at part 2.

Thanks,
Jonathan


Re: [Problem] test_must_fail makes possibly questionable assumptions about exit_code.

2018-02-28 Thread Jonathan Nieder
Randall S. Becker wrote:

> The original thread below has details of what the original issue was and
> why. It hit three tests specifically on this platform where die was invoked
> (at least on one of them). Perl was invoked under the covers and the
> completion code of 169 propagated back through git to the test framework.
> https://public-inbox.org/git/499fb29f-ca34-8d28-256d-896107c29...@kdbg.org/T
> /#m0b30f0857feae2044f38e04dc6b8565b68b7d52b

That is:

test_must_fail git send-email --dump-aliases --to=jan...@example.com -1 
refs/heads/accounting

Which matches the case described elsewhere in this thread.  It's
git-send-email.perl.  test_must_fail is doing its job correctly by
diagnosing a real bug in git-send-email.perl, which we're grateful for
you having discovered. :)

Thanks,
Jonathan


Re: The case for two trees in a commit ("How to make rebase less modal")

2018-03-01 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> $ git hash-object --stdin -w -t commit < tree c70b4a33a0089f15eb3b38092832388d75293e86
> parent 105d5b91138ced892765a84e771a061ede8d63b8
> author Stefan Beller  1519859216 -0800
> committer Stefan Beller  1519859216 -0800
> tree 5495266479afc9a4bd9560e9feac465ed43fa63a
> test commit
> EOF
> 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $ git fsck |grep 19abfc3bf1c5d782045acf23abdf7eed81e16669
> $
>
> So it is technically possible to create a commit with two tree entries
> and fsck is not complaining.

As others mentioned, this is essentially a fancy way to experiment
with adding a new header (with the same name as an existing header) to
a commit.  It is kind of a scary thing to do because anyone trying to
parse commits, including old versions of git, is likely to get
confused by the multiple trees.  It doesn't affect the reachability
calculation in the way that it should so this ends up being something
that should be straightforward to do with a message in the commit body
instead.

To affect reachability, you could use multiple parent lines instead.
You'd need synthetic commits to hang the trees on.  This is similar to
how "git stash" stores the index state.

In other words, I think what you are trying to do is feasible, but not
in the exact way you described.

[...]
> * porcelain to modify the repo "at larger scale", such as rebase,
> cherrypicking, reverting
>   involving more than 1 commit.
>
> These large scale operations involving multiple commits however
> are all modal in its nature. Before doing anything else, you have to
> finish or abort the rebase or you need expert knowledge how to
> go otherwise.
>
> During the rebase there might be a hard to resolve conflict, which
> you may not want to resolve right now, but defer to later.  Deferring a
> conflict is currently impossible, because precisely one tree is recorded.

Junio mentions you'd want to record:
 - stages of the index, to re-create a conflicted index
 - working tree files, with conflict markers

In addition you may also want to record:
 - state (todo list) from .git/rebase-merge, to allow picking up where
   you left off in such a larger operation
 - similar state for other commands --- e.g. MERGE_MSG

Recording this work-in-progress state is in the spirit of "git stash"
does.  People also sometimes like to record their state in progress with
a "wip commit" at the tip of a branch.  Both of those workflows would
benefit from something like this, I'd think.

So I kind of like this.  Maybe a "git save-wip" command that is like
"git stash" but records state to the current branch?  And similarly
improving "git stash" to record the same richer state.

And in the spirit of "git stash" I think it is possible without
even modifying the commit object format.

[...]
> I'd be advocating for having multiple trees in a commit
> possible locally; it might be a bad idea to publish such trees.

I think such "WIP state" may also be useful for publishing, to allow
collaborating on a thorny rebase or merge.

Thanks and hope that helps,
Jonathan


Re: [PATCH 1/2] ref-filter: mark a file-local symbol as static

2018-03-01 Thread Jonathan Nieder
Hi,

Ramsay Jones wrote:

> Commit fcfba37337 ('ref-filter: make "--contains " less chatty if
>  is invalid', 2018-02-23) added the add_str_to_commit_list()
> function, which causes sparse to issue a "... not declared. Should it
> be static?" warning for that symbol.

Thanks for catching it!

> In order to suppress the warning, mark that function as static.

Isn't this closer to

Indeed, the function is only used in this one compilation
unit. Mark it static.

?  In other words, sparse's warning is accurate, and this is not about
trying to quiet a false positive but about addressing a true positive.

> Signed-off-by: Ramsay Jones 
> ---
>  ref-filter.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks,
Jonathan


Re: [PATCH 2/2] parse-options: remove the unused parse_opt_commits() function

2018-03-01 Thread Jonathan Nieder
Ramsay Jones wrote:

> Commit fcfba37337 ('ref-filter: make "--contains " less chatty if
>  is invalid', 2018-02-23), removed the last use of the callback
> function parse_opt_commits(). Remove this function declaration and
> definition, since it is now dead code.
>
> Signed-off-by: Ramsay Jones 
> ---
>  parse-options-cb.c | 16 
>  parse-options.h|  1 -
>  2 files changed, 17 deletions(-)

Reviewed-by: Jonathan Nieder 
Thanks.


Re: [RFC] Contributing to Git (on Windows)

2018-03-01 Thread Jonathan Nieder
Hi,

Derrick Stolee wrote:

> Now, we'd like to make that document publicly available. These steps are
> focused on a Windows user, so we propose putting them in the
> git-for-windows/git repo under CONTRIBUTING.md. I have a pull request open
> for feedback [1]. I'll read comments on that PR or in this thread.

Thanks!  I wonder if we can put this in the standard Documentation/
directory as well.  E.g. the Windows CONTRIBUTING.md could say say
"See Documentation/Contributing-On-Windows.md" so that the bulk of the
text would not have to be git-for-windows specific.

[...]
> +++ b/CONTRIBUTING.md
> @@ -0,0 +1,401 @@
> +How to Contribute to Git for Windows
> +

Would it make sense for this to be checked in with LF instead of CRLF
line endings, so that clones use the user's chosen / platform native
line ending?  The .gitattributes file could include

/CONTRIBUTING.md text

so that line ending conversion takes place even if the user hasn't
enabled core.autocrlf.

[...]
> +The SDK uses a different credential manager, so you may still want to 
> use Visual Studio
> +or normal Git Bash for interacting with your remotes.  Alternatively, 
> use SSH rather
> +than HTTPS and avoid credential manager problems.

Where can I read more about the problems in question?

[...]
> +Many new contributors ask: What should I start working on?
> +
> +One way to win big with the maintainer of an open-source project is to look 
> at the
> +[issues page](https://github.com/git-for-windows/git/issues) and see if 
> there are any issues that
> +you can fix quickly, or if anything catches your eye.

You can also look at https://crbug.com/git for non
Windows-specific issues.  And you can look at recent user questions
on the mailing list: https://public-inbox.org/git

[...]
> +If you are adding new functionality, you may need to create low-level tests 
> by creating
> +helper commands that test a very limited action. These commands are stored 
> in `t/helpers`.
> +When adding a helper, be sure to add a line to `t/Makefile` and to the 
> `.gitignore` for the
> +binary file you add. The Git community prefers functional tests using the 
> full `git`
> +executable, so be sure the test helper is required.

Maybe s/low-level/unit/?

[...]
> +Read [`t/README`](https://github.com/git/git/blob/master/t/README) for more 
> details.

Forgive my ignorance: does github flavored markdown allow relative
links?  (I.e., could this say [`t/README`](t/README)?)

[...]
> +You can also set certain environment variables to help test the performance 
> on different
> +repositories or with more repetitions. The full list is available in
> +[the `t/perf/README` 
> file](https://github.com/git/git/blob/master/t/perf/README),

Likewise.

[...]
> +Test Your Changes on Linux
> +--
> +
> +It can be important to work directly on the [core Git 
> codebase](https://github.com/git/git),
> +such as a recent commit into the `master` or `next` branch that has not been 
> incorporated
> +into Git for Windows. Also, it can help to run functional and performance 
> tests on your
> +code in Linux before submitting patches to the Linux-focused mailing list.

I'm surprised at this advice.  Does it actually come up?  When I was
on Mac I never worried about this, nor when I was on Windows.

I'm personally happy to review patches that haven't been tested on
Linux, though it's of course even nicer if the patch author mentions
what platforms they've tested on.

Maybe this can be reframed to refer specifically to cases where you've
modified some Linux platform-specific code (e.g. to add a new feature
to run-command.c)?

[...]
> +When preparing your patch, it is important to put yourself in the shoes of 
> the maintainer.

... and in the shoes of other users and developers working with Git down
the line who will interact with the patch later!

Actually, the maintainer is one of the least important audiences for a
commit message.  But may 'the maintainer' is a stand-in for 'the
project' here?

[...]
> +* Make sure the commits are signed off using `git commit (-s|--signoff)`. 
> This means that you
> +  testify to be allowed to contribute your changes, in particular that if 
> you did this on company
> +  time, said company approved to releasing your work as Open Source under 
> the GNU Public License v2.

Can this either point to or quote the relevant legal text from
Documentation/SubmittingPatches?  It feels dangerous (in the sense of,
potentially leading to misunderstandings and major future pain) to ask
people to sign off without them knowing exactly what that means.

The rest also looks nice.  Thanks for working on this.

Sincerely,
Jonathan


Re: Bug report: "Use of uninitialized value $_ in print"

2018-03-01 Thread Jonathan Nieder
Hi,

Sam Kuper wrote:

> First, background. I encountered a bug on Debian Stretch, using this
> git version:
>
> $ git --version
> git version 2.11.0
>
> The bug is that in the midst of running
>
> git -c interactive.diffFilter="git diff --word-diff --color" add --patch
>
> and after having answered "y" to some patches and "n" to others, git
> suddenly spewed the following output:
>
> Use of uninitialized value $_ in print at
> /usr/lib/git-core/git-add--interactive line 1371,  line 74.
> Stage this hunk [y,n,q,a,d,/,K,j,J,g,e,?]? n
> Use of uninitialized value $_ in print at
> /usr/lib/git-core/git-add--interactive line 1371,  line 75.
[...]

Strange.  The relevant line, for reference:

$ dpkg-deb --fsys-tarfile git_2.11.0-3+deb9u2_amd64.deb |
  tar Oxf - ./usr/lib/git-core/git-add--interactive |
  sed -n '1370,1372 p'

for (@{$hunk[$ix]{DISPLAY}}) {
print; < this one
}

This is a foreach loop, so it's supposed to have set $_ to each value
in the list @{$hunk[$ix]{DISPLAY}).  So why is Perl considering it
uninitialized?

Is this reproducible for you?  Do you have more details about how I
can reproduce it?

What arch are you on?  What perl version do you use?  Can you report
this using "reportbug git"?

Thanks,
Jonathan


Re: [RFC] Contributing to Git (on Windows)

2018-03-03 Thread Jonathan Nieder
Hi Dscho,

Johannes Schindelin wrote:
>> Jonathan Nieder  writes:
>>> Dereck Stolee wrote:

>>>> +Test Your Changes on Linux
>>>> +--
>>>> +
>>>> +It can be important to work directly on the [core Git
>>>> codebase](https://github.com/git/git), +such as a recent commit into
>>>> the `master` or `next` branch that has not been incorporated +into
>>>> Git for Windows. Also, it can help to run functional and performance
>>>> tests on your +code in Linux before submitting patches to the
>>>> Linux-focused mailing list.
>>>
>>> I'm surprised at this advice.  Does it actually come up?
>
> Yes.
>
> I personally set up the automated builds on Windows, Linux and macOS for
> our team, and as a rule we always open an internal Pull Request on our
> topic branches as we develop them, and you would probably not believe the
> number of issues we caught before sending the patches to this list. Issues
> including
[nice list snipped]

Thanks for explaining.  I still am going to push back on the wording
here, and here is why:

 1. Approximately 1/2 of the differences you describe apply to Mac as
well as Windows.  The advice certainly does not apply on Mac.

You might object: Mac readers would not be reading this text!  But
that is not how humans work: if project documentation (e.g. the
CONTRIBUTING.md on GitHub!) says that the project is Linux-focused
and if you don't test on Linux then you might as well not bother,
then people are going to believe it.

 2. It is not unusual for Linux users to make portability mistakes that
are quickly pointed out on list.  If anything, the advice to test on
other platforms should apply to contributors on Linux even more.
This happens especially often to new contributors, who sometimes use
bashisms, etc that get quickly pointed out.

 3. I do not *want* Git to be a Linux-focused project; I want the code
to perform well on all popular platforms and for people not to be
afraid to make it so.

If the docs say that all we care about is Linux, then people are
likely to be too scared to do the necessary and valuable work of
making it work well on Mac, Windows, etc.  The actual mix of
contributors doesn't bear it out anyway: a large number of
contributors are already on Mac or Windows.

Fortunately this is pretty straightforward to fix in the doc: it could
say something like "to the multi-platform focused mailing list", for
example.

[...]
> To my chagrin, this idea of making most of the boring stuff (and I include
> formatting in that category, but I am probably special in that respect) as
> automatable, and as little of an issue for review as possible, leaving
> most brain cycles to work on the correctness of the patches instead, did
> not really receive a lot of traction on this mailing list.

Huh?  I'm confident that this is a pretty widely supported idea within
the Git project.

I get the feeling you must have misread something or misinterpreted a
different response.

[...]
> No, this advice comes straight from my personal observation that the
> reviewers on the Git mailing list are Linux-centric.

Hopefully the clarifications and suggestions higher in this message
help.  If they don't, then I'm nervous about our ability to understand
each other.

[...]
> Now, how reasonable do I think it is to ask those contributors to purchase
> an Apple machine to test their patches on macOS (you cannot just download
> an .iso nor would it be legal to run a macOS VM on anything but Apple
> hardware)? You probably guessed my answer: not at all.

Agreed, this is something that needs to be automated (and not via a
CONTRIBUTING.md file).  As a stopgap, having a section in the
contribution instructions about testing using Windows's Linux
subsystem is a valuable thing, and thanks for that; I never meant to
imply otherwise.

[...]
> On Fri, 2 Mar 2018, Junio C Hamano wrote:

>> In fact, portability issues in a patch originally written for a platform
>> is rather quickly discovered if the original platform is more minor than
>> the others, so while it is good advice to test your ware before you make
>> it public, singling out the portability issues may not add much value.
>> The fewer number of obvious bugs remaining, the fewer number of
>> iterations it would take for a series to get in a good shape.
[...]
> For you, Junio, however, the task *now* is to put yourself into the shoes
> of a Computer Science student in their 2nd year who wants to become an
> Open Source contributor and is a little afraid to talk directly to "the
> core committers", and quite scared what negative feedback they might
> receive.
>
> &q

Re: [RFC] Contributing to Git (on Windows)

2018-03-05 Thread Jonathan Nieder
Hi Dscho,

Johannes Schindelin wrote:
> On Sat, 3 Mar 2018, Jonathan Nieder wrote:

>> Hopefully the clarifications and suggestions higher in this message
>> help.  If they don't, then I'm nervous about our ability to understand
>> each other.
>
> Okay, let me state what I think the goal of this document should be:
>
>   To help Windows-based developers who want to contribute their first
>   patch to the Git project.
>
> Whatever makes it easier to contributors and is easily explained, should
> go in, in my opinion.
>
> Wishful thinking, and philosophical considerations, should probably stay
> out.

I think this conversation has gone way off the rails.

I certainly share some blame for that: in
https://public-inbox.org/git/20180303182723.ga76...@aiede.svl.corp.google.com/
I let my emotions get the better of me and let my bafflement show
instead of focusing on my gratitude for the context and clarifications
you were providing.  And I am grateful for those.

What went wrong is that I somehow turned it into a debate.  That's not
the point of a patch review.  After all, we have the same goals for
this document!  So I am happy to continue to work with Derrick Stolee
(and anyone else interested) on whatever improvements he would like to
pursue, but I am going to bow out of the arguing with you part, if
that's okay.

Jonathan


Re: [RFC] Contributing to Git (on Windows)

2018-03-05 Thread Jonathan Nieder
Derrick Stolee wrote:
> Dereck Stolee wrote:
>
> nit: s/Dereck/Derrick/ Is my outgoing email name misspelled, or do you have
> a misspelled contact info for me?

A manual typo.  Sorry about that.

[... a bunch snipped ...]
> I have a habit of being too loose in language around lawyer-speak. I should
> not have attempted to summarize what "Signed-off-by:" means and will use
> that helpful link for the description instead.

No worries.  I make that kind of mistake all the time but just thought
it worth pointing out.

BTW, thanks again for writing and submitting this document.  It can't
land soon enough. :)

Jonathan


Re: [RFC] Contributing to Git (on Windows)

2018-03-05 Thread Jonathan Nieder
Hi again,

Back on topic, some quick clarifications to tie up loose ends.

Also I want to thank you for continuing to push the project to work
better (especially to work better for newcomers).  We don't seem to
have a habit of saying often enough how important that goal is.  Of
course we'll disagree from time to time in minor ways about particular
aspects of how to change the development workflow, but the progress
you've already made (e.g. via tools like SubmitGit) is a huge deal.

[...]
Johannes Schindelin wrote:
> On Sat, 3 Mar 2018, Jonathan Nieder wrote:

>>  1. Approximately 1/2 of the differences you describe apply to Mac as
>> well as Windows.
>
> No. The executable bit, the native line endings, most of the issues I
> listed do not catch macOS-based developers off guard.

Yeah, my wording was really sloppy.

I meant that one of the differences you described half-applies to Mac
and the rest don't apply to Mac.  I should have proofread.

[...]
> Stolee agreed in the PR to mention alternatives to Hyper-V, such as
> VirtualBox, which would help macOS-based developers here.

I have no opinion about that (maybe it will make the text too long and
overwhelming, or maybe it will help people on balance).

[...]
> The Google gang (you & Junio included) uses Linux. Peff uses Linux. From
> what I can see Duy, Eric and Jake use Linux. That covers already the most
> active reviewers right there.

We are not as uniform as it may seem.  The Google gang all uses Linux
on workstations but some use Mac laptops as well.  We deal with user
reports all the time from all three popular platforms.

IIRC then Junio has a test setup that tests on Linux, FreeBSD, and
some other platforms.  IIRC Microsoft provides a way to run a Windows
VM for development purposes that he could use for testing in the same
way as he tests on FreeBSD if there are clear enough instructions for
doing it (hint, hint). :)

Of course it's even better if there is some public shared build/test
dashboard that we can all work together to at least keep green.

[...]
> So where is that formatter call that fixes your code?

There's "make style", and people still continue to work on improving
its configuration (thanks!).

[...]
> However, VSTS is free for teams up to five, meaning that any developer can
> register a project, and once I manage to get build definitions in shape, I
> can make them public and anybody can test their code on the major three
> platforms, in their personal VSTS account.

Thanks.  That sounds potentially useful.  (Though a shared dashboard
that we all keep green might be even more so.)

[...]
> So everything is a legal text.

Yeah.  In this context I should have instead said something like
"lawyer-reviewed text".

[...]
> Put another way: I indulged in veering off on a tangent. You know, like we
> do for fun here ;-)

Feel free to call me on it when my tangents are hurting, or when
they're helping for that matter.  That way I have training data to
improve my model of how to make a good tangent. :)

Sincerely,
Jonathan


Re: [PATCH v4 13/35] ls-refs: introduce ls-refs server command

2018-03-05 Thread Jonathan Nieder
Hi,

On Mon, Mar 05, 2018 at 10:21:55AM -0800, Brandon Williams wrote:
> On 03/02, Jeff King wrote:

>> It also accepts "refs/h*" to get "refs/heads" and "refs/hello".  I think
>> it's worth going for the most-restrictive thing to start with, since
>> that enables a lot more server operations without worrying about
>> breaking compatibility.
>
> And just to clarify what do you see as being the most-restrictive case
> of patterns that would should use?

Peff, can you say a little more about the downsides of accepting
refs/h*?

IIRC the "git push" command already accepts such refspecs, so there's a
benefit to accepting them.  Reftable and packed-refs support such
queries about as efficiently as refs/heads/*.  For loose refs, readdir
doesn't provide a way to restrict which files you look at, but loose
refs are always slow anyway. :)

In other words, I see real benefits and I don't see much in the way of
costs, so I'm not seeing why not to support this.

Thanks,
Jonathan


Re: Contributor Summit planning

2018-03-05 Thread Jonathan Nieder
Lars Schneider wrote:

> Thanks for starting this. I would like to propose the following topics:

Cool!  Do you mind starting threads for these so people who aren't there
can provide input into the discussion, too?  In other words, I'm
imagining

 1. Thread starts on mailing list

 2. Contributor summit: in-person presentation, discussion, etc lead to
people having better ideas

 3. On-list thread goes really well as a result of aforementioned
in-person discussion

Quick feedback:

> - hooks: Discuss a proposal for multiple local Git hooks of the same type.

I'd be happy to weigh in on a mailing list thread about this.  It's
also related to
https://public-inbox.org/git/20171002234517.gv19...@aiede.mtv.corp.google.com/
which is an interest of mine.

> - error reporting: Git is distributed and therefore lots of errors are only
>   reported locally. That makes it hard for administrators in larger
>   companies to see trouble. Would it make sense to add a config option that
>   would push recent errors along with "git push" to the server?

I'm interested in instrumentation but worried about the privacy
ramifications of this particular proposal.  I'd be happy to see some
built-in instrumentation hooks (or even a standard instrumentation
approach, if the mailing list comes up with a good one that respects
privacy).

> - fuzzing: Would it make sense to register Git to Google's OSS fuzzing
>   program https://github.com/google/oss-fuzz ?

Of course!

Alongside the obvious security benefit, there is money available to
support someone working on this:
https://opensource.googleblog.com/2017/05/oss-fuzz-five-months-later-and.html
https://www.google.com/about/appsecurity/patch-rewards/ clarifies that
the reward goes to the contributor, so you don't even necessarily have
to share your reward with the Git project. ;-)

Thanks,
Jonathan


Re: [PATCH v3 12/35] serve: introduce git-serve

2018-03-05 Thread Jonathan Nieder
Hi,

Jeff King wrote:

> I agree that would be a lot more pleasant for adding protocol features.
> But I just worry that the stateful protocols get a lot less efficient.
> I'm having trouble coming up with an easy reproduction, but my
> recollection is that http has some nasty corner cases, because each
> round of "have" lines sent to the server has to summarize the previous
> conversation. So you can get a case where the client's requests keep
> getting bigger and bigger during the negotiation (and eventually getting
> large enough to cause problems).

That's not so much a corner case as just how negotiation works over
http.

We want to do better (e.g. see [1]) but that's a bigger change than
the initial protocol v2.

As Brandon explained it to me, we really do want to use stateless-rpc
semantics by default, since that's just better for maintainability.
Instead of having two protocols, one that is sane and one that
struggles to hoist that into stateless-rpc, there would be one
stateless baseline plus capabilities to make use of state.

For example, it would be nice to have a capability to remember
negotiation state between rounds, to get around exactly the problem
you're describing when using a stateful protocol.  Stateless backends
would just not advertise such a capability.  But doing that without [1]
still sort of feels like a cop-out.  If we can get a reasonable
baseline using ideas like [1] and then have a capability to keep
server-side state as icing on the cake instead of having a negotiation
process that only really makes sense when you have server-side state,
then that would be even better.

> If anything, I wish we could push the http protocol in a more stateful
> direction with something like websockets. But I suspect that's an
> unrealistic dream, just because not everybody's http setup (proxies,
> etc) will be able to handle that.

Agreed.  I think we have to continue to deal with stateless-rpc
semantics, at least for the near future.

Jonathan

[1] 
https://public-inbox.org/git/20180227054638.gb65...@aiede.svl.corp.google.com/


Re: Bug report: "Use of uninitialized value $_ in print"

2018-03-05 Thread Jonathan Nieder
Jeff King wrote:
> On Fri, Mar 02, 2018 at 09:15:44AM -0800, Junio C Hamano wrote:
>> Jeff King  writes:

>>> That's probably a reasonable sanity check, but I think we need to abort
>>> and not just have a too-small DISPLAY array. Because later code like the
>>> hunk-splitting is going to assume that there's a 1:1 line
>>> correspondence. We definitely don't want to end up in a situation where
>>> we show one thing but apply another.
>>
>> Yes, agreed completely.
>
> Let's add this sanity check while we're thinking about it. Here's a
> series.
>
>   [1/2]: t3701: add a test for interactive.diffFilter
>   [2/2]: add--interactive: detect bogus diffFilter output
>
>  git-add--interactive.perl  |  8 
>  t/t3701-add-interactive.sh | 20 ++++
>  2 files changed, 28 insertions(+)

With or without the tweak Ævar Arnfjörð Bjarmason suggested,
Reviewed-by: Jonathan Nieder 

Thanks.  It's probably also worth adding Sam's reported-by to patch 2/2:
Reported-by: Sam Kuper 


Re: man gittutorial: patch proposal

2018-03-05 Thread Jonathan Nieder
Hi,

kalle wrote:

> see attachment.

Thanks for writing.  A few questions:

 1. Can you say a little more about the rationale for this change?
E.g. is the current formatting sending a confusing message?  Is the
current formatting intending to use '' as quotes and using italics
instead?  If so, should this use "" to fix it?

 2. May we forge your sign-off? See the section "Certify your work"
in Documentation/SubmittingPatches for more about what this means.

Thanks,
Jonathan


RFC: Another proposed hash function transition plan

2017-03-03 Thread Jonathan Nieder
Hi,

This past week we came up with this idea for what a transition to a new
hash function for Git would look like.  I'd be interested in your
thoughts (especially if you can make them as comments on the document,
which makes it easier to address them and update the document).

This document is still in flux but I thought it best to send it out
early to start getting feedback.

We tried to incorporate some thoughts from the thread
http://public-inbox.org/git/20170223164306.spg2avxzukkgg...@kitenet.net
but it is a little long so it is easy to imagine we've missed
some things already discussed there.

You can use the doc URL

 https://goo.gl/gh2Mzc

to view the latest version and comment.

Thoughts welcome, as always.

Git hash function transition

Status: Draft
Last Updated: 2017-03-03

Objective
-
Migrate Git from SHA-1 to a stronger hash function.

Background
--
The Git version control system can be thought of as a content
addressable filesystem. It uses the SHA-1 hash function to name
content. For example, files, trees, commits are referred to by hash
values unlike in other traditional version control systems where files
or versions are referred to via sequential numbers. The use of a hash
function to address its content delivers a few advantages:

* Integrity checking is easy. Bit flips, for example, are easily
  detected, as the hash of corrupted content does not match its name.
  Lookup of objects is fast.

Using a cryptographically secure hash function brings additional advantages:

* Object names can be signed and third parties can trust the hash to
  address the signed object and all objects it references.
* Communication using Git protocol and out of band communication
  methods have a short reliable string that can be used to reliably
  address stored content.

Over time some flaws in SHA-1 have been discovered by security
researchers. https://shattered.io demonstrated a practical SHA-1 hash
collision. As a result, SHA-1 cannot be considered cryptographically
secure any more. This impacts the communication of hash values because
we cannot trust that a given hash value represents the known good
version of content that the speaker intended.

SHA-1 still possesses the other properties such as fast object lookup
and safe error checking, but other hash functions are equally suitable
that are believed to be cryptographically secure.

Goals
-
1. The transition to SHA256 can be done one local repository at a time.
   a. Requiring no action by any other party.
   b. A SHA256 repository can communicate with SHA-1 Git servers and
  clients (push/fetch).
   c. Users can use SHA-1 and SHA256 identifiers for objects
  interchangeably.
   d. New signed objects make use of a stronger hash function than
  SHA-1 for their security guarantees.
2. Allow a complete transition away from SHA-1.
   a. Local metadata for SHA-1 compatibility can be dropped in a
  repository if compatibility with SHA-1 is no longer needed.
3. Maintainability throughout the process.
   a. The object format is kept simple and consistent.
   b. Creation of a generalized repository conversion tool.

Non-Goals
-
1. Add SHA256 support to Git protocol. This is valuable and the
   logical next step but it is out of scope for this initial design.
2. Transparently improving the security of existing SHA-1 signed
   objects.
3. Intermixing objects using multiple hash functions in a single
   repository.
4. Taking the opportunity to fix other bugs in git's formats and
   protocols.
5. Shallow clones and fetches into a SHA256 repository. (This will
   change when we add SHA256 support to Git protocol.)
6. Skip fetching some submodules of a project into a SHA256
   repository. (This also depends on SHA256 support in Git protocol.)

Overview

We introduce a new repository format extension `sha256`. Repositories
with this extension enabled use SHA256 instead of SHA-1 to name their
objects. This affects both object names and object content --- both
the names of objects and all references to other objects within an
object are switched to the new hash function.

sha256 repositories cannot be read by older versions of Git.

Alongside the packfile, a sha256 stores a bidirectional mapping
between sha256 and sha1 object names. The mapping is generated locally
and can be verified using "git fsck". Object lookups use this mapping
to allow naming objects using either their sha1 and sha256 names
interchangeably.

"git cat-file" and "git hash-object" gain options to display a sha256
object in its sha1 form and write a sha256 object given its sha1 form.
This requires all objects referenced by that object to be present in
the object database so that they can be named using the appropriate
name (using the bidirectional hash mapping).

Fetches from a SHA-1 based server convert the fetched objects into
sha256 form and record the mapping in the bidirectional mapping table
(see below for details)

Re: RFC: Another proposed hash function transition plan

2017-03-06 Thread Jonathan Nieder
David Lang wrote:

>> Translation table
>> ~
>> A fast bidirectional mapping between sha1-names and sha256-names of
>> all local objects in the repository is kept on disk. The exact format
>> of that mapping is to be determined.
>>
>> All operations that make new objects (e.g., "git commit") add the new
>> objects to the translation table.
>
> This seems like a rather nontrival thing to design. It will need to
> hold millions of mappings, and be quickly searchable from either
> direction (sha1->new and new->sha1) while still be fairly fast to
> insert new records into.

I am currently thinking of using LevelDB, since it has the advantages of
being simple, already existing, and having already been ported to Java
(allowing JGit can read and write the same format).

If that doesn't work, we'd try some other key-value store like Samba's
tdb or Kyoto Cabinet.

Jonathan


RFC v3: Another proposed hash function transition plan

2017-03-06 Thread Jonathan Nieder
Linus Torvalds wrote:
> On Fri, Mar 3, 2017 at 5:12 PM, Jonathan Nieder  wrote:

>> This document is still in flux but I thought it best to send it out
>> early to start getting feedback.
>
> This actually looks very reasonable if you can implement it cleanly
> enough.

Thanks for the kind words on what had quite a few flaws still.  Here's
a new draft.  I think the next version will be a patch against
Documentation/technical/.

As before, comments welcome, both here and inline at

  https://goo.gl/gh2Mzc

Changes since v2:

Use SHA3-256 instead of SHA2 (thanks, Linus and brian m.
carlson).[1][2]

Make sha3-based signatures a separate field, avoiding the need for
"hash" and "nohash" fields (thanks to peff[3]).

Add a sorting phase to fetch (thanks to Junio for noticing the need
for this).

Omit blobs from the topological sort during fetch (thanks to peff).

Discuss alternates, git notes, and git servers in the caveats section
(thanks to Junio Hamano, brian m. carlson[4], and Shawn Pearce).

Clarify language throughout (thanks to various commenters, especially
Junio).

Sincerely,
Jonathan

Git hash function transition

Status: Draft
Last Updated: 2017-03-06

Objective
-
Migrate Git from SHA-1 to a stronger hash function.

Background
--
At its core, the Git version control system is a content addressable
filesystem. It uses the SHA-1 hash function to name content. For
example, files, directories, and revisions are referred to by hash
values unlike in other traditional version control systems where files
or versions are referred to via sequential numbers. The use of a hash
function to address its content delivers a few advantages:

* Integrity checking is easy. Bit flips, for example, are easily
  detected, as the hash of corrupted content does not match its name.
* Lookup of objects is fast.

Using a cryptographically secure hash function brings additional
advantages:

* Object names can be signed and third parties can trust the hash to
  address the signed object and all objects it references.
* Communication using Git protocol and out of band communication
  methods have a short reliable string that can be used to reliably
  address stored content.

Over time some flaws in SHA-1 have been discovered by security
researchers. https://shattered.io demonstrated a practical SHA-1 hash
collision. As a result, SHA-1 cannot be considered cryptographically
secure any more. This impacts the communication of hash values because
we cannot trust that a given hash value represents the known good
version of content that the speaker intended.

SHA-1 still possesses the other properties such as fast object lookup
and safe error checking, but other hash functions are equally suitable
that are believed to be cryptographically secure.

Goals
-
1. The transition to SHA3-256 can be done one local repository at a time.
   a. Requiring no action by any other party.
   b. A SHA3-256 repository can communicate with SHA-1 Git servers
  (push/fetch).
   c. Users can use SHA-1 and SHA3-256 identifiers for objects
  interchangeably.
   d. New signed objects make use of a stronger hash function than
  SHA-1 for their security guarantees.
2. Allow a complete transition away from SHA-1.
   a. Local metadata for SHA-1 compatibility can be removed from a
  repository if compatibility with SHA-1 is no longer needed.
3. Maintainability throughout the process.
   a. The object format is kept simple and consistent.
   b. Creation of a generalized repository conversion tool.

Non-Goals
-
1. Add SHA3-256 support to Git protocol. This is valuable and the
   logical next step but it is out of scope for this initial design.
2. Transparently improving the security of existing SHA-1 signed
   objects.
3. Intermixing objects using multiple hash functions in a single
   repository.
4. Taking the opportunity to fix other bugs in git's formats and
   protocols.
5. Shallow clones and fetches into a SHA3-256 repository. (This will
   change when we add SHA3-256 support to Git protocol.)
6. Skip fetching some submodules of a project into a SHA3-256
   repository. (This also depends on SHA3-256 support in Git
   protocol.)

Overview

We introduce a new repository format extension `sha3`. Repositories
with this extension enabled use SHA3-256 instead of SHA-1 to name
their objects. This affects both object names and object content ---
both the names of objects and all references to other objects within
an object are switched to the new hash function.

sha3 repositories cannot be read by older versions of Git.

Alongside the packfile, a sha3 repository stores a bidirectional
mapping between sha3 and sha1 object names. The mapping is generated
locally and can be verified using "git fsck". Object lookups use this
mapping to allow naming objects using either their sha1 and sha3 names
interchangeably.

"git cat-file

Re: RFC v3: Another proposed hash function transition plan

2017-03-09 Thread Jonathan Nieder
Hi,

Shawn Pearce wrote:
> On Mon, Mar 6, 2017 at 4:17 PM, Jonathan Nieder  wrote:

>> Alongside the packfile, a sha3 repository stores a bidirectional
>> mapping between sha3 and sha1 object names. The mapping is generated
>> locally and can be verified using "git fsck". Object lookups use this
>> mapping to allow naming objects using either their sha1 and sha3 names
>> interchangeably.
>
> I saw some discussion about using LevelDB for this mapping table. I
> think any existing database may be overkill.
>
> For packs, you may be able to simplify by having only one file
> (pack-*.msha1) that maps SHA-1 to pack offset; idx v2. The CRC32 table
> in v2 is unnecessary, but you need the 64 bit offset support.
>
> SHA-1 to SHA-3: lookup SHA-1 in .msha1, reverse .idx, find offset to
> read the SHA-3.
> SHA-3 to SHA-1: lookup SHA-3 in .idx, and reverse the .msha1 file to
> translate offset to SHA-1.

Thanks for this suggestion.  I was initially vaguely nervous about
lookup times in an idx-style file, but as you say, object reads from a
packfile already have to deal with this kind of lookup and work fine.

> For loose objects, the loose object directories should have only
> O(4000) entries before auto gc is strongly encouraging
> packing/pruning. With 256 shards, each given directory has O(16) loose
> objects in it. When writing a SHA-3 loose object, Git could also
> append a line "$sha3 $sha1\n" to objects/${first_byte}/sha1, which
> GC/prune rewrites to remove entries. With O(16) objects in a
> directory, these files should only have O(16) entries in them.

Insertion time is what worries me.  When writing a small number of
objects using a command like "git commit", I don't want to have to
regenerate an entire idx file.  I don't want to move the pain to
O(loose objects) work at read time, either --- some people disable
auto gc, and others have a large number of loose objects due to gc
ejecting unreachable objects.

But some kind of simplification along these lines should be possible.
I'll experiment.

Jonathan


Re: [RFC PATCH] help: add optional instructions for reporting bugs

2017-03-10 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> When reporting bugs, users will usually look at the output of
> 'git --version' at one point to write a quality bug report.
> So that is a good spot to provide additional information to the user
> about e.g. additional the organizational quirks how to report a bug.
>
> As the output of 'git --version' is parsed by scripts as well,
> we only want to present this information to users, which is why
> we only give the output to TTYs.

Interesting thought.  This might also be a good place to point users
to "git version --build-options" to get more detailed build
information.

The existence of that option also suggests you're on the right track
about 'git version' being the command for this.

> Git is distributed in various ways by various organizations. The Git
> community prefers to have bugs reported on the mailing list, whereas
> other organizations may rather want to have filed a bug in a bug tracker
> or such.  The point of contact is different by organization as well.

It's tempting to put the custom information in --build-options --- e.g.

 $ git version
 git version 2.12.0.190.g6e60aba09d.dirty
 hint: use "git version --build-options" for more detail
 hint: and bug reporting instructions
 $
 $ git version --build-options
 git version 2.12.0.190.g6e60aba09d.dirty
 sizeof-long: 8
 reporting-bugs: see REPORTING BUGS section in "git help git"

Using 'hint:' would put this in the advice category. Usually those are
possible to disable using advice.* configuration (e.g.
advice.versionBuildOptions) for less noisy output.

[...]
> --- a/help.c
> +++ b/help.c
> @@ -9,6 +9,12 @@
>  #include "version.h"
>  #include "refs.h"
>  
> +#ifdef GIT_BUG_REPORT_HELP

If doing this for real, there would be a knob in the Makefile for
setting the bug report string.

[...]
> @@ -435,6 +441,8 @@ int cmd_version(int argc, const char **argv, const char 
> *prefix)
>   /* NEEDSWORK: also save and output GIT-BUILD_OPTIONS? */
>   }
>   }
> + if (isatty(1))
> + puts(git_bug_reporting_string);

Should this go to stderr?  E.g. advise() writes to stderr.

Some scripts may run "git version" in output that is written to stdout
and meant to be copy/pasted.  Is there a way for such scripts to
suppress this output?  Should they use -c
advice.versionBuildOptions=0, does 'git version' need an option to
suppress the human-oriented output, should they use 2>/dev/null, or is
this just something that people have to live with?

I'm still on the fence about whether this is a good idea. At least
having custom bug instructions in --build-options sounds like a very
nice thing, but it's not obvious to me how people would learn about
that option.

Thanks and hope that helps,
Jonathan


Re: RFC v3: Another proposed hash function transition plan

2017-03-10 Thread Jonathan Nieder
Jeff King wrote:
> On Thu, Mar 09, 2017 at 12:24:08PM -0800, Jonathan Nieder wrote:

>>> SHA-1 to SHA-3: lookup SHA-1 in .msha1, reverse .idx, find offset to
>>> read the SHA-3.
>>> SHA-3 to SHA-1: lookup SHA-3 in .idx, and reverse the .msha1 file to
>>> translate offset to SHA-1.
>>
>> Thanks for this suggestion.  I was initially vaguely nervous about
>> lookup times in an idx-style file, but as you say, object reads from a
>> packfile already have to deal with this kind of lookup and work fine.
>
> Not exactly. The "reverse .idx" step has to build the reverse mapping on
> the fly, and it's non-trivial.

Sure.  To be clear, I was handwaving over that since adding an on-disk
reverse .idx is a relatively small change.

[...]
> So I think it's solvable, but I suspect we would want an extension to
> the .idx format to store the mapping array, in order to keep it log-n.

i.e., this.

The loose object side is the more worrying bit, since we currently don't
have any practical bound on the number of loose objects.

One way to deal with that is to disallow loose objects completely.
Use packfiles for new objects, batching the objects produced by a
single process into a single packfile.  Teach "git gc --auto" a
behavior similar to Martin Fick's "git exproll" to combine packfiles
between full gcs to maintain reasonable performance.  For unreachable
objects, instead of using loose objects, use "unreachable garbage"
packs explicitly labeled as such, with similar semantics to what
JGit's DfsRepository backend uses (described in the discussion at
https://git.eclipse.org/r/89455).

That's a direction that I want in the long term anyway.  I was hoping
not to couple such changes with the hash transition but it might be
one of the simpler ways to go.

Jonathan


Re: [GSoC] Move ~/.git-credential-cache to ~/.cache/git

2017-03-10 Thread Jonathan Nieder
(+cc: npostavs)
Hi Devin,

Devin Lehmacher wrote:

> I started working on this microproject and am not quite sure what is
> necessary for backwards compatibility. Since the socket is recreated
> whenever the credential daemon exits backwards compatibility
> shouldn’t really be a concern with regard to where the socket is
> located in the filesystem.
>
> However, contrib/persistent-https depends on the socket being at
> ~/.git-credential-cache/socket, so changing the default location
> would break this. However, if we need to keep the socket at that
> location for cases like this I don’t understand how this change
> would be helpful in any way.

That's a good question.  If I'm reading contrib/persistent-https/
correctly, it uses the same directory but doesn't rely on the socket
there, so it should not be a problem.

However, that reminded me to search for other tools that might rely on
the socket.  Using
https://codesearch.debian.net/search?q=%5C.git-credential-cache, I
find that magit does rely on the socket path.

 $ git clone https://github.com/magit/magit
 $ git log -S.git-credential-cache
 commit 0f30dfbb0075ac2e99b65a2c7fac360197a989c1
 Author: Noam Postavsky 
 Date:   Sat Oct 24 15:57:54 2015 -0400

Start credential daemon on magit-credential-hook

If we let git start the daemon, Emacs will send a SIGHUP when git
finishes and closes the pty, killing the daemon.  Hence the need to have
our own daemon running first.

Cc-ing Noam to figure out what a safe transition will look like.

Thanks for noticing,
Jonathan


Re: RFC: Another proposed hash function transition plan

2017-03-13 Thread Jonathan Nieder
Hi,

The Keccak Team wrote:

> We have read your transition plan to move away from SHA-1 and noticed
> your intent to use SHA3-256 as the new hash function in the new Git
> repository format and protocol. Although this is a valid choice, we
> think that the new SHA-3 standard proposes alternatives that may also be
> interesting for your use cases.  As designers of the Keccak function
> family, we thought we could jump in the mail thread and present these
> alternatives.

I indeed had some reservations about SHA3-256's performance.  The main
hash function we had in mind to compare against is blake2bp-256.  This
overview of other functions to compare against should end up being
very helpful.

Thanks for this.  When I have more questions (which I most likely
will) I'll keep you posted.

Sincerely,
Jonathan


Re: [RFC PATCH] Move SHA-1 implementation selection into a header file

2017-03-14 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:

[...]
> --- a/cache.h
> +++ b/cache.h
> @@ -10,8 +10,8 @@
>  #include "trace.h"
>  #include "string-list.h"
>  #include "pack-revindex.h"
> +#include "hash.h"
>  
> -#include SHA1_HEADER

For what it's worth, the bazel build tool doesn't like this
'#include SHA1_HEADER' either.  Your fix looks like a straightforward
fix and we never encouraged directly customizing SHA1_HEADER.

The other approaches discussed may also work but they don't add
anything for my application (nor yours, I'd think).  Conditional
#includes are a pretty normal thing so I am fine with this more
straightforward change.  So

Reviewed-by: Jonathan Nieder 

That said, if someone is excited about one of the other approaches
then I don't object to them.

Thanks,
Jonathan


Re: [RFC PATCH] Move SHA-1 implementation selection into a header file

2017-03-14 Thread Jonathan Nieder
Junio C Hamano wrote:

> [jc: make BLK_SHA1 the fallback default as discussed on list,
> e.g. <20170314201424.vccij5z2ortq4...@sigill.intra.peff.net>; also
> remove SHA1_HEADER and SHA1_HEADER_SQ that are no longer used].
>
> Signed-off-by: brian m. carlson 
> Signed-off-by: Junio C Hamano 

Reviewed-by: Jonathan Nieder 

Thanks for taking care of it.


Re: [RFC PATCH] Move SHA-1 implementation selection into a header file

2017-03-14 Thread Jonathan Nieder
Jeff King wrote:
> On Tue, Mar 14, 2017 at 11:41:26AM -0700, Jonathan Nieder wrote:
>> brian m. carlson wrote:

>>> --- a/cache.h
>>> +++ b/cache.h
>>> @@ -10,8 +10,8 @@
>>>  #include "trace.h"
>>>  #include "string-list.h"
>>>  #include "pack-revindex.h"
>>> +#include "hash.h"
>>>
>>> -#include SHA1_HEADER
>>
>> For what it's worth, the bazel build tool doesn't like this
>> '#include SHA1_HEADER' either.  Your fix looks like a straightforward
>> fix and we never encouraged directly customizing SHA1_HEADER.
>
> Hmm. I don't know how you're using bazel with git, but if it is doing
> something like generating header dependencies, would that mean that it
> potentially picks up the wrong dependency with brian's patch?

I believe it picks up all options as dependencies, which is good
enough for me.

I have a custom BUILD file to build git with bazel.  I like the
reliable dependencies it generates (unless you do heavy contortions,
files aren't available to the build commands unless the dependency is
declared) and fast, parallel build with simple progress output.  But
keeping it up to date with every patch that changes the Makefile is
not something I would wish on the git project.

One of these days I'd like to try making a tool to automatically
generate the BUILD file, like contrib/buildsystems generates a
Visual C project.

Regards,
Jonathan


Re: [PATCH v2] short status: improve reporting for submodule changes

2017-03-16 Thread Jonathan Nieder
Hi,

Yay, I like the change this makes.  So I'll nitpick in the hope that
that makes the patch more likely to stick.

Stefan Beller wrote:

> While we already have a porcelain2 layer for git-status, that is accurate
> for submodules, users still like the way they are are used to of
> 'status -s'.

I had to read this a few times to understand it.  Maybe explaining it
from the user's point of view would work:

If I add an untracked file to a submodule or modify a tracked file,
currently "git status --short" treats the change in the same way as
changes to the current HEAD of the submodule:

$ git clone --quiet --recurse-submodules 
https://gerrit.googlesource.com/gerrit
$ echo hello >gerrit/plugins/replication/stray-file
$ sed -i -e 's/.*//' gerrit/plugins/replication/.mailmap
$ git -C gerrit status --short
 M plugins/replication

This is by analogy with ordinary files, where "M" represents a change
that has not been added yet to the index.  But this change cannot be
added to the index without entering the submodule, "git add"-ing it,
and running "git commit", so the analogy is counterproductive.

> As a submodule has more state than a file potentially, we need more letters
> indicating these states, we'll go with lower 'm' and single '?' for now.
>
> When there the recorded commit doesn't differ from the submodule HEAD
> (which we still want to treat as "M", because that can be dealt with
> in the superproject), we can have different types of change in the
> submodule. The lower case 'm' indicates that there are changes to tracked
> files in the submodule. It signals that the --recurse-submodules option
> is needed for e.g. adding or committing these changes. The " ?" also
> differentiates an untracked file in the submodule from a regular
> untracked file.

This could say

Introduce new status letters " ?" and " m" for this.  These are similar
to the existing "??" and " M" but mean that the submodule (not the
parent project) has new untracked files and modified files, 
respectively.
The user can use "git add" and "git commit" from within the submodule to
add them.

Changes to the submodule's HEAD commit can be recorded in the index with
a plain "git add -u" and are shown with " M", like today.

To avoid excessive clutter, show at most one of " ?", " m", and " M" for
the submodule.  They represent increasing levels of change --- the last
one that applies is shown (e.g., " m" if there are both modified files
and untracked files in the submodule, or " M" if the submodule's HEAD
has been modified and it has untracked files).

> While making these changes, we need to make sure to not break porcelain
> level 1, which uses the same code as the short printing function.

While making these changes, we need to make sure to not break porcelain
level 1, which shares code with "status --short".  We only change
"git status --short".

Non-short "git status" and "git status --porcelain=2" already handle
these cases by showing more detail:

$ git -C gerrit status --porcelain=2
1 .M S.MU 16 16 16 
305c864db28eb0c77c8499bc04c87de3f849cf3c 
305c864db28eb0c77c8499bc04c87de3f849cf3c plugins/replication
$ git -C gerrit status
[...]
modified:   plugins/replication (modified content, untracked 
content)

Scripts caring about these distinctions should use --porcelain=2.

> Signed-off-by: Stefan Beller 

Having written that, a few questions:

Is it important to avoid clutter by showing the submodule only once?
What would you think of showing whatever subset of those three
statuses apply to a given submodule as separate lines instead, to
match the information that long-form "git status" shows?

How should a new untracked file in a submodule of a submodule be
shown?

[...]
> --- a/Documentation/git-status.txt
> +++ b/Documentation/git-status.txt
> @@ -181,6 +181,12 @@ in which case `XY` are `!!`.
>  !   !ignored
>  -
>  
> +Submodules have more state to it, such that it reports instead:

Language nits: s/ to it//; s/, such that it reports instead/ and instead report/

> + Mthe submodule has a different HEAD than recorded

than recorded in the index?

> + mthe submodule has modified content
> + ?the submodule has untracked files
[...]
> --- a/t/t7506-status-submodule.sh
> +++ b/t/t7506-status-submodule.sh

\o/

> @@ -50,6 +50,15 @@ test_expect_success 'status with modified file in 
> submodule (porcelain)' '
>   EOF
>  '
>  
> +test_expect_success 'status with modified file in submodule (short)' '
> + (cd sub && git reset --hard) &&

not about this pa

Re: [PATCH] wt-status.c: improve readability for wt_shortstatus_print

2017-03-16 Thread Jonathan Nieder
Stefan Beller wrote:

> Subject: wt-status.c: improve readability for wt_shortstatus_print

Maybe:

wt-status: simplify by using for_each_string_list_item

Improve readability by using the for_each_string_list_item helper
instead of manually iterating with an integer counter.

> Signed-off-by: Stefan Beller 
> ---
>  wt-status.c | 20 +---
>  1 file changed, 5 insertions(+), 15 deletions(-)

I like the diffstat. :)

With or without a clarified commit message,
Reviewed-by: Jonathan Nieder 


Re: [PATCH v2] short status: improve reporting for submodule changes

2017-03-16 Thread Jonathan Nieder
Stefan Beller wrote:
> Jonathan Nieder wrote:

>> Is it important to avoid clutter by showing the submodule only once?
>> What would you think of showing whatever subset of those three
>> statuses apply to a given submodule as separate lines instead, to
>> match the information that long-form "git status" shows?
>
> I considered it, but it would break the visual appeal of git status --short ?

I could go either way.  As long as you've thought about it, I'm happy.

>> How should a new untracked file in a submodule of a submodule be
>> shown?
>
> The same way " ?" indicates that (1) there is an untracked file due to
> the question mark and (2) that you need to recurse because it differs from
> "??" for regular untracked files.
>
> The problem here is that we do not know about these nested untracked files,
> because we use --porcelain instead of --short for submodules in
> submodule.c#is_submodule_modified(). I am rewriting that function anyway
> for the "git-describe --dirty" bug, so maybe it's time to switch to 
> porcelain=2
> internally there, which can surface untracked files in nested subs.

Punting to a TODO / separate patch sounds reasonable.  Tests in this
patch describing either the current behavior or the desired behavior
would be helpful, though.

Thanks,
Jonathan


Re: why patch to the gitk no replay?

2017-03-17 Thread Jonathan Nieder
Hi,

yanke131415 wrote:

> I send a patch to gitk
> project(git://ozlabs.org/~paulus/gitk)with the target email address
> pau...@ozlabs.org. But several days later No replay of this patch i
> receive, is the  email address i send patch wrong? Anyone who knows?

Sending to this mailing list (and cc-ing Paul) is more likely to work.

Hope that helps,
Jonathan


Re: Bug with .gitignore and branch switching

2017-03-17 Thread Jonathan Nieder
Hi Nevada,

Nevada Sanchez wrote:

> # Commit a file that will end up in .gitignore
> echo 'original settings' > mine.conf
> git add mine.conf
> git commit -m "Unknowingly committed my settings."
>
> echo '*.conf' > .gitignore
> git add .gitignore
> git commit -m "Users shouldn't commit their settings"

Naming a file in .gitignore tells git that you do not want to track it
and are giving git permission to write over it.  This commonly happens
when people check in build products.  For example:

git rm -f my-build-product
echo /my-build-product >>.gitignore
git commit -m "Remove generated my-build-product file"
make my-build-product

git checkout HEAD^

Without that rule, this 'git checkout' command would fail.

That said, there are some cases (e.g. the .conf file case you mention)
where a person would want git not to track a file but do not want to
give git permission to write over it.  As you've seen, .gitignore does
not work well for this. :/

Ideas for next steps:

 1. The gitignore(5) manpage does not do a good job of emphasizing
that files named there are not precious and can be overwritten by
git.  Do you have ideas for wording that would help with that?
This would be especially welcome if you can phrase them in the
form of a patch against Documentation/gitignore.txt.

 2. Occasionally people have mentioned the idea of a .gitprecious file
listing precious files that git should not track and not overwrite
(e.g., keys and other configuration files, IDE state, or metadata
for another version control system being used in parallel).  Would
you be interested in working on that?

Thanks and hope that helps,
Jonathan


Re: Bug with .gitignore and branch switching

2017-03-17 Thread Jonathan Nieder
Junio C Hamano wrote:

> There is no "untracked but precious" vs "untracked and expendable"
> difference in the current system.  An untracked file that matches
> patterns listed in .gitignore is treated as the latter.
[...]
> We've discussed the lack of "untracked but precious" class a few
> times on the list in the past, but I do not recall the topic came up
> in the recent past.  It perhaps is because nobody found that class
> useful enough so far.

The most recent example I can find is 2010:
http://public-inbox.org/git/4c6a1c5b.4030...@workspacewhiz.com/.

It also came up in 2007:
http://public-inbox.org/git/c0e9f681e68d48eb8989022d11fee...@ntdev.corp.microsoft.com/
Earlier in that year it even made the "What's not in 1.5.2" list.
http://public-inbox.org/git/11793556383977-git-send-email-jun...@cox.net/

Perhaps those references could be a useful starting point for an
interested person's thinking.

Jonathan


Re: [PATCHv4] rev-parse: add --show-superproject-working-tree

2017-03-17 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Add the flag --show-superproject-working-tree to git-rev-parse
> to make it easy to find out if there is a superproject. When no
> superproject exists, the output will be empty.

I'm a little late for this, but think it's better to comment late than
never.

After this patch, git rev-parse has two options relating to the working
tree:

git rev-parse --is-inside-work-tree
git rev-parse --show-superproject-working-tree

Is this inconsistency necessary?  Now that there is a "git worktree"
command that allows making a second working tree for the same
repository, I would have thought we had standardized on work tree as
the name in UI.

Thoughts?
Jonathan


Re: [PATCH 2/2] grep: fix builds with with no thread support

2017-03-17 Thread Jonathan Nieder
Brandon Williams wrote:

> Commit 0281e487fd91 ("grep: optionally recurse into submodules")
> added functions grep_submodule() and grep_submodule_launch() which
> uses "struct work_item" which is defined only when thread support

nit: which use (s/uses/use/)

> is available.
[...]
> Reported-by: Rahul Bedarkar 
> Signed-off-by: Brandon Williams 
> ---
>  builtin/grep.c | 21 +
>  1 file changed, 9 insertions(+), 12 deletions(-)

With or without that commit message tweak,
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH] run-command: fix segfault when cleaning forked async process

2017-03-17 Thread Jonathan Nieder
Jeff King wrote:

> Reported-by: Brandon Williams 
> Signed-off-by: Jeff King 

Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH] receive-pack: simplify run_update_post_hook()

2017-03-17 Thread Jonathan Nieder
René Scharfe wrote:

> Instead of counting the arguments to see if there are any and then
> building the full command use a single loop and add the hook command
> just before the first argument.  This reduces duplication and overall
> code size.
>
> Signed-off-by: Rene Scharfe 
> ---
>  builtin/receive-pack.c | 13 +
>  1 file changed, 5 insertions(+), 8 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: [PATCH] Documentation/git-worktree: use working tree for trees on the file system

2017-03-17 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> If I recall correctly, "worktree" is the feature/command, and
> "working tree" is an instance in the file system, even when you only
> have one working tree.

I'm not sure I agree with this distinction.  "worktree" is just a
short name for "working tree".

Unfortunately gitglossary(7) doesn't make this clear at all --- it
uses the term worktree a few times (and appears to mean "working tree"
when it does --- e.g.

Pathspecs are used on the command line of [...] and many other
commands to limit the scope of operations to some subset of
the tree or worktree.

) but never defines it.

[...]
> --- a/Documentation/git-worktree.txt
> +++ b/Documentation/git-worktree.txt
> @@ -58,10 +58,10 @@ as if `-b $(basename )` was specified.
>  
>  list::
>  
> -List details of each worktree.  The main worktree is listed first, followed 
> by
> -each of the linked worktrees.  The output details include if the worktree is
> -bare, the revision currently checked out, and the branch currently checked 
> out
> -(or 'detached HEAD' if none).
> +List details of each working tree.  The main working tree is listed first,
> +followed by each of the linked working trees.  The output details include if
> +the working tree is bare, the revision currently checked out, and the branch
> +currently checked out (or 'detached HEAD' if none).

The patch itself looks good.

Thanks,
Jonathan


Re: [PATCH] Documentation/git-worktree: use working tree for trees on the file system

2017-03-20 Thread Jonathan Nieder
Junio C Hamano wrote:
> Stefan Beller  writes:

>> While it may be true that you can have bare worktrees; I would question
>> why anyone wants to do this, as the only thing it provides is an
>> additional HEAD (plus its reflog).
>
> A more plausible situation is you start with a bare one as the
> primary and used to make local clones to do your work in the world
> before "git worktree".  It would be a natural extension to your
> workflow to instead create worktrees of of that bare one as the
> primary worktree with secondaries with working trees.

For what it's worth, this conversation makes me think it was a mistake
to call this construct a worktree.

It's fine for the command to have one name and the documentation to
use a longer, clearer name to explain it.  What should that longer,
clearer name be?

Thanks,
Jonathan


Re: [PATCH 1/2] git.c: introduce --working-tree superseding --work-tree

2017-03-20 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Signed-off-by: Stefan Beller 
> ---
>  Documentation/git.txt | 12 ++--
>  git.c |  5 +++--
>  2 files changed, 9 insertions(+), 8 deletions(-)

I think this is a step in the right direction.  Thanks for writing it.

Nits:

- tests are still using --work-tree --- this patch didn't add any tests
  for --working-tree.  If --working-tree is what we prefer, it may make
  sense to update tests to use --working-tree and add a test or two to
  make sure the existing --work-tree synonym still works.

- this patch updated the argv[i] == "--work-tree" case but forgot to
  update the argv[i].has_prefix("--work-tree=") case

- since this is a feature used for scripting, I don't think we can
  pretend the name change never happened.  We think we need to
  document both option names and say what version introduced the new
  one so script authors can make an informed decision about which to
  use.  Later we can make the --work-tree synonym more obscure, but in
  the short term I suspect it is what most script authors will still
  want to use.

> --- a/Documentation/git.txt
> +++ b/Documentation/git.txt
[...]
> @@ -892,7 +892,7 @@ Git so take care if using a foreign front-end.
>  
>  `GIT_WORK_TREE`::
>   Set the path to the root of the working tree.
> - This can also be controlled by the `--work-tree` command-line
> + This can also be controlled by the `--working-tree` command-line
>   option and the core.worktree configuration variable.

I suspect we don't want to rename GIT_WORK_TREE --- it's not
user-facing in the same way as --work-tree is, scripts make direct use
of it (and they unset it when appropriate!), and dealing with the
permutations of what to do if some subset of environment variables is
set seems very complicated.

For comparison, core.worktree is user-facing.  Is it also in scope for
this change?

Thanks and hope that helps,
Jonathan


Re: [PATCH 2/2] revparse: introduce --is-inside-working-tree

2017-03-20 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> This behaves the same as 'is-inside-worktree' and supersedes it.
> See prior patch for discussion of "working tree" vs. "worktree"
>
> Signed-off-by: Stefan Beller 
> ---
>  Documentation/git-rev-parse.txt | 4 ++--
>  builtin/rev-parse.c | 3 ++-
>  2 files changed, 4 insertions(+), 3 deletions(-)

This is less invasive than the previous patch and can probably stand
alone.  Some of the same nits apply:

* tests?

* documentation would need to warn people that this option is new, for
  now. In fact it's even tempting to make --is-inside-working-tree
  the hidden/discouraged one for a while, until script authors can
  count on git having had it available for a while, and only then
  encourage its use.

Thanks and hope that helps,
Jonathan


Re: [PATCH] t3600: rename test to describe its functionality

2017-03-21 Thread Jonathan Nieder
Stefan Beller wrote:

> This was an oversight in 55856a35b2 (rm: absorb a submodules git dir
> before deletion, 2016-12-27), as the body of the test changed without
> adapting the test subject.
>
> Signed-off-by: Stefan Beller 
> ---
>  t/t3600-rm.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t3600-rm.sh b/t/t3600-rm.sh
> index 5aa6db584c..1f87781e94 100755
> --- a/t/t3600-rm.sh
> +++ b/t/t3600-rm.sh
> @@ -658,7 +658,7 @@ test_expect_success 'rm of a populated nested submodule 
> with nested untracked fi
>   test_cmp expect actual
>  '
>  
> -test_expect_success 'rm of a populated nested submodule with a nested .git 
> directory fails even when forced' '
> +test_expect_success 'rm of a populated nested submodule with a nested .git 
> directory absorbs the nested git dir' '

This title is getting long. How about something like the following for
squashing in?

With or without that change,
Reviewed-by: Jonathan Nieder 

diff --git i/t/t3600-rm.sh w/t/t3600-rm.sh
index 1f87781e94..3c63455729 100755
--- i/t/t3600-rm.sh
+++ w/t/t3600-rm.sh
@@ -658,7 +658,7 @@ test_expect_success 'rm of a populated nested submodule 
with nested untracked fi
test_cmp expect actual
 '
 
-test_expect_success 'rm of a populated nested submodule with a nested .git 
directory absorbs the nested git dir' '
+test_expect_success "rm absorbs submodule's nested .git directory" '
git reset --hard &&
git submodule update --recursive &&
(cd submod/subsubmod &&


Re: [PATCH] t3600: rename test to describe its functionality

2017-03-22 Thread Jonathan Nieder
Stefan Beller wrote:

> This was an oversight in 55856a35b2 (rm: absorb a submodules git dir
> before deletion, 2016-12-27), as the body of the test changed without
> adapting the test subject.
>
> Signed-off-by: Stefan Beller 
> Reviewed-by: Jonathan Nieder 

Yup.  Thanks for taking care of it.

Jonathan


Re: [PATCH 1/2] send-pack: send push options correctly in stateless-rpc case

2017-03-22 Thread Jonathan Nieder
Brandon Williams wrote:

> "git send-pack --stateless-rpc" puts each request in a sequence of pkt-lines
> followed by a flush-pkt. The push option code forgot about this and sends push
> options and their terminating delimiter as ordinary pkt-lines that get their
> length header stripped off by remote-curl before being sent to the server.
>
> The result is multiple malformed requests, which the server rejects.
>
> Fortunately send-pack --stateless-rpc already is aware of this "pkt-line 
> within
> pkt-line" framing for the update commands that precede push options. Handle
> push options the same way.
>
> Signed-off-by: Brandon Williams 
> ---
>  send-pack.c | 20 
>  1 file changed, 8 insertions(+), 12 deletions(-)

This is only a hypothetical issue until the next patch though, right?

For what it's worth,

Tested-by: Jonathan Nieder 
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH 2/2] remote-curl: allow push options

2017-03-22 Thread Jonathan Nieder
Brandon Williams wrote:

> --- a/builtin/send-pack.c
> +++ b/builtin/send-pack.c
> @@ -152,6 +152,7 @@ int cmd_send_pack(int argc, const char **argv, const char 
> *prefix)
>   int progress = -1;
>   int from_stdin = 0;
>   struct push_cas_option cas = {0};
> + struct string_list push_options = STRING_LIST_INIT_DUP;

It's safe for this to be NODUP, since the strings added to it live in
argv.

>  
>   struct option options[] = {
>   OPT__VERBOSITY(&verbose),
> @@ -171,6 +172,9 @@ int cmd_send_pack(int argc, const char **argv, const char 
> *prefix)
>   OPT_BOOL(0, "stateless-rpc", &stateless_rpc, N_("use stateless 
> RPC protocol")),
>   OPT_BOOL(0, "stdin", &from_stdin, N_("read refs from stdin")),
>   OPT_BOOL(0, "helper-status", &helper_status, N_("print status 
> from remote helper")),
> + OPT_STRING_LIST('o', "push-option", &push_options,
> + N_("server-specific"),
> + N_("option to transmit")),

Does this need the short-and-sweet option name 'o'?  For this command,
I think I'd prefer giving it no short name.

Should this option be documented in the manpage, too?

[...]
> --- a/remote-curl.c
> +++ b/remote-curl.c
> @@ -22,6 +22,7 @@ struct options {
>   unsigned long depth;
>   char *deepen_since;
>   struct string_list deepen_not;
> + struct string_list push_options;
>   unsigned progress : 1,
>   check_self_contained_and_connected : 1,
>   cloning : 1,
> @@ -139,6 +140,9 @@ static int set_option(const char *name, const char *value)
>   else
>   return -1;
>   return 0;
> + } else if (!strcmp(name, "push-option")) {
> + string_list_append(&options.push_options, value);
> + return 0;

push_options has strdup_strings enabled so this takes ownership of a
copy of value.  Good.

[...]
> --- a/t/t5545-push-options.sh
> +++ b/t/t5545-push-options.sh
> @@ -102,7 +102,9 @@ test_expect_success 'two push options work' '
>   test_cmp expect upstream/.git/hooks/post-receive.push_options
>  '
>  
> -test_expect_success 'push option denied properly by http remote helper' '\
> +test_expect_success 'push option denied properly by http server' '

Should this test use test_i18ngrep to check that the error message
diagnoses the problem correctly instead of hitting an unrelated error
condition?

[...]
> @@ -113,6 +115,32 @@ test_expect_success 'push option denied properly by http 
> remote helper' '\
>   git -C test_http_clone push origin master
>  '
>  
> +test_expect_success 'push options work properly across http' '

Nice.

Tested-by: Jonathan Nieder 

With whatever subset of the following changes seems sensible to you
squashed in,
Reviewed-by: Jonathan Nieder 

Thanks.

diff --git i/Documentation/git-send-pack.txt w/Documentation/git-send-pack.txt
index a831dd0288..966abb0df8 100644
--- i/Documentation/git-send-pack.txt
+++ w/Documentation/git-send-pack.txt
@@ -81,6 +81,12 @@ be in a separate packet, and the list must end with a flush 
packet.
will also fail if the actual call to `gpg --sign` fails.  See
linkgit:git-receive-pack[1] for the details on the receiving end.
 
+--push-option=::
+   Pass the specified string as a push option for consumption by
+   hooks on the server side.  If the server doesn't support push
+   options, error out.  See linkgit:git-push[1] and
+   linkgit:githooks[5] for details.
+
 ::
A remote host to house the repository.  When this
part is specified, 'git-receive-pack' is invoked via
diff --git i/builtin/send-pack.c w/builtin/send-pack.c
index 6796f33687..832fd7ed0a 100644
--- i/builtin/send-pack.c
+++ w/builtin/send-pack.c
@@ -144,6 +144,7 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
unsigned force_update = 0;
unsigned quiet = 0;
int push_cert = 0;
+   struct string_list push_options = STRING_LIST_INIT_NODUP;
unsigned use_thin_pack = 0;
unsigned atomic = 0;
unsigned stateless_rpc = 0;
@@ -152,7 +153,6 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
int progress = -1;
int from_stdin = 0;
struct push_cas_option cas = {0};
-   struct string_list push_options = STRING_LIST_INIT_DUP;
 
struct option options[] = {
OPT__VERBOSITY(&verbose),
@@ -166,15 +166,15 @@ int cmd_send_pack(int argc, const char **argv, const char 
*prefix)
 

Re: USE_SHA1DC is broken in pu

2017-03-22 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> As to the default of seriously slowing down all SHA-1 computations: since
> you made that the default, at compile time, with no way to turn on the
> faster computation, this will have a major, negative impact. Are you
> really, really sure you want to do that?
>
> I thought that it was obvious that we would have at least a runtime option
> to lessen the load.

It's not obvious to me.  I agree that the DC_SHA1 case can be sped up,
e.g. by turning off the collision detection for sha1 calculations that
are not part of fetching, receiving a push, or running fsck.

To be clear, are you saying that this is a bad compile-time default
because distributors are going to leave it and end-users will end up
with a bad experience?  Or are you saying distributors have no good
alternative to choose at compile time?  Or something else?

Is there some particular downstream that this breaks?

Thanks and hope that helps,
Jonathan


Re: [PATCH] t7406: correct test case for submodule-update initial population

2017-03-22 Thread Jonathan Nieder
Stefan Beller wrote:
> On Wed, Mar 22, 2017 at 2:59 PM, Jeff King  wrote:

>>   sq=\'
>>   test_expect_success '...' '
>> cat >expect <<-EOF
>> Execution of ${sq}false $submodulesha1${sq} ...
>>   '
>>
>> but I'm not sure if that is any more readable.
>
> If I recall correctly, I made a big fuss about single quotes used correctly 
> when
> writing that patch (which is why I may have lost track of the actual work 
> there)
> to be told the one and only blessed way to use single quotes in our test 
> suite.
>
> Your proposal to use ${sq} sounds good to me, though we did not
> follow through with it for some other reason. I can reroll with that, though.

I don't know what you're referring to by only and blessed way, but it
might be the following.  If you want to use single-quotes on a command
line using $sq, you would have to use eval:

eval "some_command ${sq}single-quoted argument${sq}"

which is generally more complicated than doing the '\'' dance:

some_command '\''single-quoted argument'\''

But the example in this thread is different. In the here-document you
are in an interpolated context so the ${sq} method or '\'' would work
equally well.

Hope that helps,
Jonathan


Re: [PATCH v2 0/2] push options across http

2017-03-22 Thread Jonathan Nieder
Brandon Williams wrote:

> v2 addresses Jonathan's comments from v1.
>   * Fix a test
>   * Add some documentation
>   * remove short option from --push-option in git send-pack
> 
> Brandon Williams (2):
>   send-pack: send push options correctly in stateless-rpc case
>   remote-curl: allow push options
> 
>  Documentation/git-send-pack.txt |  6 ++
>  builtin/send-pack.c |  5 +
>  remote-curl.c   |  8 
>  send-pack.c | 20 
>  t/t5545-push-options.sh | 33 +++--
>  5 files changed, 58 insertions(+), 14 deletions(-)

Reviewed-by: Jonathan Nieder 
Tested-by: Jonathan Nieder 

Thanks.


Re: Warn user about known Git Rebase bug?!

2017-03-22 Thread Jonathan Nieder
Lars Schneider wrote:

> we rebased a branch using "--preserve-merges --interactive" and were 
> surprised that the operation reported success although it silently 
> omitted commits (Git 2.12 on Windows). A little search revealed that 
> these are likely known bugs [1]. Would it make sense to print an 
> appropriate warning message if a user runs rebase with 
> "--preserve-merges --interactive"?

Sounds like a good idea.  Care to suggest a patch?

Thanks,
Jonathan

> [1] https://git-scm.com/docs/git-rebase#_bugs


Re: [PATCH 1/3] submodule.c: port is_submodule_modified to use porcelain 2

2017-03-22 Thread Jonathan Nieder
Stefan Beller wrote:

> Migrate 'is_submodule_modified' to the new porcelain format of
> git-status.
>
> As the old porcelain only reported ' M' for submodules, no
> matter what happened inside the submodule (untracked files,
> changes to tracked files or move of HEAD), the new API
> properly reports the different scenarios.
[...]
>  submodule.c | 53 -
>  1 file changed, 24 insertions(+), 29 deletions(-)

Neat.  Is this something that could be covered in tests, or should I
be patient and rely on patch 3/3 for that?

I think this would be easier to understand if it were two patches: one
that switched to --porcelain=2 with no change in behavior, and another
that took advantage of --porcelain=2 to return richer information.  As
is, I had trouble verifying that this isn't going to break anything
--- there's not enough local information here and in submodule.h to
tell what callers may rely on and I didn't audit callers.

Thanks and hope that helps,
Jonathan


Re: [PATCH 2/3] submodule.c, is_submodule_modified: stricter checking for submodules

2017-03-22 Thread Jonathan Nieder
Stefan Beller wrote:

> By having a stricter check in the superproject we catch errors earlier,
> instead of spawning a child process to tell us.

Plus the error message is clearer.

> ---
>  submodule.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)

Reviewed-by: Jonathan Nieder 


Re: [PATCH 3/3] short status: improve reporting for submodule changes

2017-03-22 Thread Jonathan Nieder
Stefan Beller wrote:

>  Documentation/git-status.txt |  9 +
>  t/t3600-rm.sh| 18 +-
>  t/t7506-status-submodule.sh  | 24 
>  wt-status.c  | 13 +++--
>  4 files changed, 57 insertions(+), 7 deletions(-)

Very nice.

[...]
> --- a/Documentation/git-status.txt
> +++ b/Documentation/git-status.txt
> @@ -181,6 +181,13 @@ in which case `XY` are `!!`.

The documentation is clear.

[...]
> --- a/t/t3600-rm.sh
> +++ b/t/t3600-rm.sh

The updated behavior shown in this test makes sense.

[...]
> --- a/t/t7506-status-submodule.sh
> +++ b/t/t7506-status-submodule.sh

This test has good coverage and describes a good behavior.

[...]
> --- a/wt-status.c
> +++ b/wt-status.c
> @@ -431,10 +431,19 @@ static void wt_status_collect_changed_cb(struct 
> diff_queue_struct *q,
>   }
>   if (!d->worktree_status)
>   d->worktree_status = p->status;
> - d->dirty_submodule = p->two->dirty_submodule;
> - if (S_ISGITLINK(p->two->mode))
> + if (S_ISGITLINK(p->two->mode)) {
> + d->dirty_submodule = p->two->dirty_submodule;
>   d->new_submodule_commits = !!oidcmp(&p->one->oid,
>   &p->two->oid);
> + if (s->status_format == STATUS_FORMAT_SHORT) {
> + if (d->new_submodule_commits)
> + d->worktree_status = 'M';

Can these be new DIFF_STATUS_* letters in diff.h?

I hadn't realized before that the worktree_status usually is taken
verbatim from diff_filepair.  Now I understand better what you were
talking about offline earlier today (sorry for the confusion).

We probably don't want the diff machinery to have to be aware of the
various status_format modes, so doing this here seems sensible to me.
Unfortunately it's also a reminder that "git diff --name-status" and
--diff-filter could benefit from a similar change.  (Unfortunate
because that's a harder change to make without breaking scripts.)

> + else if (d->dirty_submodule & 
> DIRTY_SUBMODULE_MODIFIED)
> + d->worktree_status = 'm';
> + else if (d->dirty_submodule & 
> DIRTY_SUBMODULE_UNTRACKED)
> + d->worktree_status = '?';
> + }
> + }

Given the above, this implementation looks good. So for what it's worth,
this patch is
Reviewed-by: Jonathan Nieder 

Patch 1/3 is the one that gives me pause, since I hadn't been able to
chase down all callers.  Would it be feasible to split that into two:
one patch to switch to --porcelain=2 but not change behavior, and a
separate patch to improve what is stored in the dirty_submodule flags?

Thanks,
Jonathan


Re: [PATCH] sequencer: fix missing newline

2017-03-23 Thread Jonathan Nieder
Brandon Williams wrote:

> When using rebase --interactive where one of the lines is marked as
> 'edit' this is the resulting output:
>
> Stopped at ec3b9c4...  stuffYou can amend the commit now, with
[...]
> Signed-off-by: Brandon Williams 
> ---
>  sequencer.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Thanks for fixing it.  I don't see any other fprintf in the file with
missing \r or \n at the end, so this fix looks complete.

Reviewed-by: Jonathan Nieder 


Re: USE_SHA1DC is broken in pu

2017-03-23 Thread Jonathan Nieder
Johannes Schindelin wrote:
> On Wed, 22 Mar 2017, Jonathan Nieder wrote:
> > Johannes Schindelin wrote:

>>> As to the default of seriously slowing down all SHA-1 computations:
>>> since you made that the default, at compile time, with no way to turn
>>> on the faster computation, this will have a major, negative impact.
>>> Are you really, really sure you want to do that?
>>>
>>> I thought that it was obvious that we would have at least a runtime
>>> option to lessen the load.
>>
>> It's not obvious to me.  I agree that the DC_SHA1 case can be sped up,
>> e.g. by turning off the collision detection for sha1 calculations that
>> are not part of fetching, receiving a push, or running fsck.
>
> And in those cases, using OpenSSL instead is *even* faster.
[...]
> The index is 300MB. If you have to experience a sudden drop in performance
> of `git add`, even by "only" 30%, relative to OpenSSL, it is very
> noticeable. It is painful.
[...]
> It gets even worse when it comes to fetching, let alone cloning.
[...]
> And by "switching collision detection off", I of course refer to *not*
> using SHA1DC's routines at all, but what would have been used originally,
> in Git for Windows' case: (hardware-accelerated) OpenSSL.
>
> Did I manage to clarify the problem?

Yes.  Thank you for explaining.

Sincerely,
Jonathan


Re: [PATCH 1/7] submodule.c: use argv_array in is_submodule_modified

2017-03-23 Thread Jonathan Nieder
Stefan Beller wrote:

> struct argv_array is easier to use and maintain

Yes!

[...]
>  submodule.c | 10 ++
>  1 file changed, 2 insertions(+), 8 deletions(-)

I also like the diffstat. :)

[...]
> --- a/submodule.c
> +++ b/submodule.c
> @@ -1043,12 +1043,6 @@ unsigned is_submodule_modified(const char *path, int 
> ignore_untracked)
>  {
>   ssize_t len;
>   struct child_process cp = CHILD_PROCESS_INIT;
> - const char *argv[] = {
> - "status",
> - "--porcelain",
> - NULL,
> - NULL,
> - };

and the avoidance of this kind of fixed-size magic.

Reviewed-by: Jonathan Nieder 

Thank you.


Re: [PATCH 2/7] submodule.c: convert is_submodule_modified to use strbuf_getwholeline_fd

2017-03-23 Thread Jonathan Nieder
Stefan Beller wrote:

> Instead of implementing line reading yet again, make use of our beautiful
> library functions.
>
> Signed-off-by: Stefan Beller 
> ---
>  submodule.c | 14 ++
>  1 file changed, 2 insertions(+), 12 deletions(-)

This changes buffering behavior in two ways:

- by using strbuf_getwholeline_fd instead of strbuf_read, we avoid
  having to allocate memory for the entire child process output at
  once.  That is, we limit maximum memory usage (good).

- by using strbuf_getwholeline_fd instead of strbuf_read, we xread
  one byte at a time instead of larger chunks.  That means more
  overhead due to context switches (bad).

Some callers of getwholeline_fd need the one-byte-at-a-time thing to
avoid waiting too long for input, and in some cases the alternative is
deadlock.  We know this caller doesn't fall into that category because
it was doing fine slurping the entire file at once.  As the
getwholeline_fd API doc comment explains:

 * It reads one character at a time, so it is very slow.  Do not
 * use it unless you need the correct position in the file
 * descriptor.

Can this caller use xfdopen and strbuf_getwholeline instead to get
back the benefit of buffering (i.e., something like the below)?

Another consequence of switching to streaming is that we may close
before the child finishes.  Do we have to worry about handling SIGPIPE
in the child?  I haven't checked how this handles that --- a test
might be useful.

Thanks and hope that helps,
Jonathan

diff --git i/submodule.c w/submodule.c
index c1b7b78260..184d5739fc 100644
--- i/submodule.c
+++ w/submodule.c
@@ -1043,6 +1043,7 @@ unsigned is_submodule_modified(const char *path, int 
ignore_untracked)
 {
struct child_process cp = CHILD_PROCESS_INIT;
struct strbuf buf = STRBUF_INIT;
+   FILE *fp;
unsigned dirty_submodule = 0;
const char *git_dir;
 
@@ -1070,7 +1071,8 @@ unsigned is_submodule_modified(const char *path, int 
ignore_untracked)
if (start_command(&cp))
die("Could not run 'git status --porcelain' in submodule %s", 
path);
 
-   while (strbuf_getwholeline_fd(&buf, cp.out, '\n') != EOF) {
+   fp = xfdopen(cp.out, "r");
+   while (strbuf_getwholeline(&buf, fp, '\n') != EOF) {
if ((buf.buf[0] == '?') && (buf.buf[1] == '?')) {
dirty_submodule |= DIRTY_SUBMODULE_UNTRACKED;
if (dirty_submodule & DIRTY_SUBMODULE_MODIFIED)
@@ -1082,7 +1084,7 @@ unsigned is_submodule_modified(const char *path, int 
ignore_untracked)
break;
}
}
-   close(cp.out);
+   fclose(fp);
 
if (finish_command(&cp))
die("'git status --porcelain' failed in submodule %s", path);


Re: [PATCH] branch doc: Change `git branch ` to use ``

2017-03-23 Thread Jonathan Nieder
Junio C Hamano wrote:

> I can go without "Do not forget ..." and everything that follows,
> though, and if we are going to do so, then
>
>   --list::
>   List branches.  With optional ...,
>   e.g. `git branch --list 'maint-*`, list only the
>   branches that match the pattern(s).
>
> would be fine.  I am not opposed to having an visually distinctive
> example--I just do not want to have one that is wrong without
> clearly marking it as such.

I like this one.  I haven't witnessed "git branch maint-*" being a
common mistake.

For comparison, "git branch -l maint-*" does seem to be a common
mistake.  It's a shame that the short-and-sweet "-l" was taken for
that purpose.  Perhaps it's worth calling that out here?

--list::
--list ::
List branches.  If  is specified (e.g.,
"git branch --list 'maint-*'), list only the
branches that match the pattern.
+
This should not be confused with `git branch -l `,
which creates a branch named `` with a reflog.
See `--create-reflog` above for details.

Thanks,
Jonathan


Re: How do I copy a branch & its config to a new name?

2017-03-23 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> I couldn't find any previous list discussion about this, but if not I
> think something like:
>
> git [checkout|branch] --copy src dest
>
> Would make sense as an interface for this.

Sounds good to me. :)

Thanks,
Jonathan


Re: [PATCH 3/3] t/README: clarify the test_have_prereq documentation

2017-03-23 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> --- a/t/README
> +++ b/t/README
> @@ -612,8 +612,10 @@ library for your script to use.
>   - test_have_prereq 
>  
> Check if we have a prerequisite previously set with
> -   test_set_prereq. The most common use of this directly is to skip
> -   all the tests if we don't have some essential prerequisite:
> +   test_set_prereq. The most common use-case for using this directly,
> +   as opposed to as an argument to test_expect_*, is to skip all the
> +   tests at the start of the test script if we don't have some
> +   essential prerequisite:

Nit: the hyphenated word "use-case" feels jargon-ish.  I've seen it
more often as two separate words.  Better yet to clarify that we're
talking about idioms and not just goals:

   The most common way to use this explicitly (as opposed
  to the implicit use when an argument is passed to test_expect_*) is to
  skip all the tests at the start of a test script if we don't have some
  essential prerequisite:

With or without such a change,
Reviewed-by: Jonathan Nieder 

Thanks.


<    5   6   7   8   9   10   11   12   13   14   >