Re: email as a bona fide git transport

2019-10-16 Thread Jonathan Nieder
Hi,

A few small points.

Vegard Nossum wrote:

> * git am (or an alternative command) needs to recreate the commit
>   perfectly when applied, including applying it to the correct parent

Interesting.  "git format-patch" has a --base option to do some of
what you're looking for, for the sake of snowpatch
.  Though it's not exactly the
same thing you mean.

We also discussed sending merge commits by mail recently in the
virtual git committer summit[1].

Of course, the devil is in the details.  It's straightforward to use
"git bundle" to use mail as a Git transport today, but presumably you
also want the ability to perform reviews along the way and that's not
so easy with a binary format.  Do you have more details on what you'd
want the format to look like, particularly for merge commits?

[...]
> there
> is no need for "changeset IDs" or whatever, since you can just use the
> git SHA1 which is unique, unambiguous, and stable.

In [2] the hope was for some identifier that is preserved by "git
rebase" and "git commit --amend" (so that you can track the evolution
of a change as the author improves it in response to reviews).  Is
that the conversation you're alluding to?

[...]
> Disadvantages:
>
> - requires patching git

That's not a disadvantage.  It means get to work with the Git project,
which is a welcoming bunch of people, working on userspace (seeing how
the other half lives), and improving the lives of everyone using Git.

[...]
> Date: Sat, 5 Oct 2019 16:15:59 +0200
> Subject: [PATCH 1/3] format-patch: add --complete
>
> Include the raw commit data between the changelog and the diffstat.

Oh!  I had missed this on first reading because it was in an
attachment.

I have mixed feelings.  Can you say a bit more about the advantages
and disadvantages relative to sending a git bundle?  What happens if a
mail client or a box along the way mangles whitespace in the commit
message?

Happy hacking,
Jonathan

[1] 
https://public-inbox.org/git/nycvar.qro.7.76.6.1909261253400.15...@tvgsbejvaqbjf.bet/
[2] 
https://lore.kernel.org/ksummit-discuss/CAD=FV=upjppuyftpjf-ogzj_6ljle4ptxmhcocedmh1lxss...@mail.gmail.com/


Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Jonathan Nieder
Eric Wong wrote:
> Konstantin Ryabitsev  wrote:

>> This is actually really fast if you already have a local copy of the
>> repository with most objects. Try this yourself if you have
>> torvalds/linux.git locally:
>>
>> git clone --bare -s torvalds/linux.git test
>
> Yep, -s (--shared) makes cloning really cheap.  One of my goals is to get
>
>   git clone -s https://example.com/torvalds/linux.git
>
> and
>
>   git clone -s https://example.com/torvalds/linux.git/clone.bundle
>
> working.  That would make it easier for new contributors to
> setup lightweight clones and pull in history on an as-needed
> basis w/o hacks like shallow cloning.

Does "git clone --filter=blob:none" do what you're looking for?

Thanks,
Jonathan


Re: RFE: use patchwork to submit a patch

2019-10-10 Thread Jonathan Nieder
(+cc: git)
Hi,

Konstantin Ryabitsev wrote[1]:

> How I envision this would work:
>
> - user creates an account (which requires a mail confirmation)
> - they choose a "submit patch" option from the menu
> - the patch submission screen has a succession of screens:

Interesting!  This reminds me a bit of https://gitgitgadget.github.io
(except using patchwork instead of github pull requests as substrate).

That leads me to wonder: should these kinds of tools share some code?
Are there any subtleties that web-to-mail patch submission interfaces
have in common or can learn from each other?

Thanks,
Jonathan

[1] 
https://lore.kernel.org/workflows/20191010144150.hqiosvwolm3lmzp5@chatter.i7.local/


Re: [PATCH] parser: Unmangle From: headers that have been mangled for DMARC purposes

2019-10-10 Thread Jonathan Nieder
Hi,

Andrew Donnellan wrote:

> To avoid triggering spam filters due to failed signature validation, many
> mailing lists mangle the From header to change the From address to be the
> address of the list, typically where the sender's domain has a strict DMARC
> policy enabled.
>
> In this case, we should try to unmangle the From header.
>
> Add support for using the X-Original-Sender or Reply-To headers, as used by
> Google Groups and Mailman respectively, to unmangle the From header when
> necessary.
>
> Closes: #64 ("Incorrect submitter when using googlegroups")
> Reported-by: Alexandre Belloni 
> Reported-by: Stephen Rothwell 
> Signed-off-by: Andrew Donnellan 
> ---
>  patchwork/parser.py| 75 ++
>  patchwork/tests/test_parser.py | 68 --
>  2 files changed, 130 insertions(+), 13 deletions(-)

Interesting!  I'm cc-ing the Git mailing list in case "git am" might
wnat to learn the same support.

Thanks,
Jonathan

(patch left unsnipped for reference)
> diff --git a/patchwork/parser.py b/patchwork/parser.py
> index 7dc66bc05a5b..79beac5617b1 100644
> --- a/patchwork/parser.py
> +++ b/patchwork/parser.py
> @@ -321,12 +321,7 @@ def find_series(project, mail, author):
>  return _find_series_by_markers(project, mail, author)
>  
>  
> -def get_or_create_author(mail):
> -from_header = clean_header(mail.get('From'))
> -
> -if not from_header:
> -raise ValueError("Invalid 'From' header")
> -
> +def split_from_header(from_header):
>  name, email = (None, None)
>  
>  # tuple of (regex, fn)
> @@ -355,6 +350,65 @@ def get_or_create_author(mail):
>  (name, email) = fn(match.groups())
>  break
>  
> +return (name, email)
> +
> +
> +# Unmangle From addresses that have been mangled for DMARC purposes.
> +#
> +# To avoid triggering spam filters due to failed signature validation, many
> +# mailing lists mangle the From header to change the From address to be the
> +# address of the list, typically where the sender's domain has a strict
> +# DMARC policy enabled.
> +#
> +# Unfortunately, there's no standardised way of preserving the original
> +# From address.
> +#
> +# Google Groups adds an X-Original-Sender header. If present, we use that.
> +#
> +# Mailman preserves the original address by adding a Reply-To, except in the
> +# case where the list is set to either reply to list, or reply to a specific
> +# address, in which case the original From is added to Cc instead. These 
> corner
> +# cases are dumb, but we try and handle things as sensibly as possible by
> +# looking for a name in Reply-To/Cc that matches From. It's inexact but 
> should
> +# be good enough for our purposes.
> +def get_original_sender(mail, name, email):
> +if name and ' via ' in name:
> +# Mailman uses the format " via "
> +# Google Groups uses "'' via "
> +stripped_name = name[:name.rfind(' via ')].strip().strip("'")
> +
> +original_sender = clean_header(mail.get('X-Original-Sender', ''))
> +if original_sender:
> +new_email = split_from_header(original_sender)[1].strip()[:255]
> +return (stripped_name, new_email)
> +
> +addrs = []
> +reply_to_headers = mail.get_all('Reply-To') or []
> +cc_headers = mail.get_all('Cc') or []
> +for header in reply_to_headers + cc_headers:
> +header = clean_header(header)
> +addrs = header.split(",")
> +for addr in addrs:
> +new_name, new_email = split_from_header(addr)
> +if new_name:
> +new_name = new_name.strip()[:255]
> +if new_email:
> +new_email = new_email.strip()[:255]
> +if new_name == stripped_name:
> +return (stripped_name, new_email)
> +
> +# If we can't figure out the original sender, just keep it as is
> +return (name, email)
> +
> +
> +def get_or_create_author(mail, project=None):
> +from_header = clean_header(mail.get('From'))
> +
> +if not from_header:
> +raise ValueError("Invalid 'From' header")
> +
> +name, email = split_from_header(from_header)
> +
>  if not email:
>  raise ValueError("Invalid 'From' header")
>  
> @@ -362,6 +416,9 @@ def get_or_create_author(mail):
>  if name is not None:
>  name = name.strip()[:255]
>  
> +if project and email.lower() == project.listemail.lower():
> +name, email = get_original_sender(mail, name, email)
> +
>  # this correctly handles the case where we lose the race to create
>  # the person and another process beats us to it. (If the record
>  # does not exist, g_o_c invokes _create_object_from_params which
> @@ -1004,7 +1061,7 @@ def parse_mail(mail, list_id=None):
>  
>  if not is_comment and (diff or pull_url):  # patches or pull requests
>  # we delay the saving until we know we have a patch.
> -author = get_or_create_author(mail)
> +author = ge

Re: Fetch origin/HEAD during git fetch

2019-10-08 Thread Jonathan Nieder
Hi!

Dominik Salvet wrote:

> 1) `git fetch origin && git remote set-head origin -a`
> 2) `git fetch origin +refs/heads/*:refs/remotes/origin/*
> HEAD:refs/remotes/origin/HEAD`
> 3) instead of git init and remote, use directly `git clone --no-checkout`
>
> The first solution is not suitable due its delay caused by remote
> access (2 separate invocations). For smaller repositories, delays of
> these individual commands are almost comparable. However, this
> solution is semantically exactly what I want.

Interesting.

For the specific case, it sounds like some kind of "git fetch
--also-set-head" would do the trick for you.  As you can see, I'm awful
at naming command line options --- aside from the need to give it a
better name, any others thoughts on that?  Would you be interested in
taking a stab at implementing it?

For the more general case, there's been some discussion of fetching
and pushing symrefs on-list before.  [1] discusses one possible UI.

[...]
> The third solution has several problems. The first one is the created
> default local branch. So delete it.

Hm, I don't follow.  Does "git checkout --orphan" do what you're
looking for?

Thanks and hope that helps,
Jonathan

[1] https://public-inbox.org/git/20180814214723.ga...@sigill.intra.peff.net/


Re: Raise your hand to Ack jk/code-of-conduct if your Ack fell thru cracks

2019-10-08 Thread Jonathan Nieder
Junio C Hamano wrote:

> For reference, here is the CoC the patch wants to add (there is no
> such topic yet locally, nor a single patch that can be made into
> such a topic, so there isn't exactly something people can Ack on
> yet. So here is a "preview" of what we would see once such a series
> lands).

Acked-by: Jonathan Nieder 

Thanks for your thoughtful work.


Re: merge-recursive thinks symlink's child dirs are "real"

2019-09-16 Thread Jonathan Nieder
Jonathan Tan wrote:

> This was raised by a coworker at $DAYJOB. I run the following script:
[reproduction recipe from (*) snipped]
> The cherry-pick must be manually resolved, when I would expect it to
> happen without needing user intervention.
>
> You can see that at the point of the cherry-pick, in the working
> directory, ./foo is a symlink and ./foo/bar is a directory. I traced the
> code that ran during the cherry-pick to process_entry() in
> merge-recursive.c. When processing "foo/bar", control flow correctly
> reaches "Case B: Added in one", but the dir_in_way() invocation returns
> true, since lstat() indeed reveals that "foo/bar" is a directory.

Gábor covered the "what happened", so let me say a little more about
the motivation.

The "foo" symlink is being replaced by a "foo" directory containing a
"bar" file.  We're pretty far along now: we want to write actual files
to disk.  Using the index we know where we were going from and to, but
not everything in the world is tracked in the index: there could be
build outputs under "foo/bar" blocking the operation from moving
forward.

So we check whether there's a directory there.  Once we are writing
things out, there won't be, but the symlink confuses us.  Nice find.

[...]
> Is this use case something that Git should be able to handle, and if
> yes, is the correct solution to teach dir_in_way() that dirs reachable
> from symlinks are not really in the way (presumably the implementation
> would climb directories until we reach the root or we reach a filesystem
> boundary, similar to what we do when we search for the .git directory)?

The crucial detail here is that "foo" is going to be removed before we
write "foo/bar".  We should be able to notice that and skip the
dir_in_way check.

Just my uninformed guess --- I haven't tried it. :)

Thoughts?

Thanks,
Jonathan

(*) 
https://public-inbox.org/git/20190916214707.190171-1-jonathanta...@google.com/


Re: Git in Outreachy December 2019?

2019-09-16 Thread Jonathan Nieder
SZEDER Gábor wrote:
> On Mon, Sep 16, 2019 at 11:42:08AM -0700, Emily Shaffer wrote:

>>  - try and make progress towards running many tests from a single test
>>file in parallel - maybe this is too big, I'm not sure if we know how
>>many of our tests are order-dependent within a file for now...
>
> Forget it, too many (most?) of them are order-dependent.

Hm, I remember a conversation about this with Thomas Rast a while ago.
It seemed possible at the time.

Most tests use "setup" or "set up" in the names of test assertions
that are required by later tests.  It's very helpful for debugging and
maintenance to be able to skip or reorder some tests, so I've been
able to rely on this a bit.  Of course there's no automated checking
in place for that, so there are plenty of test scripts that are
exceptions to it.

If we introduce a test_setup helper, then we would not have to rely on
convention any more.  A test_setup test assertion would represent a
"barrier" that all later tests in the file can rely on.  We could
introduce some automated checking that these semantics are respected,
and then we get a maintainability improvement in every test script
that uses test_setup.  (In scripts without any test_setup, treat all
test assertions as barriers since they haven't been vetted.)

With such automated tests in place, we can then try updating all tests
that say "setup" or "set up" to use test_setup and see what fails.

Some other tests cannot run in parallel for other reasons (e.g. HTTP
tests).  These can be declared as such, and then we have the ability
to run arbitrary individual tests in parallel.

Most of the time in a test run involves multiple test scripts running
in parallel already, so this isn't a huge win for the time to complete
a normal test run.  It helps more with expensive runs like --valgrind.

Thanks,
Jonathan


Re: Git standup on IRC

2019-09-16 Thread Jonathan Nieder
(+cc: Thomas Gummerer)
Hi Johannes,

Johannes Schindelin wrote:
> On Fri, 13 Jul 2018, Jonathan Nieder wrote:

>> Some of us have been informally doing a "git standup" on #git-devel on
>> irc.freenode.net on IRC every two weeks at 17:00-17:30 UTC.  It's a
>> way for people to say what they're working on and ask questions.
[...]
> As discussed during the Virtual Git Contributors' Summit, I would like
> to propose a new day (Fridays at 17:00UTC, I want to be well into my
> weekends...): Monday.
>
> So how about meeting on IRC every other Monday at 17:00 UTC, starting
> this coming Monday? I'll try to be there.

Sounds good to me.  Let's do that.

> (In case it was not clear: those standups are meant to offer a really
> informal venue to talk about patches you are working (or planning to
> work) on, _especially_ for people who feel intimidated by this here
> mailing list...)

Thanks to Thomas for setting up the calendar
https://calendar.google.com/calendar/embed?src=nk8ph2kh4p5tgfcctb8i7dm6d4%40group.calendar.google.com

Sincerely,
Jonathan


[PATCH bc/hash-independent-tests-part-4] t: decrease nesting in test_oid_to_path

2019-08-07 Thread Jonathan Nieder
t1410.3 ("corrupt and checks") fails when run using dash versions
before 0.5.8, with a cryptic message:

mv: cannot stat '.git/objects//e84adb2704cbd49549e52169b4043871e13432': 
No such file or directory

The function generating that path:

test_oid_to_path () {
echo "${1%${1#??}}/${1#??}"
}

which is supposed to produce a result like

12/3456789

But a dash bug[*] causes it to instead expand to

/3456789...

The stream of symbols that makes up this function is hard for humans
to follow, too.  The complexity mostly comes from the repeated use of
the expression ${1#??} for the basename of the loose object.  Use a
variable instead --- nowadays, the dialect of shell used by Git
permits local variables, so this is cheap.

An alternative way to work around [*] is to remove the double-quotes
around test_oid_to_path's return value.  That makes the expression
easier for dash to read, but harder for humans.  Let's prefer the
rephrasing that's helpful for humans, too.

Noticed by building on Ubuntu trusty, which uses dash 0.5.7.

[*] Fixed by v0.5.8~13 ("[EXPAND] Propagate EXP_QPAT in subevalvar, 2013-08-23).

Signed-off-by: Jonathan Nieder 
---
 t/test-lib-functions.sh | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/t/test-lib-functions.sh b/t/test-lib-functions.sh
index 7860491660..de58e8b502 100644
--- a/t/test-lib-functions.sh
+++ b/t/test-lib-functions.sh
@@ -1337,7 +1337,8 @@ test_oid () {
 # Insert a slash into an object ID so it can be used to reference a location
 # under ".git/objects".  For example, "deadbeef..." becomes "de/adbeef..".
 test_oid_to_path () {
-   echo "${1%${1#??}}/${1#??}"
+   local basename=${1#??}
+   echo "${1%$basename}/$basename"
 }
 
 # Choose a port number based on the test script's number and store it in
-- 
2.23.0.rc1.153.gdeed80330f



Re: amend warnings with no changes staged

2019-08-05 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:
>> Lukas Gross wrote:

>>> I had intended to stage commits but forgot to do so. Git responded
>>> with a normal commit creation message, so I pushed to the remote to
>>> begin a CI build. When the build failed for the same reason, I
>>> realized
[...]
>>  $ git commit --amend --no-edit
>>  [detached HEAD 33a3db8805] Git 2.23-rc1
>>   Author: Junio C Hamano 
>>   Date: Fri Aug 2 13:12:24 2019 -0700
>>   2 files changed, 2 insertions(+), 1 deletion(-)
>>  $
>>
>> Some non-judgemental descriptive output like
>>
>>  $ git commit --amend --no-edit
>>  No changes.
>>  $
>>
>> would address this case, without bothering people who are doing it
>> intentionally.  So I think there's room for a simple improvement here.
>
> I do that to refresh the committer timestamp.

I do, too.  The proposal is, paraphrasing,

$ git commit --amend --no-edit
Ah, I see that you want me to refresh the committer timestamp.
Done, as requested.
$

In other words:

[...]
>   I am not
> yet convinced that "--amend --no-edit will become a no-op" is the
> final solution we want.

Not this.

Hoping that clarifies,
Jonathan


Re: amend warnings with no changes staged

2019-08-05 Thread Jonathan Nieder
Lukas Gross wrote:

> I had intended to stage commits but forgot to do so. Git responded
> with a normal commit creation message, so I pushed to the remote to
> begin a CI build. When the build failed for the same reason, I
> realized I had forgotten to stage the changes. An additional line in
> the response to the effect of “Warning: did you mean to amend with no
> changes?” would be very helpful to shorten this feedback loop.

On second thought:

$ git commit --amend --no-edit
[detached HEAD 33a3db8805] Git 2.23-rc1
 Author: Junio C Hamano 
 Date: Fri Aug 2 13:12:24 2019 -0700
 2 files changed, 2 insertions(+), 1 deletion(-)
$

Some non-judgemental descriptive output like

$ git commit --amend --no-edit
No changes.
$

would address this case, without bothering people who are doing it
intentionally.  So I think there's room for a simple improvement here.

Care to take a stab at it?  builtin/commit.c would be the place to
start.

Thanks and sorry for the roller-coaster,
Jonathan


Re: amend warnings with no changes staged

2019-08-05 Thread Jonathan Nieder
(administrivia: please don't top-post)
Lukas Gross wrote:

> I had intended to stage commits but forgot to do so. Git responded
> with a normal commit creation message, so I pushed to the remote to
> begin a CI build. When the build failed for the same reason, I
> realized I had forgotten to stage the changes. An additional line in
> the response to the effect of “Warning: did you mean to amend with no
> changes?” would be very helpful to shorten this feedback loop.

Ah, I see.  You passed --no-edit so you didn't see the usual --dry-run
style output that "git commit" shows.  You forgot to run "git add"
before amending, and this is what you'd like commit to assist you
with.

That said, this is sometimes an operation people perform
intentionally.

Ideas:

* Can the documentation do a better job of steering people away
  from --no-edit?  The hints shown when "git commit --amend" (and
  --no-amend, for that matter) open a text editor tend to be very
  helpful for understanding what is going to happen.  If any
  documentation is leading people to forgo that tool, we should fix
  it.

* Should "git commit --no-edit" say a little more about what it did,
  since it knows that the user didn't get to see the text editor?

* Should we have a special option (e.g. "git commit --amend
  --refresh") for a no-op amend?  That ways, when a user doesn't pass
  that option, it would be more clear that it is a mistake, allowing
  a warning or even an error.

Of these three, the first seems most compelling to me.  Others may
have other ideas.

Thoughts?

Thanks,
Jonathan


Re: amend warnings with no changes staged

2019-08-05 Thread Jonathan Nieder
Hi,

Lukas Gross wrote:

> I have occasionally used git commit --amend without staging any
> changes or modifying the commit message (--no-edit). Since this is
> often done unintentionally, could amend warn when it is being used in
> this way?

Can you say more about the context?  What were you trying to do when
you performed this operation?  What happened instead?

Thanks,
Jonathan


Re: [PATCH 0/6] Port git to Plan 9

2019-08-04 Thread Jonathan Nieder
Hi,

Kyohei Kadota wrote:

> I think it is possible to replace rc with ape/sh, ape/sh is POSIX
> shell in Plan 9.
>
> However Plan 9 don't have recent versions of Unix tools,
> such as gcc, g++, autotools, gmake or perl,
> so it is VERY hard to use Makefile instead of mkfile.

The default Git build doesn't use autotools.  See INSTALL for more
details.

What version of gmake is available for Plan 9?  I wouldn't expect
Git's build system to be super demanding as far as recent "make"
features go.

So I wonder whether it would make sense to do something like the
following:

- add entries to config.mak.uname to set the compiler e.g. to 6c
  when appropriate

- make appropriate compatibility fixes in git-compat-util.h

- add any necessary helpers to the compat/ directory

- use gmake to build

Would that work?

Thanks,
Jonathan


Re: [PATCH v3 0/3] Add a JSON Schema for trace2 events

2019-08-02 Thread Jonathan Nieder
SZEDER Gábor wrote:
> On Fri, Aug 02, 2019 at 09:59:13AM -0700, Jonathan Nieder wrote:

>> In the short term, we can run tests internally to check that Git keeps
>> following the schema.  Let's not block patches 1 and 2 by this ---
>
> To my understanding patch 2 is only a proof of concept: it starts
> using a programming language that has not been used before in this
> project, to implement functionality that is readily available in
> several existing tools, without even arguing (let alone convincingly
> arguing) in the commit message why this approach is a good idea.

Well, Golang has been used in contrib/ before. ;-)

If I understand [1] and [2] correctly, we haven't found an existing
standalone tool that is able to efficiently validate a high volume of
traces.  But for the purpose of sanity-checking that running a few
typical commands generates a trace that fits the schema, it's not
important that the validator be super fast.  So we can use a tool like
jq, picking one using the criteria that it

- allows performing the validation without too much fuss
- is likely to already be present on some developers' machines

Thanks,
Jonathan

[1] https://public-inbox.org/git/20190726220348.gf43...@google.com/
[2] https://public-inbox.org/git/cover.1564009259.git.stead...@google.com/


Re: [PATCH v3 0/3] Add a JSON Schema for trace2 events

2019-08-02 Thread Jonathan Nieder
SZEDER Gábor wrote:
> On Thu, Aug 01, 2019 at 06:52:47PM -0700, Jonathan Nieder wrote:

>> Gábor, if we introduce such a parameter, do you think it would make
>> sense for us to set up a worker that passes it?
>
> That would be even worse than the current approach of the third patch,
> because the additional worker would have to install dependencies,
> build Git and run the test suite, in addition to the enormous overhead
> of redundantly validating the trace output of every git command
> executed during 'make test'.  So instead of adding "only" 10 minutes
> to every build, it would add over 20.

Thanks, that's helpful to know.

It sounds like if we want to run this kind of expensive test in CI, we
would want to set it up differently: e.g. daily runs against "pu"
instead of running on every push.

Jonathan


Re: [PATCH v3 0/3] Add a JSON Schema for trace2 events

2019-08-02 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:
> On Thu, 1 Aug 2019, Jonathan Nieder wrote:

>> What do you think of making the validation disabled by default and
>> using a parameter (see "Running tests with special setups" in
>> t/README) to turn it on?  That way, it should be okay for it to take
>> 10 minutes because this would only affect such specialized workers as
>> choose to set that parameter, instead of all of them.
[...]
> So how about adding a separate test script that activates Trace2, then
> runs a _few_ selected, very specific commands, starting with a set that
> Josh thinks representative, validate the generated JSON, and then only
> add new invocations if we *ever* see violations of the schema, to verify
> that we don't regress on that front again?
>
> Such a script can't take more than a couple of seconds to run, I am
> sure. And it would totally strike a sensible balance between benefit and
> time spent.

I think this suggestion has been covered a little upthread, but
apologies for not paying it the fuller attention that it deserves.

Such a test would check that

* the validation code still works, the schema is well formed, etc
* traces produced by common commands fit the schema

That makes it very likely the right thing to do.  This is a test that's
cheap to run, so we can make it happen automatically as part of a normal
test suite run for any developer that has ajv installed (including
various CI jobs).  And it unblocks getting patches 1 and 2 from this
series in.  So, basically, you're right. :)

That said, it still leaves an unmet need that is important to us.

We cannot trust the JSON output and rely on it without a more
exhaustive test that the schema is accurate.  It is very easy to add a
new field or event type that only shows up in a specialized code path,
without remembering to update the schema or docs, because of the
nature of how the trace2 API works.

In this thread, you can already see some of the fruit of this more
exhaustive approach:

- it identifies the tests that write invalid UTF-8 to the traces,
  illustrating an issue with the format (i.e., a real bug) that we
  can now be more aware of

- it identified a field that had been renamed in code where we had
  forgotten to update the documentation

The exhaustive approach really helps.  Arguing against it kind of
feels like saying "leak checkers are great, but why run one on the
full test suite instead of a dedicated tool that exercises the code
paths where you expect to find leaks?"

In the short term, we can run tests internally to check that Git keeps
following the schema.  Let's not block patches 1 and 2 by this ---
instead, the simple basic functionality test you're describing seems
like a good way to make progress for the moment.

We can upstream it later.

Thanks,
Jonathan


Re: cannot clone --single-commit instead of --single-branch

2019-08-01 Thread Jonathan Nieder
(cc: Duy, who might enjoy this walk through history)
Hi,

Alexander Mills wrote:

> git clone --single-branch=
>
> doesn't seem to work?

I've occasionally wanted something like this (actually, I've wanted to
pass a refname like "refs/meta/config").  builtin/clone.c contains

static struct ref *find_remote_branch(const struct ref *refs, const 
char *branch)
{
struct ref *ref;
struct strbuf head = STRBUF_INIT;
strbuf_addstr(&head, "refs/heads/");
strbuf_addstr(&head, branch);
ref = find_ref_by_name(refs, head.buf);
strbuf_release(&head);

if (ref)
return ref;

So far, what one would expect.

strbuf_addstr(&head, "refs/tags/");
strbuf_addstr(&head, branch);
ref = find_ref_by_name(refs, head.buf);
strbuf_release(&head);

Wait a second: a tag isn't a branch, so why do we accept it as a
value for --branch?  So this is stretching definitions a little
already.

"git log -L:find_remote_branch:builtin/clone.c" tells me this is from
v1.7.10-rc0~125^2~3 (clone: allow --branch to take a tag, 2012-01-16):

Because a tag ref cannot be put to HEAD, HEAD will become detached.
This is consistent with "git checkout ".

I think ideally we could first check for a sha1 and then try the full
set of patterns from ref_rev_parse_rules.  What do you think?  Would
you be interested in taking a stab at it?

Thanks,
Jonathan


Re: cannot clone --single-commit instead of --single-branch

2019-08-01 Thread Jonathan Nieder
Hi,

Bryan Turner wrote:

> Promisor remotes and other in-flight changes might help provide some
> of what you're looking for, but I'm not aware of any already-available
> solution.

You can do

git clone --no-checkout --filter=blob:none $url repo
cd repo
git checkout $commit

That gives you commits and trees from other commits, but the only blobs
it downloads are from the named commit.

Using the existing tree filtering features to limit the trees fetched
and/or writing a patch for commit filtering are left as exercises to
the reader.

Thanks and hope that helps,
Jonathan


Re: Git for Windows v2.23.0-rc0, was Re: [ANNOUNCE] Git v2.23.0-rc0

2019-08-01 Thread Jonathan Nieder
Junio C Hamano wrote:

> I suspect that you may have misread the "is interactive" bit in the
> original; that was used only to decide if we are going to warn.

Ah.  That was indeed confusing.  Anyway, it's nice to see the complexity
go away.

[...]
> +++ b/builtin/log.c
[...]
> @@ -214,12 +204,8 @@ static void cmd_log_init_finish(int argc, const char 
> **argv, const char *prefix,
>   memset(&w, 0, sizeof(w));
>   userformat_find_requirements(NULL, &w);
>  
> - if (mailmap < 0) {
> - if (session_is_interactive() && !rev->pretty_given)
> - warning("%s\n", _(warn_unspecified_mailmap_msg));
> -
> + if (mailmap < 0)
>   mailmap = 0;

As Peff noticed, this should say "mailmap = 1" (which I see you've done
in "pu").  We can simplify further by removing the "-1" case --- we do
not need to distinguish between "on" and "unspecified" any more.

We'll also want to update the docs.  And as Todd suggests, we should
cover how to disable mailmap in tests.

Signed-off-by: Jonathan Nieder 
---

diff --git i/Documentation/config/log.txt w/Documentation/config/log.txt
index 7798e10cb0..e9e1e397f3 100644
--- i/Documentation/config/log.txt
+++ w/Documentation/config/log.txt
@@ -41,4 +41,4 @@ log.showSignature::
 log.mailmap::
If true, makes linkgit:git-log[1], linkgit:git-show[1], and
linkgit:git-whatchanged[1] assume `--use-mailmap`, otherwise
-   assume `--no-use-mailmap`. False by default.
+   assume `--no-use-mailmap`. True by default.
diff --git i/builtin/log.c w/builtin/log.c
index 02fa179077..44b10b3415 100644
--- i/builtin/log.c
+++ w/builtin/log.c
@@ -47,7 +47,7 @@ static int default_follow;
 static int default_show_signature;
 static int decoration_style;
 static int decoration_given;
-static int use_mailmap_config = -1;
+static int use_mailmap_config = 1;
 static const char *fmt_patch_subject_prefix = "PATCH";
 static const char *fmt_pretty;
 
@@ -160,7 +160,7 @@ static void cmd_log_init_finish(int argc, const char 
**argv, const char *prefix,
 struct rev_info *rev, struct setup_revision_opt *opt)
 {
struct userformat_want w;
-   int quiet = 0, source = 0, mailmap = 0;
+   int quiet = 0, source = 0, mailmap;
static struct line_opt_callback_data line_cb = {NULL, NULL, 
STRING_LIST_INIT_DUP};
static struct string_list decorate_refs_exclude = 
STRING_LIST_INIT_NODUP;
static struct string_list decorate_refs_include = 
STRING_LIST_INIT_NODUP;
@@ -204,9 +204,6 @@ static void cmd_log_init_finish(int argc, const char 
**argv, const char *prefix,
memset(&w, 0, sizeof(w));
userformat_find_requirements(NULL, &w);
 
-   if (mailmap < 0)
-   mailmap = 1;
-
if (!rev->show_notes_given && (!rev->pretty_given || w.notes))
rev->show_notes = 1;
if (rev->show_notes)
diff --git i/t/t4203-mailmap.sh w/t/t4203-mailmap.sh
index ede0c292af..918ada69eb 100755
--- i/t/t4203-mailmap.sh
+++ w/t/t4203-mailmap.sh
@@ -442,6 +442,34 @@ test_expect_success 'Log output with log.mailmap' '
test_cmp expect actual
 '
 
+test_expect_success 'log.mailmap=false disables mailmap' '
+   cat >expect <<-\EOF &&
+   Author: CTO 
+   Author: claus 
+   Author: santa 
+   Author: nick2 
+   Author: nick2 
+   Author: nick1 
+   Author: A U Thor 
+   EOF
+   git -c log.mailmap=False log | grep Author > actual &&
+   test_cmp expect actual
+'
+
+test_expect_success '--no-use-mailmap disables mailmap' '
+   cat >expect <<-\EOF &&
+   Author: CTO 
+   Author: claus 
+   Author: santa 
+   Author: nick2 
+   Author: nick2 
+   Author: nick1 
+   Author: A U Thor 
+   EOF
+   git log --no-use-mailmap | grep Author > actual &&
+   test_cmp expect actual
+'
+
 cat >expect <<\EOF
 Author: Santa Claus 
 Author: Santa Claus 


Re: [PATCH v3 0/3] Add a JSON Schema for trace2 events

2019-08-01 Thread Jonathan Nieder
Josh Steadmon wrote:
> On 2019.07.26 15:03, Josh Steadmon wrote:

>> [ajv-cli] can validate the full 1.7M line trace output in just over a
>> minute. Moreover, it has helpful output when validation fails. So I
>> would be happy to re-implement this using ajv-cli.
>
> Unfortunately, ajv on Travis is much slower than on my work machine. It
> still takes over 10 minutes to complete, and is killed by Travis since
> it doesn't provide any progress indicator while it's running.

Alas.

What do you think of making the validation disabled by default and
using a parameter (see "Running tests with special setups" in
t/README) to turn it on?  That way, it should be okay for it to take
10 minutes because this would only affect such specialized workers as
choose to set that parameter, instead of all of them.

Gábor, if we introduce such a parameter, do you think it would make
sense for us to set up a worker that passes it?

> How would people feel about validating a sample of the "make test"
> output? In the short term we could just use command-line tools to sample
> the trace file; for the long-term, we could add a sampling config option
> for trace2 (I've been considering adding this for other reasons anyway).

I kind of like the idea of a "make smoke" that runs some fast
probabilistic tests for interactive use, but for CI runs I prefer the
thoroughness of running the full testsuite.  So for the problem at
hand, I prefer making the test optional and reliable instead of fast
and nondeterministic.

> Ideally we would want the sample to be deterministic for any given
> commit, so that we don't end up with flaky tests if changes are made to
> trace2 while neglecting to update the schema.

That would make it hard to bisect to track down bugs, so I don't like
it as much.

> Since there have been some suggestions to build a standalone test and
> verify its trace output, let me reiterate why I feel it's useful to use
> "make test" instead: I do not feel that I can create a standalone test
> that exercises a wide enough selection of code paths to get sufficient
> coverage of all potential trace2 output. Trying to make a standalone
> test that also anticipates future development is practically impossible.
> Using "make test" means that I can rely on the whole community to
> identify important code paths, both now and in the future.

I agree.

> As always, I am open to other approaches to make sure the schema stays
> up-to-date.

It sounds like the codegen approach would be a nice way to do that,
but that doesn't need to block this step in the right direction.

Thanks for your thoughtfulness,
Jonathan


Re: Git for Windows v2.23.0-rc0, was Re: [ANNOUNCE] Git v2.23.0-rc0

2019-07-31 Thread Jonathan Nieder
Jeff King wrote:

> This seems OK to me, though I kind of wonder if anybody really wants
> "auto".

Sure.  It's just the usual way of handling the lack of support for an
"unset" directive in git's config syntax (for example, if a script
author wants to test the default behavior).

Thanks,
Jonathan


Re: Git for Windows v2.23.0-rc0, was Re: [ANNOUNCE] Git v2.23.0-rc0

2019-07-31 Thread Jonathan Nieder
Jeff King wrote:

> I think part of what my annoyance is responding to (and your willingness
> to just squelch this for everybody) is that switching this particular
> default isn't really that big a deal, that it requires annoying people
> on every single "git log" invocation. Perhaps we would be better
> squelching it entirely and just putting a mention in the release notes.

Yes.

Although as Dscho mentions, it's particularly irritating because it is
not part of the paginated output.

I wonder if the ideal might not be to trigger it more selectively, when
the output actually changed due to a reflog entry.  I mean something
like

commit 393a9dd0f9762c69f753a8fa0bc89c203c6b4e9e (HEAD, origin/foo, 
other/pu)
Merge: 18598e40e6 1eba6eb1c2
Author: A U Thor  (see "git help mailmap")
Date:   Tue Jul 30 15:05:41 2019 -0700

Merge branch 'jt/fetch-cdn-offload' into foo

The message

warning: log.mailmap is not set; its implicit value will change in an
upcoming release. To squelch this message and preserve current
behaviour, set the log.mailmap configuration value to false.

To squelch this message and adopt the new behaviour now, set the
log.mailmap configuration value to true.

is *particularly* unactionable in the current state where we're not
rewriting authors.  I think we should bite the bullet and just flip
the default to "true", with the config as an escape hatch to allow
going back to the old behavior.

Is it too late in the release cycle to do that?  If not, we can do

-- >8 --
Subject: log: use mailmap by default in interactive use

In f0596ecc8de9 (log: add warning for unspecified log.mailmap setting,
2019-07-15), we prepared for a future where "git log" uses the mailmap
by default, using the following conditions:

 1. If log.mailmap or --[no-]use-mailmap is set explicitly explicitly,
respect that.  Otherwise:

 2. If output is not going to a terminal or pager, we might be in a
script.  Match historical behavior by not using the mailmap.
Otherwise:

 3. If the output format was specified using --pretty, we might be in
a script that produces output intended for copy/paste.  Match
historical behavior by not using the mailmap.  Otherwise:

 4. This is an interactive use, where we will be able to change the
default behavior.  Print a warning about the upcoming change
but don't use the mailmap yet.

In practice, the case (4) turns out to be irritating.  It prints
before pager setup, so it just flashes on the screen.  It appears on
every "git log" invocation, even when the mailmap would not result in
the output changing.  The change is not meaningful to most people, and
the new default of --use-mailmap is likely to be preferable for all of
them.

Let's bite the bullet and jump straight to --use-mailmap in case (4).

While at it, add a new log.mailmap setting "auto" that can be used to
explicitly request the new automatic behavior (so that e.g. if
log.mailmap is set to "true" system-side, I can set it to "auto" in my
per-user configuration).

Reported-by: Johannes Schindelin 
Signed-off-by: Jonathan Nieder 
---
 Documentation/RelNotes/2.23.0.txt |  6 ++
 Documentation/config/log.txt  |  4 +++-
 builtin/log.c | 23 ++-
 t/t4203-mailmap.sh| 20 
 t/t7006-pager.sh  |  2 --
 5 files changed, 31 insertions(+), 24 deletions(-)

diff --git a/Documentation/RelNotes/2.23.0.txt 
b/Documentation/RelNotes/2.23.0.txt
index 9896599b2f..b4765a5472 100644
--- a/Documentation/RelNotes/2.23.0.txt
+++ b/Documentation/RelNotes/2.23.0.txt
@@ -91,10 +91,8 @@ UI, Workflows & Features
commit-graph files now, which allows the commit-graph files to be
updated incrementally.
 
- * The "git log" command learns to issue a warning when log.mailmap
-   configuration is not set and --[no-]mailmap option is not used, to
-   prepare users for future versions of Git that uses the mailmap by
-   default.
+ * The "git log" command learned to use --mailmap by default when
+   used interactively without a --pretty format option.
 
  * "git range-diff" output has been tweaked for easier identification
of which part of what file the patch shown is about.
diff --git a/Documentation/config/log.txt b/Documentation/config/log.txt
index 7798e10cb0..739ea298aa 100644
--- a/Documentation/config/log.txt
+++ b/Documentation/config/log.txt
@@ -41,4 +41,6 @@ log.showSignature::
 log.mailmap::
If true, makes linkgit:git-log[1], linkgit:git-show[1], and
linkgit:git-whatchanged[1] assume `--use-mailmap`, otherwise
-   assume `--no-use-mailmap`. False by default.
+   assume `--no-use-mailmap`. The default is `auto`, which means
+   to use mailmap if the 

Re: [PATCH] archive: Store checksum correctly

2019-07-23 Thread Jonathan Nieder
(cc: René Scharfe, "git archive" expert)
Matt Turner wrote:

> tar2sqfs (part of https://github.com/topics/tar2sqfs) rejects tarballs
> made with git archive with the message
>
> invalid tar header checksum!
>
> tar2sqfs recomputes the tarball's checksum to verify it. Its checksum
> implementation agrees with GNU tar, which contains a comment that states
>
> Fill in the checksum field.  It's formatted differently from the
> other fields: it has [6] digits, a null, then a space ...
>
> Correcting this allows tar2sqfs to correctly process the tarballs made
> by git archive.
>
> Signed-off-by: Matt Turner 
> ---
>  archive-tar.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)

Nice.  Is this something that can be covered in tests as well?  (See
t500* for existing "git archive" tests, and see test_lazy_prereq in case
you'd like the test to use an external tool like tar2sqfs that not all
users may have.)

Thanks,
Jonathan

(patch left unsnipped for reference)

> diff --git a/archive-tar.c b/archive-tar.c
> index 3e53aac1e6..f9a157bfd1 100644
> --- a/archive-tar.c
> +++ b/archive-tar.c
> @@ -215,7 +215,9 @@ static void prepare_header(struct archiver_args *args,
>   memcpy(header->magic, "ustar", 6);
>   memcpy(header->version, "00", 2);
>  
> - xsnprintf(header->chksum, sizeof(header->chksum), "%07o", 
> ustar_header_chksum(header));
> + xsnprintf(header->chksum, sizeof(header->chksum), "%06o", 
> ustar_header_chksum(header));
> + header->chksum[6] = '\0';
> + header->chksum[7] = ' ';
>  }
>  
>  static void write_extended_header(struct archiver_args *args,


Re: [PATCH v2] transport-helper: enforce atomic in push_refs_with_push

2019-07-18 Thread Jonathan Nieder
SZEDER Gábor wrote:

> I expected that this will eventually happen after Travis CI's default
> Linux image recently changed from Ubuntu 14.04 to 16.04; explanation
> in the commit message below.
>
> With that patch issues like this could be caught earlier, while they
> are only in 'pu' but not yet in 'next'.  But do we really want to do
> that, is that the right tradeoff?  Dunno...  Adding a dedicated CI job
> just to check that there are no 'for' loop initial declarations seems
> kind of excessive, even if it only builds but doesn't run the test
> suite.  And I don't know whether there are any other undesired ("too
> new") constructs that GCC 4.8 would catch but later compilers quietly
> accept.

This makes sense to me.  Not really for the 'for' loop declaration
aspect: for that, I'd want some more specialized tool that allows
turning on such a check specifically.  But more because Ubuntu trusty
is still a platform that some people use (though hopefully not for
long), so it's helpful as a representative old platform to see if we
break the build on it.

[...]
> [2] On Travis CI 'make test' alone would take about 9 minutes in this
> new job (without running httpd, Subversion, and P4 tests).  For
> comparison, starting the job and building Git with GCC 4.8 takes
> only about 2 minutes.

Nice.  In an ideal world there would be some kind of "make fasttest"
that runs some fast subset of tests.  But this seems pretty safe.

> Signed-off-by: SZEDER Gábor 
> ---
>  .travis.yml   |  4 
>  ci/run-build-and-tests.sh | 17 +
>  2 files changed, 17 insertions(+), 4 deletions(-)

Thanks.

[...]
> --- a/ci/run-build-and-tests.sh
> +++ b/ci/run-build-and-tests.sh
> @@ -11,9 +11,9 @@ windows*) cmd //c mklink //j t\\.prove "$(cygpath -aw 
> "$cache_dir/.prove")";;
>  esac
>  
>  make
> -make test
> -if test "$jobname" = "linux-gcc"
> -then
> +case "$jobname" in
> +linux-gcc)
> + make test
>   export GIT_TEST_SPLIT_INDEX=yes
>   export GIT_TEST_FULL_IN_PACK_ARRAY=true
>   export GIT_TEST_OE_SIZE=10
> @@ -21,7 +21,16 @@ then
>   export GIT_TEST_COMMIT_GRAPH=1
>   export GIT_TEST_MULTI_PACK_INDEX=1
>   make test
> -fi
> + ;;
> +linux-gcc-4.8)
> + # Don't run the tests; we only care about whether Git can be
> + # built with GCC 4.8, as it errors out on some undesired (C99)
> + # constructs that newer compilers seem to quietly accept.
> + ;;
> +*)
> + make test
> + ;;
> +esac

Does what it says on the tin.
Reviewed-by: Jonathan Nieder 


Re: [RFC/PATCH] CodingGuidelines: spell out post-C89 rules

2019-07-18 Thread Jonathan Nieder
Junio C Hamano wrote:

> -- >8 --
> Even though we have been sticking to C89, there are a few handy
> features we borrow from more recent C language in our codebase after
> trying them in weather balloons and saw that nobody screamed.
>
> Spell them out.
>
> While at it, extend the existing variable declaration rule a bit to
> read better with the newly spelled out rule for the for loop.
>
> Signed-off-by: Junio C Hamano 
> ---
>  Documentation/CodingGuidelines | 25 ++---
>  1 file changed, 22 insertions(+), 3 deletions(-)

Reviewed-by: Jonathan Nieder 
Thanks.


Re: [RFC/PATCH] CodingGuidelines: spell out post-C89 rules

2019-07-16 Thread Jonathan Nieder
Junio C Hamano wrote:

> Even though we have been sticking to C89, there are a few handy
> features we borrow from more recent C language in our codebase after
> trying them in weather balloons and saw that nobody screamed.
>
> Spell them out.

Thanks for this.  It gives a place to advertise future weather balloons,
too.

[...]
> --- a/Documentation/CodingGuidelines
> +++ b/Documentation/CodingGuidelines
> @@ -195,10 +195,24 @@ For C programs:
> by e.g. "echo DEVELOPER=1 >>config.mak".
>  
>   - We try to support a wide range of C compilers to compile Git with,
> -   including old ones. That means that you should not use C99
> -   initializers, even if a lot of compilers grok it.
> +   including old ones. That means that you should not use certain C99
> +   features, even if your compiler groks it.  There are a few
> +   exceptions:
> +
> +   . since early 2012 with e1327023ea, we have been using an enum
> + definition whose last element is followed by a comma.

This is an interesting one: it's super convenient, but we have received
patches every 10 years or so to remove the trailing comma --- e.g.
https://public-inbox.org/git/20100311163235.gc7...@thor.il.thewrittenword.com/

I *think* these were motivated by wanting to be able to build Git with
old compilers with pedantic warnings on, and certainly the last seven
years of silence on the subject suggests it's okay.  Should we be even
more prescriptive and say that the last element should always be
followed by a comma, for ease of later patching?

> +
> +   . since mid 2017 with cbc0f81d and 512f41cf, we have been using
> + designated initializers for struct and array.

Can this include an example for the benefit of readers that don't know
what a designated initializer is?  E.g.

  . since mid 2017 with cb0f81d and 512f41cf, we have been using
designated initializers for struct members ("{ .alloc = 1 }")
and array members ("[5] = 0").

> +
> +   These used to be forbidden, but we have not heard breakage report,
> +   so they are assumed to be safe.

nit: missing article "any" before "breakage reports".

>  
> - - Variables have to be declared at the beginning of the block.
> + - Variables have to be declared at the beginning of the block, before
> +   the first statement (i.e. -Wdeclaration-after-statement).
> +
> + - Declaring a variable in the for loop "for (int i = 0; i < 10; i++)"
> +   is still not allowed in this codebase.

Nice.

Reviewed-by: Jonathan Nieder 
Thanks.


Re: [PATCH] transport-helper: avoid var decl in for () loop control

2019-07-16 Thread Jonathan Nieder
Junio C Hamano wrote:

> We do allow a few selected C99 constructs in our codebase these
> days, but this is not among them (yet).
>
> Reported-by: Carlo Arenas 
> Signed-off-by: Junio C Hamano 
> ---
>  transport.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)

Yes, gcc 4.8 fails to build without this:

 transport.c:1234:4: error: ‘for’ loop initial declarations are only allowed in 
C99 mode
 for (struct ref *it = remote_refs; it; it = it->next)
 ^
 transport.c:1234:4: note: use option -std=c99 or -std=gnu99 to compile your 
code

Arguably it would be nice to use -std=gnu99 for better consistency
between gcc versions, but it's moot here: avoiding the declaration in
for loop initializer is more consistent with
-Wdeclaration-after-statement anyway.

Reviewed-by: Jonathan Nieder 
Thanks.


Re: [QUESTION] Git fails to detect merge conflict?

2019-07-01 Thread Jonathan Nieder
Anton Ermolenko wrote:

> My understanding is that recursive merge here won't consider that situation to
> be a merge conflict as the changes have been introduced in different spots in
> the file.

Yes, that seems right to me.

Do you have more details about the context?  What do these files look
like?  Are there other cues that we could use to discover that the
customer intended the changes to conflict?

Thanks,
Jonathan


Re: [RFC PATCH] ref-filter: sort detached HEAD lines firstly

2019-06-10 Thread Jonathan Nieder
Hi,

Matthew DeVore wrote:
> On Sun, Jun 09, 2019 at 10:17:19AM +0200, Johannes Schindelin wrote:

>> I find that it makes sense in general to suppress one's urges regarding
>> introducing `{ ... }` around one-liners when the patch does not actually
>> require it.
>>
>> For example, I found this patch harder than necessary to read because of
>> it.
>
> I understand the desire to make the patch itself clean, and I sometimes try to
> do that to a fault, but the style as I understand it is to put { } around all
> if branches if only one branch requires it. Since I'm already modifying the
> "else if (cmp_type == FIELD_STR)" line, I decided to put the } at the start of
> the line and modify the if (s->version) line as well. So only one line was
> modified "in excess." I think the temporary cost of the verbose patch is
> justified to keep the style consistent in narrow code fragments.

Git seems to be inconsistent about this.  Documentation/CodingGuidelines
says

- When there are multiple arms to a conditional and some of them
  require braces, enclose even a single line block in braces for
  consistency. E.g.:

so you have some cover from there (and it matches what I'm used to,
too). :)

[...]
>>> +   LC_ALL=$zh_CN_locale LC_MESSAGES=$zh_CN_locale \
>>> +   git -C r1 branch >actual &&
>>> +   git -C r1 checkout - &&
>>
>> Why call `checkout` after `branch`? That's unnecessary, we do not verify
>> anything after that call.
>
> It's to get the repo into a neutral state in case an additional testcase is
> added in the future.

For this kind of thing, we tend to use test_when_finished so that the
test ends up in a clean state even if it fails.

[...]
> test_expect_success GETTEXT_ZH_LOCALE 'detached head sorts before branches' '
>   # Ref sorting logic should put detached heads before the other
>   # branches, but this is not automatic when a branch name sorts
>   # lexically before "(" or the full-width "(" (Unicode codepoint FF08).
>   # The latter case is nearly guaranteed for the Chinese locale.
>
>   git -C r1 checkout HEAD^{} -- &&
>   LC_ALL=$zh_CN_locale LC_MESSAGES=$zh_CN_locale \
>   git -C r1 branch >actual &&
>   git -C r1 checkout - &&
>
>   head -n 1 actual >first &&
>   # The first line should be enclosed by full-width parenthesis.
>   grep '$'\xef\xbc\x88.*\xef\xbc\x89'' first &&

nit: older shells do not know how to do $'\x01' interpolation.
Probably best to use the raw UTF-8 directly here (it will be more
readable anyway).

Thanks,
Jonathan


Re: [PATCH 2/2] index-pack: prefetch missing REF_DELTA bases

2019-06-03 Thread Jonathan Nieder
Jonathan Tan wrote:

> When fetching, the client sends "have" commit IDs indicating that the
> server does not need to send any object referenced by those commits,
> reducing network I/O. When the client is a partial clone, the client
> still sends "have"s in this way, even if it does not have every object
> referenced by a commit it sent as "have".
>
> If a server omits such an object, it is fine: the client could lazily
> fetch that object before this fetch, and it can still do so after.
>
> The issue is when the server sends a thin pack containing an object that
> is a REF_DELTA against such a missing object: index-pack fails to fix
> the thin pack. When support for lazily fetching missing objects was
> added in 8b4c0103a9 ("sha1_file: support lazily fetching missing
> objects", 2017-12-08), support in index-pack was turned off in the
> belief that it accesses the repo only to do hash collision checks.
> However, this is not true: it also needs to access the repo to resolve
> REF_DELTA bases.
[...]
> Signed-off-by: Jonathan Tan 
> ---
>  builtin/index-pack.c | 26 +++--
>  t/t5616-partial-clone.sh | 61 
>  2 files changed, 85 insertions(+), 2 deletions(-)

Thanks much.

This bugfix has been working well at $DAYJOB:
Tested-by: Jonathan Nieder 

Is it something that could be in 2.22.0 or 2.22.1?


Re: [PATCH] fetch-pack: send server options after command

2019-05-24 Thread Jonathan Nieder
Jonathan Tan wrote:

> Currently, if any server options are specified during a protocol v2
> fetch, server options will be sent before "command=fetch". Write server
> options to the request buffer in send_fetch_request() so that the
> components of the request are sent in the correct order.
>
> The protocol documentation states that the command must come first. The
> Git server implementation in serve.c (see process_request() in that
> file) tolerates any order of command and capability, which is perhaps
> why we haven't noticed this. This was noticed when testing against a
> JGit server implementation, which follows the documentation in this
> regard.
>
> Signed-off-by: Jonathan Tan 
> ---
>  fetch-pack.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Oh, dear.  Thanks for fixing it.

Reviewed-by: Jonathan Nieder 

6e98305985 (clone: send server options when using protocol v2,
2019-04-12) is part of release candidates, but it looks like we caught
this in time to get the fix in before the release.

Should we add an interop test for this to t/interop/?

Thanks,
Jonathan


Re: [PATCH] Use user supplied origin name for extension.partialcone instead of hardcoded value.

2019-05-23 Thread Jonathan Nieder
Xin Li wrote:

> Subject: Use user supplied origin name for extension.partialcone instead of 
> hardcoded value.

Good catch!

Nits:

- s/partialcone/partialclone/ (missing "l")

- subject lines in git.git tend to be of the form
  "subsystem: lowercase summary" with no trailing period.  So maybe
  something like:

clone: respect user supplied origin name when setting up partial clone

> Signed-off-by: Xin Li 
> ---
>  builtin/clone.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

The change looks obviously good.  With or without a commit message
change like suggested above,

Reviewed-by: Jonathan Nieder 

Thanks.


Re: RFC: Separate commit identification from Merkle hashing

2019-05-23 Thread Jonathan Nieder
Eric S. Raymond wrote:
> Jakub Narebski :

>> Currently Git makes use of the fact that SHA-1 and SHA-256 identifiers
>> are of different lengths to distinguish them (see section "Meaning of
>> signatures") in Documentation/technical/hash-function-transition.txt
>
> That's the obvious hack.  As a future-proofing issue, though, I think
> it would be unwise to count on all future hashes being of distinguishable
> lengths.

We're not counting on that.  As discussed in that section, future
hashes can change the format.

[...]
>> All right.  Looks sensible on first glance.
>
> I am very relieved to hear that. My view of git is outside-in; I was quite
> worried I might have missed some crucial issue.

Honestly, I do think you have missed some fundamental issues.
https://public-inbox.org/git/ab3222ab-9121-9534-1472-fac790bf0...@gmail.com/
discusses this further.

Regards,
Jonathan


Re: [PATCH v4] grep: fail if call could output and name is null

2019-05-23 Thread Jonathan Nieder
Emily Shaffer wrote:

> Signed-off-by: Emily Shaffer 
> Reviewed-by: Jonathan Nieder 
> ---
> Since v3, only the commit message has changed. Reworked based on
> Jonathan Nieder's suggestions (with some modifications for readability).
>
>  grep.c | 4 
>  1 file changed, 4 insertions(+)

This is indeed
Reviewed-by: Jonathan Nieder 

Thanks for your patient work.


Re: RFC: Separate commit identification from Merkle hashing

2019-05-23 Thread Jonathan Nieder
Hi,

Jakub Narebski wrote:

> I think Documentation/technical/hash-function-transition.txt misses
> considerations for fast-import format (it talks about problem with
> submodules, shallow clones, and currently not solved problem of
> translating notes; it does not talk about git-replace, either).

Hm, can you say more?  I think fast-import is not significantly
different from other tools that want to pick an appropriate object
format for input and an appropriate object format for output.

Do you mean that the fast-import file should have a field for
explicitly specifying the input object format, and that that doc
ought to call it out?

[...]
> For security, all references in Merkle-tree of hashes must use strong
> verification hash.  This means that you need to be able to refer to any
> object, including commit, by its verification hash name of its
> verification hash form (where all references inside object, like
> "parent" and "tree" headers in commit objects, use verification hashes).

This kind of crypto agility weakens any guarantees that rely on
strength of a hash function.  The security level would be that of the
weakest of the supported hash functions.

In other words, usually the benefit of supporting multiple hash
functions as a reader is that you want the strength of the strongest
of those hash functions and you need a migration path to get there.
If you don't have a way to eventually drop support for the weaker
hashes, then what benefit do you get from supporting multiple hash
functions?

Jonathan


Re: [PATCH v3] grep: provide sane default to grep_source struct

2019-05-21 Thread Jonathan Nieder
Emily Shaffer wrote:

> Subject: grep: provide sane default to grep_source struct

This subject line doesn't describe what the patch does any more.  How
about something like

grep: fail early if call could print output and name is not set

?

> grep_buffer creates a struct grep_source gs and calls grep_source()
> with it. However, gs.name is null, which eventually produces a
> segmentation fault in grep_source()->grep_source_1()->show_line()
> when grep_opt.status_only is not set.

This is the place to describe the motivation behind the patch:

- what is the current behavior?
- what is the motivation behind the current behavior?
- in what scenario does it fail?
- how does it fail, and what is undesirable about that failure?

Perhaps something like

grep_source, the workhorse of Git's grep library, allows passing in an
arbitrary grep_source representing the haystack in which to search for
a needle matching a pattern.  In most callers, the grep_source::name
field is set to an appropriate prefix to print before a colon when a
result matches:

README:Git is an Open Source project covered by the GNU General 
Public

One caller, grep_buffer, leaves the 'name' field set to NULL because
there is not enough context to determine an appropriate name to put in
such output lines.  In practice, this has been fine because the only
caller of grep_buffer is "git log --grep" and that caller sets
status_only to disable output (and only check whether a match exists).
But this is brittle: a future caller can call grep_buffer without
status_only set, and as soon as it hits a match, grep_source will
segfault in the process of trying to print

(null):Git is an Open Source project covered by the GNU General 
Public

> This seems to be unreachable from existing commands but is reachable in
> the event that someone rolls their own revision walk and neglects to set
> rev_info->grep_filter->status_only. Conceivably, someone might want to
> print all changed lines of all commits which match some regex.

Focusing on the use case instead of the details of how it is implemented,
we can simplify this to

For example, such a future caller might want to print all matching
lines from commits which match a regex.

> To futureproof, give developers a heads-up if grep_source() is called
> and a valid name field is expected but not given. This path should be
> unreachable now, but in the future may become reachable if developers
> want to add the ability to use grep_buffer() in prepared in-core data
> and provide hints to the user about where the data came from.

Here we can concisely describe the improved behavior with the fix:

Futureproof by diagnosing a use of the API that could trigger that
condition early, before we know whether the pattern matches:

BUG: git.c:832: grep call which could print a name requires 
grep_source.name be non-NULL
Aborted

And then what it makes possible:

This way, the caller's author gets a quick indication of how to fix
it, by ensuring that either grep_source.name is set or that
status_only is set to prevent printing output.

And how it came up:

Noticed while adding such a call in a tutorial on revision walks.

> Signed-off-by: Emily Shaffer 
> ---
[...]
> I also moved the BUG() to grep_source_1() to keep it close to the error
> itself and reflowed the commit message.

Makes sense.  Thanks for these summaries of what changed between each
revision of the patch --- they're very helpful.

With whatever subset of the above suggested commit message changes
make sense,

Reviewed-by: Jonathan Nieder 

Thanks.

[...]
> --- a/grep.c
> +++ b/grep.c
> @@ -1780,6 +1780,10 @@ static int grep_source_1(struct grep_opt *opt, struct 
> grep_source *gs, int colle
>   enum grep_context ctx = GREP_CONTEXT_HEAD;
>   xdemitconf_t xecfg;
>  
> + if (!opt->status_only && gs->name == NULL)

optional: can use !gs->name here.

> + BUG("grep call which could print a name requires "
> + "grep_source.name be non-NULL");


Re: [PATCH v2] grep: provide sane default to grep_source struct

2019-05-21 Thread Jonathan Nieder
Hi,

Emily Shaffer wrote:
> On Thu, May 16, 2019 at 06:02:54PM -0400, Jeff King wrote:
>> On Thu, May 16, 2019 at 02:44:44PM -0700, Emily Shaffer wrote:

>>> +   /* TODO: In the future it may become desirable to pass in the name as
>>> +* an argument to grep_buffer(). At that time, "(in core)" should be
>>> +* replaced.
>>> +*/

(micronit, likely moot: Git's multi-line comments start with "/*" on
its own line:

/*
 * NEEDSWORK: Passing the name in as an argument would allow
 * "(in core)" to be replaced.
 */

.)

>>> +   grep_source_init(&gs, GREP_SOURCE_BUF, _("(in core)"), NULL, NULL);
>>
>> Hmm. I don't see much point in this one, as it would just avoid
>> triggering our BUG(). If somebody is adding new grep_buffer() callers
>> that don't use status_only, wouldn't we want them to see the BUG() to
>> know that they need to refactor grep_buffer() to provide a name?
[...]
> Can we think of a reason anybody would want to be able to use it this
> way with the placeholder string?

I agree with Peff here: using NULL puts this in a better place with
respect to Rusty's API design manifesto[1].

With the "(in core)" default, I may end up triggering the "(in core)"
behavior in production, because there is not a clear enough signal
that my code path is making a mistake.  That's problematic because it
gives the end user a confusing experience: the end user cares where
the line comes from, not that it spent a second or two in core.

With the NULL default, *especially* after this patch, such usage would
instead trigger a BUG: line in output, meaning

- if it gets exercised in tests, the test will fail, prompting the
  patch auther to pass in a more appropriate label

- if it gets missed in tests and gets triggered in production, the error
  message makes it clear that this is a mistake so the user is likely
  to report a bug instead of assuming this is deliberate but confusing
  behavior

In that vein, this patch is very helpful, since the BUG would trip
*consistently*, not only when the grep pattern matches.  Failing
consistently like this is a huge improvement in API usability.

It would be even better if we could catch the problem at compile time,
but one thing at a time.

Thanks,
Jonathan

[1] https://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html,
https://ozlabs.org/~rusty/index.cgi/tech/2008-04-01.html


Re: RFC: Separate commit identification from Merkle hashing

2019-05-20 Thread Jonathan Nieder
Hi,

Eric S. Raymond wrote:
> Jonathan Nieder :
>> Eric S. Raymond wrote:

>>> One reason I am sure of this is the SHA-1 to whatever transition.
>>> We can't count on the successor hash to survive attack forever.
[...]
>> Have you read through Documentation/technical/hash-function-transition?  It
>> takes the case where the new hash function is found to be weak into account.
>>
>> Hope that helps,
>> Jonathan
>
> Reading now...

Take your time. :)

[...]
> I think it's a weakness, though, that most of it is written as though it
> assumes only one hash transition will be necessary.  (This is me thinking
> on long timescales again.)

Hm, can you point to what part of the doc suggested that?  Best to make
the text clearer, to avoid confusing the next person.

On the contrary, the design is very careful to be able to support the
next transition.

[...]
>The same technique (probably the
> same code!) could be used to map the otherwise uninterpreted
> commit-IDs I'm proposing to lookup keys.

No, since Git relies on commit IDs for integrity checking.  The hash
function transition described in that document relies on
round-tripping ability for the duration of the transition.

Jonathan


Re: Revision walking, commit dates, slop

2019-05-20 Thread Jonathan Nieder
SZEDER Gábor wrote:
> On Sat, May 18, 2019 at 09:54:12AM +0900, Mike Hommey wrote:

>> There are established corner cases, where in a repo where commit dates
>> are not monotonically increasing,
[...]
> All the above is without commit-graph, I presume?  If so, then you
> should give it a try, as it might bring immediate help in your
> pathological repo.  With 5k commit in the same second (enforced via
> 'export GIT_COMMITTER_DATE=$(date); for i in {1..5000} ...') I get:

Just to emphasize this point: one field in the commit-graph file is a
generation number (or more generally, a "reachability index" in the
standard jargon).  The reason I'm excited about having this field is
that it will allow us to stop playing games with slop.

So please join forces with Stolee and help us get to that future
sooner. :)

Thanks,
Jonathan


Re: RFC: Separate commit identification from Merkle hashing

2019-05-20 Thread Jonathan Nieder
Hi!

Eric S. Raymond wrote:

> One reason I am sure of this is the SHA-1 to whatever transition.
> We can't count on the successor hash to survive attack forever.
> Accordingly, git's design needs to be stable against the possibility
> of having to accommodate multiple future hash algorithms in the
> future.

Have you read through Documentation/technical/hash-function-transition?  It
takes the case where the new hash function is found to be weak into account.

Hope that helps,
Jonathan


Re: [PATCH] gitsubmodules: align html and nroff lists

2019-05-20 Thread Jonathan Nieder
Jeff King wrote:

> The patch itself looks good to me.

For what it's worth,

Reviewed-by: Jonathan Nieder 

as well.  It's a straightforward patch and solves the reader-facing
problem.  Thanks.

For the curious, the upstream discussion in docbook-xsl is
https://lists.oasis-open.org/archives/docbook-apps/201905/msg1.html.
I'm not sure where their bug tracker lives and what bug number this has
there.

Thanks,
Jonathan


Re: [RFC PATCH] grep: provide sane default to grep_source struct

2019-05-15 Thread Jonathan Nieder
Junio C Hamano wrote:

> Currently the only use of the function is to see if the log message
> matches with the given pattern (yes/no), but it is conceivable that
> new callers may want to prepare in-core data and use it to see if
> that matches the pattern, or even to _show_ the lines that match the
> pattern (possibly with its own .output callback).

Oh, good point about .output.  That answers my question about getting
the output in the right place.

[...]
>> Thanks in advance for everyone's thoughts.
>> Emily

Thanks, both.

Jonathan


Re: [RFC PATCH] grep: provide sane default to grep_source struct

2019-05-15 Thread Jonathan Nieder
Hi,

Emily Shaffer wrote:

> grep_buffer creates a struct grep_source gs and calls grep_source()
> with it. However, gs.name is null, which eventually produces a
> segmentation fault in
> grep_source()->grep_source_1()->show_line() when grep_opt.status_only is
> not set.

Thanks for catching it.  Taking a step back, I think the problem is in
the definition of "struct grep_source":

struct grep_source {
char *name;

enum grep_source_type {
GREP_SOURCE_OID,
GREP_SOURCE_FILE,
GREP_SOURCE_BUF,
} type;
void *identifier;

...
};

What is the difference between a 'name' and an 'identifier'?  Who is
responsible for free()ing them?  Can they be NULL?  This is pretty
underdocumented for a public type.

If we take the point of view that 'name' should always be non-NULL,
then I wonder:

- can we document that?
- can grep_source_init enforce that?
- can we take advantage of that in grep_source as well, as a sanity
  check that the grep_source has been initialized?
- while we're here, can we describe what the field is used for
  (prefixing output with context before a ":", I believe)?

[...]
> This seems to be unreachable from existing commands but is reachable in
> the event that someone rolls their own revision walk and neglects to set
> rev_info->grep_filter->status_only.

init_revisions sets that field, so a caller would have to replace
grep_filter or unset status_only to trigger this.

> Conceivably, someone might want to
> print all changed lines of all commits which match some regex.

Hm, does that work well?  The caller of grep_buffer in revision.c
doesn't seem to be careful about where in the output hits would be
printed, but I might be missing something.

[...]
> I ran into this issue while I was working on a tutorial on rolling your
> own revision walk. I didn't want to print all the lines associated with
> a matching change, but it didn't seem good that a seemingly-sane
> grep_filter config was segfaulting.

Good find, and thanks for taking the time to make it easier to debug for
the next person.

[...]
> Jonathan Nieder proposed alternatively adding some check to grep_source()
> to ensure that if opt->status_only is unset, gs->name must be non-NULL
> (and yell about it if not), as well as some extra comments indicating
> what assumptions are made about the data coming into functions like
> grep_source(). I'm fine with that as well (although I'm not sure it
> makes sense semantically to require a name which the user probably can't
> easily set, or else ban the user from printing LOC during grep). Mostly
> I'm happy with any solution besides a segfault with no error logging :)

Let's compare the two possibilities.

The advantage of "(in memory)" is that it Just Works, which should
make a nicer development experience with getting a new caller mostly
working on the way to getting them working just the way you want.

The disadvantage is that if we start outputting that in production, we
and static analyzers are less likely to notice.  In other words,
printing "(in memory)" is leaking details to the end user that do not
match what the end user asked for.  NULL would instead produce a
crash, prompting the author of the caller to fix it.

What was particularly pernicious about this example is that the NULL
dereference only occurs if the grep has a match.  So I suppose I'm
leaning toward (in addition to adding some comments to document the
struct) adding a check like

if (!gs->name && !opt->status_only)
BUG("grep calls that could print name require name");

to grep_source.

That would also sidestep the question of whether this debugging aid
should be translated. :)

Sensible?

Thanks,
Jonathan


Re: [PATCH v2 0/7] Multiple hook support

2019-05-15 Thread Jonathan Nieder
brian m. carlson wrote:

> Also, this is the way that most other programs on Unix do this behavior,
> and I think that is a compelling argument for this design in and of
> itself. I think Unix has generally made the best decisions in operating
> system design, and I aim to emulate it as much as possible.

Do a lot of other programs run commands from a specially named
subdirectory of the current directory?

If you were talking about a hooks dir in /etc, I would completely
agree.  But we are talking about .git/hooks/, which has been a
constant source of real compromises.

Anyway, I think we've gone back and forth enough times to discover
we're not going to agree on this.

[...]
>> This hasn't been a problem for remote helpers, merge drivers, etc in
>> the past.  Why are hooks different?
[...]
> Git LFS, for example, installs hooks that warn the user if the command
> isn't installed, since it's a common misconfiguration and not pushing
> the LFS objects (via the pre-push hook) along with the Git objects is a
> common source of data loss. Uninstalling the tool (or not installing it
> if it's a shared repository) doesn't mean the hook still shouldn't be
> run.

I don't understand this example.  If the repository is configured to
use the Git LFS hooks, why wouldn't it print a friendly message?

[...]
> I'm not opposed to extending the config system to implement multiple
> hooks directories or add support for inheriting hooks, because that's a
> common thing that people want. I just don't think our config system is
> the right tool for specifying what commands to run, for the reasons I've
> specified.
>
> I can't prevent you from writing a series that implements a config-based
> option, and if it's the solution the list wants, I'll go along with it,
> but it's not the solution I want to see personally or as a tooling
> implementer.

I think you're answering a different question than I asked.

One thing I've been talking about is having a path to eventually
getting rid of support for .git/hooks/, on a user controlled timeline,
with a smooth migration.  I proposed one possible way to do that, and
I was asking whether it would work okay for your use case, or whether
there are problems with it (which would give me something to work with
on iterating toward something that would work for you).

Your answer is "I can't prevent you", which means you don't like the
proposal, but doesn't tell me what about it would not work.

Thanks,
Jonathan


Re: [PATCH] test-lib: try harder to ensure a working jgit

2019-05-13 Thread Jonathan Nieder
Todd Zullinger wrote:

> The JGIT prereq uses 'type jgit' to determine whether jgit is present.
> While this should be sufficient, if the jgit found is broken we'll waste
> time running tests which fail due to no fault of our own.
>
> Use 'jgit --version' instead, to catch some badly broken jgit
> installations.
>
> Signed-off-by: Todd Zullinger 
> ---
> I ran into such a broken jgit on Fedora >= 30¹.  This is clearly a
> problem in the Fedora jgit package which will hopefully be resolved
> soon.  But it may be good to avoid wasting time debugging tests which
> fail due to a broken tool outside of our control.
>
> ¹ https://bugzilla.redhat.com/1709624

Reviewed-by: Jonathan Nieder 

It would be nice to describe that bug in the commit message, to save
readers some head scratching.

Thanks,
Jonathan


Re: [PATCH v2 0/7] Multiple hook support

2019-05-13 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:
> On Mon, May 13, 2019 at 05:51:01PM -0700, Jonathan Nieder wrote:
>> brian m. carlson wrote:

>>>  the fact that inheritance in the configuration
>>> is in-order and can't be easily modified means that it's not likely to
>>> be very useful, but it is likely to be quite surprising for the average
>>> user.
>>
>> Can we discuss this some more?  What would it take to make it likely
>> to be useful in your view?
>
> There are two aspects here which I think are worth discussing. Let's
> discuss the inheritance issue first.
>
> Order with multiple hooks matters. The best hook as an example for this
> is prepare-commit-msg. If I have a hook which I want to run on every
> repository, such as a hook that inserts some sort of ID (bug tracker,
> Gerrit, etc.), that hook, due to inheritance, *has* to be first, before
> any other prepare-commit-msg hooks. If I want a hook that runs before
> it, perhaps because a particular repository needs a different
> configuration, I have to wipe the list and insert both hooks. I'm now
> maintaining two separate locations for the command lines instead of just
> inserting a symlink to the global hook and dropping a new hook before
> it.
>
> I don't think there's a good way to make it easier unless we radically
> customize the way configuration is done.

Wouldn't a separate config item e.g. to reverse order (or to perform
whatever other customization seems appropriate) cover this?

In other words, use the standard config convention for the set of
hooks, and treat the order in which they are invoked as a separate
question.  You could even use the hooks.d style alphabetical order
convention.

[...]
> The second issue here is that it's surprising. Users don't know how to
> reset a configuration option because we don't have a consistent way to
> do that.

I agree that it's underdocumented and underimplemented.  But I'm not
aware of any other method that Git provides to reset a configuration
item.  What is it inconsistent with?

>   Users will not expect for there to be multiple ways to set
> hooks. Users will also not expect that their hooks in their
> configuration aren't run if there are hooks in .git/hooks. Tooling that
> has so far used .git/hooks will compete with users' global configuration
> options, which I guarantee you will be a surprise for users using older
> versions of tools.

Indeed, in the long term I think we should remove the .git/hooks/
mechanism entirely.

In the shorter term, I think the kind of inconsistency you're referring
to applies to hooks.d as well.

> The new behavior, which puts everything in the same directory
> (.git/hooks) is much easier to reason about.

That's a good point: a .git/hooks/README sounds like it would be
helpful here.

[...]
> It also provides a convenient place for hooks to live, which a
> config-based option doesn't. We'll need to invoke things using /bin/sh,
> so will they all have to live in PATH? What about one-offs that don't
> really belong in PATH?

This hasn't been a problem for remote helpers, merge drivers, etc in
the past.  Why are hooks different?

To be clear, I think it's a reasonable problem to solve, and I've
actually been surprised that it hasn't been a problem for people.

[...]
> I agree this is an advantage if they don't hit the ordering issue.

Wonderful.  Sounds like if I do some work on the ordering issue, then
we have a path forward.

>I
> think a lot of the common use cases where this approach has benefits can
> be handled well with core.hooksPath and hooks that can turn themselves
> on or off depending on the repository config.

I think core.hooksPath attempted to solve this problem, but it has
several deficiencies:

1. It assumes a single, centrally managed hooks directory, and there's
   no standard for where that directory lives.  This means that it
   can't be counted on by tools like "git secrets" --- instead, each
   particular installation has to set up a custom hooks directory for
   themselves.

2. Since it assumes a single, centrally managed hooks directory,
   customizations in a single repository (e.g. to enable or disable a
   single hook) require duplicating the entire directory.

3. It had no migration path defined to becoming the default, so it
   doesn't end up being discoverable.  core.hooksPath is designed as
   a special case, making it hard to debug, instead of being a
   mainstream setting that can be recommended as a future default.

> What might be an interesting approach that would address these concerns
> is a core.globalHooks[0] option that 

Re: [PATCH v2 0/7] Multiple hook support

2019-05-13 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:

> I've thought a lot about the discussion over whether this series should
> use the configuration as the source for multiple hooks. Ultimately, I've
> come to the decision that it's not a good idea. Even adopting the empty
> entry as a reset marker, the fact that inheritance in the configuration
> is in-order and can't be easily modified means that it's not likely to
> be very useful, but it is likely to be quite surprising for the average
> user.

Can we discuss this some more?  What would it take to make it likely
to be useful in your view?

> I think a solution that sticks with the existing model and builds
> off a design used by other systems people are familiar with, like cron
> and run-parts, is going to be a better choice. Moreover, this is the
> design that people have already built with outside tooling, which is a
> further argument in favor of it.

To be clear, the main advantage I see for config versus the .git/hooks
model is that with the config model, a user doesn't have to search
throughout the filesystem for .git/hooks directories to update when a
hook is broken.

There are other models that similarly fix that --- e.g. putting hooks
in /etc.  But as long as they're in .git/hooks, I think we're digging
ourselves deeper into the same hole.

Thanks,
Jonathan


Re: Proposal: object negotiation for partial clones

2019-05-13 Thread Jonathan Nieder
Hi,

Matthew DeVore wrote:
> On 2019/05/09, at 11:00, Jonathan Tan  wrote:

>> - Supporting any combination of filter means that we have more to
>>  implement and test, especially if we want to support more filters in
>>  the future. In particular, the different filters (e.g. blob, tree)
>>  have different code paths now in Git. One way to solve it would be to
>>  combine everything into one monolith, but I would like to avoid it if
>>  possible (after having to deal with revision walking a few times...)
>
> I don’t believe there is any need to introduce monolithic code. The
> bulk of the filter implementation is in list-objects-filter.c, and I
> don’t think the file will get much longer with an additional filter
> that “combines” the existing filter. The new filter is likely
> simpler than the sparse filter. Once I add the new filter and send
> out the initial patch set, we can discuss splitting up the file, if
> it appears to be necessary.
>
> My idea - if it is not clear already - is to add another OO-like
> interface to list-objects-filter.c which parallels the 5 that are
> already there.

Sounds good to me.

For what it's worth, my assumption has always been that we would
eventually want the filters to be stackable.  So I'm glad you're
looking into it.

Jonathan's reminder to clean up as you go is a welcome one.

Thanks,
Jonathan


Re: Proposal: object negotiation for partial clones

2019-05-06 Thread Jonathan Nieder
Matthew DeVore wrote:
> On 2019/05/06, at 12:46, Jonathan Nieder  wrote:

>> Ah, interesting.  When this was discussed before, the proposal has been
>> that the client can say "have" anyway.  They don't have the commit and
>> all referenced objects, but they have the commit and a *promise* that
>> they can obtain all referenced objects, which is almost as good.
>> That's what "git fetch" currently implements.
>
> Doesn’t that mean the “have” may indicate that the client has the
> entire repository already, even though it’s only a partial clone? If
> so, then the client intends to ask for some tree plus trees and
> blobs 2-3 levels down deeper, how would the server distinguish
> between those objects the client *really* has and those that were
> just promised to them? Because the whole purpose of this
> hypothetical request is to get a bunch of promises fulfilled of
> which 0-99% are fulfilled already.

For blobs, the answer is simple: the server returns any object
explicitly named in a "want", even if the client already should have
it.

For trees, the current behavior is the same: if you declare that you
"have" everything, then if you "want" a tree with filter tree:2, you
only get that tree.  So here there's already room for improvement.

[...]
> Maybe something like this (conceptually based on original proposal) ?
>
> 1. Client sends request for an object or objects with an extra flag
> which means “I can’t really tell you what I already have since it’s
> a chaotic subset of the object database of the repo”
>
> 2. Server responds back with set of objects, represented by deltas
> if that is how the server has them on disk, along with a list of
> object-IDs needed in order to resolve the content of all the
> objects. These object-IDs can go several layers of deltas back, and
> they go back as far as it takes to get to an object stored in its
> entirety by the server.
>
> 3. Client responds back with another request (this time the extra
> flag sent from step 1 is not necessary) which has “want”s for every
> object the server named which the client already has.
>
> Very hand-wavey, but I think you see my idea.

The only downside I see is that the list of objects may itself be
large, and the server has to check reachability for each one.  But
maybe that's fine.

Perhaps after that initial response, instead of sending the list of
individual objects the client wants, it could send a list of relevant
objects it has (combined with the original set of "want"s).  That
could be a smaller request and it means less work for the server to
check each "want" for reachability.

What do you think?

[...]
> That's a very good point. The data the first request gives us is
> basically the tree objects minus file names and modes. So I think a
> better feature to implement would be combining of multiple filters.
> That way, the client can combine "tree:" and
> "blob:none" and basically get an "enumeration" of available objects.

This might be simpler.

Combining filters would be useful for other uses, too.

Thanks,
Jonathan


Re: Proposal: object negotiation for partial clones

2019-05-06 Thread Jonathan Nieder
Hi,

Jonathan Tan wrote:
> Matthew DeVore wrote:

>> I'm considering implementing a feature in the Git protocol which would
>> enable efficient and accurate object negotiation when the client is a
>> partial clone. I'd like to refine and get some validation of my
>> approach before I start to write any code, so I've written a proposal
>> for anyone interested to review. Your comments would be appreciated.
>
> Thanks. Let me try to summarize: The issue is that, during a fetch,
> normally the client can say "have" to inform the server that it has a
> commit and all its referenced objects (barring shallow lines), but we
> can't do the same if the client is a partial clone (because having a
> commit doesn't necessarily mean that we have all referenced objects).

Ah, interesting.  When this was discussed before, the proposal has been
that the client can say "have" anyway.  They don't have the commit and
all referenced objects, but they have the commit and a *promise* that
they can obtain all referenced objects, which is almost as good.
That's what "git fetch" currently implements.

But there's a hitch: when doing the fetch-on-demand for an object
access, the client currently does not say "have".  Sure, even there,
they have a *promise* that they can obtain all referenced objects, but
this could get out of hand: the first pack may contain a delta against
an object the client doesn't have, triggering another fetch which
contains a delta against another object they don't have, and so on.
Too many round trips.

> And not doing this means that the server sends a lot of unnecessary
> objects in the sent packfile. The solution is to do the fetch in 2
> parts: one to get the list of objects that would be sent, and after the
> client filters that, one to get the objects themselves.

This helps with object selection but not with delta base selection.

For object selection, I think the current approach already works okay,
at least where tree and blob filters are involved.  For commit
filters, in the current approach the fetch-on-demand sends way too
much because there's no "filter=commit:none" option to pass.  Is that
what this proposal aims to address?

For blob filters, if I ignore the capability advertisements (there's
an optimization that hasn't yet been implemented to allow
single-round-trip fetches), the current behavior takes the same number
of round trips as this proposal.  Where the current approach has been
lacking is in delta base selection during fetch-on-demand.  Ideas for
improving that?

Thanks,
Jonathan


Re: Proposal: object negotiation for partial clones

2019-05-06 Thread Jonathan Nieder
Hi,

Matthew DeVore wrote:

> I'm considering implementing a feature in the Git protocol which would
> enable efficient and accurate object negotiation when the client is a
> partial clone. I'd like to refine and get some validation of my
> approach before I start to write any code, so I've written a proposal
> for anyone interested to review. Your comments would be appreciated.

Yay!  Thanks for looking into this, and sorry I didn't respond sooner.

I know the doc has a "use case" section, but I suppose I am not sure
that I understand the use case yet.  Is this about improving the
filter syntax to handle features like directory listing?  Or is this
about being able to make better use of deltas in a partial clone, to
decrease bandwidth consumption and overhead that is proportional to
size?

Thanks,
Jonathan


Re: install: gitweb.cgi was not found anywhere

2019-05-03 Thread Jonathan Nieder
Jeffrey Walton wrote:
> On Wed, May 1, 2019 at 6:30 PM Jonathan Nieder  wrote:

>> Sounds like it's using "install" when it should be using "ginstall".
>> config.mak.uname contains, under the SunOS category:
>>
>> INSTALL = /usr/ucb/install
>
> Thanks again Jonathan.
>
> /usr/ucb/install no longer exists in Solaris 11.3 i86pc:
>
> solaris3:~$ ls -Al /usr/ucb/install
> /usr/ucb/install: No such file or directory
> solaris3:~$ uname -a
> SunOS solaris3. 5.11 11.3 i86pc i386 i86pc
>
> The config files need to be patched:
[...]
> Related to /usr/ucb, also see
> https://blogs.oracle.com/solaris/preparing-for-the-upcoming-removal-of-ucb-utilities-from-the-next-version-of-solaris-v2

Hm.  How about this, in combination with the previous one?

If it looks good, I can send it out as a series for real.

[...]
> I also removed a bunch of old patches and hacks that don't seem to be
> needed for Git 2.21.0. Between both of them I am building Git on
> Solaris again.

To be clear, does that mean you are using unpatched source now, or
that you still needed some patches?  In the latter case, can you point
me to them so we can get something sufficient upstream?

Thanks,
Jonathan

diff --git i/config.mak.uname w/config.mak.uname
index d916d1dc7a..41ad90c76a 100644
--- i/config.mak.uname
+++ w/config.mak.uname
@@ -162,7 +162,7 @@ ifeq ($(uname_S),SunOS)
NO_STRTOUMAX = YesPlease
GIT_TEST_CMP = cmp
endif
-   INSTALL = /usr/ucb/install
+   INSTALL = ginstall
TAR = gtar
BASIC_CFLAGS += -D__EXTENSIONS__ -D__sun__
 endif


Re: install: gitweb.cgi was not found anywhere

2019-05-01 Thread Jonathan Nieder
Hi,

Jeffrey Walton wrote:

> I'm attempting to install Git 2.21.0 on Solaris 11.3 x86_64.
> /usr/gnu/bin is on-path.
[...]
> gmake -C gitweb install
> gmake[1]: Entering directory `/export/home/build/git-2.21.0/gitw
> eb'
> gmake[2]: Entering directory `/export/home/build/git-2.21.0'
> gmake[2]: `GIT-VERSION-FILE' is up to date.
> gmake[2]: Leaving directory `/export/home/build/git-2.21.0'
> GEN gitweb.cgi
> GEN static/gitweb.js
> install -d -m 755 '/usr/local/share/gitweb'
> directory /usr/local/share/gitweb created
> install -m 755 gitweb.cgi '/usr/local/share/gitweb'
> find: cycle detected for /lib/secure/32/
[...]
> install: gitweb.cgi was not found anywhere!

Sounds like it's using "install" when it should be using "ginstall".
config.mak.uname contains, under the SunOS category:

INSTALL = /usr/ucb/install

But gitweb/Makefile seems to forget to include ../config.mak.uname.
How about this patch?

-- >8 --
Subject: gitweb: use system-appropriate defaults for commands

Attempting to install gitweb on Solaris 11 produces

 $ gmake install
...
 gmake -C gitweb install
 gmake[1]: Entering directory `/export/home/build/git-2.21.0/gitweb'
 gmake[2]: Entering directory `/export/home/build/git-2.21.0'
 gmake[2]: `GIT-VERSION-FILE' is up to date.
 gmake[2]: Leaving directory `/export/home/build/git-2.21.0'
 GEN gitweb.cgi
 GEN static/gitweb.js
 install -d -m 755 '/usr/local/share/gitweb'
 directory /usr/local/share/gitweb created
 install -m 755 gitweb.cgi '/usr/local/share/gitweb'
 find: cycle detected for /lib/secure/32/
 find: cycle detected for /lib/32/
 find: cycle detected for /lib/crypto/32/
...
 find: cycle detected for /usr/lib/gss/32/
 install: gitweb.cgi was not found anywhere!
 gmake[1]: *** [install] Error 2
 gmake[1]: Leaving directory `/export/home/build/git-2.21.0/gitweb'

This is because the default "install" tool on SunOS doesn't follow the
convention we require.  Use the /usr/ucb/install command specified in
config.mak.uname instead to fix it.

This should also help on other platforms where the default "install"
command is not functional enough.

Reported-by: Jeffrey Walton 
Signed-off-by: Jonathan Nieder 
---
Completely untested.  Junio, please don't apply this without Jeffrey's
tested-by.

Thanks,
Jonathan

 gitweb/Makefile | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/Makefile b/gitweb/Makefile
index cd194d057f..333aa58be0 100644
--- a/gitweb/Makefile
+++ b/gitweb/Makefile
@@ -39,7 +39,7 @@ GITWEB_SITE_HEADER =
 GITWEB_SITE_FOOTER =
 HIGHLIGHT_BIN = highlight
 
-# include user config
+include ../config.mak.uname
 -include ../config.mak.autogen
 -include ../config.mak
 -include config.mak
-- 
2.21.0.1020.gf2820cf01a



Re: [PATCH 2/2] mingw: enable DEP and ASLR

2019-05-01 Thread Jonathan Nieder
Hi,

Jeff King wrote:

> I wonder if this points to this patch touching the wrong level. These
> compiler flags are a thing that _some_ builds want (i.e., production
> builds where people care most about security and not about debugging),
> but not necessarily all.
>
> I'd have expected this to be tweakable by a Makefile knob (either a
> specific knob, or just the caller setting the right CFLAGS etc), and
> then for the builds of Git for Windows to turn those knobs when making a
> package to distribute.
>
> Our internal package builds at GitHub all have this in their config.mak
> (for Linux, of course):
>
>   CFLAGS += -U_FORTIFY_SOURCE -D_FORTIFY_SOURCE=1
>   CFLAGS += -fstack-protector-strong
>
>   CFLAGS += -fpie
>   LDFLAGS += -z relro -z now
>   LDFLAGS += -pie
>
> and I wouldn't be surprised if other binary distributors (like the
> Debian package) do something similar.

Yes, the Debian package uses

CFLAGS := -Wall \
$(shell dpkg-buildflags --get CFLAGS) \
$(shell dpkg-buildflags --get CPPFLAGS)

and then passes CFLAGS='$(CFLAGS)' to "make".

That means we're using

-g -O2 -fstack-protector-strong -Wformat -Werror=format-security
-Wdate-time -D_FORTIFY_SOURCE=2

Dscho's suggestion for the Windows build sounds fine to me (if
checking for -Og, too).  Maybe it would make sense to factor out a
makefile variable for this, that could be used for builds on other
platforms, too.  That way, the autodetection can be in one place, and
there is a standard way to override it when the user wants something
else.

Thanks,
Jonathan


Re: How to undo previously set configuration? (again)

2019-04-25 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> Because we don't have some general config facility for this it keeps
> coming up, and various existing/proposed options have their own little
> custom hacks for doing it, e.g. this for Barret Rhoden's proposed "blame
> skip commits" feature
> https://public-inbox.org/git/878swhfzxb@evledraar.gmail.com/
> (b.t.w. I *meant* /dev/null in that E-Mail, but due to PEBCAK wrote
> /dev/zero).

I'm confused.  Isn't that bog-standard Git usage, not a custom hack?
That is, I thought the intended behavior is always

 1. For single-valued options, last value wins.
 2. For multi-valued options, empty clears the list.
 3. When there is a special behavior triggered by not supplying the
option at all, offer an explicit value like "default" that triggers
the same behavior, too.

and that any instance of a command that isn't following that is a bug.

Which of course leaves room for improvement in documentation and how
we organize the implementation (as Peff discussed in [1]), but isn't
it nice to already have something in place that doesn't require
inventing a new syntax?

Thanks,
Jonahtan

[1] https://public-inbox.org/git/20180406175044.ga32...@sigill.intra.peff.net/


Re: [PATCH 0/5] Multiple hook support

2019-04-24 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:
> On Tue, Apr 23, 2019 at 07:34:38PM -0700, Jonathan Nieder wrote:
>> brian m. carlson wrote:

>>> I've talked with some people about this approach, and they've indicated
>>> they would prefer a configuration-based approach.
>>
>> I would, too, mostly because that reduces the problem of securing
>> hooks to securing configuration.  See
>> https://public-inbox.org/git/20171002234517.gv19...@aiede.mtv.corp.google.com/
>> for more on this subject.
>
> I know this is a common issue, but fixing it is a non-goal for this
> series. Anything we do here is going to have to be backwards compatible,
> so we can't make any changes to the security model.

I think it's worth bringing up because we should have some idea of where
we want to head.

I think the backward compatibility part is actually one of the easier
aspects of this one.  We don't have to change the security model right
away because there are similar places to exploit like core.pager.  To
address that, we need a notion of configuration that individual repos
and worktrees can't override, and using such a configuration item, we
can provide a way to opt in to the new security model.  That provides
a smooth path forward.

[...]
>> Solving (1) without (2) feels like a bit of a missed opportunity to
>> me.  Ideally, what I would like is
>>
>>i. A central registry of trustworthy Git hooks that can be upgraded
>>   using the system package manager to address (2).  Perhaps just
>>   git-hook-* commands on the $PATH.
>>
>>   ii. Instead of putting hooks in .git/hooks, put a list of hooks to
>>   run for each event in .git/config.
>
> The problem I had with this when discussing it was that our
> configuration system lacks a good way to control inheritance from outer
> files.

The standard approach in lists defined in Git configuration is for
assigning an empty item to clear / restart the list.  See
http.extraHeader for an example.

Thanks and hope that helps,
Jonathan


Re: [PATCH 0/5] Multiple hook support

2019-04-23 Thread Jonathan Nieder
Hi,

brian m. carlson wrote:

> I've talked with some people about this approach, and they've indicated
> they would prefer a configuration-based approach.

I would, too, mostly because that reduces the problem of securing
hooks to securing configuration.  See
https://public-inbox.org/git/20171002234517.gv19...@aiede.mtv.corp.google.com/
for more on this subject.

More precisely, a few problems with the current hooks system:

 1. There's not a standard way to have multiple hooks for a single event.
That's what this series is about.

(The recommended workaround has been to use a trampoline script as
your hook, and to make that trampoline script implement whatever
policy for the order of invocation and accumulation of results is
appropriate, but that's a bit of a cop-out.)

 2. Because they are stored in the Git repository, they do not have a
way to be automatically updated.

(The recommended workaround is to use a trampoline script as your
hook and put the actual hook somewhere standard like $PATH where
it can be upgraded system-wide.  But that's a bit of a cop-out.)

 3. Because they are part of the Git repository, it is very easy to
compromise a user's account by tricking them into running an
attacker-authored hook.  Attacks include "hey admin, can you tell
me why 'git commit' is failing in this repo?" and "here's a zip file
containing a Git repository with our fancy software.  Feel free
to look around, run 'git pull', etc".

Similar attacks, probably even worse, apply to shell prompt scripts
using commands from attacker-controlled .git/config.

(The recommended workaround is to inspect .git/config and
.git/hooks whenever you're looking at an untrusted repository, and
to write your shell prompt script defensively.)

Solving (1) without (2) feels like a bit of a missed opportunity to
me.  Ideally, what I would like is

   i. A central registry of trustworthy Git hooks that can be upgraded
  using the system package manager to address (2).  Perhaps just
  git-hook-* commands on the $PATH.

  ii. Instead of putting hooks in .git/hooks, put a list of hooks to
  run for each event in .git/config.

 iii. For backward compatibility, perform a multi-stage migration.
  In the stage I am most interested in:

  When encountering a hook in .git/hooks, don't run it, but print
  a message about how to migrate it to the modern scheme.

  To make migration to the modern scheme painless, stick a
  standard trampoline script in .git/hooks in all converted and
  all newly "git init"ed repositories to allow old versions of Git
  to respect the configuration from (i) and (ii).

That doesn't handle core.pager et al, but those we can handle
separately (for example by, at least optionally, not respecting values
for them in per-repo config at all).

Thanks for tackling this.  What do you think?

Thanks,
Jonathan


Re: [PATCH v2 5/8] Documentation: add Packfile URIs design doc

2019-04-23 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:
> On Wed, Apr 24 2019, Jonathan Nieder wrote:

>> Do you mean this for when a pack is self-contained and contains all
>> objects reachable from those "tip" commits?
>>
>> What would you do when a pack is not self-contained in that way?
>
> Indeed, it had been a while since I read the first version of this. I
> was assuming a "base pack" use-case, but it seems it's narrowly isolated
> to just "N packs each containing one big blob", right?

The demo in this patch series covers the single isolated blob case.
The protocol supports the "base pack" use case but many others as
well:

* daily "catch-up fetch" packs
* "base pack without blobs"
* ... etc ...

Thanks,
Jonathan


Re: [PATCH v2 5/8] Documentation: add Packfile URIs design doc

2019-04-23 Thread Jonathan Nieder
Ævar Arnfjörð Bjarmason wrote:
> On Wed, Apr 24 2019, Jonathan Nieder wrote:
>> Jeff King wrote:
>>> On Fri, Mar 08, 2019 at 01:55:17PM -0800, Jonathan Tan wrote:

>>>> +If the 'packfile-uris' feature is advertised, the following argument
>>>> +can be included in the client's request as well as the potential
>>>> +addition of the 'packfile-uris' section in the server's response as
>>>> +explained below.
>>>> +
>>>> +packfile-uris 
>>>> +  Indicates to the server that the client is willing to receive
>>>> +  URIs of any of the given protocols in place of objects in the
>>>> +  sent packfile. Before performing the connectivity check, the
>>>> +  client should download from all given URIs. Currently, the
>>>> +  protocols supported are "http" and "https".
>>>
>>> This negotiation seems backwards to me, because it puts too much power
>>> in the hands of the server.
>>
>> Thanks.  Forgive me if this was covered earlier in the conversation, but
>> why do we need more than one protocol at all here?  Can we restrict this
>> to only-https, all the time?
>
> There was this in an earlier discussion about this:
> https://public-inbox.org/git/877eds5fpl@evledraar.gmail.com/
>
> It seems arbitrary to break it for new features if we support http in
> general, especially with a design as it is now where the checksum of the
> pack is transmitted out-of-band.

Thanks for the pointer.  TLS provides privacy, too, but I can see why
in today's world it might not always be easy to set it up, and given
that we have integrity protection via that checksum, I can see why
some people might have a legitimate need for using plain "http" here.

We may also want to support packfile-uris using SSH protocol in the
future.  Might as well figure out how the protocol negotiation works
now.  So let's delve more into it:

Peff mentioned that it feels backwards for the client to specify what
protocols they support in the request, instead of the server
specifying them upfront in the capability advertisement.  I'm inclined
to agree: it's probably reasonable to put this in server capabilities
instead.  That would even allow the client to do something like

This server only supports HTTP without TLS, which you have
indicated is a condition in which you want to be prompted.
Proceed?

[Use HTTP packfiles]  [Use slower but safer inline packs]

Peff also asked whether protocol scheme is the right granularity:
should the server list what domains they can serve packfiles from
instead?  In other words, once you're doing it for protocol schemes,
why not do it for whole URIs too?  I'm grateful for the question since
it's a way to probe at design assumptions.

- protocol schemes are likely to be low in number because each has its
  own code path to handle it.  By comparison, domains or URIs may be
  too numerous to be something we want to jam into the capability
  advertisement.  (Or the server operator could always use the same
  domain as the Git repo, and then use a 302 to redirect to the CDN.
  I suspect this is likely to be a common setup anyway: it allows the
  Git server to generate a short-lived signed URL that it uses as the
  target of a 302.  But in this case, what is the point of a domain
  whitelist?)

- relatedly, because the list of protocol schemes is small, it is
  feasible to test client behavior with each subset of protocol
  schemes enabled.  Finer-grained filtering would mean more esoteric
  client configurations for server operators to support and debug.

- supported protocol schemes do not vary per request.  The actual
  packfile URI is dynamic and varies per request

- separately from questions of preference or security policy,
  clients may have support for a limited subset of protocol schemes.
  For example, imagine a stripped-down client without SSH support.
  So we need a way to agree about this capability anyway.

So I suspect that, at least to start, protocol scheme negotiation
should be enough and we don't need full URI negotiation.

There are a few escape valves:

- affected clients can complain to the server operator, who will then
  reconfigure the server to use more appropriate packfile URIs

- if there is a need for different clients to use different packfile
  URIs, clients can pass a flag, using --server-option, to the server
  to help it choose.

- a client can disable support for packfile URIs on a particular
  request and fall back to inline packs.

- if and when an affected client materializes, they can help us
  improve the protocol to handle their needs.

Sensible?

Thanks,
Jonathan


Re: [PATCH v2 5/8] Documentation: add Packfile URIs design doc

2019-04-23 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> This is really orthagonal to this series, but wouldn't a better
> resumption strategy here be to walk the pack we just downloaded, run the
> equivalent of 'commit-graph write' on it to figure out likely "tip"
> commits, and use those in "have" lines to negotiate with the server the
> next time around?

Do you mean this for when a pack is self-contained and contains all
objects reachable from those "tip" commits?

What would you do when a pack is not self-contained in that way?

Thanks,
Jonathan


Re: [PATCH v2 5/8] Documentation: add Packfile URIs design doc

2019-04-23 Thread Jonathan Nieder
Hi,

Jeff King wrote:
> On Fri, Mar 08, 2019 at 01:55:17PM -0800, Jonathan Tan wrote:

>> +If the 'packfile-uris' feature is advertised, the following argument
>> +can be included in the client's request as well as the potential
>> +addition of the 'packfile-uris' section in the server's response as
>> +explained below.
>> +
>> +packfile-uris 
>> +Indicates to the server that the client is willing to receive
>> +URIs of any of the given protocols in place of objects in the
>> +sent packfile. Before performing the connectivity check, the
>> +client should download from all given URIs. Currently, the
>> +protocols supported are "http" and "https".
>
> This negotiation seems backwards to me, because it puts too much power
> in the hands of the server.

Thanks.  Forgive me if this was covered earlier in the conversation, but
why do we need more than one protocol at all here?  Can we restrict this
to only-https, all the time?

[...]
> The problem I see is that the client doesn't get to vet the list of
> URIs; it only gets to specify a protocol match. But there are many other
> reasons it might want to reject a URI: we don't like the protocol, the
> domain name is on a blacklist (or not on a whitelist), the domain name
> can't resolve, we can't make a TCP connection to the server, we can't
> successfully fetch the pack.

Christian mentioned this desire to vet URIs before, and I'll admit I
found it hard to imagine a use case.  Why can't it work like e.g.
 on the web, where if you don't like that domain, then you
don't get to access the page?  From a server operator's point of view,
if you want to support a second URI that more clients support, why
wouldn't you just always use that second URI instead of making clients
choose?

Thanks and hope that helps,
Jonathan


Re: [PATCH v3 01/10] config: initialize opts structure in repo_read_config()

2019-04-11 Thread Jonathan Nieder
Hi,

Jeff Hostetler wrote:

> Initialize opts structure in repo_read_config().

Good find.  I wonder if there are some flags we can turn on with
DEVELOPER=1 to prevent this kind of issue going undetected in the
future (or maybe this means we need to get the valgrind or ASan
testing modes to be fast enough for people to consistently run them).

[...]
>  config.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH 1/1] send-email: fix transferencoding config option

2019-04-09 Thread Jonathan Nieder
(thanks for cc-ing bmc!)
Hi,

Heinrich Schuchardt wrote:

> Subject: send-email: fix transferencoding config option

nit: "fix" doesn't tell me what was broken and what you improved about
it.  Here, I think you mean "respect transferencoding config option".

> Since e67a228cd8a ("send-email: automatically determine transfer-encoding")
> the value of sendmail.transferencoding is ignored because when parsing
> the configuration $target_xfer_encoding is not initial anymore.

nit: I was confused when first reading this, since I read "the
configuration $target_xfer_encoding" as a single phrase.  A comma
after "configuration" might help.

> Instead of initializing variable $target_xfer_encoding on definition we
> have to set it to the default value of 'auto' if is initial after parsing
> the configuration files.

run-on sentence.  I'm having trouble parsing this part.

Can you start from the beginning and describe again what this does?
In other words, tell me

- What is the user-facing effect of the change?  What workflow is it
  part of?

- Any risks or complications?

- Any technical details that might be interesting to the later reader?

- What does this allow me to do that I couldn't do before?

The code can speak for itself, so this should primarily focus on the
intention behind the change.

[...]
> --- a/Documentation/git-send-email.txt
> +++ b/Documentation/git-send-email.txt
> @@ -146,7 +146,7 @@ Note that no attempts whatsoever are made to validate the 
> encoding.
>   even more opaque.  auto will use 8bit when possible, and 
> quoted-printable
>   otherwise.
>  +
> -Default is the value of the `sendemail.transferEncoding` configuration
> +Default is the value of the `sendemail.transferencoding` configuration

Unrelated change?

[...]
> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -239,7 +239,7 @@ sub do_edit {
>  my (@suppress_cc);
>  my ($auto_8bit_encoding);
>  my ($compose_encoding);
> -my $target_xfer_encoding = 'auto';
> +my ($target_xfer_encoding);
> 
>  my ($debug_net_smtp) = 0;# Net::SMTP, see send_message()
> 
> @@ -446,6 +446,8 @@ sub read_config {
>   $smtp_encryption = 'ssl';
>   }
>   }
> +
> + $target_xfer_encoding = 'auto' unless (defined $target_xfer_encoding);

Makes sense.

Is there a way to cover this in tests (t/t9001-send-email.sh) so we
can avoid regressing again?

The rest looks good.

Thanks for noticing, and hope that helps,
Jonathan


Re: [PATCH v2 2/2] clone: send server options when using protocol v2

2019-04-09 Thread Jonathan Nieder
Jonathan Tan wrote:

> Commit 5e3548ef16 ("fetch: send server options when using protocol v2",
> 2018-04-24) taught "fetch" the ability to send server options when using
> protocol v2, but not "clone". This ability is triggered by "-o" or
> "--server-option".
>
> Teach "clone" the same ability, except that because "clone" already
> has "-o" for another parameter, teach "clone" only to receive
> "--server-option".
>
> Explain in the documentation, both for clone and for fetch, that server
> handling of server options are server-specific.
>
> Signed-off-by: Jonathan Tan 
> Reviewed-by: Jonathan Nieder 
> Signed-off-by: Junio C Hamano 
> ---
>  Documentation/fetch-options.txt |  3 ++-
>  Documentation/git-clone.txt |  8 
>  builtin/clone.c     |  6 ++
>  t/t5702-protocol-v2.sh  | 21 +
>  4 files changed, 37 insertions(+), 1 deletion(-)

This is indeed
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH v2 1/2] transport: warn if server options are unsupported

2019-04-09 Thread Jonathan Nieder
Hi,

Jonathan Tan wrote:

> Server options were added in commit 5e3548ef16 ("fetch: send server
> options when using protocol v2", 2018-04-24), supported only for
> protocol version 2. Add a warning if server options are specified for
> the user if a legacy protocol is used instead.
>
> An effort is made to avoid printing the same warning more than once by
> clearing transport->server_options after the warning, but this does not
> fully avoid double-printing (for example, when backfulling tags using
> another fetch with a non-reusable transport).
>
> Signed-off-by: Jonathan Tan 
> ---
>  t/t5702-protocol-v2.sh | 17 +
>  transport.c|  8 
>  2 files changed, 25 insertions(+)

Thanks for writing this.  I'd be in favor of making this die().
Compare what we already do in push:

if (args->push_options && !push_options_supported)
die(_("the receiving end does not support push options"));

What happens in the case of push with protocol v0, where server options
are supported?

[...]
> --- a/transport.c
> +++ b/transport.c
> @@ -252,6 +252,12 @@ static int connect_setup(struct transport *transport, 
> int for_push)
>   return 0;
>  }
>  
> +static void warn_server_options_unsupported(struct transport *transport)
> +{
> + warning(_("Ignoring server options because protocol version does not 
> support it"));

nits:
- error and warning messages tend to use lowercase
- the user running into this may want to know how to switch to a better
  protocol version.  Is there e.g. a manpage we can point them to?

For example:

fatal: server options require protocol version 2 or later
hint: see protocol.version in "git help config" for more details

> + transport->server_options = NULL;

Should this use a static to also warn only once in the tag-catchup case
you mentioned?

> +}
> +
>  /*
>   * Obtains the protocol version from the transport and writes it to
>   * transport->data->version, first connecting if not already connected.
> @@ -286,6 +292,7 @@ static struct ref *handshake(struct transport *transport, 
> int for_push,
>   break;
>   case protocol_v1:
>   case protocol_v0:
> + warn_server_options_unsupported(transport);
>   get_remote_heads(&reader, &refs,
>for_push ? REF_NORMAL : 0,
>&data->extra_have,
> @@ -363,6 +370,7 @@ static int fetch_refs_via_pack(struct transport 
> *transport,
>   break;
>   case protocol_v1:
>   case protocol_v0:
> + warn_server_options_unsupported(transport);
>   refs = fetch_pack(&args, data->fd, data->conn,

Looks like this only affects fetch, so the question above about push
is only about the commit message.

With whatever subset of the suggested changes makes sense,
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH] clone: send server options when using protocol v2

2019-04-06 Thread Jonathan Nieder
Jonathan Tan wrote:

> Commit 5e3548ef16 ("fetch: send server options when using protocol v2",
> 2018-04-24) taught "fetch" the ability to send server options when using
> protocol v2, but not "clone". This ability is triggered by "-o" or
> "--server-option".
>
> Teach "clone" the same ability, except that because "clone" already
> has "-o" for another parameter, teach "clone" only to receive
> "--server-option".

Can you give an example of what this would be used for?  An example I
can think of might be

git clone --server-option=priority=batch 

> Signed-off-by: Jonathan Tan 
> ---
>  Documentation/git-clone.txt |  7 +++
>  builtin/clone.c |  6 ++
>  t/t5702-protocol-v2.sh  | 11 +++
>  3 files changed, 24 insertions(+)

Thanks.

[...]
> --- a/builtin/clone.c
> +++ b/builtin/clone.c
> @@ -66,6 +66,7 @@ static int option_dissociate;
>  static int max_jobs = -1;
>  static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
>  static struct list_objects_filter_options filter_options;
> +static struct string_list server_options = STRING_LIST_INIT_DUP;

The other string-list options in this file all use NODUP.  Is there a
reason this one uses DUP instead?  (Just curious --- I suspect either
would work fine, since nothing here does tricks with modifying argv
entries after option parsing.)

The same question applies to the corresponding option in
builtin/fetch.c, so while it is not likely to matter in practice, it
would be nice for readability to find out.

>  
>  static int recurse_submodules_cb(const struct option *opt,
>const char *arg, int unset)
> @@ -137,6 +138,8 @@ static struct option builtin_clone_options[] = {
>  N_("separate git dir from working tree")),
>   OPT_STRING_LIST('c', "config", &option_config, N_("key=value"),
>   N_("set config inside the new repository")),
> + OPT_STRING_LIST(0, "server-option", &server_options, 
> N_("server-specific"),

nit: long line

> + N_("option to transmit")),
[...]
> --- a/t/t5702-protocol-v2.sh
> +++ b/t/t5702-protocol-v2.sh
> @@ -251,6 +251,17 @@ test_expect_success 'server-options are sent when 
> fetching' '
>   grep "server-option=world" log
>  '
>  
> +test_expect_success 'server-options are sent when cloning' '
> + test_when_finished "rm -rf log myclone" &&
> +
> + GIT_TRACE_PACKET="$(pwd)/log" git -c protocol.version=2 \
> + clone --server-option=hello --server-option=world \
> + "file://$(pwd)/file_parent" myclone &&
> +
> + grep "server-option=hello" log &&
> + grep "server-option=world" log
> +'

Nice.  Thanks for including this.

Reviewed-by: Jonathan Nieder 

Thanks for a pleasant read.


Re: [PATCH v2] t5551: mark half-auth no-op fetch test as v0-only

2019-04-06 Thread Jonathan Nieder
Hi,

Jonathan Tan wrote:

> When using protocol v0, upload-pack over HTTP permits a "half-auth"
> configuration in which, at the web server layer, the info/refs path is
> not protected by authentication but the git-upload-pack path is, so that
> a user can perform fetches that do not download any objects without
> authentication, but still needs authentication to download objects.
>
> But protocol v2 does not support this, because both ref and pack are
> obtained from the git-upload-pack path.
>
> Mark the test verifying this behavior as protocol v0-only, with a
> description of what needs to be done to make v2 support this.
>
> Signed-off-by: Jonathan Tan 

Thanks for the analysis.  Makes sense.

> --- a/t/t5551-http-fetch-smart.sh
> +++ b/t/t5551-http-fetch-smart.sh
> @@ -164,7 +164,17 @@ test_expect_success 'clone from auth-only-for-objects 
> repository' '
>  
>  test_expect_success 'no-op half-auth fetch does not require a password' '
>   set_askpass wrong &&
> - git --git-dir=half-auth fetch &&
> +
> + # NEEDSWORK: When using HTTP(S), protocol v0 supports a "half-auth"
> + # configuration with authentication required only when downloading
> + # objects and not refs, by having the HTTP server only require
> + # authentication for the "git-upload-pack" path and not "info/refs".
> + # This is not possible with protocol v2, since both objects and refs
> + # are obtained from the "git-upload-pack" path. A solution to this is
> + # to teach the server and client to be able to inline ls-refs requests
> + # as an Extra Parameter (see pack-protocol.txt), so that "info/refs"
> + # can serve refs, just like it does in protocol v0.
> + GIT_TEST_PROTOCOL_VERSION=0 git --git-dir=half-auth fetch &&
>   expect_askpass none

I suspect it's fine if protocol v2 never supports this.  Can we change
the NEEDSWORK comment to say that the protocol v2 spec should document
the lack of support for half-auth?

With or without such a change,
Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH v2 4/7] trace2: use system config for default trace2 settings

2019-04-02 Thread Jonathan Nieder
Josh Steadmon wrote:

> A question for the list: should these new config vars also be documented
> in the git-config manpage, or is it better to keep these separate since
> they are only read from the system config?

Yes, they should be in the git-config manpage.

*If* they are read only from the system config, the documentation
there should say so.

Thanks,
Jonathan


Re: [PATCH v2 4/7] trace2: use system config for default trace2 settings

2019-04-02 Thread Jonathan Nieder
Hi,

Jeff Hostetler via GitGitGadget wrote:

> Teach git to read the system config (usually "/etc/gitconfig") for
> default Trace2 settings.  This allows system-wide Trace2 settings to
> be installed and inherited to make it easier to manage a collection of
> systems.

Yay!  Thanks for writing this, and sorry for the slow review.

[...]
> Only the system config file is used.  Trace2 config values are ignored
> in local, global, and other config files.  Likewise, the "-c" command
> line arguments are ignored for Trace2 values.  These limits are for
> performance reasons.

Can you say a bit more about this?  If I'm willing to pay the
performance cost, is there a way for me to turn on respecting other
config files?  What is that performance cost?

[...]
> --- a/Documentation/technical/api-trace2.txt
> +++ b/Documentation/technical/api-trace2.txt
> @@ -117,6 +117,37 @@ values are recognized.
>  Socket type can be either `stream` or `dgram`.  If the socket type is
>  omitted, Git will try both.
>  
> +== Trace2 Settings in System Config
> +
> +Trace2 also reads configuration information from the system config.
> +This is intended to help adminstrators to gather system-wide Git
> +performance data.
> +
> +Trace2 only reads the system configuration, it does not read global,
> +local, worktree, or `-c` config settings.

An additional limitation is that this doesn't appear to support
include.* directives.  Intended?

[...]
> --- a/t/t0210-trace2-normal.sh
> +++ b/t/t0210-trace2-normal.sh
[...]
> +MOCK=./mock_system_config
> +
> +test_expect_success 'setup mocked /etc/gitconfig' '
> + git config --file $MOCK trace2.normalTarget "$(pwd)/trace.normal" &&
> + git config --file $MOCK trace2.normalBrief 1
> +'
> +
> +test_expect_success 'using mock, normal stream, return code 0' '
> + test_when_finished "rm trace.normal actual expect" &&
> + GIT_TEST_TR2_SYSTEM_CONFIG="$MOCK" test-tool trace2 001return 0 &&

Tests run with GIT_CONFIG_NOSYSTEM=1 to protect themselves from any
system config the user has.

So this would be easier to test if we can use user-level config for
it.

[...]
> --- /dev/null
> +++ b/trace2/tr2_sysenv.c
> @@ -0,0 +1,128 @@
> +#include "cache.h"
> +#include "config.h"
> +#include "dir.h"
> +#include "tr2_sysenv.h"
> +
> +/*
> + * Each entry represents a trace2 setting.
> + * See Documentation/technical/api-trace2.txt
> + */
> +struct tr2_sysenv_entry {
> + const char *env_var_name;
> + const char *git_config_name;
> +
> + char *value;
> + unsigned int getenv_called : 1;
> +};
> +
> +/*
> + * This table must match "enum tr2_sysenv_variable" in tr2_sysenv.h.

Can we deduplicate to avoid the need to match?

Perhaps using C99 array initializers would help:

[TR2_SYSENV_CFG_PARAM] = { ... },

[...]
> +/*
> + * Load Trace2 settings from the system config (usually "/etc/gitconfig"
> + * unless we were built with a runtime-prefix).  These are intended to
> + * define the default values for Trace2 as requested by the administrator.
> + */
> +void tr2_sysenv_load(void)
> +{
> + const char *system_config_pathname;
> + const char *test_pathname;
> +
> + system_config_pathname = git_etc_gitconfig();
> +
> + test_pathname = getenv("GIT_TEST_TR2_SYSTEM_CONFIG");
> + if (test_pathname) {
> + if (!*test_pathname || !strcmp(test_pathname, "0"))
> + return; /* disable use of system config */
> +
> + /* mock it with given test file */
> + system_config_pathname = test_pathname;
> + }
> +
> + if (file_exists(system_config_pathname))
> + git_config_from_file(tr2_sysenv_cb, system_config_pathname,
> +  NULL);

This duplicates functionality from config.c and misses some features
along the way (e.g. support for include.*).

Would read_early_config work?  If we want a variant that doesn't
discover_git_directory, we can change that function to handle it.

For our needs at Google, it would be very helpful to have support for
include.* and reading settings from at least $HOME/.gitconfig in
addition to /etc/gitconfig (even better if it supports per-repo config
as well).  I believe it would simplify the code and tests, too.  If
there's anything I can to help, please don't hesitate to ask.

Thanks,
Jonathan


Re: [PATCH 4/4] gc docs: downplay the usefulness of --aggressive

2019-03-18 Thread Jonathan Nieder
Hi,

Ævar Arnfjörð Bjarmason wrote:

> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -41,10 +41,20 @@ OPTIONS
>  --aggressive::
>   Usually 'git gc' runs very quickly while providing good disk
>   space utilization and performance.  This option will cause
> - 'git gc' to more aggressively optimize the repository at the expense
> - of taking much more time.  The effects of this optimization are
> - persistent, so this option only needs to be used occasionally; every
> - few hundred changesets or so.
> + 'git gc' to more aggressively optimize the repository to save storage 
> space
> + at the expense of taking much more time.

This part looks good.

> ++
> +Using this option may optimize for disk space at the expense of
> +runtime performance. See the `--depth` and `--window` documentation in
> +linkgit:git-repack[1]. It is not recommended that this option be used
> +to improve performance for a given repository without running tailored
> +performance benchmarks on it. It may make things better, or worse. Not
> +using this at all is the right trade-off for most users and their
> +repositories.

This part kind of feels like giving up.  Can we make --aggressive have
good runtime read performance so we don't have to hedge this way?
E.g. is this patch papering over a poor choice of --depth setting in
--aggressive?

> ++
> +The effects of this option are persistent to the extent that
> +`gc.autoPackLimit` and friends don't cause a consolidation of existing
> +pack(s) generated with this option.

Thanks and hope that helps,
Jonathan


Re: Shorthand for "$(git merge-base A B)"

2019-03-15 Thread Jonathan Nieder
Hi,

John Passaro wrote:

> I find myself fairly frequently doing something like "git log $(git
> merge-base A B)..C".

Hm.  Can you tell me more about the workflow / higher-level operation
where you do this?

Curious,
Jonathan


Re: Can't build first git commit

2019-03-04 Thread Jonathan Nieder
Hi,

Fabio Aiuto wrote:

> What a pity, It would have been very useful for me, to debug around
> that simple version, to understand how everithing works.

Jeff King suggested how you can update the build command to get it
working.  In general, I think people sometimes overthink what is
involved in a build: something like

cc *.c

plus appropriate warning, optimization, debug, -D, and library flags
should work just fine.

If you haven't already read it, I also recommend reading
https://www.kernel.org/pub/software/scm/git/docs/user-manual.html#hacking-git.
If you have ideas for improving the text there (perhaps with suggested
build instructions?), patches to Documentation/user-manual.txt are
very welcome.

Thanks and happy hacking,
Jonathan


Re: [WIP 0/7] CDN offloading of fetch response

2019-02-28 Thread Jonathan Nieder
Hi,

Sorry for the slow followup.  Thanks for probing into the design ---
this should be useful for getting the docs to be clear.

Christian Couder wrote:

> So it's likely that users will want a way to host on such sites
> incomplete repos using CDN offloading to a CDN on another site. And
> then if the CDN is not accessible for some reason, things will
> completely break when users will clone.

I think this would be a broken setup --- we can make it clear in the
protocol and server docs that you should only point to a CDN for which
you control the contents, to avoid breaking clients.

That doesn't prevent adding additional features in the future e.g. for
"server suggested alternates" --- it's just that I consider that a
separate feature.

Using CDN offloading requires cooperation of the hosting provider.
It's a way to optimize how fetches work, not a way to have a partial
repository on the server side.

[...]
> On Tue, Feb 26, 2019 at 12:45 AM Jonathan Nieder  wrote:

>> This doesn't stop a hosting provider from using e.g. server options to
>> allow the client more control over how their response is served, just
>> like can be done for other features of how the transfer works (how
>> often to send progress updates, whether to prioritize latency or
>> throughput, etc).
>
> Could you give a more concrete example of what could be done?

What I mean is passing server options using "git fetch --server-option".
For example:

git fetch -o priority=BATCH origin master

or

git fetch -o avoid-cdn=badcdn.example.com origin master

The interpretation of server options is up to the server.

>> What the client *can* do is turn off support for packfile URLs in a
>> request completely.  This is required for backward compatibility and
>> allows working around a host that has configured the feature
>> incorrectly.
>
> If the full content of a repo is really large, the size of a full pack
> file sent by an initial clone could be really big and many client
> machines could not have enough memory to deal with that. And this
> suppose that repo hosting providers would be ok to host very large
> repos in the first place.

Do we require the packfile to fit in memory?  If so, we should fix
that (to use streaming instead).

Thanks,
Jonathan


Re: [FIXUP] Fixup on tip of jt/http-auth-proto-v2-fix

2019-02-25 Thread Jonathan Nieder
Hi,

Jonathan Tan wrote:

> Thanks, Peff, for noticing this. It's because the client sometimes sends
> "" as a single request (that is, it flushes, and then before it
> sends any data, it flushes again). And post_rpc() assumes that it can
> always read something - which is usually correct, but not in this case;
> we read in stateless_connect() first, and if we read "", we need to
> tell post_rpc() to not read at all.
>
> This is a fixup on the tip of jt/http-auth-proto-v2-fix that fixes that.
>
> As for why the client sends a lone "", I'm not sure, but that's
> outside the scope of this patch set, I think.
>
> Signed-off-by: Jonathan Tan 
> ---
>  remote-curl.c | 32 +++-
>  1 file changed, 19 insertions(+), 13 deletions(-)

Tested-by: Jonathan Nieder 

Thanks for fixing it.

Is there a particular patch this should be squashed into, or does it
stand alone?  It the latter, mind writing a commit message for it?

Thanks,
Jonathan


Re: [WIP 7/7] upload-pack: send part of packfile response as uri

2019-02-25 Thread Jonathan Nieder
Hi,

Christian Couder wrote:
> On Sun, Feb 24, 2019 at 12:39 AM Jonathan Tan  
> wrote:

>> Teach upload-pack to send part of its packfile response as URIs.
>>
>> An administrator may configure a repository with one or more
>> "uploadpack.blobpackfileuri" lines, each line containing an OID and a
>> URI. A client may configure fetch.uriprotocols to be a comma-separated
>> list of protocols that it is willing to use to fetch additional
>> packfiles - this list will be sent to the server.
>
> So only packfiles will be fetched by the client using those protocols.

Yes.

>> Whenever an object
>> with one of those OIDs would appear in the packfile transmitted by
>> upload-pack, the server may exclude that object, and instead send the
>> URI.
>
> Ok, so each URI sent in the packfile corresponds to exactly one
> object. And when the client fetches one such URI it gets a packfile
> that contains only the corresponding object. Or is there something I
> misunderstood?

I think it's worth separating the protocol and the server
implementation:

The protocol allows arbitrary packfiles --- they do not have to be
single-object packfiles.

This patch for a server implementation only uses it to serve
single-object packfiles.

Thanks,
Jonathan


Re: [WIP 0/7] CDN offloading of fetch response

2019-02-25 Thread Jonathan Nieder
Hi,

Christian Couder wrote:
> On Sun, Feb 24, 2019 at 12:39 AM Jonathan Tan  
> wrote:

>> There are probably some more design discussions to be had:
>
> [...]
>
>>  - Client-side whitelist of protocol and hostnames. I've implemented
>>whitelist of protocol, but not hostnames.
>
> I would appreciate a more complete answer to my comments in:
>
> https://public-inbox.org/git/CAP8UFD16fvtu_dg3S_J9BjGpxAYvgp8SXscdh=tjb5jvabz...@mail.gmail.com/
>
> Especially I'd like to know what should the client do if they find out
> that for example a repo that contains a lot of large files is
> configured so that the large files should be fetched from a CDN that
> the client cannot use? Is the client forced to find or setup another
> repo configured differently if the client still wants to use CDN
> offloading?

The example from that message:

  For example I think the Great Firewall of China lets people in China
  use GitHub.com but not Google.com. So if people start configuring
  their repos on GitHub so that they send packs that contain Google.com
  CDN URLs (or actually anything that the Firewall blocks), it might
  create many problems for users in China if they don't have a way to
  opt out of receiving packs with those kind of URLs.

But the same thing can happen with redirects, with embedded assets in
web pages, and so on.  I think in this situation the user would likely
(and rightly) blame the host (github.com) for requiring access to a
separate inaccessible site, and the problem could be resolved with
them.

The beauty of this is that it's transparent to the client: the fact
that packfile transfer was offloaded to a CDN is an implementation
detail, and the server takes full responsibility for it.

This doesn't stop a hosting provider from using e.g. server options to
allow the client more control over how their response is served, just
like can be done for other features of how the transfer works (how
often to send progress updates, whether to prioritize latency or
throughput, etc).

What the client *can* do is turn off support for packfile URLs in a
request completely.  This is required for backward compatibility and
allows working around a host that has configured the feature
incorrectly.

Thanks for an interesting example,
Jonathan


Re: [PATCH 1/1] mingw: drop MakeMaker reference

2019-02-25 Thread Jonathan Nieder
Johannes Schindelin via GitGitGadget wrote:

> From: Johannes Schindelin 
>
> In 20d2a30f8ffe (Makefile: replace perl/Makefile.PL with simple make
> rules, 2017-12-10), Git stopped using MakeMaker. Therefore, that
> definition in the MINGW-specific section became useless.
>
> Signed-off-by: Johannes Schindelin 
> ---
>  config.mak.uname | 1 -
>  1 file changed, 1 deletion(-)

Yay!

Reviewed-by: Jonathan Nieder 

Is there a way to automate checking for make variables that are set
but never used?

Thanks,
Jonathan


Re: [Suggestion] Add Skip for t9020

2019-02-21 Thread Jonathan Nieder
Hi,

Randall S. Becker wrote:

> While this is a bit of a hack, it might be useful for skipping t9020 in
> environments where the svn.remote package is not installed. I can make this
> into a patch if this style is reasonable - guessing probably not and that
> the REMOTE_SVN test should go elsewhere if it is called that.
>
> diff --git a/t/t9020-remote-svn.sh b/t/t9020-remote-svn.sh
> index 6fca08e5e3..31edf99371 100755
> --- a/t/t9020-remote-svn.sh
> +++ b/t/t9020-remote-svn.sh
> @@ -12,6 +12,12 @@ then
> test_done
>  fi
>
> +python -c "import svn.remote" 2>/dev/null >/dev/null
> +if [ $? -eq 0 ]; then
> +   test_set_prereq REMOTE_SVN
> +fi
> +if ! test_have_prereq REMOTE_SVN
> +then
> +   skip_all='skipping remote-svn tests, python svn.remote not
> available'
> +   test_done
> +fi

Interesting.  Where do we use the svn.remote package?  I did a quick
grep and didn't find any instances.

Do you have output from running "./t9020-remote-svn.sh -v -i"?

Thanks,
Jonathan


Re: [PATCH] protocol-capabilities.txt: document symref

2019-02-11 Thread Jonathan Nieder
Josh Steadmon wrote:

> In 7171d8c15f ("upload-pack: send symbolic ref information as
> capability"), we added a symref capability to the pack protocol, but it
> was never documented. Adapt the patch notes from that commit and add
> them to the capabilities documentation.
>
> Signed-off-by: Josh Steadmon 
> ---
>  Documentation/technical/protocol-capabilities.txt | 14 ++
>  1 file changed, 14 insertions(+)

Reviewed-by: Jonathan Nieder 

Thanks.


Re: [PATCH 7/9] submodule: migrate get_next_submodule to use repository structs

2019-02-01 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> This patch tightens the check upfront, such that we do not need
> to spawn a child process to find out if the submodule is broken.

Sounds sensible.

[...]
> --- a/submodule.c
> +++ b/submodule.c
[...]
> @@ -1319,10 +1338,23 @@ static int get_next_submodule(struct child_process 
> *cp,
>   argv_array_push(&cp->args, default_argv);
>   argv_array_push(&cp->args, "--submodule-prefix");
>   argv_array_push(&cp->args, submodule_prefix.buf);
> +
> + repo_clear(repo);
> + free(repo);
>   ret = 1;
> + } else {
> + /*
> +  * An empty directory is normal,
> +  * the submodule is not initialized
> +  */
> + if (S_ISGITLINK(ce->ce_mode) &&
> + !is_empty_dir(ce->name)) {

What if the directory is nonempty (e.g. contains build artifacts)?

> + spf->result = 1;
> + strbuf_addf(err,
> + _("Could not access submodule 
> '%s'"),
> + ce->name);
> + }

Should this exit the loop?  Otherwise, multiple "Could not access"
messages can go in the same err string a big concatenated line.

Thanks,
Jonathan


Re: [PATCH (Apple Git) 12/13] Enable support for Xcode.app-bundled gitconfig

2019-01-30 Thread Jonathan Nieder
Hi,

Jeremy Huddleston Sequoia wrote:

> Unfortunately, I was quick to celebrate.  This picks up the bundled
> file instead of a system-wide file.  I'd love it if we could still
> honor system-wide config/attributes in addition to
> RUNTIME_PREFIX-relative ones (eg: user overrides system which
> overrides distribution).  I worry that as is, we'd stop referencing
> the system-wide configs which might confuse users.
> 
> Is that something you'd be interested in, or should we just continue
> to maintain our separate patches?

For the internal deployment at Google, what we've done is to put an
[include] path directive in the global gitconfig:

[include]
path = /usr/share/git-core/config

Users can edit the global git config in etc, but the distributed
config at /usr/share/git-core/config is read-only as part of the
distributed package.

We considered making an upstream change to bake in the distributed
config in the git binary but decided that this way is a little
nicer since it lets people comment out the include.path setting if
they want to e.g. for experimentation.  It's also more explicit
(hence easier to understand).

Would a similar approach work for your setup?  Can you say a little
more about how you'd like things to work from an end-user pov?

Thanks,
Jonathan


Re: [PATCH v3 4/4] tests: define GIT_TEST_SIDEBAND_ALL

2019-01-29 Thread Jonathan Nieder
Hi,

Jonathan Tan wrote:

> --- a/t/lib-httpd/apache.conf
> +++ b/t/lib-httpd/apache.conf
> @@ -78,6 +78,7 @@ PassEnv GNUPGHOME
>  PassEnv ASAN_OPTIONS
>  PassEnv GIT_TRACE
>  PassEnv GIT_CONFIG_NOSYSTEM
> +PassEnv GIT_TEST_SIDEBAND_ALL

This is producing

 [env:warn] [pid 248632] AH01506: PassEnv variable GIT_TEST_SIDEBAND_ALL was 
undefined

when tests are run with "prove".

Should we set the envvar unconditionally in lib-httpd.sh?

Thanks,
Jonathan


Re: sparse job, was Re: [PATCH] test-xml-encode: fix sparse NULL pointer warnings

2019-01-28 Thread Jonathan Nieder
(+cc: Uwe Kleine-König, maintainer of the sparse package in Debian)
Hi,

Ramsay Jones wrote[1]:

> Hmm, I've never built an Ubuntu package before, so I don't know
> exactly what would be required (spec file etc.) to create a PPA.
> But I suspect you are not talking about doing that, right?

Ubuntu uses the sparse packaging as is from Debian, so no PPA should
be needed.  I've sent https://bugs.debian.org/920776 to get the
package updated in Debian.  If all is well, it should go smoothly.

Thanks,
Jonathan

[1] 
https://public-inbox.org/git/ef5a8623-e5e6-5cc8-5178-0afce7b54...@ramsayjones.plus.com/


Re: [PATCH v3 7/8] checkout: introduce --{,no-}overlay option

2019-01-24 Thread Jonathan Nieder
Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> I find --ignore-removal fairly easy to understand, and I had no idea
>> what --overlay would mean.
>>
>> I realize this is just one user's experience.
>
> Exactly.  My impression was the exact opposite from yours.
>
> The phrase "removal" in the context of checkout does not click for
> me at all, and neither it does in the context of add, especially
> given that Git tracks states (i.e. snapshots), not changes.

Thanks.  What do you think of --skip-removals (or --skip-deletions)?
The idea is "among the changes that you would be making to the
worktree, skip any unlink() steps".

If that seems sensible, we can use it for "git checkout" and,
optionally, add it as a synonym for --ignore-removal to "git add" as
well.

Jonathan


Re: [PATCH v3 7/8] checkout: introduce --{,no-}overlay option

2019-01-23 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:
> Thomas Gummerer  writes:
>> Jonathan Nieder wrote:

>>> Is this analogous to "git add --ignore-removal"?  If so, can we just
>>> call it --ignore-removal?
>>
>> Yes, it seems like they are very similar.
>
> Hmm, I am not sure if the word "removal" makes sense in the context
> of "checkout", as "removal" is an _action_ just like "checking out"
> itself is, and not a _state_.  You'd check out a state out of a tree
> to the index and the working tree, so "checking out absence of a
> path" may make sense, though, as "absence of a path" is a state
> recorded in that source tree object.

I find --ignore-removal fairly easy to understand, and I had no idea
what --overlay would mean.

I realize this is just one user's experience.  I'd be happy to do a
little informal survey (e.g. taking the description from the manpage
and asking people to name the option) if that's useful.

See also https://dl.acm.org/citation.cfm?id=32212 on this subject.

> The word "removal" makes little sense in "git add --ignore-removal",
> but it and "git add --no-all" outlived their usefulness already, so
> it may not be worth _fixing_ it.  But I am mildly opposed to spread
> the earlier mistake to a new option.

I think that's a good place to end up: once we flip the default for
checkout, then --ignore-removal would be an obscure option in that
command as well.  The consistency with "git add" is just a bonus.

Thanks,
Jonathan


Re: [PATCH v3 7/8] checkout: introduce --{,no-}overlay option

2019-01-23 Thread Jonathan Nieder
Thomas Gummerer wrote:
> On 01/22, Jonathan Nieder wrote:

>> I had no idea what --overlay would mean and am still not clear on it.
>> Is this analogous to "git add --ignore-removal"?  If so, can we just
>> call it --ignore-removal?
>
> Yes, it seems like they are very similar.  I'm happy to rename the
> option.  The topic seems to have made it to 'next' already, so I'll
> submit the patches on top, unless reverting the topic out of next and
> replacing it is preferred?

A patch on top sounds good.

[...]
>> I'm nervous about the config with no associated warning or plan for
>> phasing it out.  It means that scripts using "git checkout" don't
>> get a consistent behavior unless they explicitly pass this option,
>> which didn't exist in older versions of Git --- in other words,
>> scripts have no real good option.  Can we plan a transition to
>> making --no-ignore-removal the default, in multiple steps?  For
>> example:
>
> As Junio mentioned, the plan was to just have this mode default when
> we introduce the new checkout-paths command.
>
> As checkout is a porcelain command, I had hoped it would be okay to
> also have this as a configuration option, for the time before
> 'checkout-paths' exists and while I'm getting used to actually typing
> 'checkout-paths' instead of 'checkout'.  However I get that there may
> be scripts that are using git checkout, and expect the previous
> behaviour, so I'm also okay with dropping the config option for now.

Yes, if we have no plan for flipping the default later, then I would
prefer to eliminate the config option.  Scripts very frequently use
human-facing commands like "git checkout" when they want the command
to produce (unparsable) friendly output to show to humans, and I don't
think we've provided a good alternative for that use case.

> If we still want to make this the default even after 'checkout-paths'
> exists, the plan you outline below sounds good to me, though maybe we
> can make the "flip the default" step once we decide to release git
> 3.0.

I would really like this, so I might write a series for it.  Please
don't wait for me, though --- feel free to send any patches you're
thinking about and we can work together or I can just appreciate your
work. ;-)

Sincerely,
Jonathan


Re: [PATCH v3 7/8] checkout: introduce --{,no-}overlay option

2019-01-22 Thread Jonathan Nieder
Hi,

Thomas Gummerer wrote:

> Currently 'git checkout' is defined as an overlay operation, which
> means that if in 'git checkout  -- []' we have an
> entry in the index that matches , but that doesn't exist in
> , that entry will not be removed from the index or the
> working tree.
>
> Introduce a new --{,no-}overlay option, which allows using 'git
> checkout' in non-overlay mode, thus removing files from the working
> tree if they do not exist in  but match .

This patch just hit my workstation.  Some initial thoughts:

I had no idea what --overlay would mean and am still not clear on it.
Is this analogous to "git add --ignore-removal"?  If so, can we just
call it --ignore-removal?

Thank you thank you thank you for working on this.  I run into this
all the time and am super excited about the "default to
--no-ignore-removal" future.

I'm nervous about the config with no associated warning or plan for
phasing it out.  It means that scripts using "git checkout" don't
get a consistent behavior unless they explicitly pass this option,
which didn't exist in older versions of Git --- in other words,
scripts have no real good option.  Can we plan a transition to
making --no-ignore-removal the default, in multiple steps?  For
example:

 1. First introduce the commandline option, as in this series

 2. Next, change the default to warn whenever the difference would
matter, printing a hint about how to configure to explicitly
request the old or new behavior.

 3. After a release or two has passed so people get a chance
to update their scripts, flip the default.

 4. Finally, remove the warning.

 5. Warn whenver the difference would matter when a user has
requested the old behavior through config, in preparation
for removing the config.

 6. Remove the config.

Steps 5 and 6 are optional but might be nice.

What do you think?

Thanks,
Jonathan


Re: Git bomb still present (at least in SUSE)?

2019-01-15 Thread Jonathan Nieder
(moving conversation to the main Git list.  I hope that's okay.)
Hi,

Jeff King wrote:
> On Tue, Jan 15, 2019 at 02:35:29PM +0100, Marketa Calabkova wrote:

>> meggy@irbis:/tmp/test> /usr/bin/time git clone
>> https://github.com/Katee/git-bomb.git
>> Cloning into 'git-bomb'...
>> remote: Enumerating objects: 18, done.
>> remote: Total 18 (delta 0), reused 0 (delta 0), pack-reused 18
>> Unpacking objects: 100% (18/18), done.
>> ^Cwarning: Clone succeeded, but checkout failed.
[...]
>   git clone --bare https://github.com/Katee/git-bomb.git
>   cd git-bomb.git
>   git read-tree HEAD ;# yikes!
>
> So I don't think there's a bug per se. It is possible that Git could
> have better protections against maliciously gigantic repositories, but I
> don't think anybody is actively working on such a feature (and it would
> involve much more than this case; it's pretty easy to generate trees
> with pessimal diffs, etc).

One thing I think interested people could do is introduce some kind of
"limits" subsystem into Git, so that a person could configure Git to
refuse to even try when it notices that an operation is going to be
sufficiently expensive.  I.e. something similar to what rlimits (or
other limits e.g. enforced in cgroups) provide in an operating system.

That said, as alluded to in the last paragraph, there's also some
protection possible at the operating system level.

So my feeling is that there's some real potential for improvement
here, and I'm happy to help mentor anyone working on it if it is their
itch (because of the "can handle at another level" thing, it is not
mine).

Thanks,
Jonathan


Re: [Bug report] Git incorrectly selects language in macos

2019-01-15 Thread Jonathan Nieder
Nate Weaver wrote:

> Upon further digging, this is an issue in gettext's code on macOS:
> The function _nl_language_preferences_default (in langprefs.c) specifically
> breaks early when it sees the literal string "en" in the list (from the
> "AppleLanguages" defaults key), but not when it gets "en-US", etc.

Sorry I missed your other followup.

As you mentioned, the fix is at https://savannah.gnu.org/bugs/?49560.
There isn't a gettext release yet with that fix, so perhaps
https://github.com/Homebrew/homebrew-core/blob/master/Formula/gettext.rb
could be updated to apply that fix as a patch
(https://docs.brew.sh/Formula-Cookbook#patches).

Thanks again and sorry for the noise,
Jonathan


Re: [Bug report] Git incorrectly selects language in macos

2019-01-15 Thread Jonathan Nieder
Nate Weaver wrote:
>>> On Fri, Sep 14, 2018 at 10:20 PM Niko Dzhus  wrote:

 Looks like the issue appeared after updating git from brew.
[...]
 A quick search revealed that brew changed how it builds git recently.
 I think, it just didn't include i18n by default before, so I never
 noticed this.
[...]
> Upon further digging, this is an issue in gettext's code on macOS:
> The function _nl_language_preferences_default (in langprefs.c) specifically
> breaks early when it sees the literal string "en" in the list (from the
> "AppleLanguages" defaults key), but not when it gets "en-US", etc.
>
> I.e., it doesn't check for a prefix.
>
> This can be fixed (though there might be a better way) simply by replacing
> both instances of
>
> if (strcmp (buf, "en") == 0)
>
> with
>
> if (strncmp (buf, "en", 2) == 0)
>
> in gettext's langprefs.c.

Nice sleuthing.  Do you have a bug tracking this in homebrew's issue
tracker, either for the git formula or the gettext formula?

Thanks,
Jonathan


Re: [PATCH 1/5] compat/obstack: fix -Wcast-function-type warnings

2019-01-15 Thread Jonathan Nieder
Hi,

SZEDER Gábor wrote:
> On Fri, Jan 11, 2019 at 07:51:18PM +0100, SZEDER Gábor wrote:

>> As to re-importing obstack.{c,h} from upstream, we've made some
>> portability fixes to these files, and neither of the commit messages
>> of those fixes mention that they are backports from upstream.  OTOH,
>> one of those commits mentions platforms like
>> "i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1, SunOS 5.10", which makes
>> me suspect that the re-import will be susceptible to those portability
>> issues again.  Therefore, I think re-importing these files from
>> upstream is beyond the scope of this patch series (and might not be
>> the right thing at all).
>
> gnulib's obstack.{c,h} doesn't fix the issues that we've fixed in
> 3254310863 (obstack.c: Fix some sparse warnings, 2011-09-11) and
> d190a0875f (obstack: Fix portability issues, 2011-08-28).  So if we
> were to re-import from gnulib, then these two patches would have to be
> applied on top yet again.

Thanks for looking into it.  The former looks applicable to upstream,
while the latter appears to do some Git-specific things (e.g. relying
on git-compat-util.h).

Mind if I send the former upstream?  I believe gnulib upstream relies
on copyright assignment, so it would help if you have a copyright
assignment for the project on file, but if not, they may consider it a
small enough change to take without.

Thanks,
Jonathan


Re: [PATCH v2 0/2] Accept error packets in any context

2019-01-14 Thread Jonathan Nieder
Jonathan Nieder wrote:
> Junio C Hamano wrote:

>> In short, are you shooting js/smart-http-detect-remote-error topic
>> down and replacing it with this one?
>>
>> As that topic is not yet in 'next', I am perfectly fine doing that.
>> I just want to make sure that is what you meant, as my reading of
>> [4] was a bit fuzzy.
>
> Josh, looking at that branch, I see:
>
>  remote-curl: die on server-side errors
>
>   Includes a test illustrating error handling in the ref
>   advertisement.  Should that be revived as a standalone patch,
>   without the remote-curl.c change?

In fact, it appears I have that locally:

 commit 7fe6abcd775dbffd919891d5838f8d125e41f160
 Author: Josh Steadmon 
 Date:   Tue Dec 11 16:25:18 2018 -0800

lib-httpd, t5551: check server-side HTTP errors

Add an HTTP path to the test server config that returns an ERR pkt-line
unconditionally. Verify in t5551 that remote-curl properly detects this
ERR message and aborts.

Signed-off-by: Josh Steadmon 
Signed-off-by: Jeff King 
Signed-off-by: Junio C Hamano 

It's handled fine by the merge 7be333a6362882e8ffceef3de830dbbfafe5
(Merge branch 'js/smart-http-detect-remote-error' into pu, 2019-01-11),
though.  So I think what is in "pu" is okay, without shooting any series
down.  (Alternatively we can combine them into a single five-patch
series, if the maintainer prefers.)

Thanks,
Jonathan


Re: [PATCH v2 0/2] Accept error packets in any context

2019-01-14 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:
> Masaya Suzuki  writes:

>> This makes it possible for servers to send an error message back to clients 
>> in
>> an arbitrary situation.

Yay!  Yes, this should simplify server implementations and user support.

[...]
> In short, are you shooting js/smart-http-detect-remote-error topic
> down and replacing it with this one?
>
> As that topic is not yet in 'next', I am perfectly fine doing that.
> I just want to make sure that is what you meant, as my reading of
> [4] was a bit fuzzy.

Josh, looking at that branch, I see:

 remote-curl: die on server-side errors

Includes a test illustrating error handling in the ref
advertisement.  Should that be revived as a standalone patch,
without the remote-curl.c change?

 remote-curl: tighten "version 2" check for smart-http

A bug fix.  We had an analogous bug in the .googlesource.com
servers and it was problematic when experimenting with
protocol changes using placeholder version numbers mmdd
(since that starts with a "2").

 remote-curl: refactor smart-http discovery

A nice cleanup.

Thanks,
Jonathan


Re: [PATCHv2 0/9] Resending sb/submodule-recursive-fetch-gets-the-tip

2019-01-14 Thread Jonathan Nieder
Josh Steadmon wrote:

> I am not very familiar with most of the submodule code, but for what
> it's worth, this entire series looks good to me. I'll note that most of
> the commits caused some style complaints, but I'll leave it up to your
> judgement as to whether they're valid or not.
>
> Reviewed-by: Josh Steadmon 

For what it's worth, we've been running with this at Google (using the
sb/submodule-recursive-fetch-gets-the-tip branch from "pu") for more
than a month, with no user complaints.

Tested-by: Jonathan Nieder 


Re: [RFC] submodule: munge paths to submodule git directories

2019-01-14 Thread Jonathan Nieder
Hi,

In August, 2018, Brandon Williams wrote:

> Commit 0383bbb901 (submodule-config: verify submodule names as paths,
> 2018-04-30) introduced some checks to ensure that submodule names don't
> include directory traversal components (e.g. "../").
>
> This addresses the vulnerability identified in 0383bbb901 but the root
> cause is that we use submodule names to construct paths to the
> submodule's git directory.  What we really should do is munge the
> submodule name before using it to construct a path.

Thanks again for this.  I liked the proposal enough to run Git with
patches implementing it for a while.  That said, there were some
unaddressed comments in the review.

I've put a summary in https://crbug.com/git/28 to make this easier to
pick up where we left off.  Summary from there of the upstream review:

1. Using urlencoding to escape the slashes is fine, but what if we
   want to escape some other character (for example to handle
   case-insensitive filesystems)?

   Proposal: Store the escaping mapping in config[1] so it can be
   modified it in the future:

[submodule "plugin/hooks"]
gitdirname = plugins%2fhooks

2. The urlencoded name could conflict with a submodule that has % in
   its name in an existing clone created by an older version of Git.

   Proposal: Put submodules in a new .git/submodules/ directory
   instead of .git/modules/.

3. These gitdirname settings can clutter up .git/config.

   Proposal: For the "easy" cases (e.g. submodule name consisting of
   [a-z]*), allow omitting the gitdirname setting.

Is that a fair summary?  Are there concerns from the review that I
forgot, or would a new version of the series that addresses those
three problems put us in good shape?

Thanks,
Jonathan

[1] https://public-inbox.org/git/20180816181940.46114-1-bmw...@google.com/


Re: [PATCH 0/8] WIP: trace2: a new trace facility

2019-01-14 Thread Jonathan Nieder
Hi,

Jeff Hostetler wrote:

> This patch series contains a new trace2 facility that hopefully addresses
> the recent trace- and structured-logging-related discussions. The intent is
> to eventually replace the existing trace_ routines (or to route them to the
> new trace2_ routines) as time permits.

I've been running with these patches since last October.  A few
thoughts:

I like the API.

The logs are a bit noisy and especially wide.  For my use, the
function name is not too important since we can get that from the file
and line number.  Should we have a way to omit some fields, or is that
for post-processing?

We don't find the JSON easy to parse and would prefer a binary format.

When I apply the patches, Git complains about whitespace problems
(trailing whitespace, etc).

Aside from that kind of easily correctible issue (trailing
whitespace), I'd be in favor of taking these patches pretty much as-is
and making improvements in tree.  Any objections to that, or do you
have other thoughts on where this should go?

If that sounds reasonable to you, I can send a clean version of these
based against current "master".  If I understand correctly, then

 https://github.com/jeffhostetler/git

branch

 gvfs-trace2-v4

contains some improvements, so as a next step I'd try to extract those
as incremental patches on top.  What do you think?

Thanks,
Jonathan


[PATCH/RFC] fsck: complain when .gitignore and .gitattributes are symlinks

2019-01-14 Thread Jonathan Nieder
From: Jeff King 
Date: Sun, 13 May 2018 14:14:34 -0400

This case is already forbidden by verify_path(), so let's
check it in fsck. It's easier to handle than .gitmodules,
because we don't care about checking the blob content. This
is really just about whether the name and mode for the tree
entry are valid.

Signed-off-by: Jeff King 
Signed-off-by: Jonathan Nieder 
---
Hi,

This patch is from the 2.20.0 era, from the same series as

 fsck: detect submodule urls starting with dash

It was omitted from that series because it does not address any known
exploit, but to me it seems worthwhile anyway:

- if a client enables transfer.fsckObjects, this helps them protect
  themselves against weird input that does *not* have a known exploit
  attached, to

- it generally feels more simple and robust.  Git-related tools can
  benefit from this kind of check as an indication of input they can
  bail out on instead of trying to support.

Peff checked it against repos in the wild and found this to be very
rare but existent (e.g. https://github.com/acquia/blt has a
.gitattributes symlink).  Linus suggested that we may want it to be
INFO instead of ERROR, so that people can at least notice that their
.gitattributes symlink is likely to have no effect.  This patch still
uses ERROR because I suspect that this is rare enough in the wild that
people will be able to cope.

Thoughts?

Thanks,
Jonathan

 fsck.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/fsck.c b/fsck.c
index 68502ce85b..850363fc8e 100644
--- a/fsck.c
+++ b/fsck.c
@@ -68,6 +68,8 @@ static struct oidset gitmodules_done = OIDSET_INIT;
FUNC(GITMODULES_SYMLINK, ERROR) \
FUNC(GITMODULES_URL, ERROR) \
FUNC(GITMODULES_PATH, ERROR) \
+   FUNC(GITIGNORE_SYMLINK, ERROR) \
+   FUNC(GITATTRIBUTES_SYMLINK, ERROR) \
/* warnings */ \
FUNC(BAD_FILEMODE, WARN) \
FUNC(EMPTY_NAME, WARN) \
@@ -627,6 +629,19 @@ static int fsck_tree(struct tree *item, struct 
fsck_options *options)
 ".gitmodules is a symbolic 
link");
}
 
+   if (S_ISLNK(mode)) {
+   if (is_hfs_dotgitignore(name) ||
+   is_ntfs_dotgitignore(name))
+   retval += report(options, &item->object,
+FSCK_MSG_GITIGNORE_SYMLINK,
+".gitignore is a symlink");
+   if (is_hfs_dotgitattributes(name) ||
+   is_ntfs_dotgitattributes(name))
+   retval += report(options, &item->object,
+FSCK_MSG_GITATTRIBUTES_SYMLINK,
+".gitattributes is a symlink");
+   }
+
if (update_tree_entry_gently(&desc)) {
retval += report(options, &item->object, 
FSCK_MSG_BAD_TREE, "cannot be parsed as a tree");
break;
-- 
2.20.1.97.g81188d93c3



Re: [PATCH] FYI / RFC: submodules: introduce repo-like workflow

2019-01-14 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> Internally we have rolled out this as an experiment for
> "submodules replacing the repo tool[1]".

Thanks again for writing and explaining it.  Here's an updated
version, basically identical (just rebased).  "git range-diff" shows
mostly context lines affected, so this really is a minor update.

I suspect the *next* time this hits the list will be more interesting,
since we can start making use of the repository API.

Thoughts of all kinds welcome, as always.

-- >8 --
From: Stefan Beller 
Date: Mon, 4 Dec 2017 15:47:40 -0800

Internally we have rolled out this as an experiment for
"submodules replacing the repo tool[1]". The repo tool is described as:

Repo unifies Git repositories when necessary, performs uploads to the
Gerrit revision control system, and automates parts of the Android
development workflow. Repo is not meant to replace Git, only to make
it easier to work with Git in the context of Android. The repo command
is an executable Python script that you can put anywhere in your path.

In working with the Android source files, you use Repo for
across-network operations. For example, with a single Repo command you
can download files from multiple repositories into your local working
directory.

In most situations, you can use Git instead of Repo, or mix Repo and
Git commands to form complex commands.

[1] https://source.android.com/setup/develop/

Submodules can also be understood as unifying Git repositories, even more,
in the superproject the submodules have relationships between their
versions, which the repo tool only provides in releases.

The repo tool does not provide these relationships between versions, but
all the repositories (in case of Android think of ~1000 git repositories)
are put in their place without depending on each other.

This comes with a couple of advantages and disadvantages:

* Many users are familiar with Git, but not submodules. Each repository
  can be used independently with Git and there is no need to update the
  superproject or the repo manifest for a change in a repository.
* It is easy to work with repositories with no version-control-dependencies
  if there are dependencies in the code. In case of Android the
  repositories are bound at natural boundaries. For example the linux
  kernel is one repository, as then upstream work is made easy for this
  repository. So it is desirable to keep an easy-as-repo workflow.
* Fetching changes ("repo sync") needs to fetch all repositories, as there
  is no central place that tracks what has changed. In a superproject
  git fetch can determine which submodules need fetching.  In Androids
  case the daily change is only in a few repositories (think 10s), so
  migrating to a superproject would save an order of magnitude in fetch
  traffic for daily updates of developers.
* Sometimes when the dependencies are not on a clear repository boundary
  one would like to have git-bisect available across the different
  repositories, which repo cannot provide due to its design.

Internally we have the Gerrit as a central point, where the source of
truth is found for a given repository.

This patch adds a new mode to submodule handling, where the superproject
controls the existence of the submodule (just as current submodule
handling), but the submodule HEAD is not detached, but following the same
branch name as the superproject.

Current situation visualized:

  superproject
  HEAD -> "" -> OID
  |
  submodule   v
  HEAD > OID

The OID in the submodule is controlled via the HEAD in the submodule that
is set accordingly to the gitlink in the superproject. Confusion can arise
when the (detached) HEAD in the submodule doesn't match the superprojects
gitlink.

This patch visualized:

  superproject
  HEAD -> "" -> OID
 |
  submodule  v
  HEAD -> "" -> OID

As there is a central point of truth in our setup (our Gerrit installation)
which keeps the superproject and the submodule branches in sync, this
ought to look the same for the user; removing the "detached HEAD" in the
submodule. git-status will still notice if there is an OID mismatch between
the gitlink and the submodules branch, but that is a race condition and
should be caught by Gerrit.

This changes the following commands in the superproject:

  checkout -B/-b create branches in subs, too
  checkout (-f): update branch in submodule (create if needed) and check
 it out; Pass the argument down literally if it is a branch
 name (e.g. "checkout -f master" will run a
"checkout -f master" in the submodule as well)
  clone: see checkout
  reset --hard: see checkout

Signed-off-by: Stefan Beller 
Signed-off-by:

  1   2   3   4   5   6   7   8   9   10   >