[PATCH v2 0/7] read/write_in_full leftovers

2017-09-26 Thread Jeff King
On Mon, Sep 25, 2017 at 04:26:47PM -0400, Jeff King wrote:

> This series addresses the bits leftover from the discussion two weeks
> ago in:
> 
>   
> https://public-inbox.org/git/20170913170807.cyx7rrpoyhaau...@sigill.intra.peff.net/
> 
> and its subthread. I don't think any of these is a real problem in
> practice, so this can be considered as a cleanup. I'm on the fence over
> whether the final one is a good idea. See the commit message for
> details.
> 
>   [1/7]: files-backend: prefer "0" for write_in_full() error check
>   [2/7]: notes-merge: drop dead zero-write code
>   [3/7]: read_in_full: reset errno before reading
>   [4/7]: get-tar-commit-id: prefer "!=" for read_in_full() error check
>   [5/7]: worktree: use xsize_t to access file size
>   [6/7]: worktree: check the result of read_in_full()
>   [7/7]: add xread_in_full() helper

Here's a followup based on the discussion on v1. There are actually a
few changes here.

I dropped the "read_in_full() should set errno on short reads" idea (3/7
in the earlier series). It really is the caller's fault for looking at
errno when they know there hasn't been an error in the first place. We
should just bite the bullet and have the callers do the right thing.

I also dropped the "xread_in_full" helper (7/7 earlier). The lego
sentences it created just weren't worth the hassle. Instead, I've fixed
all of the relevant callers to provide good error messages for both
cases. It's a few more lines of code, and it's probably rare for users
to see these in the first place. But it doesn't hurt too much to be
thorough, and I think it's good to model correct error handling. This is
in patches 4 and 5 below.

There are a handful of other changes:

  [1/7]: files-backend: prefer "0" for write_in_full() error check
  [2/7]: notes-merge: drop dead zero-write code

These two are the same as before.

  [3/7]: prefer "!=" when checking read_in_full() result

Along with the get-tar-commit-id case, I found two other cases which
aren't bugs, but which should be updated for style reasons.

  [4/7]: avoid looking at errno for short read_in_full() returns
  [5/7]: distinguish error versus short_read from read_in_full()

These two are new, and add in the messages. The ones in patch 4 are
actively wrong (they show a bogus errno). The ones in 5 are just
less descriptive than they could be. So if we find the extra lines
of code too much, we could drop 5 (or even convert the ones in 4 to
just stop looking at errno).

  [6/7]: worktree: use xsize_t to access file size

Same as before.

  [7/7]: worktree: check the result of read_in_full()

Similar, but now handles errors and short_reads separately. It also
avoids leaking descriptors and memory in the error path (noticed by
Coverity).

 builtin/get-tar-commit-id.c |  6 --
 builtin/worktree.c  | 24 +---
 bulk-checkin.c  |  5 -
 csum-file.c |  2 +-
 notes-merge.c   |  2 --
 pack-write.c|  7 ++-
 packfile.c  | 11 +--
 pkt-line.c  |  2 +-
 refs/files-backend.c|  2 +-
 sha1_file.c | 11 ---
 10 files changed, 55 insertions(+), 17 deletions(-)

-Peff


Re: [PATCH v2 6/9] connect: teach client to recognize v1 server response

2017-09-26 Thread Junio C Hamano
Brandon Williams  writes:

> +/* Returns 1 if packet_buffer is a protocol version pkt-line, 0 otherwise. */
> +static int process_protocol_version(void)
> +{
> + switch (determine_protocol_version_client(packet_buffer)) {
> + case protocol_v1:
> + return 1;
> + case protocol_v0:
> + return 0;
> + default:
> + die("server is speaking an unknown protocol");
> + }
> +}

For the purpose of "technology demonstration" v1 protocol, it is OK
to discard the result of "determine_pvc()" like the above code, but
in a real application, we would do a bit more than just ignoring an
extra "version #" packet that appears at the beginning, no?

It would be sensible to design how the result of determien_pvc()
call is propagated to the remainder of the program in this patch and
implement it.  Perhaps add a new global (like server_capabilities
already is) and store the value there, or something?  Or pass a
pointer to enum protocol_version as a return-location parameter to
this helper function so that the process_capabilities() can pass a
pointer to its local variable?

>  static void process_capabilities(int *len)
>  {
> @@ -224,12 +239,19 @@ struct ref **get_remote_heads(int in, char *src_buf, 
> size_t src_len,
>*/
>   int responded = 0;
>   int len;
> - int state = EXPECTING_FIRST_REF;
> + int state = EXPECTING_PROTOCOL_VERSION;
>  
>   *list = NULL;
>  
>   while ((len = read_remote_ref(in, _buf, _len, ))) {
>   switch (state) {
> + case EXPECTING_PROTOCOL_VERSION:
> + if (process_protocol_version()) {
> + state = EXPECTING_FIRST_REF;
> + break;
> + }
> + state = EXPECTING_FIRST_REF;
> + /* fallthrough */
>   case EXPECTING_FIRST_REF:
>   process_capabilities();
>   if (process_dummy_ref()) {


Re: [PATCH v2 5/9] upload-pack, receive-pack: introduce protocol version 1

2017-09-26 Thread Junio C Hamano
Brandon Williams  writes:

> @@ -1963,6 +1964,19 @@ int cmd_receive_pack(int argc, const char **argv, 
> const char *prefix)
>   else if (0 <= receive_unpack_limit)
>   unpack_limit = receive_unpack_limit;
>  
> + switch (determine_protocol_version_server()) {
> + case protocol_v1:
> + if (advertise_refs || !stateless_rpc)
> + packet_write_fmt(1, "version 1\n");
> + /*
> +  * v1 is just the original protocol with a version string,
> +  * so just fall through after writing the version string.
> +  */
> + case protocol_v0:
> + default:
> + break;

When protocol_v2 is introduced in the other part of the codebase
(i.e. in protocol.[ch]), until these lines are updated accordingly
to take care of the new protocol, we'd pretend that client asked
(and the server accepted) v0, even though the client and the daemon
agreed to talk v2.

Shouldn't the "default:" die instead?  The same for upload-pack.c


Re: [PATCH v2 4/9] daemon: recognize hidden request arguments

2017-09-26 Thread Junio C Hamano
Brandon Williams  writes:

> A normal request to git-daemon is structured as
> "command path/to/repo\0host=..\0" and due to a bug in an old version of
> git-daemon 73bb33a94 (daemon: Strictly parse the "extra arg" part of the
> command, 2009-06-04) we aren't able to place any extra args (separated
> by NULs) besides the host.

It's a bit unclear if that commit _introduced_ a bug, or just
noticed an old bug and documented it in its log message.  How does
that commit impact the versons of Git that the updated code is
capable of interracting with?

> +static void parse_extra_args(struct hostinfo *hi, struct argv_array *env,
> +  char *extra_args, int buflen)
> +{
> + const char *end = extra_args + buflen;
> + struct strbuf git_protocol = STRBUF_INIT;
> +
> + /* First look for the host argument */
> + extra_args = parse_host_arg(hi, extra_args, buflen);
> +
> + /* Look for additional arguments places after a second NUL byte */
> + for (; extra_args < end; extra_args += strlen(extra_args) + 1) {
> + const char *arg = extra_args;
> +
> + /*
> +  * Parse the extra arguments, adding most to 'git_protocol'
> +  * which will be used to set the 'GIT_PROTOCOL' envvar in the
> +  * service that will be run.
> +  *
> +  * If there ends up being a particular arg in the future that
> +  * git-daemon needs to parse specificly (like the 'host' arg)
> +  * then it can be parsed here and not added to 'git_protocol'.
> +  */
> + if (*arg) {
> + if (git_protocol.len > 0)
> + strbuf_addch(_protocol, ':');
> + strbuf_addstr(_protocol, arg);
> + }
> + }
> +
> + if (git_protocol.len > 0)
> + argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=%s",
> +  git_protocol.buf);
> + strbuf_release(_protocol);
>  }


Re: -s theirs use-case(s) Was: BUG: merge -s theirs is not in effect

2017-09-26 Thread Yaroslav Halchenko

On Wed, 27 Sep 2017, Junio C Hamano wrote:

> > and that is where the gotcha comes -- what if "my" changes were already
> > published?  then I would like to avoid the rebase, and would -s theirs
> > to choose "their" solution in favor of mine and be able to push so
> > others could still "fast-forward" to the new state.

> > So -- as to me it remains 'symmetric' ;)

> I do not necessarily agree.  Once you decide that their history is
> the mainline, you'd rather want to treat your line of development as
> a side branch and make a merge in that direction, i.e. the first
> parent of the resulting merge is a commit on their history and the
> second parent is the last bad one of your history.  So you would end
> up using "checkout their-history && merge -s ours your-history" to
> keep the first-parenthood sensible.

> And at that point, use of "-s ours" is no longer a workaround for
> lack of "-s theirs".  It is a proper part of the desired semantics,
> i.e. from the point of view of the surviving canonical history line,
> you want to preserve what it did, nullifying what the other line of
> history did.

> So I still do not think the above scenario justifies "-s theirs".

ok, when you describe it like this (in my case I rarely cared about the
side of the merge), then indeed I might better do the entire dance with
git reset --hard theirstate; git merge -s ours HEAD@{1}
and live happily with the left side being the one always correct and
hide "my" mistakes ;)  will keep it in mind

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik


Re: [PATCH v2 3/9] protocol: introduce protocol extention mechanisms

2017-09-26 Thread Junio C Hamano
Brandon Williams  writes:

> +`GIT_PROTOCOL`::
> + For internal use only.  Used in handshaking the wire protocol.
> + Contains a colon ':' separated list of keys with optional values
> + 'key[=value]'.  Presence of unknown keys must be tolerated.

Is this meant to be used only on the "server" end?  Am I correct to
interpret "handshaking" to mean the initial connection acceptor
(e.g. "git daemon") uses it to pass what it decided to the programs
that implement the service (e.g. "git receive-pack")?

> +/*
> + * Environment variable used in handshaking the wire protocol.
> + * Contains a colon ':' separated list of keys with optional values
> + * 'key[=value]'.  Presence of unknown keys must be tolerated.
> + */
> +#define GIT_PROTOCOL_ENVIRONMENT "GIT_PROTOCOL"

"Must be tolerated" feels a bit strange.  When somebody asks you to
use "version 3" or "version 1 variant 2", when you only know
"version 0" or "version 1" and you are not yet even aware of the
concept of "variant", we simply ignore "variant=2" as if it wasn't
there, even though "version=3" will be rejected (because we know of
"version"; it's just that we don't know "version=3").

> +enum protocol_version determine_protocol_version_server(void)
> +{
> + const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
> + enum protocol_version version = protocol_v0;
> +
> + if (git_protocol) {
> + struct string_list list = STRING_LIST_INIT_DUP;
> + const struct string_list_item *item;
> + string_list_split(, git_protocol, ':', -1);
> +
> + for_each_string_list_item(item, ) {
> + const char *value;
> + enum protocol_version v;
> +
> + if (skip_prefix(item->string, "version=", )) {
> + v = parse_protocol_version(value);
> + if (v > version)
> + version = v;
> + }
> + }
> +
> + string_list_clear(, 0);
> + }
> +
> + return version;
> +}

This implements "the largest one wins", not "the last one wins".  Is
there a particular reason why the former is chosen?



Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-26 Thread Junio C Hamano
Jeff King  writes:

> You actually don't need errno for that. You can write:
>
>   ret = read_in_full(..., size);
>   if (ret < 0)
>   die_errno("oops");
>   else if (ret != size)
>   die("short read");
>
> So I think using errno as a sentinel value to tell between the two cases
> doesn't have much value.
> ...
> One of the reasons I dislike the helper I showed earlier is that it
> involves assembling a lego sentence. But the alternative is requiring
> the caller to write both sentences from scratch (and writing a good
> sentence for the short read really is long: you expected X bytes but got
> Y).

All sensible considerations.  I probably should have read the entire
discussion twice before starting to type X-<.  Sorry about the
noise.

> Sometimes a custom "short read" message can be better, if it's "could
> not read enough bytes for packfile header" or similar. You don't need to
> know the exact number then.

That's true, too.  I dunno, either.


Re: [PATCH 3/7] read_in_full: reset errno before reading

2017-09-26 Thread Jeff King
On Tue, Sep 26, 2017 at 01:11:59PM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> >> #ifndef EUNDERFLOW
> >> # ifdef ENODATA
> >> #  define EUNDERFLOW ENODATA
> >> # else
> >> #  define EUNDERFLOW ESPIPE
> >> # endif
> >> #endif
> >
> > Right, I think our mails just crossed but I'm leaning in this direction.
> 
> Hmph, I may be slow (or may be skimming the exchanges too fast), but
> what exactly is wrong with "0"?  As long as we do not have to tell
> two or more "not exactly an error from syscall in errno" cases, I
> would think "0" is the best value to use.

The main reason to avoid "0" is just that the "read error: Success"
message is a bit funny.

> If the syserror message _is_ the issue, then we'd need to either
> pick an existing errno that is available everywhere (with possibly
> suboptimal message), or pick something and prepare a fallback for
> platforms that lack the errno, so picking "0" as the value and use
> whatever logic we would have used for the "fallback" would not sound
> too bad.  I.e.
> 
>   if (read_in_full(..., size) != size)
>   if (errno)
>   die_errno("oops");
>   else
>   die("short read");
> 
> If the callsite were too numerous,

You actually don't need errno for that. You can write:

  ret = read_in_full(..., size);
  if (ret < 0)
die_errno("oops");
  else if (ret != size)
die("short read");

So I think using errno as a sentinel value to tell between the two cases
doesn't have much value.

> #define die_errno_or(msg1, msg2) if (errno) die_errno(msg1); else die(msg2)
> 
> perhaps?

If you just care about dying, you can wrap the whole read_in_full() in a
helper (which is what my original patch 7 did). But I found there were
cases that didn't want to die. Of course _most_ of those don't bother
looking at errno in the first place. I think the only one that does is
index_core(), which calls error_errno().

So with two helpers (one for die and one for error), I think we could
cover the existing cases. Or we could convert the single error() case to
just handle the logic inline.

One of the reasons I dislike the helper I showed earlier is that it
involves assembling a lego sentence. But the alternative is requiring
the caller to write both sentences from scratch (and writing a good
sentence for the short read really is long: you expected X bytes but got
Y).

I guess just tweaking the errno value does not really give a good "short
read" error either. You get "unable to read foo: No data available",
without the X/Y information.

Sometimes a custom "short read" message can be better, if it's "could
not read enough bytes for packfile header" or similar. You don't need to
know the exact number then.

So I dunno. Maybe I have just talked myself into actually just fixing
all of the read_in_full() call sites manually.

-Peff


Darlehensangebot

2017-09-26 Thread Norton Finance Company



Guten Tag Sir / ma,

Wir Norton Finanzierung Unternehmen hier in Großbritannien gibt 
Darlehen an Kunden mit 2,4% Zinssatz.
und auch Geld für Investitionen und andere finanzielle Zwecke unabhängig von 
ihrer Lage, Geschlecht, Familienstand, Bildung, Job-Status, sondern muss ein 
gesetzliches Mittel zur Rückzahlung .. Unser Darlehen reicht von 5000 Euro, 
nok, chf, - 5.000.000,00 Euro, nok , chf, mit 2,4% Zinssatz.

  
Interessierte Personen, Firmen und Firmen. Bitte kontaktieren Sie uns unter 
unserer E-Mail:

nortonfinancecre...@gmail.com

Wenn Sie sich für ein Darlehen interessieren Füllen Sie aus und senden Sie das 
Darlehen Antragsformular sofort.

INFORMATIONEN BENÖTIGT

Deine Namen:
Adresse: ...
Telefon: ...
Benötigte Menge: ...
Dauer: ...
Beruf: ...
Monatliches Einkommensniveau: ..
Geschlecht: ..
Geburtsdatum: ...
Bundesland: ...
Land: .
Zweck des Darlehens oder Überprüfung des Projekts .

KONTAKTIERE UNS

nortonfinancecre...@gmail.com

Danke für Ihr Verständnis; Wir dienen für Ihren Komfort.


Mit freundlichen Grüßen,
Dr. Helen Mathew


Re: [PATCH 1/1] fast-import: checkpoint: dump branches/tags/marks even if object_count==0

2017-09-26 Thread Junio C Hamano
"Eric Rannaud"  writes:

> Any comments on the testing strategy with a background fast-import?
>
> To summarize:
> - fast-import is started in the background with a command stream that
>   ends with "checkpoint\nprogress checkpoint\n". fast-import keeps
>   running after reaching the last command (we don't want fast-import to
>   terminate).
> - In the meantime, the test is waiting to read "progress checkpoint" in
>   the output of fast-import, then it checks the testing conditions.
> - Finally, the test ensures that fast-import is still running in the
>   background (and thus that it has just observed the effect of
>   checkpoint, and not the side effects of fast-import terminating).
> - After 10 sec, no matter what, the background fast-import is sent
>   "done" and terminates.
>
> I tried to make sure that the test runs quickly and does not (typically) 
> sleep.
> Upon failure, the test may take up to 10 sec to fully terminate.

H, it certainly looks a bit too brittle with many tweakables
like 10, 50, 55, etc. that heavily depend on the load on the system.

Sorry for asking you to go through the hoops.

> diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
> index 67b8c50a5ab4..b410bf3a3a7a 100755
> --- a/t/t9300-fast-import.sh
> +++ b/t/t9300-fast-import.sh
> @@ -3120,4 +3120,121 @@ test_expect_success 'U: validate root delete result' '
>   compare_diff_raw expect actual
>  '
>  
> +###
> +### series V (checkpoint)
> +###
> +
> +# To make sure you're observing the side effects of checkpoint *before*
> +# fast-import terminates (and thus writes out its state), check that the
> +# fast-import process is still running using background_import_still_running
> +# *after* evaluating the test conditions.
> +background_import_until_checkpoint () {
> + options=$1
> + input_file=$2
> + ( cat $input_file; sleep 10; echo done ) | git fast-import $options 
> >V.output &
> + echo $! >V.pid
> +
> + # The loop will poll for approximately 5 seconds before giving up.
> + n=0
> + while ! test "$(cat V.output)" = "progress checkpoint"; do

Micronit.  Just like you have if/then on different lines, have
while/do on different lines, i.e.

while test "$(cat V.output)" != "progress checkpoint"
do

> + if test $n -gt 55
> + then
> +...

> +background_import_still_running () {
> + if ! ps --pid "$(cat V.pid)"

"ps --pid" is probably not portable (I think we'd do "kill -0 $pid"
instead in our existing tests---and it should be kosher according to
POSIX [*1*, *2*]).

> +test_expect_success 'V: checkpoint updates refs after reset' '
> + cat >input <<-INPUT_END &&

It is not wrong per-se but we quote INPUT_END when there is no
interpolation necessary in the body of here document to help
readers, like so:

cat >input <<-\INPUT_END &&

> + reset refs/heads/V
> + from refs/heads/U
> +
> + checkpoint
> + progress checkpoint
> + INPUT_END

> +test_expect_success 'V: checkpoint updates refs and marks after commit' '
> + cat >input <<-INPUT_END &&

This we do want interpolation and the above is correct.


[Footnotes]

*1* http://pubs.opengroup.org/onlinepubs/9699919799/utilities/kill.html 
lists '0' as an allowed signal number to be sent, and defers to the
description of the kill() function to explain what happens.

*2* http://pubs.opengroup.org/onlinepubs/9699919799/functions/kill.html
says """If sig is 0 (the null signal), error checking is
performed but no signal is actually sent. The null signal can be
used to check the validity of pid."""


REPLY NOW:

2017-09-26 Thread Mrs Mavis Wanczyk
Did you receive my previous message about my donation to you? for philanthropic 
and humanitarian work for all of mankind who are in great need among your 
friends, family and people around you.

REPLY FOR MORE DETAILS.
Regards

Mrs Mavis Wanczyk.


Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-26 Thread Junio C Hamano
Stefan Beller  writes:

> submodule..update can be assigned an arbitrary command via setting
> it to "!command". When this command is found in the regular config, Git
> ought to just run that command instead of other update mechanisms.
>
> However if that command is just found in the .gitmodules file, it is
> potentially untrusted, which is why we do not run it.  Add a test
> confirming the behavior.
>
> Suggested-by: Jonathan Nieder 
> Signed-off-by: Stefan Beller 
> ---

Earlier, we saw:

Ideally we want this test to be super robust: e.g. if it runs the
command but from a different directory, we still want the test to fail,
and if it runs the command but using exec instead of a shell, we still
want the test to fail.

and this one (i.e. signal that it is a command by prefixing with
'!', and then have a single command that would fail whether it is
run via run_command() with or without shell) would satisfy that
criteria, I would think.

>> This test for a missing file is certainly a remnant from the
>> previous iteration, isn't it?
>
> Yes. This is a good indicator I need some vacation.

Or just take a deep breath before making a knee-jerk reaction public
and instead double-check before sending things out ;-)

Will queue.  Thanks.

>
> Thanks,
> Stefan
>
>  t/t7406-submodule-update.sh | 8 
>  1 file changed, 8 insertions(+)
>
> diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
> index 034914a14f..6f083c4d68 100755
> --- a/t/t7406-submodule-update.sh
> +++ b/t/t7406-submodule-update.sh
> @@ -406,6 +406,14 @@ test_expect_success 'submodule update - command in 
> .git/config' '
>   )
>  '
>  
> +test_expect_success 'submodule update - command in .gitmodules is ignored' '
> + test_when_finished "git -C super reset --hard HEAD^" &&
> + git -C super config -f .gitmodules submodule.submodule.update "!false" 
> &&
> + git -C super commit -a -m "add command to .gitmodules file" &&
> + git -C super/submodule reset --hard $submodulesha1^ &&
> + git -C super submodule update submodule
> +'
> +
>  cat << EOF >expect
>  Execution of 'false $submodulesha1' failed in submodule path 'submodule'
>  EOF


Re: [PATCH 4/3] branch: fix "copy" to never touch HEAD

2017-09-26 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> I do think however that we also need to update the docs, the relevant
> origin/master...gitster/sd/branch-copy diff is currently:
>
> +The `-c` and `-C` options have the exact same semantics as `-m` and
> +`-M`, except instead of the branch being renamed it along with its
> +config and reflog will be copied to a new name.
>
> Suggestions welcome, but I think that should probably be changed to
> something like the following as part of this patch:
>
> The `-c` and `-C` options have the exact same semantics as `-m` and
> `-M`, except instead of the branch being renamed it along with its
> config and reflog will be copied to a new name. Furthermore, unlike
> its `-m` and `-M` cousins the `-c` and `-C` options will never move
> the HEAD, whereas the move option will do so if the source branch is
> the currently checked-out branch.

I do not think anybody even _imagines_ copy to move HEAD, and do not
think "unlike -m, it doesn't touch HEAD" a worthwhile thing to say.

It is '-m' whose behaviour may look strange wrt HEAD, so _that_ may
be worth mentioning, though.

I suspect that your reaction probably comes from being too married
to the idea that HEAD has a string that is the same as the refname
of the current branch.  But that is a mere implementation detail.
Users would think that HEAD points at the current branch and does
not even care how that pointing is implemented.

Conceptually, you can consider that each branch has its own
identity, separate from various traits it has, like what its
upstream branch is, what commit it points at, what its reflog says,
and (most notably) what its name is.

Then you can think of "branch -m old new" to be (1) finding the
instance of branch that currently has name 'old' and (2) updating
only one of its trait, namely, its name, without changing anything
else.  Creating a new instance of a branch by copying an existing
one, on the other hand, would then be the matter of (1) finding the
instance of branch with name 'old' and (2) creating another instance
of branch with the same traits as the original, modulo its name is
set to 'new'.

A necessary wrinkle for "branch -m" that falls out of the above
world model is that HEAD needs to be adjusted in order to keep
pointing at the same (renamed) instance of branch that now has an
updated name, if HEAD happened to be pointing at the instance of the
branch whose name trait has been updated.

So, from that point of view, I'd prefer to do something like the
attached patch instead.  I notice that "branch -m" does not mention
configuration variables carried over to the new branch, but I do not
necessarily think we want to exhaustively enumerate what traits are
carried over here, so perhaps it is OK as is.

 Documentation/git-branch.txt | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index fe029ac6fc..d425e3acd4 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -63,11 +63,13 @@ With a `-m` or `-M` option,  will be renamed to 
.
 If  had a corresponding reflog, it is renamed to match
 , and a reflog entry is created to remember the branch
 renaming. If  exists, -M must be used to force the rename
-to happen.
+to happen.  If you rename a branch that is currently checked out,
+`HEAD` is adjusted so that the branch (whose name is now
+) stays to be the current branch.
 
-The `-c` and `-C` options have the exact same semantics as `-m` and
-`-M`, except instead of the branch being renamed it along with its
-config and reflog will be copied to a new name.
+With a `-c` or`-C` option, a new branch  is created by
+copying the traits like the reflog contents and `branch.*.*`
+configuration from an existing .
 
 With a `-d` or `-D` option, `` will be deleted.  You may
 specify more than one branch for deletion.  If the branch currently


Re: git apply fails silently when on the wrong directory

2017-09-26 Thread Jeff King
On Tue, Sep 26, 2017 at 05:53:55PM -0700, Ernesto Alfonso wrote:

> I recently ran into a similar issue as described here:
> 
> https://stackoverflow.com/questions/24821431/git-apply-patch-fails-silently-no-errors-but-nothing-happens
> 
> I was using the alias:
> 
> alias ganw='git diff -U0 -w --no-color "$@" | git apply --cached 
> --ignore-whitespace --unidiff-zero -'
> 
> to stage non-whitespace changes, but I was not in the root directory and
> the changes were not being applied. I broke down the command to discover
> the 'git apply' part of the pipe was silently failing to apply anything,
> exiting 0 without even a warning.
> 
> The exit status and lack of warning is terribly misleading, I imagine
> this would be the cause of subtle bugs in automated scripts. 
> 
> Is this expected behaviour?

Yes, this is as documented in git-apply(1):

   Reads the supplied diff output (i.e. "a patch") and applies it to
   files. When running from a subdirectory in a repository, patched
   paths outside the directory are ignored.

I agree it's a potential source of confusion. There's some previous
discussion in this thread:

  
https://public-inbox.org/git/ca+dcaeqqqh59lb43y4bi1xktpnoodv11kkubbkng1oz7mdb...@mail.gmail.com/

-Peff


Re: [PATCH v2 6/9] connect: teach client to recognize v1 server response

2017-09-26 Thread Junio C Hamano
Brandon Williams  writes:

> +/* Returns 1 if packet_buffer is a protocol version pkt-line, 0 otherwise. */
> +static int process_protocol_version(void)
> +{
> + switch (determine_protocol_version_client(packet_buffer)) {
> + case protocol_v1:
> + return 1;
> + case protocol_v0:
> + return 0;
> + default:
> + die("server is speaking an unknown protocol");
> + }
> +}

checkpatch.pl yells at me:

ERROR: switch and case should be at the same indent

and we would probably want to teach "make style" the same, if we
already don't.


Re: git apply fails silently when on the wrong directory

2017-09-26 Thread Ernesto Alfonso
Also, has there been a feature request for a '-w' option to 'git add',
analogous to the same option in 'git diff'?

Ernesto Alfonso  writes:

> I recently ran into a similar issue as described here:
>
> https://stackoverflow.com/questions/24821431/git-apply-patch-fails-silently-no-errors-but-nothing-happens
>
> I was using the alias:
>
> alias ganw='git diff -U0 -w --no-color "$@" | git apply --cached 
> --ignore-whitespace --unidiff-zero -'
>
> to stage non-whitespace changes, but I was not in the root directory and
> the changes were not being applied. I broke down the command to discover
> the 'git apply' part of the pipe was silently failing to apply anything,
> exiting 0 without even a warning.
>
> The exit status and lack of warning is terribly misleading, I imagine
> this would be the cause of subtle bugs in automated scripts. 
>
> Is this expected behaviour?
>
> Thanks,
>
> Ernesto


git apply fails silently when on the wrong directory

2017-09-26 Thread Ernesto Alfonso
I recently ran into a similar issue as described here:

https://stackoverflow.com/questions/24821431/git-apply-patch-fails-silently-no-errors-but-nothing-happens

I was using the alias:

alias ganw='git diff -U0 -w --no-color "$@" | git apply --cached 
--ignore-whitespace --unidiff-zero -'

to stage non-whitespace changes, but I was not in the root directory and
the changes were not being applied. I broke down the command to discover
the 'git apply' part of the pipe was silently failing to apply anything,
exiting 0 without even a warning.

The exit status and lack of warning is terribly misleading, I imagine
this would be the cause of subtle bugs in automated scripts. 

Is this expected behaviour?

Thanks,

Ernesto


Re: [PATCH] submodule: correct error message for missing commits.

2017-09-26 Thread Junio C Hamano
Stefan Beller  writes:

> When a submodule diff should be displayed we currently just add the
> submodule objects to the main object store and then e.g. walk the
> revision graph and create a summary for that submodule.
>
> It is possible that we are missing the submodule either completely or
> partially, which we currently differentiate with different error messages
> depending on whether (1) the whole submodule object store is missing or
> (2) just the needed for this particular diff. (1) is reported as
> "not initialized", and (2) is reported as "commits not present".
>
> If a submodule is deinit'ed its repository data is still around inside
> the superproject, such that the diff can still be produced. In that way
> the error message (1) is misleading as we can have a diff despite the
> submodule being not initialized.
>
> Downgrade the error message (1) to be the same as (2) and just say
> the commits are not present, as that is the true reason why the diff
> cannot be shown.
>
> Signed-off-by: Stefan Beller 
> ---

Sounds good.  Thanks for working on it.



Re: [PATCH 0/3] Comment fixes

2017-09-26 Thread Junio C Hamano
Han-Wen Nienhuys  writes:

> follow more commit log conventions; verified it compiled (yay).

;-)

> (should I send patches that are in 'pu' again as well?)

Because these look more or less independent changes that can be
queued on separate topics, I do not think resending is needed at
all, even though mentioning that these do *not* replace what is
queued like you just did is an excellent way to prevent the
maintainer from making stupid mistakes.  Thanks.

> Han-Wen Nienhuys (3):
>   real_path: clarify return value ownership
>   read_gitfile_gently: clarify return value ownership.
>   string-list.h: move documentation from Documentation/api/ into header
>
>  Documentation/technical/api-string-list.txt | 209 
> 
>  abspath.c   |   4 +
>  setup.c |   3 +-
>  string-list.h   | 192 +
>  4 files changed, 168 insertions(+), 240 deletions(-)
>  delete mode 100644 Documentation/technical/api-string-list.txt
>
> --
> 2.14.1.821.g8fa685d3b7-goog


Re: -s theirs use-case(s) Was: BUG: merge -s theirs is not in effect

2017-09-26 Thread Junio C Hamano
Yaroslav Halchenko  writes:

> and that is where the gotcha comes -- what if "my" changes were already
> published?  then I would like to avoid the rebase, and would -s theirs
> to choose "their" solution in favor of mine and be able to push so
> others could still "fast-forward" to the new state.
>
> So -- as to me it remains 'symmetric' ;)

I do not necessarily agree.  Once you decide that their history is
the mainline, you'd rather want to treat your line of development as
a side branch and make a merge in that direction, i.e. the first
parent of the resulting merge is a commit on their history and the
second parent is the last bad one of your history.  So you would end
up using "checkout their-history && merge -s ours your-history" to
keep the first-parenthood sensible.

And at that point, use of "-s ours" is no longer a workaround for
lack of "-s theirs".  It is a proper part of the desired semantics,
i.e. from the point of view of the surviving canonical history line,
you want to preserve what it did, nullifying what the other line of
history did.

So I still do not think the above scenario justifies "-s theirs".


Re: [PATCH v2 0/9] Teach 'run' perf script to read config files

2017-09-26 Thread Junio C Hamano
Roberto Tyley  writes:

> I had a quick look at git-send-email.perl, I see the trick is the `time++` one
> introduced with https://github.com/git/git/commit/a5370b16 - seems reasonable!
>
> SubmitGit makes all emails in-reply-to the initial email, which I
> think is correct behaviour,
> but I can see that offsetting the times would probably give a more
> reliable sorting in
> a newsreader. ...
> ...so the only way SubmitGit can offset the times is to literally
> delay the sending of the emails,
> which is a bit unfortunate for patchbombs more than a few dozen commits long!

As this matters mostly for a series that is longer than several
patches, it indeed is unfortunate.  If SubmitGit needs to send and
sleep for a dozen messages, does it mean the end user has to wait 12
(or is it 11? ;-)) seconds?  If the human is the only thing that
needs waiting, it may be OK (after all, it all happens in the web
browser and the human can switch to other tasks after clicking
"submit"), but that may not be acceptable if this artificial delay
can cause a timeout in a moving part among many.

Thanks for looking into this.  


[PATCH v2 0/9] protocol transition

2017-09-26 Thread Brandon Williams
v2 of this series has the following changes:

 * Included Jonathan Tan's patch as [1/9] with a small tweak to not rely on
   hardcoding the side of sha1 into the capability line check.
 * Reworked some of the logic in daemon.c
 * Added the word 'Experimental' to protocol.version to indicate that user's
   shouldn't rely on its semantics to hold until it has been thoroughly tested.

Brandon Williams (8):
  pkt-line: add packet_write function
  protocol: introduce protocol extention mechanisms
  daemon: recognize hidden request arguments
  upload-pack, receive-pack: introduce protocol version 1
  connect: teach client to recognize v1 server response
  connect: tell server that the client understands v1
  http: tell server that the client understands v1
  i5700: add interop test for protocol transition

Jonathan Tan (1):
  connect: in ref advertisement, shallows are last

 Documentation/config.txt   |  17 ++
 Documentation/git.txt  |   5 +
 Makefile   |   1 +
 builtin/receive-pack.c |  14 ++
 cache.h|   9 +
 connect.c  | 248 
 daemon.c   |  68 +++-
 http.c |  18 ++
 pkt-line.c |   6 +
 pkt-line.h |   1 +
 protocol.c |  72 
 protocol.h |  14 ++
 t/interop/i5700-protocol-transition.sh |  68 
 t/lib-httpd/apache.conf|   7 +
 t/t5700-protocol-v1.sh | 292 +
 upload-pack.c  |  17 +-
 16 files changed, 776 insertions(+), 81 deletions(-)
 create mode 100644 protocol.c
 create mode 100644 protocol.h
 create mode 100755 t/interop/i5700-protocol-transition.sh
 create mode 100755 t/t5700-protocol-v1.sh

-- 
2.14.1.992.g2c7b836f3a-goog



[PATCH v2 5/9] upload-pack, receive-pack: introduce protocol version 1

2017-09-26 Thread Brandon Williams
Teach upload-pack and receive-pack to understand and respond using
protocol version 1, if requested.

Protocol version 1 is simply the original and current protocol (what I'm
calling version 0) with the addition of a single packet line, which
precedes the ref advertisement, indicating the protocol version being
spoken.

Signed-off-by: Brandon Williams 
---
 builtin/receive-pack.c | 14 ++
 upload-pack.c  | 17 -
 2 files changed, 30 insertions(+), 1 deletion(-)

diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index dd06b3fb4..cb179367b 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -24,6 +24,7 @@
 #include "tmp-objdir.h"
 #include "oidset.h"
 #include "packfile.h"
+#include "protocol.h"
 
 static const char * const receive_pack_usage[] = {
N_("git receive-pack "),
@@ -1963,6 +1964,19 @@ int cmd_receive_pack(int argc, const char **argv, const 
char *prefix)
else if (0 <= receive_unpack_limit)
unpack_limit = receive_unpack_limit;
 
+   switch (determine_protocol_version_server()) {
+   case protocol_v1:
+   if (advertise_refs || !stateless_rpc)
+   packet_write_fmt(1, "version 1\n");
+   /*
+* v1 is just the original protocol with a version string,
+* so just fall through after writing the version string.
+*/
+   case protocol_v0:
+   default:
+   break;
+   }
+
if (advertise_refs || !stateless_rpc) {
write_head_info();
}
diff --git a/upload-pack.c b/upload-pack.c
index 7efff2fbf..5cab39819 100644
--- a/upload-pack.c
+++ b/upload-pack.c
@@ -18,6 +18,7 @@
 #include "parse-options.h"
 #include "argv-array.h"
 #include "prio-queue.h"
+#include "protocol.h"
 
 static const char * const upload_pack_usage[] = {
N_("git upload-pack [] "),
@@ -1067,6 +1068,20 @@ int cmd_main(int argc, const char **argv)
die("'%s' does not appear to be a git repository", dir);
 
git_config(upload_pack_config, NULL);
-   upload_pack();
+
+   switch (determine_protocol_version_server()) {
+   case protocol_v1:
+   if (advertise_refs || !stateless_rpc)
+   packet_write_fmt(1, "version 1\n");
+   /*
+* v1 is just the original protocol with a version string,
+* so just fall through after writing the version string.
+*/
+   case protocol_v0:
+   default:
+   upload_pack();
+   break;
+   }
+
return 0;
 }
-- 
2.14.1.992.g2c7b836f3a-goog



[PATCH v2 8/9] http: tell server that the client understands v1

2017-09-26 Thread Brandon Williams
Tell a server that protocol v1 can be used by sending the http header
'Git-Protocol' indicating this.

Also teach the apache http server to pass through the 'Git-Protocol'
header as an environment variable 'GIT_PROTOCOL'.

Signed-off-by: Brandon Williams 
---
 cache.h |  2 ++
 http.c  | 18 +
 t/lib-httpd/apache.conf |  7 +
 t/t5700-protocol-v1.sh  | 69 +
 4 files changed, 96 insertions(+)

diff --git a/cache.h b/cache.h
index 0c792545f..aaadac1f0 100644
--- a/cache.h
+++ b/cache.h
@@ -451,6 +451,8 @@ static inline enum object_type object_type(unsigned int 
mode)
  * 'key[=value]'.  Presence of unknown keys must be tolerated.
  */
 #define GIT_PROTOCOL_ENVIRONMENT "GIT_PROTOCOL"
+/* HTTP header used to handshake the wire protocol */
+#define GIT_PROTOCOL_HEADER "Git-Protocol"
 
 /*
  * This environment variable is expected to contain a boolean indicating
diff --git a/http.c b/http.c
index 9e40a465f..ffb719216 100644
--- a/http.c
+++ b/http.c
@@ -12,6 +12,7 @@
 #include "gettext.h"
 #include "transport.h"
 #include "packfile.h"
+#include "protocol.h"
 
 static struct trace_key trace_curl = TRACE_KEY_INIT(CURL);
 #if LIBCURL_VERSION_NUM >= 0x070a08
@@ -897,6 +898,21 @@ static void set_from_env(const char **var, const char 
*envname)
*var = val;
 }
 
+static void protocol_http_header(void)
+{
+   if (get_protocol_version_config() > 0) {
+   struct strbuf protocol_header = STRBUF_INIT;
+
+   strbuf_addf(_header, GIT_PROTOCOL_HEADER ": 
version=%d",
+   get_protocol_version_config());
+
+
+   extra_http_headers = curl_slist_append(extra_http_headers,
+  protocol_header.buf);
+   strbuf_release(_header);
+   }
+}
+
 void http_init(struct remote *remote, const char *url, int proactive_auth)
 {
char *low_speed_limit;
@@ -927,6 +943,8 @@ void http_init(struct remote *remote, const char *url, int 
proactive_auth)
if (remote)
var_override(_proxy_authmethod, 
remote->http_proxy_authmethod);
 
+   protocol_http_header();
+
pragma_header = curl_slist_append(http_copy_default_headers(),
"Pragma: no-cache");
no_pragma_header = curl_slist_append(http_copy_default_headers(),
diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf
index 0642ae7e6..df1943631 100644
--- a/t/lib-httpd/apache.conf
+++ b/t/lib-httpd/apache.conf
@@ -67,6 +67,9 @@ LockFile accept.lock
 
LoadModule unixd_module modules/mod_unixd.so
 
+
+   LoadModule setenvif_module modules/mod_setenvif.so
+
 
 
 PassEnv GIT_VALGRIND
@@ -76,6 +79,10 @@ PassEnv ASAN_OPTIONS
 PassEnv GIT_TRACE
 PassEnv GIT_CONFIG_NOSYSTEM
 
+= 2.4>
+   SetEnvIf Git-Protocol ".*" GIT_PROTOCOL=$0
+
+
 Alias /dumb/ www/
 Alias /auth/dumb/ www/auth/dumb/
 
diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
index 1988bbce6..65127 100755
--- a/t/t5700-protocol-v1.sh
+++ b/t/t5700-protocol-v1.sh
@@ -220,4 +220,73 @@ test_expect_success 'push with ssh:// using protocol v1' '
grep "push< version 1" log
 '
 
+# Test protocol v1 with 'http://' transport
+#
+. "$TEST_DIRECTORY"/lib-httpd.sh
+start_httpd
+
+test_expect_success 'create repo to be served by http:// transport' '
+   git init "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" &&
+   git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" config http.receivepack 
true &&
+   test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" one
+'
+
+test_expect_success 'clone with http:// using protocol v1' '
+   GIT_TRACE_PACKET=1 GIT_TRACE_CURL=1 git -c protocol.version=1 \
+   clone "$HTTPD_URL/smart/http_parent" http_child 2>log &&
+
+   git -C http_child log -1 --format=%s >actual &&
+   git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" log -1 --format=%s 
>expect &&
+   test_cmp expect actual &&
+
+   # Client requested to use protocol v1
+   grep "Git-Protocol: version=1" log &&
+   # Server responded using protocol v1
+   grep "git< version 1" log
+'
+
+test_expect_success 'fetch with http:// using protocol v1' '
+   test_commit -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" two &&
+
+   GIT_TRACE_PACKET=1 git -C http_child -c protocol.version=1 \
+   fetch 2>log &&
+
+   git -C http_child log -1 --format=%s FETCH_HEAD >actual &&
+   git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" log -1 --format=%s 
>expect &&
+   test_cmp expect actual &&
+
+   # Server responded using protocol v1
+   grep "git< version 1" log
+'
+
+test_expect_success 'pull with http:// using protocol v1' '
+   GIT_TRACE_PACKET=1 git -C http_child -c protocol.version=1 \
+   pull 2>log &&
+
+   git -C http_child log -1 --format=%s >actual &&
+   git -C "$HTTPD_DOCUMENT_ROOT_PATH/http_parent" log -1 --format=%s 

[PATCH v2 3/9] protocol: introduce protocol extention mechanisms

2017-09-26 Thread Brandon Williams
Create protocol.{c,h} and provide functions which future servers and
clients can use to determine which protocol to use or is being used.

Also introduce the 'GIT_PROTOCOL' environment variable which will be
used to communicate a colon separated list of keys with optional values
to a server.  Unknown keys and values must be tolerated.  This mechanism
is used to communicate which version of the wire protocol a client would
like to use with a server.

Signed-off-by: Brandon Williams 
---
 Documentation/config.txt | 17 
 Documentation/git.txt|  5 
 Makefile |  1 +
 cache.h  |  7 +
 protocol.c   | 72 
 protocol.h   | 14 ++
 6 files changed, 116 insertions(+)
 create mode 100644 protocol.c
 create mode 100644 protocol.h

diff --git a/Documentation/config.txt b/Documentation/config.txt
index dc4e3f58a..b78747abc 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2517,6 +2517,23 @@ The protocol names currently used by git are:
 `hg` to allow the `git-remote-hg` helper)
 --
 
+protocol.version::
+   Experimental. If set, clients will attempt to communicate with a
+   server using the specified protocol version.  If unset, no
+   attempt will be made by the client to communicate using a
+   particular protocol version, this results in protocol version 0
+   being used.
+   Supported versions:
++
+--
+
+* `0` - the original wire protocol.
+
+* `1` - the original wire protocol with the addition of a version string
+  in the initial response from the server.
+
+--
+
 pull.ff::
By default, Git does not create an extra merge commit when merging
a commit that is a descendant of the current commit. Instead, the
diff --git a/Documentation/git.txt b/Documentation/git.txt
index 6e3a6767e..299f75c7b 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -697,6 +697,11 @@ of clones and fetches.
which feed potentially-untrusted URLS to git commands.  See
linkgit:git-config[1] for more details.
 
+`GIT_PROTOCOL`::
+   For internal use only.  Used in handshaking the wire protocol.
+   Contains a colon ':' separated list of keys with optional values
+   'key[=value]'.  Presence of unknown keys must be tolerated.
+
 Discussion[[Discussion]]
 
 
diff --git a/Makefile b/Makefile
index ed4ca438b..9ce68cded 100644
--- a/Makefile
+++ b/Makefile
@@ -842,6 +842,7 @@ LIB_OBJS += pretty.o
 LIB_OBJS += prio-queue.o
 LIB_OBJS += progress.o
 LIB_OBJS += prompt.o
+LIB_OBJS += protocol.o
 LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
diff --git a/cache.h b/cache.h
index 49b083ee0..0c792545f 100644
--- a/cache.h
+++ b/cache.h
@@ -445,6 +445,13 @@ static inline enum object_type object_type(unsigned int 
mode)
 #define GIT_ICASE_PATHSPECS_ENVIRONMENT "GIT_ICASE_PATHSPECS"
 #define GIT_QUARANTINE_ENVIRONMENT "GIT_QUARANTINE_PATH"
 
+/*
+ * Environment variable used in handshaking the wire protocol.
+ * Contains a colon ':' separated list of keys with optional values
+ * 'key[=value]'.  Presence of unknown keys must be tolerated.
+ */
+#define GIT_PROTOCOL_ENVIRONMENT "GIT_PROTOCOL"
+
 /*
  * This environment variable is expected to contain a boolean indicating
  * whether we should or should not treat:
diff --git a/protocol.c b/protocol.c
new file mode 100644
index 0..369503065
--- /dev/null
+++ b/protocol.c
@@ -0,0 +1,72 @@
+#include "cache.h"
+#include "config.h"
+#include "protocol.h"
+
+static enum protocol_version parse_protocol_version(const char *value)
+{
+   if (!strcmp(value, "0"))
+   return protocol_v0;
+   else if (!strcmp(value, "1"))
+   return protocol_v1;
+   else
+   return protocol_unknown_version;
+}
+
+enum protocol_version get_protocol_version_config(void)
+{
+   const char *value;
+   if (!git_config_get_string_const("protocol.version", )) {
+   enum protocol_version version = parse_protocol_version(value);
+
+   if (version == protocol_unknown_version)
+   die("unknown value for config 'protocol.version': %s",
+   value);
+
+   return version;
+   }
+
+   return protocol_v0;
+}
+
+enum protocol_version determine_protocol_version_server(void)
+{
+   const char *git_protocol = getenv(GIT_PROTOCOL_ENVIRONMENT);
+   enum protocol_version version = protocol_v0;
+
+   if (git_protocol) {
+   struct string_list list = STRING_LIST_INIT_DUP;
+   const struct string_list_item *item;
+   string_list_split(, git_protocol, ':', -1);
+
+   for_each_string_list_item(item, ) {
+   const char *value;
+   enum protocol_version v;
+
+   if (skip_prefix(item->string, 

[PATCH v2 2/9] pkt-line: add packet_write function

2017-09-26 Thread Brandon Williams
Add a function which can be used to write the contents of an arbitrary
buffer.  This makes it easy to build up data in a buffer before writing
the packet instead of formatting the entire contents of the packet using
'packet_write_fmt()'.

Signed-off-by: Brandon Williams 
---
 pkt-line.c | 6 ++
 pkt-line.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/pkt-line.c b/pkt-line.c
index 647bbd3bc..c025d0332 100644
--- a/pkt-line.c
+++ b/pkt-line.c
@@ -188,6 +188,12 @@ static int packet_write_gently(const int fd_out, const 
char *buf, size_t size)
return 0;
 }
 
+void packet_write(const int fd_out, const char *buf, size_t size)
+{
+   if (packet_write_gently(fd_out, buf, size))
+   die_errno("packet write failed");
+}
+
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...)
 {
va_list args;
diff --git a/pkt-line.h b/pkt-line.h
index 66ef610fc..d9e9783b1 100644
--- a/pkt-line.h
+++ b/pkt-line.h
@@ -22,6 +22,7 @@
 void packet_flush(int fd);
 void packet_write_fmt(int fd, const char *fmt, ...) __attribute__((format 
(printf, 2, 3)));
 void packet_buf_flush(struct strbuf *buf);
+void packet_write(const int fd_out, const char *buf, size_t size);
 void packet_buf_write(struct strbuf *buf, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
 int packet_flush_gently(int fd);
 int packet_write_fmt_gently(int fd, const char *fmt, ...) 
__attribute__((format (printf, 2, 3)));
-- 
2.14.1.992.g2c7b836f3a-goog



[PATCH v2 1/9] connect: in ref advertisement, shallows are last

2017-09-26 Thread Brandon Williams
From: Jonathan Tan 

Currently, get_remote_heads() parses the ref advertisement in one loop,
allowing refs and shallow lines to intersperse, despite this not being
allowed by the specification. Refactor get_remote_heads() to use two
loops instead, enforcing that refs come first, and then shallows.

This also makes it easier to teach get_remote_heads() to interpret other
lines in the ref advertisement, which will be done in a subsequent
patch.

As part of this change, this patch interprets capabilities only on the
first line in the ref advertisement, printing a warning message when
encountering capabilities on other lines.

Signed-off-by: Jonathan Tan 
Signed-off-by: Brandon Williams 
---
 connect.c | 189 --
 1 file changed, 123 insertions(+), 66 deletions(-)

diff --git a/connect.c b/connect.c
index df56c0cbf..8e2e276b6 100644
--- a/connect.c
+++ b/connect.c
@@ -11,6 +11,7 @@
 #include "string-list.h"
 #include "sha1-array.h"
 #include "transport.h"
+#include "strbuf.h"
 
 static char *server_capabilities;
 static const char *parse_feature_value(const char *, const char *, int *);
@@ -107,6 +108,104 @@ static void annotate_refs_with_symref_info(struct ref 
*ref)
string_list_clear(, 0);
 }
 
+/*
+ * Read one line of a server's ref advertisement into packet_buffer.
+ */
+static int read_remote_ref(int in, char **src_buf, size_t *src_len,
+  int *responded)
+{
+   int len = packet_read(in, src_buf, src_len,
+ packet_buffer, sizeof(packet_buffer),
+ PACKET_READ_GENTLE_ON_EOF |
+ PACKET_READ_CHOMP_NEWLINE);
+   const char *arg;
+   if (len < 0)
+   die_initial_contact(*responded);
+   if (len > 4 && skip_prefix(packet_buffer, "ERR ", ))
+   die("remote error: %s", arg);
+
+   *responded = 1;
+
+   return len;
+}
+
+#define EXPECTING_FIRST_REF 0
+#define EXPECTING_REF 1
+#define EXPECTING_SHALLOW 2
+
+static void process_capabilities(int *len)
+{
+   int nul_location = strlen(packet_buffer);
+   if (nul_location == *len)
+   return;
+   server_capabilities = xstrdup(packet_buffer + nul_location + 1);
+   *len = nul_location;
+}
+
+static int process_dummy_ref(void)
+{
+   struct object_id oid;
+   const char *name;
+
+   if (parse_oid_hex(packet_buffer, , ))
+   return 0;
+   if (*name != ' ')
+   return 0;
+   name++;
+
+   return !oidcmp(_oid, ) && !strcmp(name, "capabilities^{}");
+}
+
+static void check_no_capabilities(int len)
+{
+   if (strlen(packet_buffer) != len)
+   warning("Ignoring capabilities after first line '%s'",
+   packet_buffer + strlen(packet_buffer));
+}
+
+static int process_ref(int len, struct ref ***list, unsigned int flags,
+  struct oid_array *extra_have)
+{
+   struct object_id old_oid;
+   const char *name;
+
+   if (parse_oid_hex(packet_buffer, _oid, ))
+   return 0;
+   if (*name != ' ')
+   return 0;
+   name++;
+
+   if (extra_have && !strcmp(name, ".have")) {
+   oid_array_append(extra_have, _oid);
+   } else if (!strcmp(name, "capabilities^{}")) {
+   die("protocol error: unexpected capabilities^{}");
+   } else if (check_ref(name, flags)) {
+   struct ref *ref = alloc_ref(name);
+   oidcpy(>old_oid, _oid);
+   **list = ref;
+   *list = >next;
+   }
+   check_no_capabilities(len);
+   return 1;
+}
+
+static int process_shallow(int len, struct oid_array *shallow_points)
+{
+   const char *arg;
+   struct object_id old_oid;
+
+   if (!skip_prefix(packet_buffer, "shallow ", ))
+   return 0;
+
+   if (get_oid_hex(arg, _oid))
+   die("protocol error: expected shallow sha-1, got '%s'", arg);
+   if (!shallow_points)
+   die("repository on the other end cannot be shallow");
+   oid_array_append(shallow_points, _oid);
+   check_no_capabilities(len);
+   return 1;
+}
+
 /*
  * Read all the refs from the other end
  */
@@ -123,76 +222,34 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
 * willing to talk to us.  A hang-up before seeing any
 * response does not necessarily mean an ACL problem, though.
 */
-   int saw_response;
-   int got_dummy_ref_with_capabilities_declaration = 0;
+   int responded = 0;
+   int len;
+   int state = EXPECTING_FIRST_REF;
 
*list = NULL;
-   for (saw_response = 0; ; saw_response = 1) {
-   struct ref *ref;
-   struct object_id old_oid;
-   char *name;
-   int len, name_len;
-   char *buffer = 

[PATCH v2 7/9] connect: tell server that the client understands v1

2017-09-26 Thread Brandon Williams
Teach the connection logic to tell a serve that it understands protocol
v1.  This is done in 2 different ways for the built in protocols.

1. git://
   A normal request is structured as "command path/to/repo\0host=..\0"
   and due to a bug in an old version of git-daemon 73bb33a94 (daemon:
   Strictly parse the "extra arg" part of the command, 2009-06-04) we
   aren't able to place any extra args (separated by NULs) besides the
   host.  In order to get around this limitation put protocol version
   information after a second NUL byte so the request is structured
   like: "command path/to/repo\0host=..\0\0version=1\0".  git-daemon can
   then parse out the version number and set GIT_PROTOCOL.

2. ssh://, file://
   Set GIT_PROTOCOL envvar with the desired protocol version.  The
   envvar can be sent across ssh by using '-o SendEnv=GIT_PROTOCOL' and
   having the server whitelist this envvar.

Signed-off-by: Brandon Williams 
---
 connect.c  |  37 ++--
 t/t5700-protocol-v1.sh | 223 +
 2 files changed, 255 insertions(+), 5 deletions(-)
 create mode 100755 t/t5700-protocol-v1.sh

diff --git a/connect.c b/connect.c
index 1805debf3..12ebab724 100644
--- a/connect.c
+++ b/connect.c
@@ -871,6 +871,7 @@ struct child_process *git_connect(int fd[2], const char 
*url,
printf("Diag: path=%s\n", path ? path : "NULL");
conn = NULL;
} else if (protocol == PROTO_GIT) {
+   struct strbuf request = STRBUF_INIT;
/*
 * Set up virtual host information based on where we will
 * connect, unless the user has overridden us in
@@ -898,13 +899,25 @@ struct child_process *git_connect(int fd[2], const char 
*url,
 * Note: Do not add any other headers here!  Doing so
 * will cause older git-daemon servers to crash.
 */
-   packet_write_fmt(fd[1],
-"%s %s%chost=%s%c",
-prog, path, 0,
-target_host, 0);
+   strbuf_addf(,
+   "%s %s%chost=%s%c",
+   prog, path, 0,
+   target_host, 0);
+
+   /* If using a new version put that stuff here after a second 
null byte */
+   if (get_protocol_version_config() > 0) {
+   strbuf_addch(, '\0');
+   strbuf_addf(, "version=%d%c",
+   get_protocol_version_config(), '\0');
+   }
+
+   packet_write(fd[1], request.buf, request.len);
+
free(target_host);
+   strbuf_release();
} else {
struct strbuf cmd = STRBUF_INIT;
+   const char *const *var;
 
conn = xmalloc(sizeof(*conn));
child_process_init(conn);
@@ -917,7 +930,9 @@ struct child_process *git_connect(int fd[2], const char 
*url,
sq_quote_buf(, path);
 
/* remove repo-local variables from the environment */
-   conn->env = local_repo_env;
+   for (var = local_repo_env; *var; var++)
+   argv_array_push(>env_array, *var);
+
conn->use_shell = 1;
conn->in = conn->out = -1;
if (protocol == PROTO_SSH) {
@@ -971,6 +986,14 @@ struct child_process *git_connect(int fd[2], const char 
*url,
}
 
argv_array_push(>args, ssh);
+
+   if (get_protocol_version_config() > 0) {
+   argv_array_push(>args, "-o");
+   argv_array_push(>args, "SendEnv=" 
GIT_PROTOCOL_ENVIRONMENT);
+   argv_array_pushf(>env_array, 
GIT_PROTOCOL_ENVIRONMENT "=version=%d",
+get_protocol_version_config());
+   }
+
if (flags & CONNECT_IPV4)
argv_array_push(>args, "-4");
else if (flags & CONNECT_IPV6)
@@ -985,6 +1008,10 @@ struct child_process *git_connect(int fd[2], const char 
*url,
argv_array_push(>args, ssh_host);
} else {
transport_check_allowed("file");
+   if (get_protocol_version_config() > 0) {
+   argv_array_pushf(>env_array, 
GIT_PROTOCOL_ENVIRONMENT "=version=%d",
+get_protocol_version_config());
+   }
}
argv_array_push(>args, cmd.buf);
 
diff --git a/t/t5700-protocol-v1.sh b/t/t5700-protocol-v1.sh
new file mode 100755
index 0..1988bbce6
--- /dev/null
+++ b/t/t5700-protocol-v1.sh
@@ -0,0 +1,223 @@
+#!/bin/sh
+
+test_description='test git 

[PATCH v2 9/9] i5700: add interop test for protocol transition

2017-09-26 Thread Brandon Williams
Signed-off-by: Brandon Williams 
---
 t/interop/i5700-protocol-transition.sh | 68 ++
 1 file changed, 68 insertions(+)
 create mode 100755 t/interop/i5700-protocol-transition.sh

diff --git a/t/interop/i5700-protocol-transition.sh 
b/t/interop/i5700-protocol-transition.sh
new file mode 100755
index 0..9e83428a8
--- /dev/null
+++ b/t/interop/i5700-protocol-transition.sh
@@ -0,0 +1,68 @@
+#!/bin/sh
+
+VERSION_A=.
+VERSION_B=v2.0.0
+
+: ${LIB_GIT_DAEMON_PORT:=5600}
+LIB_GIT_DAEMON_COMMAND='git.b daemon'
+
+test_description='clone and fetch by client who is trying to use a new 
protocol'
+. ./interop-lib.sh
+. "$TEST_DIRECTORY"/lib-git-daemon.sh
+
+start_git_daemon --export-all
+
+repo=$GIT_DAEMON_DOCUMENT_ROOT_PATH/repo
+
+test_expect_success "create repo served by $VERSION_B" '
+   git.b init "$repo" &&
+   git.b -C "$repo" commit --allow-empty -m one
+'
+
+test_expect_success "git:// clone with $VERSION_A and protocol v1" '
+   GIT_TRACE_PACKET=1 git.a -c protocol.version=1 clone 
"$GIT_DAEMON_URL/repo" child 2>log &&
+   git.a -C child log -1 --format=%s >actual &&
+   git.b -C "$repo" log -1 --format=%s >expect &&
+   test_cmp expect actual &&
+   grep "version=1" log
+'
+
+test_expect_success "git:// fetch with $VERSION_A and protocol v1" '
+   git.b -C "$repo" commit --allow-empty -m two &&
+   git.b -C "$repo" log -1 --format=%s >expect &&
+
+   GIT_TRACE_PACKET=1 git.a -C child -c protocol.version=1 fetch 2>log &&
+   git.a -C child log -1 --format=%s FETCH_HEAD >actual &&
+
+   test_cmp expect actual &&
+   grep "version=1" log &&
+   ! grep "version 1" log
+'
+
+stop_git_daemon
+
+test_expect_success "create repo served by $VERSION_B" '
+   git.b init parent &&
+   git.b -C parent commit --allow-empty -m one
+'
+
+test_expect_success "file:// clone with $VERSION_A and protocol v1" '
+   GIT_TRACE_PACKET=1 git.a -c protocol.version=1 clone 
--upload-pack="git.b upload-pack" parent child2 2>log &&
+   git.a -C child2 log -1 --format=%s >actual &&
+   git.b -C parent log -1 --format=%s >expect &&
+   test_cmp expect actual &&
+   ! grep "version 1" log
+'
+
+test_expect_success "file:// fetch with $VERSION_A and protocol v1" '
+   git.b -C parent commit --allow-empty -m two &&
+   git.b -C parent log -1 --format=%s >expect &&
+
+   GIT_TRACE_PACKET=1 git.a -C child2 -c protocol.version=1 fetch 
--upload-pack="git.b upload-pack" 2>log &&
+   git.a -C child2 log -1 --format=%s FETCH_HEAD >actual &&
+
+   test_cmp expect actual &&
+   ! grep "version 1" log
+'
+
+test_done
-- 
2.14.1.992.g2c7b836f3a-goog



[PATCH v2 4/9] daemon: recognize hidden request arguments

2017-09-26 Thread Brandon Williams
A normal request to git-daemon is structured as
"command path/to/repo\0host=..\0" and due to a bug in an old version of
git-daemon 73bb33a94 (daemon: Strictly parse the "extra arg" part of the
command, 2009-06-04) we aren't able to place any extra args (separated
by NULs) besides the host.

In order to get around this limitation teach git-daemon to recognize
additional request arguments hidden behind a second NUL byte.  Requests
can then be structured like:
"command path/to/repo\0host=..\0\0version=1\0key=value\0".  git-daemon
can then parse out the extra arguments and set 'GIT_PROTOCOL'
accordingly.

Signed-off-by: Brandon Williams 
---
 daemon.c | 68 +++-
 1 file changed, 59 insertions(+), 9 deletions(-)

diff --git a/daemon.c b/daemon.c
index 30747075f..36cc794c9 100644
--- a/daemon.c
+++ b/daemon.c
@@ -282,7 +282,7 @@ static const char *path_ok(const char *directory, struct 
hostinfo *hi)
return NULL;/* Fallthrough. Deny by default */
 }
 
-typedef int (*daemon_service_fn)(void);
+typedef int (*daemon_service_fn)(const struct argv_array *env);
 struct daemon_service {
const char *name;
const char *config_name;
@@ -363,7 +363,7 @@ static int run_access_hook(struct daemon_service *service, 
const char *dir,
 }
 
 static int run_service(const char *dir, struct daemon_service *service,
-  struct hostinfo *hi)
+  struct hostinfo *hi, const struct argv_array *env)
 {
const char *path;
int enabled = service->enabled;
@@ -422,7 +422,7 @@ static int run_service(const char *dir, struct 
daemon_service *service,
 */
signal(SIGTERM, SIG_IGN);
 
-   return service->fn();
+   return service->fn(env);
 }
 
 static void copy_to_log(int fd)
@@ -462,25 +462,34 @@ static int run_service_command(struct child_process *cld)
return finish_command(cld);
 }
 
-static int upload_pack(void)
+static int upload_pack(const struct argv_array *env)
 {
struct child_process cld = CHILD_PROCESS_INIT;
argv_array_pushl(, "upload-pack", "--strict", NULL);
argv_array_pushf(, "--timeout=%u", timeout);
+
+   argv_array_pushv(_array, env->argv);
+
return run_service_command();
 }
 
-static int upload_archive(void)
+static int upload_archive(const struct argv_array *env)
 {
struct child_process cld = CHILD_PROCESS_INIT;
argv_array_push(, "upload-archive");
+
+   argv_array_pushv(_array, env->argv);
+
return run_service_command();
 }
 
-static int receive_pack(void)
+static int receive_pack(const struct argv_array *env)
 {
struct child_process cld = CHILD_PROCESS_INIT;
argv_array_push(, "receive-pack");
+
+   argv_array_pushv(_array, env->argv);
+
return run_service_command();
 }
 
@@ -574,7 +583,7 @@ static void canonicalize_client(struct strbuf *out, const 
char *in)
 /*
  * Read the host as supplied by the client connection.
  */
-static void parse_host_arg(struct hostinfo *hi, char *extra_args, int buflen)
+static char *parse_host_arg(struct hostinfo *hi, char *extra_args, int buflen)
 {
char *val;
int vallen;
@@ -602,6 +611,43 @@ static void parse_host_arg(struct hostinfo *hi, char 
*extra_args, int buflen)
if (extra_args < end && *extra_args)
die("Invalid request");
}
+
+   return extra_args;
+}
+
+static void parse_extra_args(struct hostinfo *hi, struct argv_array *env,
+char *extra_args, int buflen)
+{
+   const char *end = extra_args + buflen;
+   struct strbuf git_protocol = STRBUF_INIT;
+
+   /* First look for the host argument */
+   extra_args = parse_host_arg(hi, extra_args, buflen);
+
+   /* Look for additional arguments places after a second NUL byte */
+   for (; extra_args < end; extra_args += strlen(extra_args) + 1) {
+   const char *arg = extra_args;
+
+   /*
+* Parse the extra arguments, adding most to 'git_protocol'
+* which will be used to set the 'GIT_PROTOCOL' envvar in the
+* service that will be run.
+*
+* If there ends up being a particular arg in the future that
+* git-daemon needs to parse specificly (like the 'host' arg)
+* then it can be parsed here and not added to 'git_protocol'.
+*/
+   if (*arg) {
+   if (git_protocol.len > 0)
+   strbuf_addch(_protocol, ':');
+   strbuf_addstr(_protocol, arg);
+   }
+   }
+
+   if (git_protocol.len > 0)
+   argv_array_pushf(env, GIT_PROTOCOL_ENVIRONMENT "=%s",
+git_protocol.buf);
+   strbuf_release(_protocol);
 }
 
 /*
@@ -695,6 +741,7 @@ static int execute(void)

[PATCH v2 6/9] connect: teach client to recognize v1 server response

2017-09-26 Thread Brandon Williams
Teach a client to recognize that a server understands protocol v1 by
looking at the first pkt-line the server sends in response.  This is
done by looking for the response "version 1" send by upload-pack or
receive-pack.

Signed-off-by: Brandon Williams 
---
 connect.c | 30 ++
 1 file changed, 26 insertions(+), 4 deletions(-)

diff --git a/connect.c b/connect.c
index 8e2e276b6..1805debf3 100644
--- a/connect.c
+++ b/connect.c
@@ -12,6 +12,7 @@
 #include "sha1-array.h"
 #include "transport.h"
 #include "strbuf.h"
+#include "protocol.h"
 
 static char *server_capabilities;
 static const char *parse_feature_value(const char *, const char *, int *);
@@ -129,9 +130,23 @@ static int read_remote_ref(int in, char **src_buf, size_t 
*src_len,
return len;
 }
 
-#define EXPECTING_FIRST_REF 0
-#define EXPECTING_REF 1
-#define EXPECTING_SHALLOW 2
+#define EXPECTING_PROTOCOL_VERSION 0
+#define EXPECTING_FIRST_REF 1
+#define EXPECTING_REF 2
+#define EXPECTING_SHALLOW 3
+
+/* Returns 1 if packet_buffer is a protocol version pkt-line, 0 otherwise. */
+static int process_protocol_version(void)
+{
+   switch (determine_protocol_version_client(packet_buffer)) {
+   case protocol_v1:
+   return 1;
+   case protocol_v0:
+   return 0;
+   default:
+   die("server is speaking an unknown protocol");
+   }
+}
 
 static void process_capabilities(int *len)
 {
@@ -224,12 +239,19 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
 */
int responded = 0;
int len;
-   int state = EXPECTING_FIRST_REF;
+   int state = EXPECTING_PROTOCOL_VERSION;
 
*list = NULL;
 
while ((len = read_remote_ref(in, _buf, _len, ))) {
switch (state) {
+   case EXPECTING_PROTOCOL_VERSION:
+   if (process_protocol_version()) {
+   state = EXPECTING_FIRST_REF;
+   break;
+   }
+   state = EXPECTING_FIRST_REF;
+   /* fallthrough */
case EXPECTING_FIRST_REF:
process_capabilities();
if (process_dummy_ref()) {
-- 
2.14.1.992.g2c7b836f3a-goog



Re: RFC v3: Another proposed hash function transition plan

2017-09-26 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> Sorry, you are asking cryptography experts to spend their time on the Git
> mailing list. I tried to get them to speak out on the Git mailing list.
> They respectfully declined.
>
> I can't fault them, they have real jobs to do, and none of their managers
> would be happy for them to educate the Git mailing list on matters of
> cryptography, not after what happened in 2005.

Fortunately we have had a few public comments from crypto specialists:

https://public-inbox.org/git/91a34c5b-7844-3db2-cf29-411df5bcf...@noekeon.org/
https://public-inbox.org/git/CAL9PXLzhPyE+geUdcLmd=pidt5p8efebbsgx_ds88knz2q_...@mail.gmail.com/
https://public-inbox.org/git/CAL9PXLxMHG1nP5_GQaK_WSJTNKs=_qbaL6V5v2GzVG=9vu2...@mail.gmail.com/
https://public-inbox.org/git/59bfb95d.1030...@st.com/
https://public-inbox.org/git/59c149a3.6080...@st.com/

[...]
> Let's be realistic. Git is pretty important to us, but it is not important
> enough to sway, say, Intel into announcing hardware support for SHA3.

Yes, I agree with this.  (Adoption by Git could lead to adoption by
some other projects, leading to more work on high quality software
implementations in projects like OpenSSL, but I am not convinced that
that would be a good thing for the world anyway.  There are downsides
to a proliferation of too many crypto primitives.  This is the basic
argument described in more detail at [1].)

[...]
> On Tue, 26 Sep 2017, Jason Cooper wrote:

>> For my use cases, as a user of git, I have a plan to maintain provable
>> integrity of existing objects stored in git under sha1 while migrating
>> away from sha1.  The same plan works for migrating away from SHA2 or
>> SHA3 when the time comes.
>
> Please do not make the mistake of taking your use case to be a template
> for everybody's use case.

That said, I'm curious at what plan you are alluding to.  Is it
something that could benefit others on the list?

Thanks,
Jonathan

[1] https://www.imperialviolet.org/2017/05/31/skipsha3.html


Re: [PATCH] technical doc: add a design doc for hash function transition

2017-09-26 Thread Jonathan Nieder
Hi,

Stefan Beller wrote:

> From: Jonathan Nieder 

I go by jrnie...@gmail.com upstream. :)

> This is "RFC v3: Another proposed hash function transition plan" from
> the git mailing list.
>
> Signed-off-by: Jonathan Nieder 
> Signed-off-by: Jonathan Tan 
> Signed-off-by: Brandon Williams 
> Signed-off-by: Stefan Beller 

I hadn't signed-off on this version, but it's not a big deal.

[...]
> ---
>
>  This takes the original Google Doc[1] and adds it to our history,
>  such that the discussion can be on on list and in the commit messages.
>
>  * replaced SHA3-256 with NEWHASH, sha3 with newhash
>  * added section 'Implementation plan'
>  * added section 'Future work'
>  * added section 'Agreed-upon criteria for selecting NewHash'

Thanks for sending this out.  I had let it stall too long.

As a tiny nit, I think NewHash is easier to read than NEWHASH.  Not a
big deal.  More importantly, we need some text describing it and
saying it's a placeholder.

The implementation plan included here is out of date.  It comes from
an email where I was answering a question about what people can do to
make progress, before this design had been agreed on.  In the context
of this design there are other steps we'd want to describe (having to
do with implementing the translation table, etc).

I also planned to add a description of the translation table based on
what was discussed previously in this thread.

Jonathan


Re: [PATCH] submodule: correct error message for missing commits.

2017-09-26 Thread Jacob Keller
On Tue, Sep 26, 2017 at 11:27 AM, Stefan Beller  wrote:
> When a submodule diff should be displayed we currently just add the
> submodule objects to the main object store and then e.g. walk the
> revision graph and create a summary for that submodule.
>
> It is possible that we are missing the submodule either completely or
> partially, which we currently differentiate with different error messages
> depending on whether (1) the whole submodule object store is missing or
> (2) just the needed for this particular diff. (1) is reported as
> "not initialized", and (2) is reported as "commits not present".
>
> If a submodule is deinit'ed its repository data is still around inside
> the superproject, such that the diff can still be produced. In that way
> the error message (1) is misleading as we can have a diff despite the
> submodule being not initialized.
>
> Downgrade the error message (1) to be the same as (2) and just say
> the commits are not present, as that is the true reason why the diff
> cannot be shown.
>
> Signed-off-by: Stefan Beller 

This makes sense to me.

Thanks,
Jake

> ---
>  submodule.c   | 2 +-
>  t/t4059-diff-submodule-not-initialized.sh | 2 +-
>  2 files changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/submodule.c b/submodule.c
> index 6531c5d609..280c246477 100644
> --- a/submodule.c
> +++ b/submodule.c
> @@ -567,7 +567,7 @@ static void show_submodule_header(FILE *f, const char 
> *path,
>
> if (add_submodule_odb(path)) {
> if (!message)
> -   message = "(not initialized)";
> +   message = "(commits not present)";
> goto output_header;
> }
>
> diff --git a/t/t4059-diff-submodule-not-initialized.sh 
> b/t/t4059-diff-submodule-not-initialized.sh
> index cd70fd5192..49bca7b48d 100755
> --- a/t/t4059-diff-submodule-not-initialized.sh
> +++ b/t/t4059-diff-submodule-not-initialized.sh
> @@ -95,7 +95,7 @@ test_expect_success 'submodule not initialized in new 
> clone' '
> git clone . sm3 &&
> git -C sm3 diff-tree -p --no-commit-id --submodule=log HEAD >actual &&
> cat >expected <<-EOF &&
> -   Submodule sm1 $smhead1...$smhead2 (not initialized)
> +   Submodule sm1 $smhead1...$smhead2 (commits not present)
> EOF
> test_cmp expected actual
>  '
> --
> 2.14.0.rc0.3.g6c2e499285
>


Google indexing https://public-inbox.org/git (was: BUG in git diff-index)

2017-09-26 Thread Marc Herbert
[Reduced Cc: and change Subject:]

On 26/09/17 13:11, Eric Wong wrote:
> There's no blocks on public-inbox.org and I'm completely against
> any sort of blocking/throttling.  Maybe there's too many pages
> to index?  Or the Message-IDs in URLs are too ugly/scary?  Not
> sure what to do about that...
> 
> Anyways, I just put up a robots.txt with Crawl-Delay: 1, since I
> seem to recall crawlers use a more conservative delay by default:

Not sure what made the difference: Google can now easily find this
thread on both https://public-inbox.org/git and on spinics too.

Same change observed for another, unrelated thread on this list.

Nevermind.

"Unsearchable are his judgments, and his ways past finding out"


Re: [PATCH 09/13] rev-list: add object filtering support

2017-09-26 Thread Jonathan Tan
On Fri, 22 Sep 2017 20:30:13 +
Jeff Hostetler  wrote:

> + if (filter_options.relax) {

Add some documentation about how this differs from ignore_missing_links
in struct rev_info.


Re: [PATCH 07/13] object-filter: common declarations for object filtering

2017-09-26 Thread Jonathan Tan
On Fri, 22 Sep 2017 20:30:11 +
Jeff Hostetler  wrote:

>  Makefile|   1 +
>  object-filter.c | 269 
> 
>  object-filter.h | 173 
>  3 files changed, 443 insertions(+)
>  create mode 100644 object-filter.c
>  create mode 100644 object-filter.h

I think these and list-objects-filter-* are multiple levels of
indirection too many. Would a single file with a few implementations of
filter_object_fn be sufficient?


Re: [PATCH 03/13] list-objects: filter objects in traverse_commit_list

2017-09-26 Thread Jonathan Tan
On Fri, 22 Sep 2017 20:26:22 +
Jeff Hostetler  wrote:

> From: Jeff Hostetler 
> 
> Create traverse_commit_list_filtered() and add filtering

You mention _filtered() here, but this patch contains _worker().

>  list-objects.h | 30 ++
>  2 files changed, 80 insertions(+), 16 deletions(-)
> 
> diff --git a/list-objects.c b/list-objects.c
> index b3931fa..3e86008 100644
> --- a/list-objects.c
> +++ b/list-objects.c
> @@ -13,10 +13,13 @@ static void process_blob(struct rev_info *revs,
>show_object_fn show,
>struct strbuf *path,
>const char *name,
> -  void *cb_data)
> +  void *cb_data,
> +  filter_object_fn filter,
> +  void *filter_data)

I had some thoughts about modifying "show" (instead of adding "filter",
as you have done in this patch) to indicate to the caller that the
corresponding object should not be marked "seen". That does have the
advantage that we don't have so many callbacks flying around, but it
also makes things more complicated, so I don't know which is better.

> + if (filter) {
> + r = filter(LOFT_END_TREE, obj, base->buf, >buf[baselen], 
> filter_data);
> + if (r & LOFR_MARK_SEEN)
> + obj->flags |= SEEN;
> + if (r & LOFR_SHOW)
> + show(obj, base->buf, cb_data);
> + }

This LOFT_END_TREE was added to support the code in
list-objects-filter-sparse - would it be OK to completely remove the
optimization involving the "provisional" omission of blobs? (I don't
think the exact same tree object will occur multiple times often.) If
yes, then I think this block can be removed too and will simplify the
code.

> +/* See object.h and revision.h */
> +#define FILTER_REVISIT (1<<25)

If you do add this, also update object.h. But I don't think this is the
right place to do it - it's probably better in
list-objects-filter-sparse.

> +typedef enum list_objects_filter_result list_objects_filter_result;
> +typedef enum list_objects_filter_type list_objects_filter_type;

I don't think Git style is to typedef enums.

> +void traverse_commit_list_worker(
> + struct rev_info *,
> + show_commit_fn, show_object_fn, void *show_data,
> + filter_object_fn filter, void *filter_data);

Here (and throughout), as far as I can tell, the style is to indent to
the character after the opening parenthesis.


[PATCH] technical doc: add a design doc for hash function transition

2017-09-26 Thread Stefan Beller
From: Jonathan Nieder 

This is "RFC v3: Another proposed hash function transition plan" from
the git mailing list.

Signed-off-by: Jonathan Nieder 
Signed-off-by: Jonathan Tan 
Signed-off-by: Brandon Williams 
Signed-off-by: Stefan Beller 
---

 This takes the original Google Doc[1] and adds it to our history,
 such that the discussion can be on on list and in the commit messages.
 
 * replaced SHA3-256 with NEWHASH, sha3 with newhash
 * added section 'Implementation plan'
 * added section 'Future work'
 * added section 'Agreed-upon criteria for selecting NewHash'
 
 As the discussion restarts again, here is our attempt
 to add value to the discussion, we planned to polish it more, but as the
 discussion is restarting, we might just post it as-is.
  
 Thanks.

[1] 
https://docs.google.com/document/d/18hYAQCTsDgaFUo-VJGhT0UqyetL2LbAzkWNK1fYS8R0/edit

 Documentation/Makefile |   1 +
 .../technical/hash-function-transition.txt | 571 +
 2 files changed, 572 insertions(+)
 create mode 100644 Documentation/technical/hash-function-transition.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index 2415e0d657..471bb29725 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -67,6 +67,7 @@ SP_ARTICLES += howto/maintain-git
 API_DOCS = $(patsubst %.txt,%,$(filter-out technical/api-index-skel.txt 
technical/api-index.txt, $(wildcard technical/api-*.txt)))
 SP_ARTICLES += $(API_DOCS)
 
+TECH_DOCS += technical/hash-function-transition
 TECH_DOCS += technical/http-protocol
 TECH_DOCS += technical/index-format
 TECH_DOCS += technical/pack-format
diff --git a/Documentation/technical/hash-function-transition.txt 
b/Documentation/technical/hash-function-transition.txt
new file mode 100644
index 00..0ac751d600
--- /dev/null
+++ b/Documentation/technical/hash-function-transition.txt
@@ -0,0 +1,571 @@
+Git hash function transition
+
+
+Objective
+-
+Migrate Git from SHA-1 to a stronger hash function.
+
+Background
+--
+At its core, the Git version control system is a content addressable
+filesystem. It uses the SHA-1 hash function to name content. For
+example, files, directories, and revisions are referred to by hash
+values unlike in other traditional version control systems where files
+or versions are referred to via sequential numbers. The use of a hash
+function to address its content delivers a few advantages:
+
+* Integrity checking is easy. Bit flips, for example, are easily
+  detected, as the hash of corrupted content does not match its name.
+* Lookup of objects is fast.
+
+Using a cryptographically secure hash function brings additional
+advantages:
+
+* Object names can be signed and third parties can trust the hash to
+  address the signed object and all objects it references.
+* Communication using Git protocol and out of band communication
+  methods have a short reliable string that can be used to reliably
+  address stored content.
+
+Over time some flaws in SHA-1 have been discovered by security
+researchers. https://shattered.io demonstrated a practical SHA-1 hash
+collision. As a result, SHA-1 cannot be considered cryptographically
+secure any more. This impacts the communication of hash values because
+we cannot trust that a given hash value represents the known good
+version of content that the speaker intended.
+
+SHA-1 still possesses the other properties such as fast object lookup
+and safe error checking, but other hash functions are equally suitable
+that are believed to be cryptographically secure.
+
+Goals
+-
+1. The transition to NEWHASH can be done one local repository at a time.
+   a. Requiring no action by any other party.
+   b. A NEWHASH repository can communicate with SHA-1 Git servers
+  (push/fetch).
+   c. Users can use SHA-1 and NEWHASH identifiers for objects
+  interchangeably.
+   d. New signed objects make use of a stronger hash function than
+  SHA-1 for their security guarantees.
+2. Allow a complete transition away from SHA-1.
+   a. Local metadata for SHA-1 compatibility can be removed from a
+  repository if compatibility with SHA-1 is no longer needed.
+3. Maintainability throughout the process.
+   a. The object format is kept simple and consistent.
+   b. Creation of a generalized repository conversion tool.
+
+Non-Goals
+-
+1. Add NEWHASH support to Git protocol. This is valuable and the
+   logical next step but it is out of scope for this initial design.
+2. Transparently improving the security of existing SHA-1 signed
+   objects.
+3. Intermixing objects using multiple hash functions in a single
+   repository.
+4. Taking the opportunity to fix other bugs in git's formats and
+   protocols.
+5. Shallow clones and fetches into a NEWHASH repository. (This will
+   change when we add NEWHASH support to Git 

Re: [PATCH 02/13] oidset2: create oidset subclass with object length and pathname

2017-09-26 Thread Jonathan Tan
On Fri, 22 Sep 2017 20:26:21 +
Jeff Hostetler  wrote:

> From: Jeff Hostetler 
> 
> Create subclass of oidset where each entry has a
> field to store the length of the object's content
> and an optional pathname.
> 
> This will be used in a future commit to build a
> manifest of omitted objects in a partial/narrow
> clone/fetch.

As Brandon mentioned, I think "oidmap" should be the new data structure
of choice (with "oidset" modified to use it).

> +struct oidset2_entry {
> + struct hashmap_entry hash;
> + struct object_id oid;
> +
> + enum object_type type;
> + int64_t object_length;  /* This is SIGNED. Use -1 when unknown. */
> + char *pathname;
> +};

object_length is defined to be "unsigned long" in Git code, I think.
When is object_length not known, and in those cases, would it be better
to use a separate data structure to store what we need?


Re: RFC v3: Another proposed hash function transition plan

2017-09-26 Thread Johannes Schindelin
Hi Jason,

On Tue, 26 Sep 2017, Jason Cooper wrote:

> On Thu, Sep 14, 2017 at 08:45:35PM +0200, Johannes Schindelin wrote:
> > On Wed, 13 Sep 2017, Linus Torvalds wrote:
> > > On Wed, Sep 13, 2017 at 6:43 AM, demerphq  wrote:
> > > > SHA3 however uses a completely different design where it mixes a 1088
> > > > bit block into a 1600 bit state, for a leverage of 2:3, and the excess
> > > > is *preserved between each block*.
> > > 
> > > Yes. And considering that the SHA1 attack was actually predicated on
> > > the fact that each block was independent (no extra state between), I
> > > do think SHA3 is a better model.
> > > 
> > > So I'd rather see SHA3-256 than SHA256.
> 
> Well, for what it's worth, we need to be aware that SHA3 is *different*.
> In crypto, "different" = "bugs haven't been found yet".  :-P
> 
> And SHA2 is *known*.  So we have a pretty good handle on how it'll
> weaken over time.

Here, you seem to agree with me.

> > SHA-256 got much more cryptanalysis than SHA3-256, and apart from the
> > length-extension problem that does not affect Git's usage, there are no
> > known weaknesses so far.
> 
> While I think that statement is true on it's face (particularly when
> including post-competition analysis), I don't think it's sufficient
> justification to chose one over the other.

And here you don't.

I find that very confusing.

> > It would seem that the experts I talked to were much more concerned about
> > that amount of attention than the particulars of the algorithm. My
> > impression was that the new features of SHA3 were less studied than the
> > well-known features of SHA2, and that the new-ness of SHA3 is not
> > necessarily a good thing.
> 
> The only thing I really object to here is the abstract "experts".  We're
> talking about cryptography and integrity here.  It's no longer
> sufficient to cite anonymous experts.  Either they can put their
> thoughts, opinions and analysis on record here, or it shouldn't be
> considered.  Sorry.

Sorry, you are asking cryptography experts to spend their time on the Git
mailing list. I tried to get them to speak out on the Git mailing list.
They respectfully declined.

I can't fault them, they have real jobs to do, and none of their managers
would be happy for them to educate the Git mailing list on matters of
cryptography, not after what happened in 2005.

> Other than their anonymity, though, I do agree with your experts
> assessments.

I know what our in-house cryptography experts have to prove to start
working at Microsoft. Forgive me, but you are not a known entity to me.

> However, whether we chose SHA2 or SHA3 doesn't matter.

To you, it does not matter.

To me, it matters. To the several thousand developers working on Windows,
probably the largest Git repository in active use, it matters. It matters
because the speed difference that has little impact on you has a lot more
impact on us.

> Moving away from SHA1 does.  Once the object_id code is in place to
> facilitate that transition, the problem is solved from git's
> perspective.

Uh oh. You forgot the mapping. And the protocol. And pretty much
everything except the oid.

> If SHA3 is chosen as the successor, it's going to get a *lot* more
> adoption, and thus, a lot more analysis.  If cracks start to show, the
> hard work of making git flexible is already done.  We can migrate to
> SHA4/5/whatever in an orderly fashion with far less effort than the
> transition away from SHA1.

Sure. And if XYZ789 is chosen, it's going to get a *lot* more adoption,
too.

We think.

Let's be realistic. Git is pretty important to us, but it is not important
enough to sway, say, Intel into announcing hardware support for SHA3.

And if you try to force through *any* hash function only so that it gets
more adoption and hence more support, in the short run you will make life
harder for developers on more obscure platforms, who may not easily get
high-quality, high-speed implementations of anything but the very
mainstream (which is, let's face it, MD5, SHA-1 and SHA-256). I know I
would have cursed you for such a decision back when I had to work on AIX
and IRIX.

> For my use cases, as a user of git, I have a plan to maintain provable
> integrity of existing objects stored in git under sha1 while migrating
> away from sha1.  The same plan works for migrating away from SHA2 or
> SHA3 when the time comes.

Please do not make the mistake of taking your use case to be a template
for everybody's use case.

Migrating a large team away from any hash function to another one *will*
be painful, and costly.

Migrating will be very costly for hosting companies like GitHub, Microsoft
and BitBucket, too.

Ciao,
Johannes


Re: [PATCH 4/3] branch: fix "copy" to never touch HEAD

2017-09-26 Thread Ævar Arnfjörð Bjarmason

On Fri, Sep 22 2017, Junio C. Hamano jotted:

> When creating a new branch B by copying the branch A that happens to
> be the current branch, it also updates HEAD to point at the new
> branch.  It probably was made this way because "git branch -c A B"
> piggybacked its implementation on "git branch -m A B",
>
> This does not match the usual expectation.  If I were sitting on a
> blue chair, and somebody comes and repaints it to red, I would
> accept ending up sitting on a chair that is now red (I am also OK to
> stand, instead, as there no longer is my favourite blue chair).  But
> if somebody creates a new red chair, modelling it after the blue
> chair I am sitting on, I do not expect to be booted off of the blue
> chair and ending up on sitting on the new red one.
>
> Let's fix this before it hits 'next'.  Those who want to create a
> new branch and switch to it can do "git checkout B" after doing a
> "git branch -c B", and if that operation is so useful and deserves a
> short-hand way to do so, perhaps extend "git checkout -b B" to copy
> configurations while creating the new branch B.
>
> Signed-off-by: Junio C Hamano 
> ---
>
>  * Let's send an updated one as a follow-up to the discussion thread
>that it is a follow-up to.  The patch in this message is the same
>as the one I previously sent; the proposed log message has been
>updated somewhat.
>
>  builtin/branch.c  |  9 +++--
>  t/t3200-branch.sh | 10 +-
>  2 files changed, 8 insertions(+), 11 deletions(-)

This mostly looks good to me.

I've already spent several screenfuls here on-list arguing for not
applying the equivalent of this patch, which I won't repeat here.

But just a brief comment: I think this doing this is good, and it's
something I agree with given that the consensus is going this way. I
thought "do what -m does" was less confusing, but since everyone (you,
Brad, even Sahil) seems to disagree I'm happy to be shown to be wrong &
to have this go in so thoroughly reviewed & thought about.

I do think however that we also need to update the docs, the relevant
origin/master...gitster/sd/branch-copy diff is currently:

+The `-c` and `-C` options have the exact same semantics as `-m` and
+`-M`, except instead of the branch being renamed it along with its
+config and reflog will be copied to a new name.

Suggestions welcome, but I think that should probably be changed to
something like the following as part of this patch:

The `-c` and `-C` options have the exact same semantics as `-m` and
`-M`, except instead of the branch being renamed it along with its
config and reflog will be copied to a new name. Furthermore, unlike
its `-m` and `-M` cousins the `-c` and `-C` options will never move
the HEAD, whereas the move option will do so if the source branch is
the currently checked-out branch.

> diff --git a/builtin/branch.c b/builtin/branch.c
> index 89f64f4123..e2e3692838 100644
> --- a/builtin/branch.c
> +++ b/builtin/branch.c
> @@ -506,12 +506,9 @@ static void copy_or_rename_branch(const char *oldname, 
> const char *newname, int
>   oldref.buf + 11);
>   }
>
> - if (replace_each_worktree_head_symref(oldref.buf, newref.buf, 
> logmsg.buf)) {
> - if (copy)
> - die(_("Branch copied to %s, but HEAD is not updated!"), 
> newname);
> - else
> - die(_("Branch renamed to %s, but HEAD is not 
> updated!"), newname);
> - }
> + if (!copy &&
> + replace_each_worktree_head_symref(oldref.buf, newref.buf, 
> logmsg.buf))
> + die(_("Branch renamed to %s, but HEAD is not updated!"), 
> newname);
>
>   strbuf_release();
>
> diff --git a/t/t3200-branch.sh b/t/t3200-branch.sh
> index 5d03ad16f6..be9b3784c6 100755
> --- a/t/t3200-branch.sh
> +++ b/t/t3200-branch.sh
> @@ -422,7 +422,7 @@ test_expect_success 'git branch --copy is a synonym for 
> -c' '
>   test_cmp expect actual
>  '
>
> -test_expect_success 'git branch -c ee ef should copy and checkout branch ef' 
> '
> +test_expect_success 'git branch -c ee ef should copy to create branch ef' '
>   git checkout -b ee &&
>   git reflog exists refs/heads/ee &&
>   git config branch.ee.dummy Hello &&
> @@ -431,7 +431,7 @@ test_expect_success 'git branch -c ee ef should copy and 
> checkout branch ef' '
>   git reflog exists refs/heads/ef &&
>   test $(git config branch.ee.dummy) = Hello &&
>   test $(git config branch.ef.dummy) = Hello &&
> - test $(git rev-parse --abbrev-ref HEAD) = ef
> + test $(git rev-parse --abbrev-ref HEAD) = ee
>  '
>
>  test_expect_success 'git branch -c f/f g/g should work' '
> @@ -494,12 +494,12 @@ test_expect_success 'git branch -C c1 c2 should succeed 
> when c1 is checked out'
>   git checkout -b c1 &&
>   git branch c2 &&
>   git branch -C c1 c2 &&
> - test $(git rev-parse --abbrev-ref HEAD) = c2
> + test $(git 

Re: BUG in git diff-index

2017-09-26 Thread Eric Wong
Marc Herbert  wrote:
> PS: I used NNTP and http://dir.gmane.org/gmane.comp.version-control.git
> to quickly find this old thread (what could we do without NNTP?). Then
> I googled for a web archive of this thread and Google could only find
> this one: 
> http://git.661346.n2.nabble.com/BUG-in-git-diff-index-tt7652105.html#none
> Is there a robots.txt to block indexing on
> https://public-inbox.org/git/1459432667.2124.2.ca...@dwim.me ?

There's no blocks on public-inbox.org and I'm completely against
any sort of blocking/throttling.  Maybe there's too many pages
to index?  Or the Message-IDs in URLs are too ugly/scary?  Not
sure what to do about that...

Anyways, I just put up a robots.txt with Crawl-Delay: 1, since I
seem to recall crawlers use a more conservative delay by default:

==> https://public-inbox.org/robots.txt <==
User-Agent: *
Crawl-Delay: 1


I don't know much about SEO other than keeping a site up and
responsive; so perhaps there's more to be done about getting
things indexed...


[PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-26 Thread Stefan Beller
submodule..update can be assigned an arbitrary command via setting
it to "!command". When this command is found in the regular config, Git
ought to just run that command instead of other update mechanisms.

However if that command is just found in the .gitmodules file, it is
potentially untrusted, which is why we do not run it.  Add a test
confirming the behavior.

Suggested-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---

> This test for a missing file is certainly a remnant from the
> previous iteration, isn't it?

Yes. This is a good indicator I need some vacation.

Thanks,
Stefan

 t/t7406-submodule-update.sh | 8 
 1 file changed, 8 insertions(+)

diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 034914a14f..6f083c4d68 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -406,6 +406,14 @@ test_expect_success 'submodule update - command in 
.git/config' '
)
 '
 
+test_expect_success 'submodule update - command in .gitmodules is ignored' '
+   test_when_finished "git -C super reset --hard HEAD^" &&
+   git -C super config -f .gitmodules submodule.submodule.update "!false" 
&&
+   git -C super commit -a -m "add command to .gitmodules file" &&
+   git -C super/submodule reset --hard $submodulesha1^ &&
+   git -C super submodule update submodule
+'
+
 cat << EOF >expect
 Execution of 'false $submodulesha1' failed in submodule path 'submodule'
 EOF
-- 
2.14.0.rc0.3.g6c2e499285



Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-26 Thread Johannes Sixt

Am 26.09.2017 um 20:54 schrieb Stefan Beller:

+test_expect_success 'submodule update - command in .gitmodules is ignored' '
+   test_when_finished "git -C super reset --hard HEAD^" &&
+
+   git -C super config -f .gitmodules submodule.submodule.update "!false" 
&&
+   git -C super commit -a -m "add command to .gitmodules file" &&
+   git -C super/submodule reset --hard $submodulesha1^ &&
+   git -C super submodule update submodule &&
+   test_path_is_missing bad


This test for a missing file is certainly a remnant from the previous 
iteration, isn't it?


-- Hannes


Re: BUG in git diff-index

2017-09-26 Thread Marc Herbert
On 31/03/16 13:39, Junio C Hamano wrote:
> Andy Lowry  writes:
> 
>> So I think now that the script should do "update-index --refresh"
>> followed by "diff-index --quiet HEAD". Sound correct?
> 
> Yes.  That has always been one of the kosher ways for any script to
> make sure that the files in the working tree that are tracked have
> not been modified relative to HEAD (assuming that the index matches
> HEAD).  

Too bad kernel/scripts/setlocalversion didn't get the memo:

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=cdf2bc632ebc9ef51
> scripts/setlocalversion on write-protected source tree (2013)
> I don't see how removing "git update-index" could do any harm.

This causes a spurious "-dirty" suffix when building from a directory copy
(as Mike learned the hard way)

  

PS: I used NNTP and http://dir.gmane.org/gmane.comp.version-control.git
to quickly find this old thread (what could we do without NNTP?). Then
I googled for a web archive of this thread and Google could only find
this one: 
http://git.661346.n2.nabble.com/BUG-in-git-diff-index-tt7652105.html#none
Is there a robots.txt to block indexing on
https://public-inbox.org/git/1459432667.2124.2.ca...@dwim.me ?


Re: [PATCH v2 0/9] Teach 'run' perf script to read config files

2017-09-26 Thread Stefan Beller
>   "Note: Amazon SES overrides any Date header you provide with the
> time that Amazon
>   SES accepts the message."
>
> http://docs.aws.amazon.com/ses/latest/DeveloperGuide/header-fields.html
>
> ...so the only way SubmitGit can offset the times is to literally
> delay the sending of the emails,
> which is a bit unfortunate for patchbombs more than a few dozen commits long!

How many series of more than "a few dozen" patches were sent to the list lately?
I'd argue this is a non issue for the typical use case of submitGit.

> I'll take a further look at this when I get a bit more free time.

Thanks!
Stefan


Re: [PATCH v2 0/9] Teach 'run' perf script to read config files

2017-09-26 Thread Roberto Tyley
On 26 September 2017 at 16:40, Christian Couder
 wrote:
> On Sun, Sep 24, 2017 at 9:59 AM, Junio C Hamano  wrote:
>> Christian Couder  writes:
>>
>>> (It looks like smtp.gmail.com isn't working anymore for me, so I am
>>> trying to send this using Gmail for the cover letter and Submitgit for
>>> the patches.)
>>
>> SubmitGit may want to learn the "change the timestamps of the
>> individual patches by 1 second" trick from "git send-email" to help
>> threading (you can view inbox/comp.version-control.git/ group over
>> nntp and tell your newsreader to sort-by-date).
>
> Roberto is now in CC. I will let him answer about that.

I had a quick look at git-send-email.perl, I see the trick is the `time++` one
introduced with https://github.com/git/git/commit/a5370b16 - seems reasonable!

SubmitGit makes all emails in-reply-to the initial email, which I
think is correct behaviour,
but I can see that offsetting the times would probably give a more
reliable sorting in
a newsreader. Unfortunately the documentation for AWS Simple Email Service (SES)
says:

  "Note: Amazon SES overrides any Date header you provide with the
time that Amazon
  SES accepts the message."

http://docs.aws.amazon.com/ses/latest/DeveloperGuide/header-fields.html

...so the only way SubmitGit can offset the times is to literally
delay the sending of the emails,
which is a bit unfortunate for patchbombs more than a few dozen commits long!

I'll take a further look at this when I get a bit more free time.

Roberto


Re: [PATCH 00/13] RFC object filtering for parital clone

2017-09-26 Thread Jeff Hostetler



On 9/26/2017 10:55 AM, Jeff Hostetler wrote:



On 9/22/2017 8:39 PM, Jonathan Tan wrote:

On Fri, 22 Sep 2017 20:26:19 +
Jeff Hostetler  wrote:
...

I tried applying your patches and it doesn't apply cleanly on master.
Could you try rebasing? In particular, the code references
get_sha1_with_context(), which no longer exists (it was replaced with
get_oid_with_context()).



I usually build and submit patches relative to the latest
available tag (v2.14.1) rather than a moving target, but
yeah I'll look thru the other comments and rebase it forward.


https://github.com/jeffhostetler/git/tree/core/rev_list_filtering

I rebased this branch onto a current master and fixed
the get_sha1_with_context() problems.

Let me know if you have problems with it (or would prefer that I
send it to the mailing list).

Jeff


[PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-26 Thread Stefan Beller
submodule..update can be assigned an arbitrary command via setting
it to "!command". When this command is found in the regular config, Git
ought to just run that command instead of other update mechanisms.

However if that command is just found in the .gitmodules file, it is
potentially untrusted, which is why we do not run it.  Add a test
confirming the behavior.

Suggested-by: Jonathan Nieder 
Signed-off-by: Stefan Beller 
---

Johannes wrote:
> I am pretty confident that this does not test what you intend to
> test. Notice that $TEST_DIRECTORY is expanded when the script is
> written. But that path contains a blank, and we have something like
> this in the test script:
>
>   #!/bin/sh
>   >/the/build/directory/t/trash directory.t7406/bad

I can confirm that.

Instead of mucking around with writing a script, "to make it robust",
I decided to got the simplest route and just have the command "!false",
which would make "git submodule update submodule" return an error.

Thanks,
Stefan


 t/t7406-submodule-update.sh | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/t/t7406-submodule-update.sh b/t/t7406-submodule-update.sh
index 034914a14f..a9ea098e55 100755
--- a/t/t7406-submodule-update.sh
+++ b/t/t7406-submodule-update.sh
@@ -406,6 +406,16 @@ test_expect_success 'submodule update - command in 
.git/config' '
)
 '
 
+test_expect_success 'submodule update - command in .gitmodules is ignored' '
+   test_when_finished "git -C super reset --hard HEAD^" &&
+
+   git -C super config -f .gitmodules submodule.submodule.update "!false" 
&&
+   git -C super commit -a -m "add command to .gitmodules file" &&
+   git -C super/submodule reset --hard $submodulesha1^ &&
+   git -C super submodule update submodule &&
+   test_path_is_missing bad
+'
+
 cat << EOF >expect
 Execution of 'false $submodulesha1' failed in submodule path 'submodule'
 EOF
-- 
2.14.0.rc0.3.g6c2e499285



Re: [PATCH v5] connect: in ref advertisement, shallows are last

2017-09-26 Thread Brandon Williams
On 09/26, Jonathan Tan wrote:
> Currently, get_remote_heads() parses the ref advertisement in one loop,
> allowing refs and shallow lines to intersperse, despite this not being
> allowed by the specification. Refactor get_remote_heads() to use two
> loops instead, enforcing that refs come first, and then shallows.
> 
> This also makes it easier to teach get_remote_heads() to interpret other
> lines in the ref advertisement, which will be done in a subsequent
> patch.
> 
> As part of this change, this patch interprets capabilities only on the
> first line in the ref advertisement, printing a warning message when
> encountering capabilities on other lines.
> 
> Signed-off-by: Jonathan Tan 
> ---
> Changes in v5:
>  - print warning when encountering capabilities on other lines instead
>of ignoring them (also updated commit message)
>  - explicitly disallow refs of name "capabilities^{}" (except when it is
>the only ref)
> ---
>  connect.c | 183 
> +++---
>  1 file changed, 117 insertions(+), 66 deletions(-)
> 
> diff --git a/connect.c b/connect.c
> index df56c0cbf..df65a3fc4 100644
> --- a/connect.c
> +++ b/connect.c
> @@ -11,6 +11,7 @@
>  #include "string-list.h"
>  #include "sha1-array.h"
>  #include "transport.h"
> +#include "strbuf.h"
>  
>  static char *server_capabilities;
>  static const char *parse_feature_value(const char *, const char *, int *);
> @@ -107,6 +108,98 @@ static void annotate_refs_with_symref_info(struct ref 
> *ref)
>   string_list_clear(, 0);
>  }
>  
> +/*
> + * Read one line of a server's ref advertisement into packet_buffer.
> + */
> +static int read_remote_ref(int in, char **src_buf, size_t *src_len,
> +int *responded)
> +{
> + int len = packet_read(in, src_buf, src_len,
> +   packet_buffer, sizeof(packet_buffer),
> +   PACKET_READ_GENTLE_ON_EOF |
> +   PACKET_READ_CHOMP_NEWLINE);
> + const char *arg;
> + if (len < 0)
> + die_initial_contact(*responded);
> + if (len > 4 && skip_prefix(packet_buffer, "ERR ", ))
> + die("remote error: %s", arg);
> +
> + *responded = 1;
> +
> + return len;
> +}
> +
> +#define EXPECTING_FIRST_REF 0
> +#define EXPECTING_REF 1
> +#define EXPECTING_SHALLOW 2
> +
> +static void process_capabilities(int *len)
> +{
> + int nul_location = strlen(packet_buffer);
> + if (nul_location == *len)
> + return;
> + server_capabilities = xstrdup(packet_buffer + nul_location + 1);
> + *len = nul_location;
> +}
> +
> +static int process_dummy_ref(void)
> +{
> + static char *template;
> + if (!template)
> + template = xstrfmt("%040d capabilities^{}", 0);

My only complaint is still here, I don't like the notion of hardcoding
the 0's.  Its much more future proof and less error prone to call
parse_oid_hex and require that it matches null_oid.

> + return !strcmp(packet_buffer, template);
> +}
> +
> +static void check_no_capabilities(int len)
> +{
> + if (strlen(packet_buffer) != len)
> + warning("Ignoring capabilities after first line '%s'",
> + packet_buffer + strlen(packet_buffer));
> +}
> +
> +static int process_ref(int len, struct ref ***list, unsigned int flags,
> +struct oid_array *extra_have)
> +{
> + struct object_id old_oid;
> + const char *name;
> +
> + if (parse_oid_hex(packet_buffer, _oid, ))
> + return 0;
> + if (*name != ' ')
> + return 0;
> + name++;
> +
> + if (extra_have && !strcmp(name, ".have")) {
> + oid_array_append(extra_have, _oid);
> + } else if (!strcmp(name, "capabilities^{}")) {
> + die("protocol error: unexpected capabilities^{}");
> + } else if (check_ref(name, flags)) {
> + struct ref *ref = alloc_ref(name);
> + oidcpy(>old_oid, _oid);
> + **list = ref;
> + *list = >next;
> + }
> + check_no_capabilities(len);
> + return 1;
> +}
> +
> +static int process_shallow(int len, struct oid_array *shallow_points)
> +{
> + const char *arg;
> + struct object_id old_oid;
> +
> + if (!skip_prefix(packet_buffer, "shallow ", ))
> + return 0;
> +
> + if (get_oid_hex(arg, _oid))
> + die("protocol error: expected shallow sha-1, got '%s'", arg);
> + if (!shallow_points)
> + die("repository on the other end cannot be shallow");
> + oid_array_append(shallow_points, _oid);
> + check_no_capabilities(len);
> + return 1;
> +}
> +
>  /*
>   * Read all the refs from the other end
>   */
> @@ -123,76 +216,34 @@ struct ref **get_remote_heads(int in, char *src_buf, 
> size_t src_len,
>* willing to talk to us.  A hang-up before seeing any
>* response does not necessarily mean an ACL problem, though.
>*/
> - 

[PATCH] submodule: correct error message for missing commits.

2017-09-26 Thread Stefan Beller
When a submodule diff should be displayed we currently just add the
submodule objects to the main object store and then e.g. walk the
revision graph and create a summary for that submodule.

It is possible that we are missing the submodule either completely or
partially, which we currently differentiate with different error messages
depending on whether (1) the whole submodule object store is missing or
(2) just the needed for this particular diff. (1) is reported as
"not initialized", and (2) is reported as "commits not present".

If a submodule is deinit'ed its repository data is still around inside
the superproject, such that the diff can still be produced. In that way
the error message (1) is misleading as we can have a diff despite the
submodule being not initialized.

Downgrade the error message (1) to be the same as (2) and just say
the commits are not present, as that is the true reason why the diff
cannot be shown.

Signed-off-by: Stefan Beller 
---
 submodule.c   | 2 +-
 t/t4059-diff-submodule-not-initialized.sh | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/submodule.c b/submodule.c
index 6531c5d609..280c246477 100644
--- a/submodule.c
+++ b/submodule.c
@@ -567,7 +567,7 @@ static void show_submodule_header(FILE *f, const char *path,
 
if (add_submodule_odb(path)) {
if (!message)
-   message = "(not initialized)";
+   message = "(commits not present)";
goto output_header;
}
 
diff --git a/t/t4059-diff-submodule-not-initialized.sh 
b/t/t4059-diff-submodule-not-initialized.sh
index cd70fd5192..49bca7b48d 100755
--- a/t/t4059-diff-submodule-not-initialized.sh
+++ b/t/t4059-diff-submodule-not-initialized.sh
@@ -95,7 +95,7 @@ test_expect_success 'submodule not initialized in new clone' '
git clone . sm3 &&
git -C sm3 diff-tree -p --no-commit-id --submodule=log HEAD >actual &&
cat >expected <<-EOF &&
-   Submodule sm1 $smhead1...$smhead2 (not initialized)
+   Submodule sm1 $smhead1...$smhead2 (commits not present)
EOF
test_cmp expected actual
 '
-- 
2.14.0.rc0.3.g6c2e499285



[PATCH v5] connect: in ref advertisement, shallows are last

2017-09-26 Thread Jonathan Tan
Currently, get_remote_heads() parses the ref advertisement in one loop,
allowing refs and shallow lines to intersperse, despite this not being
allowed by the specification. Refactor get_remote_heads() to use two
loops instead, enforcing that refs come first, and then shallows.

This also makes it easier to teach get_remote_heads() to interpret other
lines in the ref advertisement, which will be done in a subsequent
patch.

As part of this change, this patch interprets capabilities only on the
first line in the ref advertisement, printing a warning message when
encountering capabilities on other lines.

Signed-off-by: Jonathan Tan 
---
Changes in v5:
 - print warning when encountering capabilities on other lines instead
   of ignoring them (also updated commit message)
 - explicitly disallow refs of name "capabilities^{}" (except when it is
   the only ref)
---
 connect.c | 183 +++---
 1 file changed, 117 insertions(+), 66 deletions(-)

diff --git a/connect.c b/connect.c
index df56c0cbf..df65a3fc4 100644
--- a/connect.c
+++ b/connect.c
@@ -11,6 +11,7 @@
 #include "string-list.h"
 #include "sha1-array.h"
 #include "transport.h"
+#include "strbuf.h"
 
 static char *server_capabilities;
 static const char *parse_feature_value(const char *, const char *, int *);
@@ -107,6 +108,98 @@ static void annotate_refs_with_symref_info(struct ref *ref)
string_list_clear(, 0);
 }
 
+/*
+ * Read one line of a server's ref advertisement into packet_buffer.
+ */
+static int read_remote_ref(int in, char **src_buf, size_t *src_len,
+  int *responded)
+{
+   int len = packet_read(in, src_buf, src_len,
+ packet_buffer, sizeof(packet_buffer),
+ PACKET_READ_GENTLE_ON_EOF |
+ PACKET_READ_CHOMP_NEWLINE);
+   const char *arg;
+   if (len < 0)
+   die_initial_contact(*responded);
+   if (len > 4 && skip_prefix(packet_buffer, "ERR ", ))
+   die("remote error: %s", arg);
+
+   *responded = 1;
+
+   return len;
+}
+
+#define EXPECTING_FIRST_REF 0
+#define EXPECTING_REF 1
+#define EXPECTING_SHALLOW 2
+
+static void process_capabilities(int *len)
+{
+   int nul_location = strlen(packet_buffer);
+   if (nul_location == *len)
+   return;
+   server_capabilities = xstrdup(packet_buffer + nul_location + 1);
+   *len = nul_location;
+}
+
+static int process_dummy_ref(void)
+{
+   static char *template;
+   if (!template)
+   template = xstrfmt("%040d capabilities^{}", 0);
+   return !strcmp(packet_buffer, template);
+}
+
+static void check_no_capabilities(int len)
+{
+   if (strlen(packet_buffer) != len)
+   warning("Ignoring capabilities after first line '%s'",
+   packet_buffer + strlen(packet_buffer));
+}
+
+static int process_ref(int len, struct ref ***list, unsigned int flags,
+  struct oid_array *extra_have)
+{
+   struct object_id old_oid;
+   const char *name;
+
+   if (parse_oid_hex(packet_buffer, _oid, ))
+   return 0;
+   if (*name != ' ')
+   return 0;
+   name++;
+
+   if (extra_have && !strcmp(name, ".have")) {
+   oid_array_append(extra_have, _oid);
+   } else if (!strcmp(name, "capabilities^{}")) {
+   die("protocol error: unexpected capabilities^{}");
+   } else if (check_ref(name, flags)) {
+   struct ref *ref = alloc_ref(name);
+   oidcpy(>old_oid, _oid);
+   **list = ref;
+   *list = >next;
+   }
+   check_no_capabilities(len);
+   return 1;
+}
+
+static int process_shallow(int len, struct oid_array *shallow_points)
+{
+   const char *arg;
+   struct object_id old_oid;
+
+   if (!skip_prefix(packet_buffer, "shallow ", ))
+   return 0;
+
+   if (get_oid_hex(arg, _oid))
+   die("protocol error: expected shallow sha-1, got '%s'", arg);
+   if (!shallow_points)
+   die("repository on the other end cannot be shallow");
+   oid_array_append(shallow_points, _oid);
+   check_no_capabilities(len);
+   return 1;
+}
+
 /*
  * Read all the refs from the other end
  */
@@ -123,76 +216,34 @@ struct ref **get_remote_heads(int in, char *src_buf, 
size_t src_len,
 * willing to talk to us.  A hang-up before seeing any
 * response does not necessarily mean an ACL problem, though.
 */
-   int saw_response;
-   int got_dummy_ref_with_capabilities_declaration = 0;
+   int responded = 0;
+   int len;
+   int state = EXPECTING_FIRST_REF;
 
*list = NULL;
-   for (saw_response = 0; ; saw_response = 1) {
-   struct ref *ref;
-   struct object_id old_oid;
-   char *name;
-   int len, name_len;
- 

[ANNOUNCE] Git for Windows 2.14.2

2017-09-26 Thread Johannes Schindelin
Dear Git users,

It is my pleasure to announce that Git for Windows 2.14.2 is available from:

https://git-for-windows.github.io/

Changes since Git for Windows v2.14.1 (August 10th 2017)

New Features

  * Comes with Git v2.14.2.
  * Comes with cURL v7.55.1.
  * The XP-compatibility layer emulating pthreads (which is no longer
needed) was dropped in favor of modern Windows threading APIs; This
should make threaded operations slightly faster and more robust.
  * On Windows, UNC paths can now be accessed via file://host/share/
repo.git-style paths.
  * Comes with a new custom Git command git update to help keeping Git
up-to-date on your machine.
  * The Git installer now offers an option to keep Git up-to-date by
calling git update regularly.
  * Comes with BusyBox v1.28.0pre.16353.2739df917.
  * As is common elsewhere, Ctrl+Left and Ctrl+Right now move word-wise
in Git Bash, too.
  * Comes with patch level 2 of the MSYS2 runtime (Git for Windows
flavor) based on Cygwin 2.9.0.
  * Comes with Git LFS v2.3.0.
  * The vs/master branch can now be built in Visual Studio 2017, too
  * As requested by the same user who implemented the change, Git for
Windows now comes with tig, a text-mode interface for Git.

Bug Fixes

  * It is now possible to override http.sslBackend on the command-line.
  * The installer now detects correctly whether symbolic links can be
created by regular users.
  * Git Bash now renders non-ASCII directories nicely.
  * A regression that caused the fetch operation with lots of refs to
be a lot slower than before was fixed.
  * The git-gui.exe and gitk.exe wrappers intended to be used in Git
CMD now handle command-line parameters correctly.
  * The core.longPaths setting is now heeded when packing refs, and
other previously forgotten Git commands.
  * Pressing Ctrl+Z in Git Bash no longer kills Win32 processes (e.g.
git.exe) anymore, because POSIX job control is only available with
MSYS2 processes.
  * Git for Windows now sets core.fsyncObjectFiles = true by default
which makes it a lot more fault-tolerant, say, when power is lost.
  * A bug has been fixed where Git for Windows could run into an
infinite loop trying to rename a file.
  * Before installing Git for Windows, we already verified that no Git
Bash instance is active (which would prevent files from being
overwritten). We now also verify that no git.exe processes are
active, either.

Filename | SHA-256
 | ---
Git-2.14.2-64-bit.exe | 
ef144a5dbb74518b56727f6c228993b9e7f5aedec7bbf7d680dcffb5d36ce354
Git-2.14.2-32-bit.exe | 
0c7cf5f1bd6532c3720920b953da6b5c563acab35e3caee0b6f337975e284f35
PortableGit-2.14.2-64-bit.7z.exe | 
3d451383519379dfe2f160c4c3a49e4b9e39a98f3eea2534593cc3fb5a4e
PortableGit-2.14.2-32-bit.7z.exe | 
6acd886bb1cebee7b5b607599c8327453110c28ddd8c4f976d14050ab0a17e11
MinGit-2.14.2-64-bit.zip | 
9638733b8d749c43d59c34a714d582b2352356ee7d13c4acf919c18f307387f5
MinGit-2.14.2-32-bit.zip | 
1c8ed7f54a8d50c84fbc767d97cf35992de42463d9852c56a1f3a3c1ac80c965
MinGit-2.14.2-busybox-64-bit.zip | 
e6a75e75378b89c4d2e9c89a7cd0b0995ea0b8c2bcd3a93c42d0ffadd17ccd75
MinGit-2.14.2-busybox-32-bit.zip | 
b0b4d2135ad63347dfd787aa8046a625272764d0a6ea73f4ebd793105d6d9545
Git-2.14.2-64-bit.tar.bz2 | 
cf2d4ccce3dbdd42755e2e89830a406b9cf5cef6305dc7bf176a4ef5d55cb47e
Git-2.14.2-32-bit.tar.bz2 | 
8901421656b6e5215cf4a2f3872de3e6d89aa1177ea3c3ffbed54af292957023

Ciao,
Johannes


Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3)

2017-09-26 Thread Jonathan Tan
On Tue, 26 Sep 2017 10:25:16 -0400
Jeff Hostetler  wrote:

> >> Perhaps you could augment the OID lookup to remember where the object
> >> was found (essentially a .promisor bit set).  Then you wouldn't need
> >> to touch them all.
> > 
> > Sorry - I don't understand this. Are you saying that missing promisor
> > objects should go into the global object hashtable, so that we can set a
> > flag on them?
> 
> I just meant could we add a bit to "struct object_info" to indicate
> that the object was found in a .promisor packfile ?  This could
> be set in sha1_object_info_extended().
> 
> Then the is_promised() calls in fsck and gc would just test that bit.
> 
> Given that that bit will be set on promisOR objects (and we won't
> have object_info for missing objects), you may need to adjust the
> iterator in the fsck/gc code slightly.
> 
> This is a bit of a handwave, but could something like that eliminate
> the need to build this oidset?

This oidset is meant to contain the missing objects, and is needed as
the final check (attempt to read the object, then check this oidset).
Admittedly, right now I add objects to it even if they are present in
the DB, but that's because I think that it's better for the set to be
bigger than to incur the repeated existence checks. But even if we only
include truly missing objects in this oidset, we still need the oidset,
or store information about missing objects in some equivalent data
structure.

The bit that you mention being set on promisOR objects is already being
set. See the invocation of mark_uninteresting() in rev-list.c.


Re: RFC v3: Another proposed hash function transition plan

2017-09-26 Thread Jason Cooper
Hi all,

Sorry for late commentary...

On Thu, Sep 14, 2017 at 08:45:35PM +0200, Johannes Schindelin wrote:
> On Wed, 13 Sep 2017, Linus Torvalds wrote:
> > On Wed, Sep 13, 2017 at 6:43 AM, demerphq  wrote:
> > > SHA3 however uses a completely different design where it mixes a 1088
> > > bit block into a 1600 bit state, for a leverage of 2:3, and the excess
> > > is *preserved between each block*.
> > 
> > Yes. And considering that the SHA1 attack was actually predicated on
> > the fact that each block was independent (no extra state between), I
> > do think SHA3 is a better model.
> > 
> > So I'd rather see SHA3-256 than SHA256.

Well, for what it's worth, we need to be aware that SHA3 is *different*.
In crypto, "different" = "bugs haven't been found yet".  :-P

And SHA2 is *known*.  So we have a pretty good handle on how it'll
weaken over time.

> SHA-256 got much more cryptanalysis than SHA3-256, and apart from the
> length-extension problem that does not affect Git's usage, there are no
> known weaknesses so far.

While I think that statement is true on it's face (particularly when
including post-competition analysis), I don't think it's sufficient
justification to chose one over the other.

> It would seem that the experts I talked to were much more concerned about
> that amount of attention than the particulars of the algorithm. My
> impression was that the new features of SHA3 were less studied than the
> well-known features of SHA2, and that the new-ness of SHA3 is not
> necessarily a good thing.

The only thing I really object to here is the abstract "experts".  We're
talking about cryptography and integrity here.  It's no longer
sufficient to cite anonymous experts.  Either they can put their
thoughts, opinions and analysis on record here, or it shouldn't be
considered.  Sorry.

Other than their anonymity, though, I do agree with your experts
assessments.

However, whether we chose SHA2 or SHA3 doesn't matter.  Moving away from
SHA1 does.  Once the object_id code is in place to facilitate that
transition, the problem is solved from git's perspective.

If SHA3 is chosen as the successor, it's going to get a *lot* more
adoption, and thus, a lot more analysis.  If cracks start to show, the
hard work of making git flexible is already done.  We can migrate to
SHA4/5/whatever in an orderly fashion with far less effort than the
transition away from SHA1.

For my use cases, as a user of git, I have a plan to maintain provable
integrity of existing objects stored in git under sha1 while migrating
away from sha1.  The same plan works for migrating away from SHA2 or
SHA3 when the time comes.


thx,

Jason.


Re: [PATCH v2 0/9] Teach 'run' perf script to read config files

2017-09-26 Thread Christian Couder
On Sun, Sep 24, 2017 at 9:59 AM, Junio C Hamano  wrote:
> Christian Couder  writes:
>
>> (It looks like smtp.gmail.com isn't working anymore for me, so I am
>> trying to send this using Gmail for the cover letter and Submitgit for
>> the patches.)
>
> SubmitGit may want to learn the "change the timestamps of the
> individual patches by 1 second" trick from "git send-email" to help
> threading (you can view inbox/comp.version-control.git/ group over
> nntp and tell your newsreader to sort-by-date).

Roberto is now in CC. I will let him answer about that.

>> Highlevel view of the patches in the series
>> ~~~
>>
>>   - Patches 1/9 to 4/9 were already in v1 and haven't changed.
>
> It isn't quite clear what _did_ change in the series and what
> lessons were learned form the previous round's discussion here.

In the previous round of discussion, I think the reviewers (you and
Peff) basically agreed that it was worth improving the test framework
to make it possible to easily run many tests, though reviewers were
not sure if what I had planned to do would in the end be a good
solution.

(See 
https://public-inbox.org/git/20170713065050.19215-1-chrisc...@tuxfamily.org/)

So what did change is that I implemented what was in the "Future work"
section in v1.

> The
> sample configuration in the description above (snipped) seems to
> have been extended and it shows that one of the use cases of the
> feature is to allow comparing runs against two versions, which
> looked more or less sensible way to express it.

[...]

> Thanks.  Let me see how well it works ;-)

Thanks for testing ;-)


Re: RFC: Design and code of partial clones (now, missing commits and trees OK)

2017-09-26 Thread Michael Haggerty
On 09/22/2017 12:42 AM, Jonathan Tan wrote:
> On Thu, 21 Sep 2017 13:57:30 Jeff Hostetler  wrote:
> [...]
>> I struggled with the terms here a little when looking at the source.
>> () A remote responding to a partial-clone is termed a
>> "promisor-remote". () Packfiles received from a promisor-remote are
>> marked with ".promisor" like ".keep" names.
>> () An object actually contained in such packfiles is called a
>> "promisor-object". () An object not-present but referenced by one of
>> the above promisor-objects is called a "promised-object" (aka a
>> "missing-object").
>>
>> I think the similarity of the words "promisOR" and "promisED" threw
>> me here and as I was looking at the code.  The code in is_promised()
>> [1] looked like it was adding all promisor- and promised-objects to
>> the "promised" OIDSET, but I could be mistaken.
>>
>> [1]
>> https://github.com/jonathantanmy/git/commit/7a9c2d9b6e2fce293817b595dee29a7eede0#diff-5d5d5dc185ef37dc30bb7d9a7ae0c4e8R1960
> 
> I was struggling a bit with the terminology, true.
> 
> Right now I'm thinking of:
>  - promisor remote (as you defined)
>  - promisor packfile (as you defined)
>  - promisor object is an object known to belong to the promisor (whether
>because we have it in a promisor packfile or because it is referenced
>by an object in a promisor packfile)

Maybe naming has been discussed at length before, and I am jumping into
a long-settled topic. And admittedly this is bikeshedding.

But I find these names obscure, even as a developer. And terms like this
will undoubtedly bleed into the UI and documentation, so it would be
good to put some effort into choosing the best names possible.

I suppose that the term "promisor" comes from the computer science term
"promise" [1]. In that sense it is apt, because, say, a promisor object
is something that is known to be obtainable, but we don't have it yet.

But from the user's point of view, I think this isn't a very
illuminating term. I think the user's mental model will be that there is
a distinguished remote repository that holds the project's entire
published history, and she has to remain connected to it for certain Git
operations to work [2]. Another interesting aspect of this remote is
that it has to be trusted never (well, almost never) to discard any
objects [3].

Let me brainstorm about other names or concepts that seem closer to the
user's mental model:

* "backing remote", "backing repository"

* "lazy remote", "live remote", "cloud remote", "shared remote",
  "on-demand remote"

* "full remote", "deep remote", "permanent remote"

* "attached remote", "bound remote", "linked remote"

* "trusted remote", "authoritative remote", "official remote"
  (these terms probably imply more than we want)

* "upstream", "upstream remote" (probably too confusing with how
  the term "upstream" is currently used, even if in practice they
  will often be the same remote)

* "object depot", "depot remote", "depot repository", "remote
  object depot" (I don't like the term "object" that much, because
  it is rather remote from the Git user's daily life)

* "repository of record", "remote of record" (too wordy)

* "history depot", "history warehouse" (meh)

* (dare I say it?) "central repository"

* "object archive", "archival remote" (not so good because we already
  use "archive" for other concepts)

* depository (sounds too much like "repository")

* The thesaurus suggests nouns like: store, bank, warehouse, library,
  chronicle, annals, registry, locker, strongbox, attic, bunker

Personally I think "lazy remote" and "backing remote" are not too bad.

Michael

[1] https://en.wikipedia.org/wiki/Futures_and_promises

[2] I haven't checked whether the current proposal allows for
multiple "promisor remotes". It's certainly thinkable, if not
now then in the future. But I suppose that even then, 99% of
users will configure a single "promisor remote" for each
repository.

[3] For those rare occasions where the server has to discard objects,
it might make sense for the server to remember the names of the
objects that were deleted, so that it can tell clients "no, you're
not insane. I used to have that object but it has intentionally
been obliterated", and possibly even a reason: "it is now taboo"
vs. "I got tired of carrying it around".


Re: git config credential.helper with absolute path on windows not working properly

2017-09-26 Thread bcampolo
My situation was a little different, but I was able to get this to work with
some interesting escaping.

helper = !"\"C:\\Path with spaces\\executable\"" --option1 value1
credential-helper $@

Notice the exclamation, quoted path of executable and extra escaped quotes
inside of that, plus escaped backslashes.






--
Sent from: http://git.661346.n2.nabble.com/


Re: [PATCH 00/13] RFC object filtering for parital clone

2017-09-26 Thread Jeff Hostetler



On 9/22/2017 8:39 PM, Jonathan Tan wrote:

On Fri, 22 Sep 2017 20:26:19 +
Jeff Hostetler  wrote:


This draft contains filters to:
() omit all blobs
() omit blobs larger than some size
() omit blobs using a sparse-checkout specification

In addition to specifying the filter criteria, the rev-list command
was updated to include options to:
() print a list of the omitted objects (due to the current filtering
criteria)
() print a list of missing objects (probably from a prior partial
clone/fetch).


Thanks, this last point seems useful.

I tried applying your patches and it doesn't apply cleanly on master.
Could you try rebasing? In particular, the code references
get_sha1_with_context(), which no longer exists (it was replaced with
get_oid_with_context()).



I usually build and submit patches relative to the latest
available tag (v2.14.1) rather than a moving target, but
yeah I'll look thru the other comments and rebase it forward.

Thanks
Jeff


Re: [PATCH] git: add --no-optional-locks option

2017-09-26 Thread Jeff Hostetler



On 9/25/2017 12:17 PM, Johannes Schindelin wrote:

Hi Kaartic,

On Sun, 24 Sep 2017, Kaartic Sivaraam wrote:


On Thursday 21 September 2017 10:02 AM, Jeff King wrote:

Some tools like IDEs or fancy editors may periodically run commands
like "git status" in the background to keep track of the state of the
repository.


I might be missing something, shouldn't the IDEs be encouraged to use
libgit2 instead? I thought it was meant for these use cases.


There are pros and cons. Visual Studio moved away from libgit2 e.g. to
support SSH (back then, libgit2 did not support spawning processes, I have
no idea whether that changed in the meantime).


There were other issues besides feature parity.  The big one for VS
was that it moved the git computations into a separate process and
address space.  You can't easily read/modify/write a 500MB .git/index
file into memory (with however many copies of the data that that
creates) in a 32-bit process.



Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 3)

2017-09-26 Thread Jeff Hostetler



On 9/22/2017 6:58 PM, Jonathan Tan wrote:

On Fri, 22 Sep 2017 17:32:00 -0400
Jeff Hostetler  wrote:


I guess I'm afraid that the first call to is_promised() is going
cause a very long pause as it loads up a very large hash of objects.


Yes, the first call will cause a long pause. (I think fsck and gc can
tolerate this, but a better solution is appreciated.)


Perhaps you could augment the OID lookup to remember where the object
was found (essentially a .promisor bit set).  Then you wouldn't need
to touch them all.


Sorry - I don't understand this. Are you saying that missing promisor
objects should go into the global object hashtable, so that we can set a
flag on them?


I just meant could we add a bit to "struct object_info" to indicate
that the object was found in a .promisor packfile ?  This could
be set in sha1_object_info_extended().

Then the is_promised() calls in fsck and gc would just test that bit.

Given that that bit will be set on promisOR objects (and we won't
have object_info for missing objects), you may need to adjust the
iterator in the fsck/gc code slightly.

This is a bit of a handwave, but could something like that eliminate
the need to build this oidset?





The oidset will deduplicate OIDs.


Right, but you still have an entry for each object.  For a repo the
size of Windows, you may have 25M+ objects your copy of the ODB.


We have entries only for the "frontier" objects (the objects directly
referenced by any promisor object). For the Windows repo, for example, I
foresee that many of the blobs, trees, and commits will be "hiding"
behind objects that the repository user did not download into their
repo.



Re: RFC: Design and code of partial clones (now, missing commits and trees OK) (part 2/3)

2017-09-26 Thread Jeff Hostetler



On 9/22/2017 6:52 PM, Jonathan Tan wrote:

On Fri, 22 Sep 2017 17:19:50 -0400
Jeff Hostetler  wrote:


In your specific example, how would rev-list know, on the client, to
include (or exclude) a large blob in its output if it does not have it,
and thus does not know its size?



The client doesn't have the size. It just knows it is missing and it
needs it. It doesn't matter why it is missing.  (But I guess the client
could assume it is because it is large.)


Ah, OK.


So rev-list on the client could filter the objects it has by size.


My issue is that if the purpose of this feature in rev-list is to do
prefetching, the only criterion we need to check for is absence from the
local repo right? (Or is filtering by size on the client useful for
other reasons?)


The prefetch before a checkout may want all missing blobs, or it
want to work with the sparse-checkout specification and only get
the required missing blobs in the subset of the tree.  By putting
the same filter logic in rev-list, we can do that.

It also sets the stage for later filtering trees.  (My current patch
only filters blobs.  It would be nice to have a second version of the
sparse filter that also omits trees, but that may require a recursive
option in the fetch-objects protocol.)





FYI I just posted my RFC this afternoon.
https://public-inbox.org/git/20170922204211.ga24...@google.com/T/


Thanks, I'll take a look.



Re: -X theirs does not resolve symlink conflict Was: BUG: merge -s theirs is not in effect

2017-09-26 Thread Yaroslav Halchenko

On Tue, 26 Sep 2017, Junio C Hamano wrote:
> >> I do not recall people talking about symbolic links but the case of
> >> binary files has been on the wishlist for a long time, and I do not
> >> know of anybody who is working on (or is planning to work on) it.

> > Ah, I misremembered.

> > We've addressed the "binary files" case back in 2012 with a944af1d
> > ("merge: teach -Xours/-Xtheirs to binary ll-merge driver",
> > 2012-09-08).  I do not know offhand if it is just as easy to plumb
> > the MERGE_FAVOR_{OURS,THEIRS} bits thru the symbolic link codepath,
> > like that patch did to the binary file codepath.

> Perhaps the attached (totally untested) patch might be a good
> starting point.  I do not know if you are interested in hacking on
> Git, and I do not feel offended if you are not, but perhaps somebody

I would have felt honored to "hack on Git" but neither my C-foo is up to
par, neither there would be more time I could adequately allocate for
such endeavor.   So meanwhile I am trying to contribute in hopefully
constructive "whining" while exploiting git.

> else might get interested in seeing if this #leftoverbits is a good
> direction to go in, and finishing it with docs and tests if it is
> ;-)


>  merge-recursive.c | 17 +
>  1 file changed, 13 insertions(+), 4 deletions(-)

> >...<

This patch worked beautifully in my usecase!:

$> rm -rf /tmp/repo1; mkdir /tmp/repo1; cd /tmp/repo1; git init .; ln -s sym1 
link; echo 1 > file; git add file link; git commit -m 'common'; git co -b b1 ; 
ln -sf b1link link; echo "b1 file" >| file; git commit -m 'b2 changes' -a; git 
co master; ln -sf masterlink link; echo "master file" >| file; git commit -m 
'also modified in master' -a; git merge -X theirs --no-edit b1; ls -l link; cat 
file   
warning: templates not found /home/yoh/share/git-core/templates 
   
Initialized empty Git repository in /tmp/repo1/.git/
[master (root-commit) d2e9010] common
 2 files changed, 2 insertions(+)
 create mode 100644 file
 create mode 12 link
Switched to a new branch 'b1'
[b1 a2b1321] b2 changes
 2 files changed, 2 insertions(+), 2 deletions(-)
Switched to branch 'master'
[master fbb4ba7] also modified in master
 2 files changed, 2 insertions(+), 2 deletions(-)
Auto-merging link
Auto-merging file
Merge made by the 'recursive' strategy.
 file | 2 +-
 link | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)
lrwxrwxrwx 1 yoh yoh 6 Sep 26 09:32 link -> b1link
b1 file

I also tried -s ours and no explicit -s, and they did as prescribed as
well

PS thanks for the CCs
-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik


Re: -s theirs use-case(s) Was: BUG: merge -s theirs is not in effect

2017-09-26 Thread Yaroslav Halchenko

On Tue, 26 Sep 2017, Junio C Hamano wrote:

> Yaroslav Halchenko  writes:

> > 1. As a workaround for absence of -m theirs I using mtheirs git alias:
> > (I believe provided to me awhile back here on the list):

> > mtheirs = !sh -c 'git merge -s ours --no-commit $1 && git read-tree -m 
> > -u $1' -

> > and it worked fine for my usecases

> > 2. I think that if there is a reason for -s ours to exist, so there for -s 
> > theirs
> > since it is just the directionality of merges which changes between the two

> Just on this point.  They are not exactly symmetric.

> Imagine there are some undesirable changes you want to vanquish from
> the world, but they have already built on useful changes on top of
> the undesirable changes.  A hypothetical history might look like
> this:

>  B---C
> /
>X---X---A
>   /
>   ---o---o your mainline

> where 'X' denotes those unwanted changes.

> >...<

> The symmetiric case where _you_ have wrong changes do not need "-s
> theirs".  These mistakes X are yours, so are the changes depend on
> them:

>  B---C
> /
>X---X---A
>   /
>   ---o---o their mainline

> and you can just rebase A, B and C on top of their mainline while
> getting rid of Xs yourself before publishing.

and that is where the gotcha comes -- what if "my" changes were already
published?  then I would like to avoid the rebase, and would -s theirs
to choose "their" solution in favor of mine and be able to push so
others could still "fast-forward" to the new state.

So -- as to me it remains 'symmetric' ;)

> There may be valid workflows that benefit from "-s theirs", and I
> would not be surprised at all if we found more of them in the past 9
> years since we had the "why -s theirs does not exist" discussion in
> 2008.  But "because -s ours can be used in reverse to emulate" is
> not a valid excuse to add "-s theirs".  It can be used a rationale
> against adding it (e.g. "-s theirs generally is discouraged because
> it forsters a bad workflow, but in a very rare case where it might
> be useful, you can always check out their branch and merge yours
> using '-s ours' to emulate it, so we do not lose any functionality
> even if we did not add it"), though.

sure, git is flexible, so workarounds could always be found, but often
many options are just a matter of convenience.  And here -s theirs would
be one of them besides my other use case where it is a somewhat "by
design" workflow, and -s theirs is use to take their exact state I would
improve upon (before committing the "merge")

-- 
Yaroslav O. Halchenko
Center for Open Neuroscience http://centerforopenneuroscience.org
Dartmouth College, 419 Moore Hall, Hinman Box 6207, Hanover, NH 03755
Phone: +1 (603) 646-9834   Fax: +1 (603) 646-1419
WWW:   http://www.linkedin.com/in/yarik


Re: [PATCH 4/4] Move documentation of string_list into string-list.h

2017-09-26 Thread Han-Wen Nienhuys
On Tue, Sep 26, 2017 at 7:22 AM, Junio C Hamano  wrote:
> Junio C Hamano  writes:
>
>>
>> Thanks.  I am not sure if you can safely reorder the contents of the
>> header files in general, but I trust that you made sure that this
>> does not introduce problems (e.g. referrals before definition).
>
> Alas, this time it seems my trust was grossly misplaced.  Discarding
> this patch and redoing the integration cycle for the day.

Oops, my bad.

I resent the remaining patches. They do compile now :)

--

Google Germany GmbH, Erika-Mann-Strasse 33, 80636 Munich

Registergericht und -nummer: Hamburg, HRB 86891

Sitz der Gesellschaft: Hamburg

Geschäftsführer: Paul Manicle, Halimah DeLaine Prado


[PATCH 3/3] string-list.h: move documentation from Documentation/api/ into header

2017-09-26 Thread Han-Wen Nienhuys
This mirrors commit 'bdfdaa497 ("strbuf.h: integrate api-strbuf.txt
documentation, 2015-01-16") which did the same for strbuf.h:

* API documentation uses /** */ to set it apart from other comments.

* Function names were stripped from the comments.

* Ordering of the header was adjusted to follow the one from the text
  file.

* Edited some existing comments from string-list.h for consistency.

Signed-off-by: Han-Wen Nienhuys 
---
 Documentation/technical/api-string-list.txt | 209 
 string-list.h   | 192 +
 2 files changed, 162 insertions(+), 239 deletions(-)
 delete mode 100644 Documentation/technical/api-string-list.txt

diff --git a/Documentation/technical/api-string-list.txt 
b/Documentation/technical/api-string-list.txt
deleted file mode 100644
index c08402b12..0
--- a/Documentation/technical/api-string-list.txt
+++ /dev/null
@@ -1,209 +0,0 @@
-string-list API
-===
-
-The string_list API offers a data structure and functions to handle
-sorted and unsorted string lists.  A "sorted" list is one whose
-entries are sorted by string value in `strcmp()` order.
-
-The 'string_list' struct used to be called 'path_list', but was renamed
-because it is not specific to paths.
-
-The caller:
-
-. Allocates and clears a `struct string_list` variable.
-
-. Initializes the members. You might want to set the flag `strdup_strings`
-  if the strings should be strdup()ed. For example, this is necessary
-  when you add something like git_path("..."), since that function returns
-  a static buffer that will change with the next call to git_path().
-+
-If you need something advanced, you can manually malloc() the `items`
-member (you need this if you add things later) and you should set the
-`nr` and `alloc` members in that case, too.
-
-. Adds new items to the list, using `string_list_append`,
-  `string_list_append_nodup`, `string_list_insert`,
-  `string_list_split`, and/or `string_list_split_in_place`.
-
-. Can check if a string is in the list using `string_list_has_string` or
-  `unsorted_string_list_has_string` and get it from the list using
-  `string_list_lookup` for sorted lists.
-
-. Can sort an unsorted list using `string_list_sort`.
-
-. Can remove duplicate items from a sorted list using
-  `string_list_remove_duplicates`.
-
-. Can remove individual items of an unsorted list using
-  `unsorted_string_list_delete_item`.
-
-. Can remove items not matching a criterion from a sorted or unsorted
-  list using `filter_string_list`, or remove empty strings using
-  `string_list_remove_empty_items`.
-
-. Finally it should free the list using `string_list_clear`.
-
-Example:
-
-
-struct string_list list = STRING_LIST_INIT_NODUP;
-int i;
-
-string_list_append(, "foo");
-string_list_append(, "bar");
-for (i = 0; i < list.nr; i++)
-   printf("%s\n", list.items[i].string)
-
-
-NOTE: It is more efficient to build an unsorted list and sort it
-afterwards, instead of building a sorted list (`O(n log n)` instead of
-`O(n^2)`).
-+
-However, if you use the list to check if a certain string was added
-already, you should not do that (using unsorted_string_list_has_string()),
-because the complexity would be quadratic again (but with a worse factor).
-
-Functions
--
-
-* General ones (works with sorted and unsorted lists as well)
-
-`string_list_init`::
-
-   Initialize the members of the string_list, set `strdup_strings`
-   member according to the value of the second parameter.
-
-`filter_string_list`::
-
-   Apply a function to each item in a list, retaining only the
-   items for which the function returns true.  If free_util is
-   true, call free() on the util members of any items that have
-   to be deleted.  Preserve the order of the items that are
-   retained.
-
-`string_list_remove_empty_items`::
-
-   Remove any empty strings from the list.  If free_util is true,
-   call free() on the util members of any items that have to be
-   deleted.  Preserve the order of the items that are retained.
-
-`print_string_list`::
-
-   Dump a string_list to stdout, useful mainly for debugging purposes. It
-   can take an optional header argument and it writes out the
-   string-pointer pairs of the string_list, each one in its own line.
-
-`string_list_clear`::
-
-   Free a string_list. The `string` pointer of the items will be freed in
-   case the `strdup_strings` member of the string_list is set. The second
-   parameter controls if the `util` pointer of the items should be freed
-   or not.
-
-* Functions for sorted lists only
-
-`string_list_has_string`::
-
-   Determine if the string_list has a given string or not.
-
-`string_list_insert`::
-
-   Insert a new element to the string_list. The returned pointer can be
-   handy if you want to write something to the `util` pointer of the
-   string_list_item 

[PATCH 2/3] read_gitfile_gently: clarify return value ownership.

2017-09-26 Thread Han-Wen Nienhuys
Signed-off-by: Han-Wen Nienhuys 
---
 setup.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/setup.c b/setup.c
index 42400fcc5..8c95841d5 100644
--- a/setup.c
+++ b/setup.c
@@ -541,7 +541,8 @@ void read_gitfile_error_die(int error_code, const char 
*path, const char *dir)
 
 /*
  * Try to read the location of the git directory from the .git file,
- * return path to git directory if found.
+ * return path to git directory if found. The return value comes from
+ * a shared buffer.
  *
  * On failure, if return_error_code is not NULL, return_error_code
  * will be set to an error code and NULL will be returned. If
-- 
2.14.1.821.g8fa685d3b7-goog



[PATCH 1/3] real_path: clarify return value ownership

2017-09-26 Thread Han-Wen Nienhuys
Signed-off-by: Han-Wen Nienhuys 
---
 abspath.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/abspath.c b/abspath.c
index 708aff8d4..985798532 100644
--- a/abspath.c
+++ b/abspath.c
@@ -202,6 +202,10 @@ char *strbuf_realpath(struct strbuf *resolved, const char 
*path,
return retval;
 }
 
+/*
+ * Resolve `path` into an absolute, cleaned-up path. The return value
+ * comes from a shared buffer.
+ */
 const char *real_path(const char *path)
 {
static struct strbuf realpath = STRBUF_INIT;
-- 
2.14.1.821.g8fa685d3b7-goog



[PATCH 0/3] Comment fixes

2017-09-26 Thread Han-Wen Nienhuys
follow more commit log conventions; verified it compiled (yay).

(should I send patches that are in 'pu' again as well?)

Han-Wen Nienhuys (3):
  real_path: clarify return value ownership
  read_gitfile_gently: clarify return value ownership.
  string-list.h: move documentation from Documentation/api/ into header

 Documentation/technical/api-string-list.txt | 209 
 abspath.c   |   4 +
 setup.c |   3 +-
 string-list.h   | 192 +
 4 files changed, 168 insertions(+), 240 deletions(-)
 delete mode 100644 Documentation/technical/api-string-list.txt

--
2.14.1.821.g8fa685d3b7-goog


[PATCH 1/1] fast-import: checkpoint: dump branches/tags/marks even if object_count==0

2017-09-26 Thread Eric Rannaud
The checkpoint command cycles packfiles if object_count != 0, a sensible
test or there would be no pack files to write. Since 820b931012, the
command also dumps branches, tags and marks, but still conditionally.
However, it is possible for a command stream to modify refs or create
marks without creating any new objects.

For example, reset a branch (and keep fast-import running):

$ git fast-import
reset refs/heads/master
from refs/heads/master^

checkpoint

but refs/heads/master remains unchanged.

Other example: a commit command that re-creates an object that already
exists in the object database.

The man page also states that checkpoint "updates the refs" and that
"placing a progress command immediately after a checkpoint will inform
the reader when the checkpoint has been completed and it can safely
access the refs that fast-import updated". This wasn't always true
without this patch.

This fix unconditionally calls dump_{branches,tags,marks}() for all
checkpoint commands. dump_branches() and dump_tags() are cheap to call
in the case of a no-op.

Add tests to t9300 that observe the (non-packfiles) effects of
checkpoint.

Signed-off-by: Eric Rannaud 
---
 fast-import.c  |   6 +--
 t/t9300-fast-import.sh | 117 +
 2 files changed, 120 insertions(+), 3 deletions(-)


Any comments on the testing strategy with a background fast-import?

To summarize:
- fast-import is started in the background with a command stream that
  ends with "checkpoint\nprogress checkpoint\n". fast-import keeps
  running after reaching the last command (we don't want fast-import to
  terminate).
- In the meantime, the test is waiting to read "progress checkpoint" in
  the output of fast-import, then it checks the testing conditions.
- Finally, the test ensures that fast-import is still running in the
  background (and thus that it has just observed the effect of
  checkpoint, and not the side effects of fast-import terminating).
- After 10 sec, no matter what, the background fast-import is sent
  "done" and terminates.

I tried to make sure that the test runs quickly and does not (typically) sleep.
Upon failure, the test may take up to 10 sec to fully terminate.

However, the test could break under (very heavy) load if fast-import is unable
to make progress in a reasonable amount of time (either "progress checkpoint"
is not read within 5 sec, or fast-import receives "done" before the testing
conditions are checked). Let me know if this is not acceptable.

Should these test cases be at least separated to a new t9304 to
circumscribe any such nuisance? Alternatively, a FIFO-based approach
could be considered.

(Note: added cases 2 and 4 pass without this patch, but 1 and 3 do not.)


diff --git a/fast-import.c b/fast-import.c
index 35bf671f12c4..d5e4cf0bad41 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -3189,10 +3189,10 @@ static void checkpoint(void)
checkpoint_requested = 0;
if (object_count) {
cycle_packfile();
-   dump_branches();
-   dump_tags();
-   dump_marks();
}
+   dump_branches();
+   dump_tags();
+   dump_marks();
 }
 
 static void parse_checkpoint(void)
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 67b8c50a5ab4..b410bf3a3a7a 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -3120,4 +3120,121 @@ test_expect_success 'U: validate root delete result' '
compare_diff_raw expect actual
 '
 
+###
+### series V (checkpoint)
+###
+
+# To make sure you're observing the side effects of checkpoint *before*
+# fast-import terminates (and thus writes out its state), check that the
+# fast-import process is still running using background_import_still_running
+# *after* evaluating the test conditions.
+background_import_until_checkpoint () {
+   options=$1
+   input_file=$2
+   ( cat $input_file; sleep 10; echo done ) | git fast-import $options 
>V.output &
+   echo $! >V.pid
+
+   # The loop will poll for approximately 5 seconds before giving up.
+   n=0
+   while ! test "$(cat V.output)" = "progress checkpoint"; do
+   if test $n -gt 55
+   then
+   echo >&2 "no progress checkpoint received"
+   exit 1
+   fi
+
+   # Try to avoid sleeping in the first iterations and poll
+   # aggressively.
+   if test $n -ge 50
+   then
+   sleep 1
+   fi
+
+   n=$(($n + 1))
+   done
+}
+
+background_import_still_running () {
+   if ! ps --pid "$(cat V.pid)"
+   then
+   echo >&2 "background fast-import terminated too early"
+   exit 1
+   fi
+   kill $(cat V.pid)
+}
+
+test_expect_success 'V: checkpoint updates refs after reset' '
+   cat >input <<-INPUT_END &&
+ 

Re: [PATCH v2 2/5] p0008-abbrev.sh: Test find_unique_abbrev() perf

2017-09-26 Thread Junio C Hamano
Derrick Stolee  writes:

> diff --git a/t/helper/test-abbrev.c b/t/helper/test-abbrev.c
> new file mode 100644
> index 0..6866896eb
> --- /dev/null
> +++ b/t/helper/test-abbrev.c
> @@ -0,0 +1,19 @@
> +#include "cache.h"
> +#include 

Same comment on  as [1/5] applies.

> +
> +int cmd_main(int ac, const char **av)
> +{
> + struct object_id oid;
> + char hex[GIT_MAX_HEXSZ + 2];

Why +2 (as opposed to +1)?

> + const char *end;
> +
> + setup_git_directory();
> +
> + while (fgets(hex, GIT_MAX_HEXSZ + 2, stdin)) {
> + hex[GIT_MAX_HEXSZ] = 0;
> + if (!parse_oid_hex(hex, , ))
> + find_unique_abbrev(oid.hash, MINIMUM_ABBREV);
> + }
> +
> + exit(0);
> +}
> diff --git a/t/perf/p0008-abbrev.sh b/t/perf/p0008-abbrev.sh
> new file mode 100755
> index 0..ba25e7824
> --- /dev/null
> +++ b/t/perf/p0008-abbrev.sh
> @@ -0,0 +1,22 @@
> +#!/bin/bash
> +
> +test_description='Test object disambiguation through abbreviations'
> +. ./perf-lib.sh
> +
> +test_perf_large_repo
> +
> +test-list-objects 10 | sort -R > objs.txt

I thought "sort randomly" was a GNUism.  Does it work across
platforms?  I think not.

> +
> +test_perf 'find_unique_abbrev() for existing objects' '
> + test-abbrev < objs.txt
> +'
> +
> +test-list-objects 10 --missing | sort -R > objs.txt
> +
> +test_perf 'find_unique_abbrev() for missing objects' '
> + test-abbrev < objs.txt
> +'
> +
> +rm objs.txt
> +
> +test_done


Re: [PATCH v2 1/5] test-list-objects: List a subset of object ids

2017-09-26 Thread Junio C Hamano
Derrick Stolee  writes:

> diff --git a/t/helper/test-list-objects.c b/t/helper/test-list-objects.c
> new file mode 100644
> index 0..83b1250fe
> --- /dev/null
> +++ b/t/helper/test-list-objects.c
> @@ -0,0 +1,85 @@
> +#include "cache.h"
> +#include "packfile.h"
> +#include 

If you include "cache.h", like the ordinary "git" program, you
should not have to include any system headers like stdio.h
yourself.  The whole point of "git-compat-util.h must be the first
to be included (indirectly including it via cache.h and friends is
also OK)" rule is to make sure that system headers are included at
the right place in the include sequence (e.g. defining feature
macros before including certain headers, etc.).

> +int cmd_main(int ac, const char **av)
> +{
> + unsigned int hash_delt = 0xFDB97531;
> + unsigned int hash_base = 0x01020304;
> +...
> + const int u_per_oid = sizeof(struct object_id) / sizeof(unsigned int);

Because the above will not work as you expect, unless your uint is
32-bit, you would probably want to make hash_delt and hash_base
explicitly uint32_t, I think.

Alternatively, because you assume that uint is at least 8-hex-digit
wide in the output loop, you can keep "unsigned int" above, but
change the divisor from sizeof(unsigned int) to a hardcoded 4.

Either would fix the issue.

> + c.total = 0;
> + if (ac > 1)
> + n = atoi(av[1]);

Otherwise n will stay 0 as initialized.  Not checking the result of
atoi() is probably permissible, as this is merely a test helper and
we can assume that the caller will not do anything silly, like
passing "-3" in av[1].  But it certainly would be a good discipline
to make sure av[1] is a reasonable number.

I am afraid that I may be misreading the code, but the following
code seems to require that the caller must know roughly how many
objects there are (so that a large-enough value can be passed to
av[1] to force c.select_mod to 1) when it wants to show everything,
as the "missing" loop will run 0 times showing nothing, and
"listing" case will divide by zero.

"Not passing anything will show everything" the proposed log message
promises is a better design than what the code actually seems to be
doing.

> +
> + if (ac > 2 && !strcmp(av[2], "--missing")) {
> + while (c.total++ < n) {

When n==0, this does nothing.  I am not sure what the desired
behaviour should be for "When no count is given, we show everything"
in this codepath, though.

Also, doesn't "test-list-objects 5 --missing" look very wrong?
Shouldn't it be more like "test-list-objects --missing 5"?

> + for (i = 0; i < u_per_oid; i++) {
> + printf("%08x", hash_base);
> + hash_base += hash_delt;
> + }
> + printf("\n");
> + }
> + } else {
> + setup_git_directory();
> +
> + for_each_loose_object(count_loose, , 0);
> + for_each_packed_object(count_packed, , 0);
> +
> + c.select_mod = 1 + c.total / n;

When n==0, this would divide by zero.
Perhaps

c.select_mod = n ? (1 + c.total / n) : 1;

or something to keep the promise of showing everything?

> + for_each_loose_object(output_loose, , 0);
> + for_each_packed_object(output_packed, , 0);
> + }
> +
> + exit(0);
> +}

Thanks.


Re: [PATCH] t7406: submodule..update command must not be run from .gitmodules

2017-09-26 Thread Junio C Hamano
Johannes Sixt  writes:

>> +test_when_finished "git -C super reset --hard HEAD^" &&
>> +
>> +write_script must_not_run.sh <<-EOF &&
>> +>$TEST_DIRECTORY/bad
>> +EOF
>
> I am pretty confident that this does not test what you intend to
> test. Notice that $TEST_DIRECTORY is expanded when the script is
> written. But that path contains a blank, and we have something like
> this in the test script:
>
>   #!/bin/sh
>   >/the/build/directory/t/trash directory.t7406/bad

Nicely explained.  Thanks.



[ANNOUNCE] Git v2.14.2

2017-09-26 Thread Junio C Hamano
The latest maintenance release Git v2.14.2 is now available at the
usual places.  In addition to the cvsserver related fixes that
appear in 2.10.5, 2.11.4, 2.12.5 and 2.13.6 announced separately,
this also includes various fixes that were merged already to the
'master' branch.

The tarballs are found at:

https://www.kernel.org/pub/software/scm/git/

The following public repositories all have a copy of the 'v2.14.2'
tag and the 'maint' branch that the tag points at:

  url = https://kernel.googlesource.com/pub/scm/git/git
  url = git://repo.or.cz/alt-git.git
  url = https://github.com/gitster/git



Git v2.14.2 Release Notes
=

Fixes since v2.14.1
---

 * Because recent Git for Windows do come with a real msgfmt, the
   build procedure for git-gui has been updated to use it instead of a
   hand-rolled substitute.

 * "%C(color name)" in the pretty print format always produced ANSI
   color escape codes, which was an early design mistake.  They now
   honor the configuration (e.g. "color.ui = never") and also tty-ness
   of the output medium.

 * The http.{sslkey,sslCert} configuration variables are to be
   interpreted as a pathname that honors "~[username]/" prefix, but
   weren't, which has been fixed.

 * Numerous bugs in walking of reflogs via "log -g" and friends have
   been fixed.

 * "git commit" when seeing an totally empty message said "you did not
   edit the message", which is clearly wrong.  The message has been
   corrected.

 * When a directory is not readable, "gitweb" fails to build the
   project list.  Work this around by skipping such a directory.

 * A recently added test for the "credential-cache" helper revealed
   that EOF detection done around the time the connection to the cache
   daemon is torn down were flaky.  This was fixed by reacting to
   ECONNRESET and behaving as if we got an EOF.

 * Some versions of GnuPG fail to kill gpg-agent it auto-spawned
   and such a left-over agent can interfere with a test.  Work it
   around by attempting to kill one before starting a new test.

 * "git log --tag=no-such-tag" showed log starting from HEAD, which
   has been fixed---it now shows nothing.

 * The "tag.pager" configuration variable was useless for those who
   actually create tag objects, as it interfered with the use of an
   editor.  A new mechanism has been introduced for commands to enable
   pager depending on what operation is being carried out to fix this,
   and then "git tag -l" is made to run pager by default.

 * "git push --recurse-submodules $there HEAD:$target" was not
   propagated down to the submodules, but now it is.

 * Commands like "git rebase" accepted the --rerere-autoupdate option
   from the command line, but did not always use it.  This has been
   fixed.

 * "git clone --recurse-submodules --quiet" did not pass the quiet
   option down to submodules.

 * "git am -s" has been taught that some input may end with a trailer
   block that is not Signed-off-by: and it should refrain from adding
   an extra blank line before adding a new sign-off in such a case.

 * "git svn" used with "--localtime" option did not compute the tz
   offset for the timestamp in question and instead always used the
   current time, which has been corrected.

 * Memory leaks in a few error codepaths have been plugged.

 * bash 4.4 or newer gave a warning on NUL byte in command
   substitution done in "git stash"; this has been squelched.

 * "git grep -L" and "git grep --quiet -L" reported different exit
   codes; this has been corrected.

 * When handshake with a subprocess filter notices that the process
   asked for an unknown capability, Git did not report what program
   the offending subprocess was running.  This has been corrected.

 * "git apply" that is used as a better "patch -p1" failed to apply a
   taken from a file with CRLF line endings to a file with CRLF line
   endings.  The root cause was because it misused convert_to_git()
   that tried to do "safe-crlf" processing by looking at the index
   entry at the same path, which is a nonsense---in that mode, "apply"
   is not working on the data in (or derived from) the index at all.
   This has been fixed.

 * Killing "git merge --edit" before the editor returns control left
   the repository in a state with MERGE_MSG but without MERGE_HEAD,
   which incorrectly tells the subsequent "git commit" that there was
   a squash merge in progress.  This has been fixed.

 * "git archive" did not work well with pathspecs and the
   export-ignore attribute.

 * "git cvsserver" no longer is invoked by "git shell" by default,
   as it is old and largely unmaintained.

 * Various Perl scripts did not use safe_pipe_capture() instead of
   backticks, leaving them susceptible to end-user input.  They have
   been corrected.

Also contains various documentation updates and code clean-ups.

Credits go to joernchen 

[ANNOUNCE] Git v2.13.6 and others

2017-09-26 Thread Junio C Hamano
Maintenance releases Git v2.10.5, v2.11.4, v2.12.5 and v2.13.6 are
now available at the usual places.  These are solely about hardening
"git shell" that is used on servers against an unsafe user input,
which "git cvsserver" copes with poorly.  A copy of the release notes
for v2.10.5 is attached at the end of the message, but the notes for
other releases listed above essentially say the same thing.

The tarballs are found at:

https://www.kernel.org/pub/software/scm/git/

The following public repositories all have a copy of the 'v2.10.5',
'v2.11.4', 'v2.12.5' and 'v2.13.6' tags and some of them have the
'maint-2.10', 'maint-2.11', 'maint-2.12' and 'maint-2.13' branches
that the tags point at:

  url = https://kernel.googlesource.com/pub/scm/git/git
  url = git://repo.or.cz/alt-git.git
  url = https://github.com/gitster/git

Note that the committed version of the release notes for these
versions all incorrectly mention "git daemon", where they should
have said "git shell".  It has been corrected in the attached copy,
but because the release engineering was done several days in advance
and the tags have already been shared with binary packagers and
others at the git-secur...@googlegroups.com mailing list, these
release tarballs are issued _with_ the known typo to avoid confusion
of having two release tags with different contents.  Sorry about that.



Git v2.10.5 Release Notes
=

Fixes since v2.10.4
---

 * "git cvsserver" no longer is invoked by "git shell" by default,
   as it is old and largely unmaintained.

 * Various Perl scripts did not use safe_pipe_capture() instead of
   backticks, leaving them susceptible to end-user input.  They have
   been corrected.

Credits go to joernchen  for finding the
unsafe constructs in "git cvsserver", and to Jeff King at GitHub for
finding and fixing instances of the same issue in other scripts.



Changes since v2.10.4 are as follows:

Jeff King (3):
  shell: drop git-cvsserver support by default
  archimport: use safe_pipe_capture for user input
  cvsimport: shell-quote variable used in backticks

Junio C Hamano (3):
  cvsserver: move safe_pipe_capture() to the main package
  cvsserver: use safe_pipe_capture for `constant commands` as well
  Git 2.10.5

joernchen (1):
  cvsserver: use safe_pipe_capture instead of backticks