date:20170907

Re: Strange behavior of git rev-list

2017-09-07 Thread Jeff King

On Thu, Sep 07, 2017 at 11:20:15AM +0200, Paweł Marczewski wrote:

> I have an interesting case. In my repository, there are two commits,
> 'one' and 'two'. 'one' is reachable from 'two' (as evidenced by 'git
> rev-list two | grep $(giv rev-parse one)'). However, the output of
> 'git rev-list two..one' is not empty, as is 'git rev-list ^two one'.
> 
> Here is the repository: https://github.com/pwmarcz/git-wtf/
> 
> It seems that the commit dates influence this behavior, because when I
> edit all the dates to be the same, the output of 'git rev-list
> two..one' is empty. Pruning seemingly irrelevant parents also makes it
> empty.
> 
> I verified the behavior on git versions 2.14.1, 2.11.0, and on the
> 'next' branch (2.14.1.586.g1a2e63c10).

Yes, this is known. The commit dates are used as a proxy for graph
height (or generation numbers, if you prefer) so that we avoid having to
walk all the way down to a merge base before producing any output. But
it can give the wrong answer in the face of clock skew.

We walk back five extra commits more than we need to in order to avoid
small runs of skewed commits, but obviously you can have arbitrary-sized
runs.

-Peff

Re: [RFC PATCH 0/2] Add named reference to latest push cert

2017-09-07 Thread Shikher Verma

On Wed, Sep 06, 2017 at 02:31:49PM -0700, Stefan Beller wrote: 
> On Wed, Sep 6, 2017 at 2:39 AM, Shikher Verma  wrote: 
> > Currently, git only stores push certificates if there is a receive hook 
> > present. This may violate the principle of least surprise (e.g., I 
> > pushed with --signed, and I don't see anything in upstream). 
> > Additionally, push certificates could be more versatile if they are not 
> > tightly bound to git hooks. Finally, it would be useful to verify the 
> > signed pushes at later points of time with ease. 
> > 
> > A named ref is added for ease of access/tooling around push 
> > certificates. If the last push was signed, ref/PUSH_CERT stores the 
> > ref of the latest push cert otherwise it is empty. 
> > 
> > Sending patches as RFC since the documentation would have to be 
> > updated and git gc might have to be patched to not garbage collect 
> > the latest push certificate. 
> > 
> > This patch applies on master (3ec7d702a) 
> 
> What are performance implications for busy repositories at busy hosts? 
> (think kernel.org / github) They may want to disable this new feature 
> for performance reasons or because they don't want to clutter the 
> object store. So at least a config option to turn it off would be useful. 

Any typical git push would write several objects to disk, this patch 
would only add one more object per push so I think the performance 
penalty is not that high. But I agree that we can have a config to turn 
it off. 
> On the ref to store the push certs: 
> (a) Currently the ref points at the blob, I wonder if we'd rather want to 
> point at a commit? (Then we can build up a history of 
> push certs, instead of relying on the reflog to show all 
> push certs. Also the gc issue might be easier to solve using this) 

I am not sure how that would work. The ref points at the blob of push 
certificate. Since each push can update multiple refs, each push 
certificate can point to mutiple commits (tip of the updated refs). 
Also if the named ref points at the commit then how will we get the 
corresponding push certificate? 

I did think about keeping a history of push certificates but the problem 
is new pushes can delete refs and commits which were pointed to by 
previous push certificates. This makes it really difficult to decide 
which push certificates to keep and which to gc. Also this history would 
be different for different clones of the same repo. Since push 
certificate are only meta data of the git workflow I think its best to 
just keep the latest push certificate and gc the old ones. People can 
use the recieve hook if they want to do advance things like logging a 
history of push certificates. I think git should provide a builtin 
solution for the simple case. 

Another motivation to decouple push certificates from hooks was that 
later we could store a map of refs to the latest push cert which 
updated the ref. And serve the corresponding push cert whenever someone 
does `git pull --signed important-ref`. Effectively removing trust from 
the server by preventing tampering with refs. This could really help 
the Github generation developers like me.
> (b) When going with (a), we might want to change the name. Most 
> refs are 3 directories deep: 
> refs/heads/ 
> refs/pr/ # at github IIUC 
> refs/changes/ # Gerrit 
> refs/meta/config # Gerrit to e.g. configure ACLs of the repo 
> "refs" indicates it is a ref, whereas the second part can be seen 
> as a "namespace". Currently Git only uses the "heads" and "tags" 
> namespace, "meta" is used by more than just Gerrit, so maybe it is 
> not wise to use "refs/meta/push_cert", but go with refs/gitmeta/pushcert 
> instead?

Re: Strange behavior of git rev-list

2017-09-07 Thread Jeff King

On Thu, Sep 07, 2017 at 11:50:25AM +0200, Paweł Marczewski wrote:

> Thanks. Any plans to fix that? Or is there a way to turn off this heuristic?

I don't think there's a way to turn it off for `rev-list`. Merge-base
computations are more careful, so you could determine the correct
endpoints that way. But when you fed them to `rev-list`, it would hit
the same run of skewed commits.

We've discussed storing true generation numbers in the past, which would
make this optimization more robust, as well as allow us to speed up many
other traversals (e.g., the "tag --contains"). It's something I'd like
to revisit, but it's not at the top of the pile.

In the meantime, it would probably be possible to write a patch that
conditionally disables the optimization. But it would mean all of your
rev-lists run a _lot_ slower.

-Peff

Re: Git in Outreachy round 15?

2017-09-07 Thread Jeff King

On Tue, Sep 05, 2017 at 11:06:34PM +0200, Christian Couder wrote:

> On Sat, Sep 2, 2017 at 12:30 AM, Jeff King  wrote:
> >
> > The big questions on whether we can make this happen are:
> >
> >   1. Do we have mentor interest?
> >
> >  I'm willing to mentor, but I'd like to hear whether other people
> >  are interested, as well. Either way I can take responsibility for
> >  the administrative bits.
> 
> I am willing to co-mentor, but I prefer not to take care of administrative 
> bits.

Thanks. I've listed you as a mentor on the landing page at:

  https://git.github.io/Outreachy-15/

The instructions for mentors can be found at:

  https://www.outreachy.org/mentor/

I've put a few projects on the landing page, but feel free to modify it
as you see fit.

> Is there something like the GSoC Mentor Summit? I think this is an
> interesting part of the GSoC and it could help motivate mentors if
> there was something like that.

No, I don't think that there is (though I'm pretty sure that there are
some sessions about Outreachy at the Mentor Summit).

-Peff

Strange behavior of git rev-list

2017-09-07 Thread Paweł Marczewski

Hi,
I have an interesting case. In my repository, there are two commits,
'one' and 'two'. 'one' is reachable from 'two' (as evidenced by 'git
rev-list two | grep $(giv rev-parse one)'). However, the output of
'git rev-list two..one' is not empty, as is 'git rev-list ^two one'.

Here is the repository: https://github.com/pwmarcz/git-wtf/

It seems that the commit dates influence this behavior, because when I
edit all the dates to be the same, the output of 'git rev-list
two..one' is empty. Pruning seemingly irrelevant parents also makes it
empty.

I verified the behavior on git versions 2.14.1, 2.11.0, and on the
'next' branch (2.14.1.586.g1a2e63c10).

Paweł Marczewski

Re: [RFC PATCH 0/2] Add named reference to latest push cert

2017-09-07 Thread Shikher Verma

On Thu, Sep 07, 2017 at 09:55:25AM +0900, Junio C Hamano wrote:
> Stefan Beller  writes:
> 
> > On the ref to store the push certs:
> > (a) Currently the ref points at the blob, I wonder if we'd rather want to
> > point at a commit? (Then we can build up a history of
> > push certs, instead of relying on the reflog to show all
> > push certs. Also the gc issue might be easier to solve using this)
> > (b) When going with (a), we might want to change the name. Most
> > refs are 3 directories deep:
> >   refs/heads/
> >   refs/pr/ # at github IIUC
> >   refs/changes/ # Gerrit
> >   refs/meta/config # Gerrit to e.g. configure ACLs of the repo
> > "refs" indicates it is a ref, whereas the second part can be seen
> > as a "namespace". Currently Git only uses the "heads" and "tags"
> > namespace, "meta" is used by more than just Gerrit, so maybe it is
> > not wise to use "refs/meta/push_cert", but go with refs/gitmeta/pushcert
> > instead?
> 
> You also need to worry about concurrent pushes.  The resulting
> "history" may not have to be sequencial, but two pushes that affect
> the same ref must be serialized in the resulting push-cert store.

Oh I see. I guess concurrency would be an issue. How does recieve-pack
handle concurrent pushes? Is there a lock that I could use to decide 
if named push cert ref has to be updated or not?

> The original design deliberately punts all the complexity to hook
> exactly because we do not want to have a half-baked "built-in"
> implementation that would only get in the way of those who wants to
> do high-performance servers.  It is very likely that they want to
> have a long-running daemon that listens to a port or a named pipe,
> where the only thing the hook would do is to write the value of
> GIT_PUSH_CERT to that daemon process, which acts as a serialization
> point, can read from the object store that is used as a short-term
> temporary area, and write the push cert to a more permanent store.
> 
> Having said all that, I am sympathetic to a wish to have an
> easy-to-enable-without-thinking example that is not so involved
> (e.g. no portability concern, does not have to perform very well but
> must be correct).  If Shikher wants to add one, I think the right
> approach to do so would be to add and ship a sample hook.
> 

This patch would only add one more object to write per push so I 
think the performance penalty is not that high. We can have a config
to turn it off so that it does not get in the way of those who want 
to do high-performance servers.

People can use the recieve hook for advance use cases but I think git
should provide a builtin solution for the simple case. The reason I
favour a named ref over a sample hook is because decouping push
certificate from hook opens up more possibilities like we could store
a map of refs to the latest push cert which updated the ref. And 
serve the corresponding push cert whenever someone does 
`git pull --signed important-ref`. Effectively removing trust from 
the server by preventing tampering with refs. This could really help 
the Github generation developers like me. What do you think?

> Thanks.

Re: Strange behavior of git rev-list

2017-09-07 Thread Paweł Marczewski

Thanks. Any plans to fix that? Or is there a way to turn off this heuristic?

On 7 September 2017 at 11:47, Jeff King  wrote:
> On Thu, Sep 07, 2017 at 11:20:15AM +0200, Paweł Marczewski wrote:
>
>> I have an interesting case. In my repository, there are two commits,
>> 'one' and 'two'. 'one' is reachable from 'two' (as evidenced by 'git
>> rev-list two | grep $(giv rev-parse one)'). However, the output of
>> 'git rev-list two..one' is not empty, as is 'git rev-list ^two one'.
>>
>> Here is the repository: https://github.com/pwmarcz/git-wtf/
>>
>> It seems that the commit dates influence this behavior, because when I
>> edit all the dates to be the same, the output of 'git rev-list
>> two..one' is empty. Pruning seemingly irrelevant parents also makes it
>> empty.
>>
>> I verified the behavior on git versions 2.14.1, 2.11.0, and on the
>> 'next' branch (2.14.1.586.g1a2e63c10).
>
> Yes, this is known. The commit dates are used as a proxy for graph
> height (or generation numbers, if you prefer) so that we avoid having to
> walk all the way down to a merge base before producing any output. But
> it can give the wrong answer in the face of clock skew.
>
> We walk back five extra commits more than we need to in order to avoid
> small runs of skewed commits, but obviously you can have arbitrary-sized
> runs.
>
> -Peff

Re: [PATCH 10/10] add UNLEAK annotation for reducing leak false positives

2017-09-07 Thread Jeff King

On Tue, Sep 05, 2017 at 03:05:12PM -0700, Stefan Beller wrote:

> On Tue, Sep 5, 2017 at 6:05 AM, Jeff King  wrote:
> 
> >   int main(void)
> 
> nit of the day:
>   s/void/int argc, char *argv/ or in case we do not
>   want to emphasize the argument list s/void//
>   as that adds no uninteresting things.

That really is a nit. I chose not to provide argv because it's longer
than "void" and I wasn't going to use the arguments. And I chose not to
use an empty argument list because it violates our style (as well as
arguably the C standard, though it leaves room for implementations to
take other forms of main).

> > In other words, you can do:
> >
> >   int main(void)
> >   {
> > char *p = some_function();
> > printf("%s", p);
> > UNLEAK(p);
> > return 0;
> >   }
> >
> > to annotate "p" and suppress the leak report.
> 
> This sounds really cool so far.
> 
> After having a sneak peak at the implementation
> it is O(1) in runtime for each added element, and the
> space complexity is O(well).

I'm not sure if your "well" is "this does well" or "well, it could be
quite a lot". :)

It certainly has the potential to grow the heap without bound (since
after all, it's whole point is to make a giant list of variables that
are going out of scope). But in practice we'd sprinkle this over a
handful of variables just before program exit (and remember that it's
copying only what's on the stack already; so pointers get copied, not
whole heap-allocated blocks).

Plus it does nothing at all when not compiled with leak-checking. So I'm
not too worried about the extra memory usage or performance.

> >   1. It can be compiled conditionally. There's no need in
> >  normal runs to do this free(), and it just wastes time.
> >  By using a macro, we can get the benefit for leak-check
> >  builds with zero cost for normal builds (this patch
> >  uses a compile-time check, though we could clearly also
> >  make it a run-time check at very low cost).
> >
> >  Of course one could also hide free() behind a macro, so
> >  this is really just arguing for having UNLEAK(), not
> >  for its particular implementation.
> 
> This is only a real argument in combination with (2), or in other
> words you seem to hint at situations like these:

Well, the numbered list was meant to be a set of arguments, each of
which contributes to the overall conclusion. :) I agree that (1) is the
weakest. Since both you and Martin seemed to get hung up on it, I'll
re-organize it a bit for the re-roll.

>   5. It's not just about worrying if we can call UNLEAK
>   once (in 4), but we also do not have to worry about
>   calling it twice, or recursively. (This argument can be bad
>   for cargo cult programmers, but we don't have these ;-)

True. I didn't come across that case in any of the ones I converted. As
a more general rule, UNLEAK() doesn't access any pointed-to memory at
all. So it's fine with already-freed or even uninitialized memory (which
of course is technically wrong according to the standard, but in
practice would be fine, as we'd copy garbage that does not match a heap
block).

> > +#ifdef SUPPRESS_ANNOTATED_LEAKS
> > +extern void unleak_memory(const void *ptr, size_t len);
> > +#define UNLEAK(var) unleak_memory(&(var), sizeof(var));
> 
> As always with macros we have to be careful about its arguments.
> 
>   UNLEAK(a++)
>   UNLEAK(baz())
> 
> won't work as intended.

Yes, I intended this to be used only for actual variables. I couldn't
think of a way to enforce that at compile time with some kind of
BUILD_ASSERT (even requiring an lvalue isn't quite strict enough).

-Peff

Re: [PATCH 10/10] add UNLEAK annotation for reducing leak false positives

2017-09-07 Thread Jeff King

On Wed, Sep 06, 2017 at 07:16:00PM +0200, Martin Ågren wrote:

> > diff --git a/builtin/commit.c b/builtin/commit.c
> > index b3b04f5dd3..de775d906c 100644
> > --- a/builtin/commit.c
> > +++ b/builtin/commit.c
> > @@ -1819,5 +1819,6 @@ int cmd_commit(int argc, const char **argv, const 
> > char *prefix)
> > print_summary(prefix, , !current_head);
> >
> > strbuf_release();
> > +   UNLEAK(sb);
> > return 0;
> >  }
> 
> These are both strbufs, so this ends up being a bit inconsistent. What
> would be the ideal end state for these two and all other such
> structures? My guess is "always UNLEAK", as opposed to carefully judging
> whether foo_release() would/could add any significant overhead.
> 
> In other words, it would be ok/wanted with changes such as "let's UNLEAK
> bar, because ..., and while at it, convert the existing foo_release to
> UNLEAK for consistency" (or per policy, for smaller binary, whatever).
> Or "if it ain't broken, don't fix it"? Did you think about this, or was
> it more a random choice?

To be honest, I didn't really think that deeply about it. I had a hammer
in my hand, and LSAN kept showing me nails to pound.

I agree that these two strbufs should probably be treated the same.

In general, I think I prefer using UNLEAK() because it's hard to get it
wrong (i.e., you don't have to care about double-frees or uninitialized
pointers). For strbufs, though, that's less of an issue because they are
always maintained in a consistent state.

As an aside, I'm pretty sure that "err" can never have been allocated
here, and this release is always a noop. It's filled in only when we get
an error from the ref update, which also causes us to die(). But in
general I'd prefer the code that causes readers to think the least
(i.e., just calling free or UNLEAK here rather than forcing the reader
to figure out whether it's possible to leak).

-Peff

Re: [PATCH] name-rev: change ULONG_MAX to TIME_MAX

2017-09-07 Thread Michael J Gruber

Jeff King venit, vidit, dixit 06.09.2017 15:35:
> On Wed, Sep 06, 2017 at 01:59:31PM +0200, Michael J Gruber wrote:
> 
>> BTW, there's more fallout from those name-rev changes: In connection
>> with that other thread about surprising describe results for emacs.git I
>> noticed that I can easily get "git name-rev --stdin" to segfault there.
>> As easy as
>>
>> echo bc5d96a0b2a1dccf7c459e40d21b54c977f4 | git name-rev --stdin
>>
>> for example.
>>
>> That's unfortunate for the use-case of name-rev to amend git log output.
>>
>> The reason seems to be that with "--stdin" or "--all", "name-rev" walks
>> and names all commits before beginning to use that those names for even
>> a single commit as above.
>>
>> That segfault bisects to the logic changing commit in
>> jc/name-rev-lw-tag, but I think the changed logic simply leads to more
>> xmallocs() the segfault sooner now. Or something that I dind't spot even
>> after a few hours.
> 
> The segfault seems to be due to running out of stack space. The problem
> is that name_rev() is recursive over the history graph.  That topic
> added a parameter to the function, which increased the memory used for
> each level of the recursion. But the fundamental problem has always been
> there. The right solution is to switch to iteration (with our own stack
> structure if necessary).
> 
> We had similar problems with the recursive --contains traversal in tag,
> and ended up with cbc60b6720 (git tag --contains: avoid stack overflow,
> 2014-04-24).

Cool, thanks for the pointer. ulimit -s is a great way to test this.

Michael

Re: [PATCH] match_name_as_path: Pass WM_CASEFOLD to wildmatch

2017-09-07 Thread Jeff King

On Thu, Sep 07, 2017 at 04:51:36PM +0300, Aleksandr Makarov wrote:

> ---

Your commit message leaves me with a lot of questions. Let me see if I
can answer them. :)

Can this code path be triggered? We need to set filter->ignore_case,
but also filter->match_as_patch. So "git branch --ignore-case" works,
but "git for-each-ref --ignore-case" is totally broken.

When did this break? It looks like ignore-case never worked with
for-each-ref. This should have been part of Duy's (+cc) 3bb16a8bf2 (tag,
branch, for-each-ref: add --ignore-case for sorting and filtering,
2016-12-04). It got the match_pattern() one right, but missed this tweak
in match_as_path.

How come we didn't notice? Because 3bb16a8bf2 has tests for branch and
tag, but not for-each-ref. We should probably add one as part of this
fix.

And finally, your patch needs a sign-off to be included in the project
(see Documentation/SubmittingPatches).

> diff --git a/ref-filter.c b/ref-filter.c
> index bc591f4..3746628 100644
> --- a/ref-filter.c
> +++ b/ref-filter.c
> @@ -1663,7 +1663,7 @@ static int match_name_as_path(const struct ref_filter 
> *filter, const char *refna
>refname[plen] == '/' ||
>p[plen-1] == '/'))
>   return 1;
> - if (!wildmatch(p, refname, WM_PATHNAME))
> + if (!wildmatch(p, refname, flags))

The patch itself looks correct to me. Thanks.

-Peff

[PATCH 4/4] t6120: test describe and name-rev with deep repos

2017-09-07 Thread Michael J Gruber

Depending on the implementation of walks, limitted stack size may lead
to problems (for recursion).

Test name-rev and describe with deep repos and limitted stack size and
mark the former with known failure.

We add these tests (which add gazillions of commits) last so as to keep
the runtime of other subtests the same.

Signed-off-by: Michael J Gruber 
---
 t/t6120-describe.sh | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 1997ccde56..dd6dd9df9b 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -279,4 +279,35 @@ test_expect_success 'describe ignoring a borken submodule' 
'
grep broken out
 '
 
+# we require ulimit, this excludes Windows
+test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '
+   i=1 &&
+   while test $i -lt 8000
+   do
+   echo "commit refs/heads/master
+committer A U Thor  $((10 + $i * 100)) +0200
+data actual &&
+   test_cmp expect actual &&
+   run_with_limited_stack git name-rev HEAD~4000 >actual &&
+   test_cmp expect actual
+'
+
+test_expect_success ULIMIT_STACK_SIZE 'describe works in a deep repo' '
+   git tag -f far-far-away HEAD~7999 &&
+   echo "far-far-away" >expect &&
+   git describe --tags --abbrev=0 HEAD~4000 >actual &&
+   test_cmp expect actual &&
+   run_with_limited_stack git describe --tags --abbrev=0 HEAD~4000 >actual 
&&
+   test_cmp expect actual
+'
+
 test_done
-- 
2.14.1.603.gf58147c36e

[PATCH 1/4] t7004: move limited stack prereq to test-lib

2017-09-07 Thread Michael J Gruber

The lazy prerequisite  ULIMIT_STACK_SIZE is used only in t7004 so far.

Move it to test-lib.sh so that it can be used in other tests (which it will
be in a follow-up commit).

Signed-off-by: Michael J Gruber 
---
 t/t7004-tag.sh | 6 --
 t/test-lib.sh  | 6 ++
 2 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/t/t7004-tag.sh b/t/t7004-tag.sh
index dbcd6f623c..5bf5ace56b 100755
--- a/t/t7004-tag.sh
+++ b/t/t7004-tag.sh
@@ -1863,12 +1863,6 @@ test_expect_success 'version sort with very long 
prerelease suffix' '
git tag -l --sort=version:refname
 '
 
-run_with_limited_stack () {
-   (ulimit -s 128 && "$@")
-}
-
-test_lazy_prereq ULIMIT_STACK_SIZE 'run_with_limited_stack true'
-
 # we require ulimit, this excludes Windows
 test_expect_success ULIMIT_STACK_SIZE '--contains and --no-contains work in a 
deep repo' '
>expect &&
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 5fbd8d4a90..f22c1b260a 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -1167,6 +1167,12 @@ run_with_limited_cmdline () {
 
 test_lazy_prereq CMDLINE_LIMIT 'run_with_limited_cmdline true'
 
+run_with_limited_stack () {
+   (ulimit -s 128 && "$@")
+}
+
+test_lazy_prereq ULIMIT_STACK_SIZE 'run_with_limited_stack true'
+
 build_option () {
git version --build-options |
sed -ne "s/^$1: //p"
-- 
2.14.1.603.gf58147c36e

[PATCH 2/4] t6120: test name-rev --all and --stdin

2017-09-07 Thread Michael J Gruber

name-rev is used in a few tests, but tested only in t6120 along with
describe so far.

Add tests for name-rev with --all and --stdin.

Signed-off-by: Michael J Gruber 
---
 t/t6120-describe.sh | 25 +
 1 file changed, 25 insertions(+)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index aa74eb8f0d..7c5728ebd5 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -198,6 +198,31 @@ test_expect_success 'name-rev with exact tags' '
test_cmp expect actual
 '
 
+test_expect_success 'name-rev --all' '
+   >expect.unsorted &&
+   for rev in $(git rev-list --all)
+   do
+   git name-rev $rev >>expect.unsorted
+   done &&
+   sort expect &&
+   git name-rev --all >actual.unsorted &&
+   sort actual &&
+   test_cmp expect actual
+'
+
+test_expect_success 'name-rev --stdin' '
+   >expect.unsorted &&
+   for rev in $(git rev-list --all)
+   do
+   name=$(git name-rev --name-only $rev) &&
+   echo "$rev ($name)" >>expect.unsorted
+   done &&
+   sort expect &&
+   git rev-list --all | git name-rev --stdin >actual.unsorted &&
+   sort actual &&
+   test_cmp expect actual
+'
+
 test_expect_success 'describe --contains with the exact tags' '
echo "A^0" >expect &&
tag_object=$(git rev-parse refs/tags/A) &&
-- 
2.14.1.603.gf58147c36e

[PATCH 3/4] t6120: clean up state after breaking repo

2017-09-07 Thread Michael J Gruber

t6120 breaks the repo state intentionally in the last tests.

Clean up the breakage afterwards (and before adding more tests).

Signed-off-by: Michael J Gruber 
---
 t/t6120-describe.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 7c5728ebd5..1997ccde56 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -275,6 +275,7 @@ test_expect_success 'describe chokes on severely broken 
submodules' '
 '
 test_expect_success 'describe ignoring a borken submodule' '
git describe --broken >out &&
+   test_when_finished "mv .git/modules/sub_moved .git/modules/sub1" &&
grep broken out
 '
 
-- 
2.14.1.603.gf58147c36e

Git diff --no-index and file descriptor inputs

2017-09-07 Thread Jason Pyeron


I was hoping to leverage --word-diff-regex, but the --no-index option does
not seem to work with <(...) notation.

I am I doing something wrong or is this a bug?

-Jason (reply to list please)

root@blackfat /projects
$ git diff --no-index -- <(xmllint --format
HRUniversalServices/pairs/src/esb/wsdl/pairs-american-dependents.wsdl)
<(xmllint --format
/projects/HRUniversalServices.368879ef/modules/pairs/src/esb/wsdl/pairs-amer
ican-employees.wsdl)

root@blackfat /projects
$ echo git diff --no-index -- <(xmllint --format
HRUniversalServices/pairs/src/esb/wsdl/pairs-american-dependents.wsdl)
<(xmllint --format
/projects/HRUniversalServices.368879ef/modules/pairs/src/esb/wsdl/pairs-amer
ican-employees.wsdl)
git diff --no-index -- /dev/fd/63 /dev/fd/62

root@blackfat /projects
$ git diff --no-index --
HRUniversalServices/pairs/src/esb/wsdl/pairs-american-dependents.wsdl
/projects/HRUniversalServices.368879ef/modules/pairs/src/esb/wsdl/pairs-amer
ican-employees.wsdl | wc -l
92

root@blackfat /projects
$ diff -u <(xmllint --format
HRUniversalServices/pairs/src/esb/wsdl/pairs-american-dependents.wsdl)
<(xmllint --format
/projects/HRUniversalServices.368879ef/modules/pairs/src/esb/wsdl/pairs-amer
ican-employees.wsdl) | wc -l
82

root@blackfat /projects
$ git --version
git version 2.13.2

root@blackfat /projects
$ uname -a
CYGWIN_NT-10.0 blackfat 2.8.1(0.312/5/3) 2017-07-03 14:11 x86_64 Cygwin

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-   -
- Jason Pyeron  PD Inc. http://www.pdinc.us -
- Principal Consultant  10 West 24th Street #100-
- +1 (443) 269-1555 x333Baltimore, Maryland 21218   -
-   -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

[PATCH] commit-tree: don't append a newline with -F

2017-09-07 Thread Ross Kabus

This change makes it such that commit-tree -F never appends a newline
character to the supplied commit message (either from file or stdin).

Previously, commit-tree -F would always append a newline character to
the text brought in from file or stdin. This has caused confusion in a
number of ways:
  - This is a plumbing command and it is generally expected not to do
text cleanup or other niceties.
  - stdin piping with "-F -" appends a newline but stdin piping without
-F does not append a newline (inconsistent).
  - git-commit does not specifically append a newline to the "-F"
input. The issue is somewhat muddled by the fact that git-commit
does pass the message through its --cleanup option, which may add
such a newline. But for commit-tree to match "commit --cleanup=verbatim",
we should not do so here.

Signed-off-by: Ross Kabus 
---
 builtin/commit-tree.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
index 19e898fa4..2177251e2 100644
--- a/builtin/commit-tree.c
+++ b/builtin/commit-tree.c
@@ -102,7 +102,6 @@ int cmd_commit_tree(int argc, const char **argv, const char 
*prefix)
if (fd && close(fd))
die_errno("git commit-tree: failed to close 
'%s'",
  argv[i]);
-   strbuf_complete_line();
continue;
}
 
-- 
2.13.1.windows.2

[PATCH 0/4] Test name-rev with small stack

2017-09-07 Thread Michael J Gruber

name-rev segfaults for me in emacs.git with the typical 8102 stack size.
The reason is the recursive walk that name-rev uses.

This series adds a test to mark this as known failure, after some
clean-ups.

Michael J Gruber (4):
  t7004: move limited stack prereq to test-lib
  t6120: test name-rev --all and --stdin
  t6120: clean up state after breaking repo
  t6120: test describe and name-rev with deep repos

 t/t6120-describe.sh | 57 +
 t/t7004-tag.sh  |  6 --
 t/test-lib.sh   |  6 ++
 3 files changed, 63 insertions(+), 6 deletions(-)

-- 
2.14.1.603.gf58147c36e

Re: "git shortlog -sn --follow -- " counts all commits to entire repo

2017-09-07 Thread Stefan Beller

On Thu, Sep 7, 2017 at 11:13 AM, Валентин  wrote:
> Hi,
>
> I'll be short as shortlog :)
>
> "git shortlog -sn -- "
> counts all commits to the specified path, as expected.
>
> "git shortlog -sn --follow -- "
> counts all commits to the entire repo, which looks like a bug.
>
> "--follow" switch is not listed on
> https://git-scm.com/docs/git-shortlog so maybe it's not supported. In
> this case I would expect error message.
>
> Tried the following versions:
> "git version 2.14.1.windows.1" on Windows 7
> "git version 2.7.4" on Ubuntu 16.04

The shortlog takes most (all?) options that git-log
does, e.g. in git.git:

$ git shortlog -sne --author=Peter

74  Peter Krefting 
43  H. Peter Anvin 
23  Peter Eriksen 
 7  Peter Hagervall 
 6  Peter Collingbourne 
 4  Peter Baumann 
 3  Peter Oberndorfer 
 3  Peter Valdemar Mørch 
 2  Peter Colberg 
 2  Peter Eisentraut 
 2  Peter Harris 
 2  Peter van der Does 
 1  Peter Hutterer 
 1  Peter Law 
 1  Peter Stuge 
 1  Peter Wu 
 1  Peter van Zetten 

Maybe we'd to state in the man page explicitly that
shortlog is part of the log family hence taking all
log related options.

Thanks,
Stefan

RE: Git diff --no-index and file descriptor inputs

2017-09-07 Thread Jason Pyeron

> -Original Message-
> From: Martin Ågren
> Sent: Thursday, September 7, 2017 1:48 PM
> 
> On 7 September 2017 at 18:00, Jason Pyeron  wrote:
> >
> > I was hoping to leverage --word-diff-regex, but the 
> --no-index option 
> > does not seem to work with <(...) notation.
> >
> > I am I doing something wrong or is this a bug?
> 
> There were some patches floating around half a year ago, I 
> don't know what happened to them.
> 

| From: Dennis Kaarsemaker
| Sent: Friday, January 13, 2017 5:20 AM
| Subject: [PATCH v2 0/2] diff --no-index: support symlinks and pipes

> https://public-inbox.org/git/20170324213110.4331-1-den...@kaarsemaker.net/

I see, it goes back to 2016...

| From: Dennis Kaarsemaker
| Subject: [RFC/PATCH 0/2] git diff <(command1) <(command2)
| Date: Fri, 11 Nov 2016 21:19:56 +0100

https://public-inbox.org/git/2016201958.2175-1-den...@kaarsemaker.net/

I will read up on those threads weekend, then try to apply the patches and
see what happens.

Thanks for the google fu help.

-Jason 

--
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-
-   -
- Jason Pyeron  PD Inc. http://www.pdinc.us -
- Principal Consultant  10 West 24th Street #100-
- +1 (443) 269-1555 x333Baltimore, Maryland 21218   -
-   -
-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-

[PATCH v3 3/3] merge-recursive: change current file dir string_lists to hashmap

2017-09-07 Thread Kevin Willford

The code was using two string_lists, one for the directories and
one for the files.  The code never checks the lists independently
so we should be able to only use one list.  The string_list also
is a O(log n) for lookup and insertion.  Switching this to use a
hashmap will give O(1) which will save some time when there are
millions of paths that will be checked.

Signed-off-by: Kevin Willford 
---
 merge-recursive.c | 56 ---
 merge-recursive.h |  3 +--
 2 files changed, 46 insertions(+), 13 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index d47f40ea81..35af3761ba 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -24,6 +24,31 @@
 #include "dir.h"
 #include "submodule.h"
 
+struct path_hashmap_entry {
+   struct hashmap_entry e;
+   char path[FLEX_ARRAY];
+};
+
+static int path_hashmap_cmp(const void *cmp_data,
+   const void *entry,
+   const void *entry_or_key,
+   const void *keydata)
+{
+   const struct path_hashmap_entry *a = entry;
+   const struct path_hashmap_entry *b = entry_or_key;
+   const char *key = keydata;
+
+   if (ignore_case)
+   return strcasecmp(a->path, key ? key : b->path);
+   else
+   return strcmp(a->path, key ? key : b->path);
+}
+
+static unsigned int path_hash(const char *path)
+{
+   return ignore_case ? strihash(path) : strhash(path);
+}
+
 static void flush_output(struct merge_options *o)
 {
if (o->buffer_output < 2 && o->obuf.len) {
@@ -314,15 +339,15 @@ static int save_files_dirs(const unsigned char *sha1,
struct strbuf *base, const char *path,
unsigned int mode, int stage, void *context)
 {
+   struct path_hashmap_entry *entry;
int baselen = base->len;
struct merge_options *o = context;
 
strbuf_addstr(base, path);
 
-   if (S_ISDIR(mode))
-   string_list_insert(>current_directory_set, base->buf);
-   else
-   string_list_insert(>current_file_set, base->buf);
+   FLEX_ALLOC_MEM(entry, path, base->buf, base->len);
+   hashmap_entry_init(entry, path_hash(entry->path));
+   hashmap_add(>current_file_dir_set, entry);
 
strbuf_setlen(base, baselen);
return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
@@ -642,6 +667,7 @@ static void add_flattened_path(struct strbuf *out, const 
char *s)
 
 static char *unique_path(struct merge_options *o, const char *path, const char 
*branch)
 {
+   struct path_hashmap_entry *entry;
struct strbuf newpath = STRBUF_INIT;
int suffix = 0;
size_t base_len;
@@ -650,14 +676,16 @@ static char *unique_path(struct merge_options *o, const 
char *path, const char *
add_flattened_path(, branch);
 
base_len = newpath.len;
-   while (string_list_has_string(>current_file_set, newpath.buf) ||
-  string_list_has_string(>current_directory_set, newpath.buf) ||
+   while (hashmap_get_from_hash(>current_file_dir_set,
+path_hash(newpath.buf), newpath.buf) ||
   (!o->call_depth && file_exists(newpath.buf))) {
strbuf_setlen(, base_len);
strbuf_addf(, "_%d", suffix++);
}
 
-   string_list_insert(>current_file_set, newpath.buf);
+   FLEX_ALLOC_MEM(entry, path, newpath.buf, newpath.len);
+   hashmap_entry_init(entry, path_hash(entry->path));
+   hashmap_add(>current_file_dir_set, entry);
return strbuf_detach(, NULL);
 }
 
@@ -1941,8 +1969,14 @@ int merge_trees(struct merge_options *o,
if (unmerged_cache()) {
struct string_list *entries, *re_head, *re_merge;
int i;
-   string_list_clear(>current_file_set, 1);
-   string_list_clear(>current_directory_set, 1);
+   /*
+* Only need the hashmap while processing entries, so
+* initialize it here and free it when we are done running
+* through the entries. Keeping it in the merge_options as
+* opposed to decaring a local hashmap is for convenience
+* so that we don't have to pass it to around.
+*/
+   hashmap_init(>current_file_dir_set, path_hashmap_cmp, NULL, 
512);
get_files_dirs(o, head);
get_files_dirs(o, merge);
 
@@ -1978,6 +2012,8 @@ int merge_trees(struct merge_options *o,
string_list_clear(re_head, 0);
string_list_clear(entries, 1);
 
+   hashmap_free(>current_file_dir_set, 1);
+
free(re_merge);
free(re_head);
free(entries);
@@ -2179,8 +2215,6 @@ void init_merge_options(struct merge_options *o)
if (o->verbosity >= 5)
o->buffer_output = 0;
strbuf_init(>obuf, 0);
-

[PATCH v3 1/3] merge-recursive: fix memory leak

2017-09-07 Thread Kevin Willford

In merge_trees if process_renames or process_entry returns less
than zero, the method will just return and not free re_merge,
re_head, or entries.

This change cleans up the allocated variables before returning
to the caller.

Signed-off-by: Kevin Willford 
---
 merge-recursive.c | 12 +---
 1 file changed, 9 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 1494ffdb82..033d7cd406 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1956,7 +1956,7 @@ int merge_trees(struct merge_options *o,
re_merge = get_renames(o, merge, common, head, merge, entries);
clean = process_renames(o, re_head, re_merge);
if (clean < 0)
-   return clean;
+   goto cleanup;
for (i = entries->nr-1; 0 <= i; i--) {
const char *path = entries->items[i].string;
struct stage_data *e = entries->items[i].util;
@@ -1964,8 +1964,10 @@ int merge_trees(struct merge_options *o,
int ret = process_entry(o, path, e);
if (!ret)
clean = 0;
-   else if (ret < 0)
-   return ret;
+   else if (ret < 0) {
+   clean = ret;
+   goto cleanup;
+   }
}
}
for (i = 0; i < entries->nr; i++) {
@@ -1975,6 +1977,7 @@ int merge_trees(struct merge_options *o,
entries->items[i].string);
}
 
+cleanup:
string_list_clear(re_merge, 0);
string_list_clear(re_head, 0);
string_list_clear(entries, 1);
@@ -1982,6 +1985,9 @@ int merge_trees(struct merge_options *o,
free(re_merge);
free(re_head);
free(entries);
+
+   if (clean < 0)
+   return clean;
}
else
clean = 1;
-- 
2.14.1.329.gcdd497e120.dirty

[PATCH v3 2/3] merge-recursive: remove return value from get_files_dirs

2017-09-07 Thread Kevin Willford

The return value of the get_files_dirs call is never being used.
Looking at the history of the file and it was originally only
being used for debug output statements.  Also when
read_tree_recursive return value is non zero it is changed to
zero.  This leads me to believe that it doesn't matter if
read_tree_recursive gets an error.

Since the debug output has been removed and the caller isn't
checking the return value there is no reason to keep calculating
and returning a value.

Signed-off-by: Kevin Willford 
---
 merge-recursive.c | 8 ++--
 1 file changed, 2 insertions(+), 6 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 033d7cd406..d47f40ea81 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -328,15 +328,11 @@ static int save_files_dirs(const unsigned char *sha1,
return (S_ISDIR(mode) ? READ_TREE_RECURSIVE : 0);
 }
 
-static int get_files_dirs(struct merge_options *o, struct tree *tree)
+static void get_files_dirs(struct merge_options *o, struct tree *tree)
 {
-   int n;
struct pathspec match_all;
memset(_all, 0, sizeof(match_all));
-   if (read_tree_recursive(tree, "", 0, 0, _all, save_files_dirs, o))
-   return 0;
-   n = o->current_file_set.nr + o->current_directory_set.nr;
-   return n;
+   read_tree_recursive(tree, "", 0, 0, _all, save_files_dirs, o);
 }
 
 /*
-- 
2.14.1.329.gcdd497e120.dirty

[PATCH v3 0/3] merge-recursive: replace string_list with hashmap

2017-09-07 Thread Kevin Willford

Code was using two string_lists, one for the directories and
one for the files.  The code never checks the lists independently
so we should be able to only use one list.  The string_list also
is a O(log n) for lookup and insertion.  Switching this to use a
hashmap will give O(1) which will save some time when there are
millions of paths that will be checked.

Also cleaned up a memory leak and method where the return was not
being used.

Changes since last version:
1. Removed the function pointers and just check the ignore_case in the
compare and hash methods.
2. Added a comment about the hashmap and why it is getting initialized
and freed but not a local.
3. Use hashmap_get_from_hash and remove the dummy entry

Kevin Willford (3):
  merge-recursive: fix memory leak
  merge-recursive: remove return value from get_files_dirs
  merge-recursive: change current file dir string_lists to hashmap

 merge-recursive.c | 76 ---
 merge-recursive.h |  3 +--
 2 files changed, 57 insertions(+), 22 deletions(-)

-- 
2.14.1.329.gcdd497e120.dirty

Re: Git diff --no-index and file descriptor inputs

2017-09-07 Thread Martin Ågren

On 7 September 2017 at 18:00, Jason Pyeron  wrote:
>
> I was hoping to leverage --word-diff-regex, but the --no-index option does
> not seem to work with <(...) notation.
>
> I am I doing something wrong or is this a bug?

There were some patches floating around half a year ago, I don't know
what happened to them.

https://public-inbox.org/git/20170324213110.4331-1-den...@kaarsemaker.net/

> -Jason (reply to list please)

Martin (not CC-ing the OP, though I'm not sure why they'd want that --
it's usually the other way round)

Re: [PATCH 09/10] set_git_dir: handle feeding gitdir to itself

2017-09-07 Thread Brandon Williams

On 09/05, Jeff King wrote:
> Ideally we'd free the existing gitdir field before assigning
> the new one, to avoid a memory leak. But we can't do so
> safely because some callers do the equivalent of:
> 
>   set_git_dir(get_git_dir());
> 
> We can detect that case as a noop, but there are even more
> complicated cases like:
> 
>   set_git_dir(remove_leading_path(worktree, get_git_dir());
> 
> where we really do need to do some work, but the original
> string must remain valid.
> 
> Rather than put the burden on callers to make a copy of the
> string (only to free it later, since we'll make a copy of it
> ourselves), let's solve the problem inside set_git_dir(). We
> can make a copy of the pointer for the old gitdir, and then
> avoid freeing it until after we've made our new copy.
> 

This patch and the one before it look good to me.  Thanks for fixing
this issue!

> Signed-off-by: Jeff King 
> ---
>  repository.c | 10 +++---
>  setup.c  |  5 -
>  2 files changed, 3 insertions(+), 12 deletions(-)
> 
> diff --git a/repository.c b/repository.c
> index 52f1821c6b..97c732bd48 100644
> --- a/repository.c
> +++ b/repository.c
> @@ -56,16 +56,12 @@ static void repo_setup_env(struct repository *repo)
>  void repo_set_gitdir(struct repository *repo, const char *path)
>  {
>   const char *gitfile = read_gitfile(path);
> + char *old_gitdir = repo->gitdir;
>  
> - /*
> -  * NEEDSWORK: Eventually we want to be able to free gitdir and the rest
> -  * of the environment before reinitializing it again, but we have some
> -  * crazy code paths where we try to set gitdir with the current gitdir
> -  * and we don't want to free gitdir before copying the passed in value.
> -  */
>   repo->gitdir = xstrdup(gitfile ? gitfile : path);
> -
>   repo_setup_env(repo);
> +
> + free(old_gitdir);
>  }
>  
>  /*
> diff --git a/setup.c b/setup.c
> index 23950173fc..6d8380acd2 100644
> --- a/setup.c
> +++ b/setup.c
> @@ -399,11 +399,6 @@ void setup_work_tree(void)
>   if (getenv(GIT_WORK_TREE_ENVIRONMENT))
>   setenv(GIT_WORK_TREE_ENVIRONMENT, ".", 1);
>  
> - /*
> -  * NEEDSWORK: this call can essentially be set_git_dir(get_git_dir())
> -  * which can cause some problems when trying to free the old value of
> -  * gitdir.
> -  */
>   set_git_dir(remove_leading_path(git_dir, work_tree));
>   initialized = 1;
>  }
> -- 
> 2.14.1.721.gc5bc1565f1
> 

-- 
Brandon Williams

[PATCH v2] read-cache: fix index corruption with index v4

2017-09-07 Thread Thomas Gummerer

ce012deb98 ("read-cache: avoid allocating every ondisk entry when
writing", 2017-08-21) changed the way cache entries are written to the
index file.  While previously it wrote the name to an struct that was
allocated using xcalloc(), it now uses ce_write() directly.  Previously
ce_namelen - common bytes were written to the cache entry, which would
automatically make it nul terminated, as it was allocated using calloc.

Now we are writing ce_namelen - common + 1 bytes directly from the
ce->name to the index.  If CE_STRIP_NAME however gets set in the split
index case ce->ce_namelen is set to 0 without changing the actual
ce->name buffer.  When index-v4, this results in the first character of
ce->name being written out instead of just a terminating nul charcter.

As index-v4 requires the terminating nul character as terminator of
the name when reading it back, this results in a corrupted index.

Fix that by only writing ce_namelen - common bytes directly from
ce->name to the index, and adding the nul terminator in an extra call to
ce_write.

This bug was turned up by setting TEST_GIT_INDEX_VERSION = 4 in
config.mak and running the test suite (t1700 specifically broke).

Signed-off-by: Thomas Gummerer 
---

> Will send an updated patch in a bit.

In a bit was a lie, I didn't get to it anymore yesterday, but here it is :)

 read-cache.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 40da87ea71..c6c69cf027 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2103,7 +2103,9 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct 
cache_entry *ce,
if (!result)
result = ce_write(c, fd, to_remove_vi, prefix_size);
if (!result)
-   result = ce_write(c, fd, ce->name + common, 
ce_namelen(ce) - common + 1);
+   result = ce_write(c, fd, ce->name + common, 
ce_namelen(ce) - common);
+   if (!result)
+   result = ce_write(c, fd, padding, 1);
 
strbuf_splice(previous_name, common, to_remove,
  ce->name + common, ce_namelen(ce) - common);
-- 
2.14.1.480.gb18f417b89

Re: Strange behavior of git rev-list

2017-09-07 Thread Stefan Beller

On Thu, Sep 7, 2017 at 3:11 AM, Jeff King  wrote:
> On Thu, Sep 07, 2017 at 11:50:25AM +0200, Paweł Marczewski wrote:
>
>> Thanks. Any plans to fix that? Or is there a way to turn off this heuristic?
>
> I don't think there's a way to turn it off for `rev-list`. Merge-base
> computations are more careful, so you could determine the correct
> endpoints that way. But when you fed them to `rev-list`, it would hit
> the same run of skewed commits.
>
> We've discussed storing true generation numbers in the past, which would
> make this optimization more robust, as well as allow us to speed up many
> other traversals (e.g., the "tag --contains"). It's something I'd like
> to revisit, but it's not at the top of the pile.

(We just had an office discussion if this is part of the hash transition plan,
i.e. have a field in the commit object to contain its generation number.
and as I think the generation numbers would lead to fast and correct
behavior unlike the current heuristic which is only fast, I would try
to make a strong point actual generation numbers inside commit objects)

>
> In the meantime, it would probably be possible to write a patch that
> conditionally disables the optimization. But it would mean all of your
> rev-lists run a _lot_ slower.
>
> -Peff

"git shortlog -sn --follow -- " counts all commits to entire repo

2017-09-07 Thread Валентин

Hi,

I'll be short as shortlog :)

"git shortlog -sn -- "
counts all commits to the specified path, as expected.

"git shortlog -sn --follow -- "
counts all commits to the entire repo, which looks like a bug.

"--follow" switch is not listed on
https://git-scm.com/docs/git-shortlog so maybe it's not supported. In
this case I would expect error message.

Tried the following versions:
"git version 2.14.1.windows.1" on Windows 7
"git version 2.7.4" on Ubuntu 16.04

Thanks,
Valentine

Re: clone repo & history to disconnected server

2017-09-07 Thread Stefan Beller

On Thu, Sep 7, 2017 at 9:50 AM, Kermit Short  wrote:
> Greetings!
> I have to set up a workflow where I'll be developing and testing on
> one network, and then deploying to production on a disconnected
> network. I'd like to be able to have the full commit history for all
> branches, tags, etc. on the disconnected side, in case I need to
> perform a roll-back, create a hot-fix, etc.
>
>
> Most of the solutions I see involve creating a mirror clone that is
> then merged with a different repository. My problem is that my servers
> are on two separate networks and I am unable to talk to both repos
> from the same computer. I'd need to be able to transport the files
> using sneaker net prior to importing on the second repo. I need to
> keep the production repo in full working sanity...i.e. import only new
> changes or completely overwrite the repository such that the full
> project history is intact and working.
>
>
> Does anyone have any suggestions? I've been looking at the archive
> command but I'm not sure it would bring out the full project history.
> Thanks in advance for your help!!

Checkout https://git-scm.com/docs/git-bundle

Re: [RFC PATCH 0/2] Add named reference to latest push cert

2017-09-07 Thread Stefan Beller

On Thu, Sep 7, 2017 at 2:11 AM, Shikher Verma  wrote:
> On Wed, Sep 06, 2017 at 02:31:49PM -0700, Stefan Beller wrote:
>> On Wed, Sep 6, 2017 at 2:39 AM, Shikher Verma  wrote:
>> > Currently, git only stores push certificates if there is a receive hook
>> > present. This may violate the principle of least surprise (e.g., I
>> > pushed with --signed, and I don't see anything in upstream).
>> > Additionally, push certificates could be more versatile if they are not
>> > tightly bound to git hooks. Finally, it would be useful to verify the
>> > signed pushes at later points of time with ease.
>> >
>> > A named ref is added for ease of access/tooling around push
>> > certificates. If the last push was signed, ref/PUSH_CERT stores the
>> > ref of the latest push cert otherwise it is empty.
>> >
>> > Sending patches as RFC since the documentation would have to be
>> > updated and git gc might have to be patched to not garbage collect
>> > the latest push certificate.
>> >
>> > This patch applies on master (3ec7d702a)
>>
>> What are performance implications for busy repositories at busy hosts?
>> (think kernel.org / github) They may want to disable this new feature
>> for performance reasons or because they don't want to clutter the
>> object store. So at least a config option to turn it off would be useful.
>
> Any typical git push would write several objects to disk,

(or just one pack file, [which may be renamed eventually, see 722ff7f8])

> this patch
> would only add one more object per push so I think the performance
> penalty is not that high. But I agree that we can have a config to turn
> it off.

I personally do not run a high performance server, so I do not terribly mind,
but thought it would be nice for them to have at least an option ready made
instead of a potential performance regression.

>> On the ref to store the push certs:
>> (a) Currently the ref points at the blob, I wonder if we'd rather want to
>> point at a commit? (Then we can build up a history of
>> push certs, instead of relying on the reflog to show all
>> push certs. Also the gc issue might be easier to solve using this)
>
> I am not sure how that would work. The ref points at the blob of push
> certificate. Since each push can update multiple refs, each push
> certificate can point to mutiple commits (tip of the updated refs).
> Also if the named ref points at the commit then how will we get the
> corresponding push certificate?
>
> I did think about keeping a history of push certificates but the problem
> is new pushes can delete refs and commits which were pointed to by
> previous push certificates. This makes it really difficult to decide
> which push certificates to keep and which to gc. Also this history would
> be different for different clones of the same repo. Since push
> certificate are only meta data of the git workflow I think its best to
> just keep the latest push certificate and gc the old ones. People can
> use the recieve hook if they want to do advance things like logging a
> history of push certificates. I think git should provide a builtin
> solution for the simple case.

What I had in mind was what would be achieved with a
hook like this (untested):

#!/bin/sh
if test -z GIT_PUSH_CERT ; then
exit 0
fi

# add a new worktree 'tmp', checking out the magic ref:
git worktree add tmp refs/PUSH_CERT

cp $GIT_PUSH_CERT tmp/cert
git -C tmp add cert
git -C tmp commit -m "new push cert"
# maybe include GIT_PUSH_CERT_[NONCE_]STATUS
# in commit message?

# clean up, command doesn't exist yet:
git worktree delete tmp

This would not deal with concurrency as it re-uses the
same worktree, but illustrates what I had in mind
for the git history of that special ref.

[PATCH] match_name_as_path: Pass WM_CASEFOLD to wildmatch

2017-09-07 Thread Aleksandr Makarov


---
 ref-filter.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/ref-filter.c b/ref-filter.c
index bc591f4..3746628 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1663,7 +1663,7 @@ static int match_name_as_path(const struct ref_filter 
*filter, const char *refna
 refname[plen] == '/' ||
 p[plen-1] == '/'))
return 1;
-   if (!wildmatch(p, refname, WM_PATHNAME))
+   if (!wildmatch(p, refname, flags))
return 1;
}
return 0;
--
2.7.4

Re: [PATCH 0/4] Test name-rev with small stack

2017-09-07 Thread Jeff King

On Thu, Sep 07, 2017 at 04:02:19PM +0200, Michael J Gruber wrote:

> name-rev segfaults for me in emacs.git with the typical 8102 stack size.
> The reason is the recursive walk that name-rev uses.
> 
> This series adds a test to mark this as known failure, after some
> clean-ups.

These all look reasonable to me. The size of the test case in the final
one is presumably arbitrary and just copied from t7004. I don't know if
it's worth trying to shrink it. It could shorten a rather expensive
test. OTOH, if we shorten it too much then we might get a false pass
(e.g., if the algorithm remains recursive but has a smaller stack
footprint).

> Michael J Gruber (4):
>   t7004: move limited stack prereq to test-lib
>   t6120: test name-rev --all and --stdin
>   t6120: clean up state after breaking repo
>   t6120: test describe and name-rev with deep repos

Now comes the hard part: rewriting the C code. :)

-Peff

Re: Donation

2017-09-07 Thread Mavis Wanczyk Foundation

Greetings To You,

My Name is Mavis wanczyk , the winner of the Power ball jackpot of $ $758.7 
million  in the AUGUST 24, 2017, My jackpot was a gift from God to me hence my 
Entire family/foundation has AGREED to do this. My foundation is donating 
$500,000.00USD to you. please contac maviswanczyk...@gmail.com for full details 
and please accept this token as a gift from me and my family.

Read more: 
http://money.cnn.com/2017/08/23/news/powerball-700-million-jackpot/index.html

 Best Regards,
 Mavis Wanczyk

clone repo & history to disconnected server

2017-09-07 Thread Kermit Short

Greetings!
I have to set up a workflow where I'll be developing and testing on
one network, and then deploying to production on a disconnected
network. I'd like to be able to have the full commit history for all
branches, tags, etc. on the disconnected side, in case I need to
perform a roll-back, create a hot-fix, etc.


Most of the solutions I see involve creating a mirror clone that is
then merged with a different repository. My problem is that my servers
are on two separate networks and I am unable to talk to both repos
from the same computer. I'd need to be able to transport the files
using sneaker net prior to importing on the second repo. I need to
keep the production repo in full working sanity...i.e. import only new
changes or completely overwrite the repository such that the full
project history is intact and working.


Does anyone have any suggestions? I've been looking at the archive
command but I'm not sure it would bring out the full project history.
Thanks in advance for your help!!

git diff doesn't quite work as documented?

2017-09-07 Thread Olaf Klischat

oklischat@oklischat:/tmp$ mkdir gittest
oklischat@oklischat:/tmp$ cd gittest/
oklischat@oklischat:/tmp/gittest$ git init
Initialized empty Git repository in /private/tmp/gittest/.git/
oklischat@oklischat:/tmp/gittest$ echo foo > foo.txt
oklischat@oklischat:/tmp/gittest$ git add foo.txt
oklischat@oklischat:/tmp/gittest$ git commit -m foo
[master (root-commit) 54d55f2] foo
 1 file changed, 1 insertion(+)
 create mode 100644 foo.txt
oklischat@oklischat:/tmp/gittest$ echo bar > bar.txt
oklischat@oklischat:/tmp/gittest$ git add bar.txt
oklischat@oklischat:/tmp/gittest$ git commit -m bar
[master 83b2e74] bar
 1 file changed, 1 insertion(+)
 create mode 100644 bar.txt
oklischat@oklischat:/tmp/gittest$ git tag bar-added
oklischat@oklischat:/tmp/gittest$ git rm bar.txt
rm 'bar.txt'
oklischat@oklischat:/tmp/gittest$ git commit -m 'rm bar'
[master 3ca4ff9] rm bar
 1 file changed, 1 deletion(-)
 delete mode 100644 bar.txt
oklischat@oklischat:/tmp/gittest$
oklischat@oklischat:/tmp/gittest$ git checkout bar-added -- bar.txt
oklischat@oklischat:/tmp/gittest$ git reset HEAD
oklischat@oklischat:/tmp/gittest$ git status
On branch master
Untracked files:
  (use "git add ..." to include in what will be committed)

bar.txt

nothing added to commit but untracked files present (use "git add" to track)
oklischat@oklischat:/tmp/gittest$ git diff bar-added   # would expect this to 
show no differences
diff --git a/bar.txt b/bar.txt
deleted file mode 100644
index 5716ca5..000
--- a/bar.txt
+++ /dev/null
@@ -1 +0,0 @@
-bar
oklischat@oklischat:/tmp/gittest$ 
oklischat@oklischat:/tmp/gittest$ git diff bar-added  -- bar.txt   # dito
diff --git a/bar.txt b/bar.txt
deleted file mode 100644
index 5716ca5..000
--- a/bar.txt
+++ /dev/null
@@ -1 +0,0 @@
-bar
oklischat@oklischat:/tmp/gittest$ 


`git diff --help' says:

git diff [--options]  [--] [...]
   This form is to view the changes you have in your working tree 
relative to the named .

If that were entirely true, the last two commands shouldn't have shown
any differences, right?

On closer inspection, it seems that what `git diff ' really
does is take only those paths in the working directory that are also
in  and compare the resulting tree against .

We should add some option to that git diff form to make it really do
what the docs claim it does.

Or am I missing something?

Re: [RFC PATCH 0/2] Add named reference to latest push cert

2017-09-07 Thread Stefan Beller

On Thu, Sep 7, 2017 at 12:08 AM, Shikher Verma  wrote:
> Hey everyone,
>
> I felt like I should introduce myself since this is my first patch on
> the git mailing list (or any mailing list actually) :D

Welcome to the list. :)

> I am Shikher[1], currently in my 4th year undergrad at IIT Kanpur.
> This summer I was lucky enough to intern at NYU Secure Systems Lab[2]
> mentored by Santiago. We looked into how signed pushes work and how
> we can use them to increase the security of git. We encountered a
> strange error in tests which resulted in a patch[3] and I wrote a
> python script to verify push certificates[4]. I was pretty surprised
> to not find any push certificate on the remote repo after I did a
> signed push, hence this RFC.
>
> Anyway this is my first time trying to contribute to a large OSS so
> forgive me if I make any noob mistakes.

Patches are very welcome. Everyone was a noob once, but that
changes quickly.

Thanks,
Stefan

[PATCH] usage: conditionally compile unleak_memory() definition

2017-09-07 Thread Ramsay Jones


Signed-off-by: Ramsay Jones 
---

Hi Jeff,

If you need to re-roll your 'jk/leak-checkers' branch, could you
please squash this into the relevant patch (commit c57586d954,
"add UNLEAK annotation for reducing leak false positives", 05-09-2017).

This was noticed by sparse, since it is not declared when the
SUPPRESS_ANNOTATED_LEAKS pre-processor macro is not defined.
(You could move the declaration out of the #ifdef in the header
file, I suppose, but it seems pointless to compile the function
if it isn't being used).

Thanks!

ATB,
Ramsay Jones

 usage.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/usage.c b/usage.c
index 780ed73be..cdd534c9d 100644
--- a/usage.c
+++ b/usage.c
@@ -242,6 +242,7 @@ NORETURN void BUG(const char *fmt, ...)
 }
 #endif
 
+#ifdef SUPPRESS_ANNOTATED_LEAKS
 void unleak_memory(const void *ptr, size_t len)
 {
static struct suppressed_leak_root {
@@ -254,3 +255,4 @@ void unleak_memory(const void *ptr, size_t len)
root->next = suppressed_leaks;
suppressed_leaks = root;
 }
+#endif
-- 
2.14.0

Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-07 Thread Thomas Gummerer

On 09/07, Michael Haggerty wrote:
> On Wed, Sep 6, 2017 at 3:26 AM, Junio C Hamano  wrote:
> > Thomas Gummerer  writes:
> >
> >> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed
> >> as the second parameter of memcpy.
> >> [...]
> >
> > It is hugely annoying to see a halfway-intelligent compiler forces
> > you to add such pointless asserts.
> >
> > The only way the compiler could error on this is by inferring the
> > fact that new_sha1/old_sha1 could be NULL by looking at the callsite
> > in ref_transaction_update() where these are used as conditionals to
> > set HAVE_NEW/HAVE_OLD that are passed.  Even if the compiler were
> > doing the whole-program analysis, the other two callsites of the
> > function pass the address of oid.hash[] in an oid structure so it
> > should know these won't be NULL.
> >
> > [...]
> >
> > I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these
> > codepaths, though.  Perhaps we can instead declare !!new_sha1 means
> > we have the new side and rewrite the above part to
> >
> > if (new_sha1)
> > hashcpy(update->new_oid.hash, new_sha1);
> >
> > without an extra and totally pointless assert()?
> 
> The ultimate reason for those flags is that `struct ref_update` embeds
> `new_oid` and `old_oid` directly in the struct, so there is no way to
> set it to "NULL". (The `is_null_sha1` value is used for a different
> purpose.) So those flags keep track of whether the corresponding value
> is specified or absent.
> 
> Four of the five callers of `ref_transaction_add_update()` are
> constructing a new `ref_update` from an old one. They currently don't
> have to look into `flags`; they just pass it on (possibly changing a
> bit or two). Implementing your proposal would oblige those callers to
> change from something like

Thanks for the explanation!

> > new_update = ref_transaction_add_update(
> > transaction, "HEAD",
> > update->flags | REF_LOG_ONLY | REF_NODEREF,
> > update->new_oid.hash, update->old_oid.hash,
> > update->msg);
> 
> to
> 
> > new_update = ref_transaction_add_update(
> > transaction, "HEAD",
> > update->flags | REF_LOG_ONLY | REF_NODEREF,
> > (update->flags & REF_HAVE_NEW) ? update->new_oid.hash : NULL,
> > (update->flags & REF_HAVE_OLD) ? update->old_oid.hash : NULL,
> > update->msg);
> 
> It's not the end of the world, but it's annoying.
> `ref_transaction_add_update()` was meant to be a low-level,
> low-overhead way of allocating a `struct ref_update` and add it to a
> transaction.
> 
> Another solution (also annoying, but maybe a tad less so) would be to
> change the one iffy caller, `ref_transaction_update()`, to pass in a
> pointer to the null OID for `new_sha1` and `old_sha1` when the
> corresponding flags are turned off. That value would never be looked
> at, but it would hopefully reassure gcc.
> 
> I did just realize one thing: `ref_transaction_update()` takes `flags`
> as an argument and alters it using
> 
> > flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 
> > 0);
> 
> Perhaps gcc is *more* intelligent than we give it credit for, and is
> actually worried that the `flags` argument passed in by the caller
> might *already* have one of these bits set. In that case
> `ref_transaction_add_update()` would indeed be called incorrectly.
> Does the warning go away if you change that line to
> 
> > if (new_sha1)
> > flags |=REF_HAVE_NEW;
> > else
> > flags &= ~REF_HAVE_NEW;
> > if (old_sha1)
> > flags |=REF_HAVE_OLD;
> > else
> > flags &= ~REF_HAVE_OLD;
> 
> ?

Indeed that fixes it, great catch!  gcc is indeed smarter than we gave
it credit for, this makes a lot of sense.

Interestingly stripping away the flags fixes the compiler warning:

diff --git a/refs.c b/refs.c
index ba22f4acef..2e6871beac 100644
--- a/refs.c
+++ b/refs.c
@@ -921,6 +921,9 @@ int ref_transaction_update(struct ref_transaction 
*transaction,
return -1;
}
 
+   flags &= ~REF_HAVE_NEW;
+   flags &= ~REF_HAVE_OLD;
+
flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 0);
 
ref_transaction_add_update(transaction, refname, flags,

while checking that the flags are not there in the first place still
leaves the compiler warning (whether I use "die()" or just "return -1"
doesn't matter in this case):

diff --git a/refs.c b/refs.c
index ba22f4acef..62ff283755 100644
--- a/refs.c
+++ b/refs.c
@@ -921,6 +921,9 @@ int ref_transaction_update(struct ref_transaction 
*transaction,
return -1;
}
 
+   if ((flags & REF_HAVE_NEW) != 0 || (flags & REF_HAVE_OLD) != 0)
+   die("BUG: passed invalid flag to ref_transaction_update");
+
flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 0);

Re: [PATCH 10/10] add UNLEAK annotation for reducing leak false positives

2017-09-07 Thread Stefan Beller

On Thu, Sep 7, 2017 at 2:17 AM, Jeff King  wrote:
>> After having a sneak peak at the implementation
>> it is O(1) in runtime for each added element, and the
>> space complexity is O(well).
>
> I'm not sure if your "well" is "this does well" or "well, it could be
> quite a lot". :)

Both actually. When I wrote it I thought the phonetic interpretation
was way too funny, but nobody can hear subtle humor on mailing
lists. :)

If UNLEAK is used correctly, then it sounds more like
"this does well (and we cannot do better anyway)".

> It certainly has the potential to grow the heap without bound (since
> after all, it's whole point is to make a giant list of variables that
> are going out of scope). But in practice we'd sprinkle this over a
> handful of variables just before program exit (and remember that it's
> copying only what's on the stack already; so pointers get copied, not
> whole heap-allocated blocks).
>
> Plus it does nothing at all when not compiled with leak-checking. So I'm
> not too worried about the extra memory usage or performance.

me neither.

Thanks for starting this series (I am really happy about this solution)!
Stefan

Re: clone repo & history to disconnected server

2017-09-07 Thread Kermit Short

OK apologies for not reading the docs thoroughly enough, and thank
you!!  This looks to be exactly what I need.

On Thu, Sep 7, 2017 at 1:35 PM, Stefan Beller  wrote:
> On Thu, Sep 7, 2017 at 9:50 AM, Kermit Short  wrote:
>> Greetings!
>> I have to set up a workflow where I'll be developing and testing on
>> one network, and then deploying to production on a disconnected
>> network. I'd like to be able to have the full commit history for all
>> branches, tags, etc. on the disconnected side, in case I need to
>> perform a roll-back, create a hot-fix, etc.
>>
>>
>> Most of the solutions I see involve creating a mirror clone that is
>> then merged with a different repository. My problem is that my servers
>> are on two separate networks and I am unable to talk to both repos
>> from the same computer. I'd need to be able to transport the files
>> using sneaker net prior to importing on the second repo. I need to
>> keep the production repo in full working sanity...i.e. import only new
>> changes or completely overwrite the repository such that the full
>> project history is intact and working.
>>
>>
>> Does anyone have any suggestions? I've been looking at the archive
>> command but I'm not sure it would bring out the full project history.
>> Thanks in advance for your help!!
>
> Checkout https://git-scm.com/docs/git-bundle

[PATCHv3] builtin/merge: honor commit-msg hook for merges

2017-09-07 Thread Stefan Beller

Similar to 65969d43d1 (merge: honor prepare-commit-msg hook, 2011-02-14)
merge should also honor the commit-msg hook: When a merge is stopped due
to conflicts or --no-commit, the subsequent commit calls the commit-msg
hook.  However, it is not called after a clean merge. Fix this
inconsistency by invoking the hook after clean merges as well.

This change is motivated by Gerrit's commit-msg hook to install a ChangeId
trailer into the commit message. Without such a ChangeId, Gerrit refuses
to accept any commit by default, such that the inconsistency of (not)
running the commit-msg hook between commit and merge leads to confusion
and might block people from getting their work done.

As the githooks man page is very vocal about the possibility of skipping
the commit-msg hook via the --no-verify option, implement the option
in merge, too.

'git merge --continue' is currently implemented as calling cmd_commit
with no further arguments. This works for most other merge related options,
such as demonstrated via the --allow-unrelated-histories flag in the
test. The --no-verify option however is not remembered across invocations
of git-merge. Originally the author assumed an alternative in which the
'git merge --continue' command accepts the --no-verify flag, but that
opens up the discussion which flags are allows to the continued merge
command and which must be given in the first invocation.

Signed-off-by: Stefan Beller 
---

> I didn't check how "merge --continue" is implemented, but we need to
> behave exactly the same way over there, too.  Making sure that it is
> a case in t7504 may be a good idea, in addition to the test you
> added.

First I understood this as if we'd want to support
'git merge --continue --no-verify' eventually, but by now I think we want
to 'carry over' the meaning from the first invocation of git-merge.

For that I added a test.

Thanks,
Stefan

 builtin/merge.c|  8 ++
 t/t7504-commit-msg-hook.sh | 64 +++---
 2 files changed, 68 insertions(+), 4 deletions(-)

diff --git a/builtin/merge.c b/builtin/merge.c
index 7df3fe3927..780435d7a1 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -73,6 +73,7 @@ static int show_progress = -1;
 static int default_to_upstream = 1;
 static int signoff;
 static const char *sign_commit;
+static int verify_msg = 1;
 
 static struct strategy all_strategy[] = {
{ "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
@@ -236,6 +237,7 @@ static struct option builtin_merge_options[] = {
  N_("GPG sign commit"), PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
OPT_BOOL(0, "overwrite-ignore", _ignore, N_("update ignored 
files (default)")),
OPT_BOOL(0, "signoff", , N_("add Signed-off-by:")),
+   OPT_BOOL(0, "verify", _msg, N_("verify commit-msg hook")),
OPT_END()
 };
 
@@ -780,6 +782,12 @@ static void prepare_to_commit(struct commit_list 
*remoteheads)
if (launch_editor(git_path_merge_msg(), NULL, NULL))
abort_commit(remoteheads, NULL);
}
+
+   if (verify_msg && run_commit_hook(0 < option_edit, get_index_file(),
+ "commit-msg",
+ git_path_merge_msg(), NULL))
+   abort_commit(remoteheads, NULL);
+
read_merge_msg();
strbuf_stripspace(, 0 < option_edit);
if (!msg.len)
diff --git a/t/t7504-commit-msg-hook.sh b/t/t7504-commit-msg-hook.sh
index 88d4cda299..302a3a2082 100755
--- a/t/t7504-commit-msg-hook.sh
+++ b/t/t7504-commit-msg-hook.sh
@@ -101,6 +101,10 @@ cat > "$HOOK" <> file &&
@@ -135,6 +139,32 @@ test_expect_success '--no-verify with failing hook 
(editor)' '
 
 '
 
+test_expect_success 'merge fails with failing hook' '
+
+   test_when_finished "git branch -D newbranch" &&
+   test_when_finished "git checkout -f master" &&
+   git checkout --orphan newbranch &&
+   : >file2 &&
+   git add file2 &&
+   git commit --no-verify file2 -m in-side-branch &&
+   test_must_fail git merge --allow-unrelated-histories master &&
+   commit_msg_is "in-side-branch" # HEAD before merge
+
+'
+
+test_expect_success 'merge bypasses failing hook with --no-verify' '
+
+   test_when_finished "git branch -D newbranch" &&
+   test_when_finished "git checkout -f master" &&
+   git checkout --orphan newbranch &&
+   : >file2 &&
+   git add file2 &&
+   git commit --no-verify file2 -m in-side-branch &&
+   git merge --no-verify --allow-unrelated-histories master &&
+   commit_msg_is "Merge branch '\''master'\'' into newbranch"
+'
+
+
 chmod -x "$HOOK"
 test_expect_success POSIXPERM 'with non-executable hook' '
 
@@ -178,10 +208,6 @@ exit 0
 EOF
 chmod +x "$HOOK"
 
-commit_msg_is () {
-   test "$(git log --pretty=format:%s%b -1)" = "$1"
-}
-
 test_expect_success 'hook edits commit message' '
 
echo "additional" >> file &&
@@ -217,7 +243,36 @@

Re: [PATCH 27/34] shortlog: release strbuf after use in insert_one_record()

2017-09-07 Thread Junio C Hamano

Jeff King  writes:

> On Thu, Sep 07, 2017 at 04:51:12AM +0900, Junio C Hamano wrote:
>
>> > diff --git a/builtin/shortlog.c b/builtin/shortlog.c
>> > index 43c4799ea9..48af16c681 100644
>> > --- a/builtin/shortlog.c
>> > +++ b/builtin/shortlog.c
>> > @@ -50,66 +50,67 @@ static int compare_by_list(const void *a1, const void 
>> > *a2)
>> >  static void insert_one_record(struct shortlog *log,
>> >  const char *author,
>> >  const char *oneline)
>> >  {
>> > ...
>> >item = string_list_insert(>list, namemailbuf.buf);
>> > +  strbuf_release();
>> 
>> As log->list.strdup_strings is set to true in shortlog_init(),
>> namemailbuf.buf will leak without this.
>> 
>> An alterative, as this is the only place we add to log->list, could
>> be to make log->list take ownership of the string by not adding a
>> _release() here but instead _detach(), I guess.
>
> I agree that would be better, but I think it's a little tricky. The
> string_list_insert() call may make a new entry, or it may return an
> existing one. We'd still need to free in the latter case. I'm not sure
> the string_list interface makes it easy to tell the difference.

True; I do not think string_list API does.  But for this particular
application, I suspect that we can by looking at the util field of
the item returned.  A newly created one has NULL, but we always make
it non-NULL before leaving this function.

>> It is also curious that at the end of shortlog_output(), we set the
>> log->list.strdup_strings again to true immediately before calling
>> string_list_clear() on it.  I suspect that is unnecessary?
>
> I think so. I was curious whether I might have introduced this weirdness
> when I refactored this code a year or so ago. But no, it looks like it
> dates all the way back to the very first version of shortlog.c.

Hmph.

Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-07 Thread Junio C Hamano

Michael Haggerty  writes:

> I did just realize one thing: `ref_transaction_update()` takes `flags`
> as an argument and alters it using
>
>> flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 
>> 0);
>
> Perhaps gcc is *more* intelligent than we give it credit for, and is
> actually worried that the `flags` argument passed in by the caller
> might *already* have one of these bits set. In that case
> `ref_transaction_add_update()` would indeed be called incorrectly.
> Does the warning go away if you change that line to
>
>> if (new_sha1)
>> flags |=REF_HAVE_NEW;
>> else
>> flags &= ~REF_HAVE_NEW;
>> if (old_sha1)
>> flags |=REF_HAVE_OLD;
>> else
>> flags &= ~REF_HAVE_OLD;
>
> ? This might be a nice change to have anyway, to isolate
> `ref_transaction_update()` from mistakes by its callers.

I understand "drop HAVE_NEW bit if new_sha1 is NULL" part, but not
the other side "add HAVE_NEW if new_SHA1 is not NULL"---doesn't the
NEW/OLD flag exist exactly because some callers pass the address of
an embedded oid.hash[] or null_sha1, instead of NULL, when one side 
does not exist?  So new|old being NULL is a definite signal that we
need to drop HAVE_NEW|OLD, but the reverse may not be true, no?  Is
it OK to overwrite null_sha1[] that is passed from some codepaths?

ref_transaction_create and _delete pass null_sha1 on the missing
side, while ref_transaction_verify passes NULL, while calling
_update().  Should this distinction affect how _add_update() gets
called?

Re: [PATCHv3] builtin/merge: honor commit-msg hook for merges

2017-09-07 Thread Junio C Hamano

Stefan Beller  writes:

>  The --no-verify option however is not remembered across invocations
> of git-merge. Originally the author assumed an alternative in which the
> 'git merge --continue' command accepts the --no-verify flag, but that
> opens up the discussion which flags are allows to the continued merge
> command and which must be given in the first invocation.

This leaves a reader (me) wondering what the final conclusion was,
after the author assumed something and thought about alternatives.
I am guessing that your final decision was not to remember
"--no-verify" so a user who started "merge --no-verify" that stopped
in the middle must say "merge --continue --no-verify" or "commit
--no-verify" to conclude the merge?  Or you added some mechanism to
remember the fact that no-verify was given so that "merge --continue"
will read from there, ignoring "merge --continue --verify" from the
command line?  Not just the above part of the log message confusing,
but there is no update to the documentation, and we shouldn't expect
end-users to find out what ought to happen by reading t7504 X-<.

The new test in t7504 tells me that you remember --[no-]verify from
the initial invocation and use the same when --continue is given; it
is unclear how that remembered one interacts with --[no-]verify that
is given when --continue is given.  It is not documented, tested and
explained in the log message.  I would expect that the command line 
trumps what was given in the initial invocation.

> +static int verify_msg = 1;
>  
>  static struct strategy all_strategy[] = {
>   { "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
> @@ -236,6 +237,7 @@ static struct option builtin_merge_options[] = {
> N_("GPG sign commit"), PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
>   OPT_BOOL(0, "overwrite-ignore", _ignore, N_("update ignored 
> files (default)")),
>   OPT_BOOL(0, "signoff", , N_("add Signed-off-by:")),
> + OPT_BOOL(0, "verify", _msg, N_("verify commit-msg hook")),
>   OPT_END()
>  };

I suspect that the previous iteration gives a much better end-user
experience when "git merge -h" is used.  This will give the
impression that the user MUST say "merge --verify" if the user wants
to verify commit-msg hook (whatever that means), but because the
option defaults to true, that is not what happens.  The user instead
must say "merge --no-verify" if the verification is unwanted.

"git commit -h" explains 

--no-verifybypass pre-commit and commit-msg hooks

and I think that is the way how we want to explain this option in
"git merge" too.  Normally it is not bypassed, and the user can ask
with "--no-verify".  Thanks to René's change in 2012, the option
definition you had in the previous one will make --[no-]verify
accepted just fine.

> +test_expect_success 'merge fails with failing hook' '
> + ...
> +'
> +
> +test_expect_success 'merge bypasses failing hook with --no-verify' '
> + ...
> +'

Both look sensible.

> +test_expect_failure 'merge --continue remembers --no-verify' '
> + test_when_finished "git branch -D newbranch" &&
> + test_when_finished "git checkout -f master" &&
> + git checkout master &&
> + echo a >file2 &&
> + git add file2 &&
> + git commit --no-verify -m "add file2 to master" &&
> + git checkout -b newbranch master^ &&
> + echo b >file2 &&
> + git add file2 &&
> + git commit --no-verify file2 -m in-side-branch &&
> + git merge --no-verify -m not-rewritten-by-hook master &&
> + # resolve conflict:
> + echo c >file2 &&
> + git add file2 &&
> + git merge --continue &&
> + commit_msg_is not-rewritten-by-hook
>  '

OK.  What should happen when the last "merge --continue" was given
"--verify" at the same time?  A similar test whose title is
"--no-verify remembered by merge --continue can be overriden" may be
a good thing to follow this one, perhaps?

Thanks.

Re: Branch name support & character

2017-09-07 Thread Junio C Hamano

Andrew Ardill  writes:

> Using git checkout -b 'my' works for me (single quotes around
> the branch name).
>
> Pushing to a local remote also works: git push --set-upstream
> ../git-test-2 'my'

Thanks for sanity-checking.  I was wondering if we have codepaths
that botch quoting (we shouldn't, and if there is, we should fix
it).

>
> That being said, I would advise not using & in branch names,
> specifically because it is a special character in shells.

Good advice.

Re: git diff doesn't quite work as documented?

2017-09-07 Thread Junio C Hamano

Olaf Klischat  writes:

> `git diff --help' says:
>
> git diff [--options]  [--] [...]
>This form is to view the changes you have in your
>working tree relative to the named .

That help text is poorly phrased.  

When "git diff" talks about files in your working tree, it always
looks them _through_ the index.  As far as the command is concerned,
a cruft left in your working tree that is not in the index does
*not* exist.

So when your index does not have bar.txt, even if you have an
untracked bar.txt in your directory, i.e.

> oklischat@oklischat:/tmp/gittest$ git status
> On branch master
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>
>   bar.txt

and you have a commit that _has_ that file, then the command thinks
 has the path, and your working tree does *not*.  IOW, this
is...

> oklischat@oklischat:/tmp/gittest$ git diff bar-added
> diff --git a/bar.txt b/bar.txt
> deleted file mode 100644

... totally expected and intended output.

Hope the above explanation clarifies.  A documentation update might
be helpful to new users.

Thanks.

Re: [PATCH] commit-tree: don't append a newline with -F

2017-09-07 Thread Junio C Hamano

Ross Kabus  writes:

> This change makes it such that commit-tree -F never appends a newline
> character to the supplied commit message (either from file or stdin).
>
> Previously, commit-tree -F would always append a newline character to
> the text brought in from file or stdin.

This is not correct.  The command adds a missing newline only when
the input ended in an incompete line.

I'd rephrase the whole thing like this:

commit-tree: do not complete line in -F input

"git commit-tree -F ", unlike "cat  | git
commit-tree" (i.e. feeding the same contents from the standard
input), added a missing final newline when the input ended in an
incomplete line.

Correct this inconsistency by leaving the incomplete line as-is,
as erring on the side of not touching the input is preferrable
and expected for a plumbing command like "commit-tree".

if I were doing this patch.

Thanks.

> Signed-off-by: Ross Kabus 
> ---
>  builtin/commit-tree.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
> index 19e898fa4..2177251e2 100644
> --- a/builtin/commit-tree.c
> +++ b/builtin/commit-tree.c
> @@ -102,7 +102,6 @@ int cmd_commit_tree(int argc, const char **argv, const 
> char *prefix)
>   if (fd && close(fd))
>   die_errno("git commit-tree: failed to close 
> '%s'",
> argv[i]);
> - strbuf_complete_line();
>   continue;
>   }

Re: RFC v3: Another proposed hash function transition plan

2017-09-07 Thread Junio C Hamano

Junio C Hamano  writes:

> One thing I still do not know how I feel about after re-reading the
> thread, and I didn't find the above doc, is Linus's suggestion to
> use the objects themselves as NewHash-to-SHA-1 mapper [*1*].  
> ...
> [Reference]
>
> *1* 

I think this falls into the same category as the often-talked-about
addition of the "generation number" field.  It is very tempting to
add these "mechanically derivable but expensive to compute" pieces
of information to the sha3-content while converting from
sha1-content and creating anew.  

Because the "sha1-name" or the "generation number" can mechanically
be computed, as long as everybody agrees to _always_ place them in
the sha3-content, the same sha1-content will be converted into
exactly the same sha3-content without ambiguity, and converting them
back to sha1-content while pushing to an older repository will
correctly produce the original sha1-content, as it would just be the
matter of simply stripping these extra pieces of information.

The reason why I still feel a bit uneasy about adding these things
(aside from the fact that sha1-name thing will be a baggage we would
need to carry forever even after we completely wean ourselves off of
the old hash) is because I am not sure what we should do when we
encounter sha3-content in the wild that has these things _wrong_.
An object that exists today in the SHA-1 world is fetched into the
new repository and converted to SHA-3 contents, and Linus's extra
"original SHA-1 name" field is added to the object's header while
recording the SHA-3 content.  But for whatever reason, the original
SHA-1 name is recorded incorrectly in the resulting SHA-3 object.

The same thing could happen if we decide to bake "generation number"
in the SHA-3 commit objects.  One possible definition would be that
a root commit will have gen #0; a commit with 1 or more parents will
get max(parents' gen numbers) + 1 as its gen number.  But somebody
may botch the counting and records sum(parents' gen numbers) as its
gen number.

In these cases, not just the SHA3-content but also the resulting
SHA-3 object name would be different from the name of the object
that would have recorded the same contents correctly.  So converting
back to SHA-1 world from these botched SHA-3 contents may produce
the original contents, but we may end up with multiple "plausibly
looking" set of SHA-3 objects that (clain to) correspond to a single
SHA-1 object, only one of which is a valid one.

Our "git fsck" already treats certain brokenness (like a tree whose
entry has mode that is 0-padded to the left) as broken but still
tolerate them.  I am not sure if it is sufficient to diagnose and
declare broken and invalid when we see sha3-content that records
these "mechanically derivable but expensive to compute" pieces of
information incorrectly.

I am leaning towards saying "yes, catching in fsck is enough" and
suggesting to add generation number to sha3-content of the commit
objects, and to add even the "original sha1 name" thing if we find
good use of it.  But I cannot shake this nagging feeling off that I
am missing some huge problems that adding these fields and opening
ourselves to more classes of broken objects.

Thoughts?

Re: Strange behavior of git rev-list

2017-09-07 Thread Jeff King

On Fri, Sep 08, 2017 at 12:37:28PM +0900, Junio C Hamano wrote:

> > I'm still moderately against storing generation numbers inside the
> > objects. They're redundant with the existing parent pointers, which
> > means it's an opportunity for the two sets of data to disagree. And as
> > we've seen, once errors are cemented in history it's very hard to fix
> > them, because you break any history built on top.
> >
> > I'm much more in favor of building a local cache of generation numbers
> > (either on the fly or during repacks, where we can piggy-back on the
> > existing pack .idx for indexing).
> 
> I guess our mails crossed.  Yes, objects that are needlessly broken
> only because they botch computation of derivable values are real
> problem, as we need to accomodate them forever because histories can
> and will be built on top of them.
> 
> On the other hand, seeing that the world did not stop even with some
> projects have trees with entries whose mode are written with 0-padding
> on the left in the ancient part of their histories, it might not be
> such a big deal.  I dunno.

True, but in counter-point:

  1. Tree problems generally only affect operations on that tree itself.
 Parent (or generation number) problems hit us any time we walk
 across that part of history, which may be much more frequent.

  2. There's an open question of what to do with existing commits
 without generation numbers.

 Per (1), "git tag --contains" is _always_ going to want to know the
 generation number of v1.0. Some problems are "local" to their block
 of history and as the project history marches forward, the bugs are
 there but you are less likely to make queries that hit them. But
 considering old tags for reachability will happen forever (and is
 the _most_ important use of generation numbers, because it lets us
 throw out that old history immediately).

 So if we assume we can't rewrite those objects, then we end up with
 some kind of local cache anyway.

  3. I think we should be moving more in the direction of keeping
 repo-local caches for optimizations. Reachability bitmaps have been
 a big performance win. I think we should be doing the same with our
 properties of commits. Not just generation numbers, but making it
 cheap to access the graph structure without zlib-inflating whole
 commit objects (i.e., packv4 or something like the "metapacks" I
 proposed a few years ago).

-Peff

Re: RFC v3: Another proposed hash function transition plan

2017-09-07 Thread Jeff King

On Fri, Sep 08, 2017 at 11:40:21AM +0900, Junio C Hamano wrote:

> Our "git fsck" already treats certain brokenness (like a tree whose
> entry has mode that is 0-padded to the left) as broken but still
> tolerate them.  I am not sure if it is sufficient to diagnose and
> declare broken and invalid when we see sha3-content that records
> these "mechanically derivable but expensive to compute" pieces of
> information incorrectly.
> 
> I am leaning towards saying "yes, catching in fsck is enough" and
> suggesting to add generation number to sha3-content of the commit
> objects, and to add even the "original sha1 name" thing if we find
> good use of it.  But I cannot shake this nagging feeling off that I
> am missing some huge problems that adding these fields and opening
> ourselves to more classes of broken objects.

I share your nagging feeling.

I have two thoughts on the "fsck can catch it" line of reasoning.

  1. It's harder to fsck generation numbers than other syntactic
 elements of an object, because it inherently depends on the links.
 So I can't fsck a commit object in isolation. I have to open its
 parents and check _their_ generation numbers.

 In some sense that isn't a big deal. A real fsck wants to know that
 we _have_ the parents in the first place. But traditionally we've
 separated "is this syntactically valid" from "do we have full
 connectivity". And features like shallow clones rely on us fudging
 the latter but not the former. A shallow history could never
 properly fsck the generation numbers.

 A multiple-hash field doesn't have this problem. It's purely a
 function of the bytes in the object.

  2. I wouldn't classify the current fsck checks as a wild success in
 containing breakages. If a buggy implementation produces invalid
 objects, the same buggy implementation generally lets people (and
 their colleagues) unwittingly build on top of those objects. It's
 only later (sometimes much later) that they interact with a
 non-buggy implementation whose fsck complains.

 And what happens then? If they're lucky, the invalid objects
 haven't spread far, and the worst thing is that they have to learn
 to use filter-branch (which itself is punishment enough). But
 sometimes a significant bit of history has been built on top, and
 it's awkward or impossible to rewrite it.

 That puts the burden on whoever is running the non-buggy
 implementation that wants to reject the objects. Do they accept
 these broken objects? If so, what do they do to mitigate the wrong
 answers that Git will return?

I'm much more in favor of keeping that data outside the object-hash
computation, and caching the pre-computed results as necessary. Those
cache can disagree with the objects, of course, but the cost to dropping
and re-building them is much lower than a history rewrite.

I'm speaking primarily to the generation-number thing, where I really
don't think there's any benefit to embedding it in the object beyond the
obvious "well, it has to go _somewhere_, and this saves us implementing
a local cache layer".  I haven't thought hard enough on the
multiple-hash thing to know if there's some other benefit to having it
inside the objects.

-Peff

Re: Strange behavior of git rev-list

2017-09-07 Thread Jeff King

On Thu, Sep 07, 2017 at 12:24:02PM -0700, Stefan Beller wrote:

> > We've discussed storing true generation numbers in the past, which would
> > make this optimization more robust, as well as allow us to speed up many
> > other traversals (e.g., the "tag --contains"). It's something I'd like
> > to revisit, but it's not at the top of the pile.
> 
> (We just had an office discussion if this is part of the hash transition plan,
> i.e. have a field in the commit object to contain its generation number.
> and as I think the generation numbers would lead to fast and correct
> behavior unlike the current heuristic which is only fast, I would try
> to make a strong point actual generation numbers inside commit objects)

I'm still moderately against storing generation numbers inside the
objects. They're redundant with the existing parent pointers, which
means it's an opportunity for the two sets of data to disagree. And as
we've seen, once errors are cemented in history it's very hard to fix
them, because you break any history built on top.

I'm much more in favor of building a local cache of generation numbers
(either on the fly or during repacks, where we can piggy-back on the
existing pack .idx for indexing).

-Peff

Re: Strange behavior of git rev-list

2017-09-07 Thread Junio C Hamano

Jeff King  writes:

> On Thu, Sep 07, 2017 at 12:24:02PM -0700, Stefan Beller wrote:
>
>> > We've discussed storing true generation numbers in the past, which would
>> > make this optimization more robust, as well as allow us to speed up many
>> > other traversals (e.g., the "tag --contains"). It's something I'd like
>> > to revisit, but it's not at the top of the pile.
>> 
>> (We just had an office discussion if this is part of the hash transition 
>> plan,
>> i.e. have a field in the commit object to contain its generation number.
>> and as I think the generation numbers would lead to fast and correct
>> behavior unlike the current heuristic which is only fast, I would try
>> to make a strong point actual generation numbers inside commit objects)
>
> I'm still moderately against storing generation numbers inside the
> objects. They're redundant with the existing parent pointers, which
> means it's an opportunity for the two sets of data to disagree. And as
> we've seen, once errors are cemented in history it's very hard to fix
> them, because you break any history built on top.
>
> I'm much more in favor of building a local cache of generation numbers
> (either on the fly or during repacks, where we can piggy-back on the
> existing pack .idx for indexing).

I guess our mails crossed.  Yes, objects that are needlessly broken
only because they botch computation of derivable values are real
problem, as we need to accomodate them forever because histories can
and will be built on top of them.

On the other hand, seeing that the world did not stop even with some
projects have trees with entries whose mode are written with 0-padding
on the left in the ancient part of their histories, it might not be
such a big deal.  I dunno.

Re: [PATCH] usage: conditionally compile unleak_memory() definition

2017-09-07 Thread Jeff King

On Thu, Sep 07, 2017 at 05:08:23PM +0100, Ramsay Jones wrote:

> Hi Jeff,
> 
> If you need to re-roll your 'jk/leak-checkers' branch, could you
> please squash this into the relevant patch (commit c57586d954,
> "add UNLEAK annotation for reducing leak false positives", 05-09-2017).
> 
> This was noticed by sparse, since it is not declared when the
> SUPPRESS_ANNOTATED_LEAKS pre-processor macro is not defined.
> (You could move the declaration out of the #ifdef in the header
> file, I suppose, but it seems pointless to compile the function
> if it isn't being used).

Thanks, that makes sense. There are a handful of other bits to modify in
that final commit, so I will end up with at least one re-roll.

-Peff

Re: [PATCH 27/34] shortlog: release strbuf after use in insert_one_record()

2017-09-07 Thread Jeff King

On Fri, Sep 08, 2017 at 09:33:38AM +0900, Junio C Hamano wrote:

> >> An alterative, as this is the only place we add to log->list, could
> >> be to make log->list take ownership of the string by not adding a
> >> _release() here but instead _detach(), I guess.
> >
> > I agree that would be better, but I think it's a little tricky. The
> > string_list_insert() call may make a new entry, or it may return an
> > existing one. We'd still need to free in the latter case. I'm not sure
> > the string_list interface makes it easy to tell the difference.
> 
> True; I do not think string_list API does.  But for this particular
> application, I suspect that we can by looking at the util field of
> the item returned.  A newly created one has NULL, but we always make
> it non-NULL before leaving this function.

Yeah, I agree that would work here.

I also wondered if we could get away with avoiding the malloc entirely
here. Especially in the "shortlog -n" case, it is identical to the name
field we already have in ident.name. So ideally we'd do a lookup to see
if we have the entry before allocating anything (since we do one lookup
per commit, but only insert once per unique author).

But that doesn't quite work, because ident.name doesn't put to a
NUL-terminated string, and string_list only handles strings.

We _can_ reuse the same buffer over and over:

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 43c4799ea9..7328abf4a1 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -54,7 +54,7 @@ static void insert_one_record(struct shortlog *log,
struct string_list_item *item;
const char *mailbuf, *namebuf;
size_t namelen, maillen;
-   struct strbuf namemailbuf = STRBUF_INIT;
+   static struct strbuf namemailbuf = STRBUF_INIT;
struct ident_split ident;

if (split_ident_line(, author, strlen(author)))
@@ -66,6 +66,7 @@ static void insert_one_record(struct shortlog *log,
maillen = ident.mail_end - ident.mail_begin;

map_user(>mailmap, , , , );
+   strbuf_reset();
strbuf_add(, namebuf, namelen);

if (log->email)

That saves the malloc, if not the extra copying. It shows about a 1%
speed up on "git shortlog -ns" on linux.git. Probably that's not worth
caring too much about, but it also "solves" the leaking problem (I'm not
sure if the speedup is from calling malloc less frequently, or from
lowering our peak heap usage due to fixing the leak).

-Peff

Re: [PATCH 07/10] t1404: demonstrate two problems with reference transactions

2017-09-07 Thread Junio C Hamano

Michael Haggerty  writes:

> + git for-each-ref $prefix >actual &&
> + grep "Unable to create $Q.*packed-refs.lock$Q: File exists" err &&

I added a squash for doing s/grep/test_i18n&/ here; are there other
issues in the series, or is the series more or less ready to be
polished in 'next'?

Thanks.

Re: "git shortlog -sn --follow -- " counts all commits to entire repo

2017-09-07 Thread Jeff King

On Thu, Sep 07, 2017 at 12:30:01PM -0700, Stefan Beller wrote:

> > "--follow" switch is not listed on
> > https://git-scm.com/docs/git-shortlog so maybe it's not supported. In
> > this case I would expect error message.
> >
> > Tried the following versions:
> > "git version 2.14.1.windows.1" on Windows 7
> > "git version 2.7.4" on Ubuntu 16.04
> 
> The shortlog takes most (all?) options that git-log
> does, e.g. in git.git:

That's definitely the intent, but I think --follow is just buggy here.
We don't seem to trigger a diff, which is where "git log" does the
follow check (because of the way it's bolted onto the diff, and not the
actual pathspec-pruning mechanism).

Something like the patch below seems to work, but there are a lot of
open questions. And it would probably want to do something to prevent
nonsense like "shortlog -p" from showing actual diffs.

I suspect a better solution might involve actually building on
log-tree.c to do the traversal (since this internal traversal is
supposed to be equivalent to "git log | git shortlog").

diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 43c4799ea9..31274a92f5 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -175,8 +175,31 @@ static void get_from_rev(struct rev_info *rev, struct 
shortlog *log)
 
if (prepare_revision_walk(rev))
die(_("revision walk setup failed"));
-   while ((commit = get_revision(rev)) != NULL)
-   shortlog_add_commit(log, commit);
+   while ((commit = get_revision(rev)) != NULL) {
+   int show_commit;
+
+   /* trigger a diff to give --follow a chance to kick in */
+   if (!commit->parents) {
+   /* should diff against empty tree or respect --root? */
+   show_commit = 1;
+   } else if (commit->parents->next) {
+   /* how to handle merge commits? */
+   show_commit = 1;
+   } else {
+   struct commit *parent = commit->parents->item;
+
+   parse_commit_or_die(parent);
+   parse_commit_or_die(commit);
+   diff_tree_oid(>tree->object.oid,
+ >tree->object.oid,
+ "", >diffopt);
+   show_commit = !diff_queue_is_empty();
+   diff_flush(>diffopt);
+   }
+
+   if (show_commit)
+   shortlog_add_commit(log, commit);
+   }
 }
 
 static int parse_uint(char const **arg, int comma, int defval)

Re: [PATCH 27/34] shortlog: release strbuf after use in insert_one_record()

2017-09-07 Thread Jeff King

On Thu, Sep 07, 2017 at 11:56:48PM -0400, Jeff King wrote:

> > True; I do not think string_list API does.  But for this particular
> > application, I suspect that we can by looking at the util field of
> > the item returned.  A newly created one has NULL, but we always make
> > it non-NULL before leaving this function.
> 
> Yeah, I agree that would work here.
> 
> I also wondered if we could get away with avoiding the malloc entirely
> here. Especially in the "shortlog -n" case, it is identical to the name
> field we already have in ident.name. So ideally we'd do a lookup to see
> if we have the entry before allocating anything (since we do one lookup
> per commit, but only insert once per unique author).
> 
> But that doesn't quite work, because ident.name doesn't put to a
> NUL-terminated string, and string_list only handles strings.

I happened to look at this more while digging on an unrelated shortlog
bug. I think the whole thing could actually be reorganized a bit.

We call insert_one_record() from shortlog_add_commit(). The latter
formats "%an <%ae>", only to have the former parse it back to its
constituent parts. That seems rather silly.

This is an artifact of shortlog's original mode, which was to parse "git
log" output. But for an internal traversal, we can just format the
correct item right off the bat. That part of insert_one_record() is also
where we handle the mailmap mapping. But again, the internal traversal
can just "%aE" to format that correctly in the first place.

IOW, something like the patch below, which pushes the re-parsing out to
the stdin code-path, and lets the internal traversal format directly
into the final buffer. It seems to be about 3% faster than the existing
code, and fixes the leak (by dropping that variable entirely).

-Peff

---
diff --git a/builtin/shortlog.c b/builtin/shortlog.c
index 43c4799ea9..e29875b843 100644
--- a/builtin/shortlog.c
+++ b/builtin/shortlog.c
@@ -52,26 +52,8 @@ static void insert_one_record(struct shortlog *log,
  const char *oneline)
 {
struct string_list_item *item;
-   const char *mailbuf, *namebuf;
-   size_t namelen, maillen;
-   struct strbuf namemailbuf = STRBUF_INIT;
-   struct ident_split ident;
 
-   if (split_ident_line(, author, strlen(author)))
-   return;
-
-   namebuf = ident.name_begin;
-   mailbuf = ident.mail_begin;
-   namelen = ident.name_end - ident.name_begin;
-   maillen = ident.mail_end - ident.mail_begin;
-
-   map_user(>mailmap, , , , );
-   strbuf_add(, namebuf, namelen);
-
-   if (log->email)
-   strbuf_addf(, " <%.*s>", (int)maillen, mailbuf);
-
-   item = string_list_insert(>list, namemailbuf.buf);
+   item = string_list_insert(>list, author);
 
if (log->summary)
item->util = (void *)(UTIL_TO_INT(item) + 1);
@@ -114,9 +96,33 @@ static void insert_one_record(struct shortlog *log,
}
 }
 
+static int parse_stdin_author(struct shortlog *log,
+  struct strbuf *out, const char *in)
+{
+   const char *mailbuf, *namebuf;
+   size_t namelen, maillen;
+   struct ident_split ident;
+
+   if (split_ident_line(, in, strlen(in)))
+   return -1;
+
+   namebuf = ident.name_begin;
+   mailbuf = ident.mail_begin;
+   namelen = ident.name_end - ident.name_begin;
+   maillen = ident.mail_end - ident.mail_begin;
+
+   map_user(>mailmap, , , , );
+   strbuf_add(out, namebuf, namelen);
+   if (log->email)
+   strbuf_addf(out, " <%.*s>", (int)maillen, mailbuf);
+
+   return 0;
+}
+
 static void read_from_stdin(struct shortlog *log)
 {
struct strbuf author = STRBUF_INIT;
+   struct strbuf mapped_author = STRBUF_INIT;
struct strbuf oneline = STRBUF_INIT;
static const char *author_match[2] = { "Author: ", "author " };
static const char *committer_match[2] = { "Commit: ", "committer " };
@@ -134,9 +140,15 @@ static void read_from_stdin(struct shortlog *log)
while (strbuf_getline_lf(, stdin) != EOF &&
   !oneline.len)
; /* discard blanks */
-   insert_one_record(log, v, oneline.buf);
+
+   strbuf_reset(_author);
+   if (parse_stdin_author(log, _author, v) < 0)
+   continue;
+
+   insert_one_record(log, mapped_author.buf, oneline.buf);
}
strbuf_release();
+   strbuf_release(_author);
strbuf_release();
 }
 
@@ -153,7 +165,9 @@ void shortlog_add_commit(struct shortlog *log, struct 
commit *commit)
ctx.date_mode.type = DATE_NORMAL;
ctx.output_encoding = get_log_output_encoding();
 
-   fmt = log->committer ? "%cn <%ce>" : "%an <%ae>";
+   fmt = log->committer ?
+   (log->email ? "%cN <%cE>" : "%cN") :
+   (log->email ? "%aN <%aE>" : "%aN");

Re: Donation

2017-09-07 Thread Mavis Wanczyk Foundation

Greetings To You,

My Name is Mavis wanczyk , the winner of the Power ball jackpot of $ $758.7 
million  in the AUGUST 24, 2017, My jackpot was a gift from God to me hence my 
Entire family/foundation has AGREED to do this. My foundation is donating 
$500,000.00USD to you. please contac maviswanczyk...@gmail.com for full details 
and please accept this token as a gift from me and my family.

Read more: 
http://money.cnn.com/2017/08/23/news/powerball-700-million-jackpot/index.html

 Best Regards,
 Mavis Wanczyk

Re: [RFC PATCH 0/2] Add named reference to latest push cert

2017-09-07 Thread Shikher Verma

Hey everyone,

I felt like I should introduce myself since this is my first patch on
the git mailing list (or any mailing list actually) :D

I am Shikher[1], currently in my 4th year undergrad at IIT Kanpur.
This summer I was lucky enough to intern at NYU Secure Systems Lab[2]
mentored by Santiago. We looked into how signed pushes work and how 
we can use them to increase the security of git. We encountered a
strange error in tests which resulted in a patch[3] and I wrote a
python script to verify push certificates[4]. I was pretty surprised 
to not find any push certificate on the remote repo after I did a 
signed push, hence this RFC.

Anyway this is my first time trying to contribute to a large OSS so 
forgive me if I make any noob mistakes.

Thanks
Shikher Verma

[1]http://shikherverma.com/
[2]https://ssl.engineering.nyu.edu/
[3]https://public-inbox.org/git/20170707220159.12752-1-santi...@nyu.edu/
[4]https://gist.github.com/ShikherVerma/9204060b545c00597e7ad9b694cfeb9c

On Wed, Sep 06, 2017 at 03:09:11PM +0530, Shikher Verma wrote:
> Currently, git only stores push certificates if there is a receive hook 
> present. This may violate the principle of least surprise (e.g., I 
> pushed with --signed, and I don't see anything in upstream). 
> Additionally, push certificates could be more versatile if they are not 
> tightly bound to git hooks. Finally, it would be useful to verify the 
> signed pushes at later points of time with ease.
> 
> A named ref is added for ease of access/tooling around push 
> certificates. If the last push was signed, ref/PUSH_CERT stores the 
> ref of the latest push cert otherwise it is empty.
>  
> Sending patches as RFC since the documentation would have to be 
> updated and git gc might have to be patched to not garbage collect 
> the latest push certificate.
> 
> This patch applies on master (3ec7d702a) 
> 
> Shikher Verma (2):
>   Always write push cert to disk
>   Store latest push cert ref in PUSH_CERT
> 
>  builtin/receive-pack.c | 25 -
>  path.c |  1 +
>  path.h |  1 +
>  3 files changed, 22 insertions(+), 5 deletions(-)
> 
> -- 
> 2.14.1
>

How to include references to subfolders

2017-09-07 Thread Norike


Hello,

I've been searching what's the best way to include a reference to a 
subfolder/subdirectory from a (secondary) remote repo into a subfolder 
of my (primary) repo.


Here I found some approaches:
https://stackoverflow.com/a/30386041/509369

Where the first 3 options don't seem to support sending changes back to 
the mainstream; and the 4rth is replicating the whole repo.


Ideally:
- The reference is stored somewhere in the (primary) repo so when 
someone clones it, the referenced folder (from the secondary repo) is 
also cloned.

- You can pull changes from the remote repo into the subfolder of your repo.
- You can make changes to the subfolder directly from your (primary) 
repo, and push them to the remote repo (of course depending on the 
privileges of the remote repo).

- Allow multiple remote subfolders.

Extra: it could be more complex, but what about single remote files?

I'm not sure if this is possible with the current set of git tools, if 
it can be added in a future or if it's just a no-sense question.


Thanks in advance!

--
Salut,
Yeray

Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-07 Thread Michael Haggerty

On Wed, Sep 6, 2017 at 3:26 AM, Junio C Hamano  wrote:
> Thomas Gummerer  writes:
>
>> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed
>> as the second parameter of memcpy.
>> [...]
>
> It is hugely annoying to see a halfway-intelligent compiler forces
> you to add such pointless asserts.
>
> The only way the compiler could error on this is by inferring the
> fact that new_sha1/old_sha1 could be NULL by looking at the callsite
> in ref_transaction_update() where these are used as conditionals to
> set HAVE_NEW/HAVE_OLD that are passed.  Even if the compiler were
> doing the whole-program analysis, the other two callsites of the
> function pass the address of oid.hash[] in an oid structure so it
> should know these won't be NULL.
>
> [...]
>
> I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these
> codepaths, though.  Perhaps we can instead declare !!new_sha1 means
> we have the new side and rewrite the above part to
>
> if (new_sha1)
> hashcpy(update->new_oid.hash, new_sha1);
>
> without an extra and totally pointless assert()?

The ultimate reason for those flags is that `struct ref_update` embeds
`new_oid` and `old_oid` directly in the struct, so there is no way to
set it to "NULL". (The `is_null_sha1` value is used for a different
purpose.) So those flags keep track of whether the corresponding value
is specified or absent.

Four of the five callers of `ref_transaction_add_update()` are
constructing a new `ref_update` from an old one. They currently don't
have to look into `flags`; they just pass it on (possibly changing a
bit or two). Implementing your proposal would oblige those callers to
change from something like

> new_update = ref_transaction_add_update(
> transaction, "HEAD",
> update->flags | REF_LOG_ONLY | REF_NODEREF,
> update->new_oid.hash, update->old_oid.hash,
> update->msg);

to

> new_update = ref_transaction_add_update(
> transaction, "HEAD",
> update->flags | REF_LOG_ONLY | REF_NODEREF,
> (update->flags & REF_HAVE_NEW) ? update->new_oid.hash : NULL,
> (update->flags & REF_HAVE_OLD) ? update->old_oid.hash : NULL,
> update->msg);

It's not the end of the world, but it's annoying.
`ref_transaction_add_update()` was meant to be a low-level,
low-overhead way of allocating a `struct ref_update` and add it to a
transaction.

Another solution (also annoying, but maybe a tad less so) would be to
change the one iffy caller, `ref_transaction_update()`, to pass in a
pointer to the null OID for `new_sha1` and `old_sha1` when the
corresponding flags are turned off. That value would never be looked
at, but it would hopefully reassure gcc.

I did just realize one thing: `ref_transaction_update()` takes `flags`
as an argument and alters it using

> flags |= (new_sha1 ? REF_HAVE_NEW : 0) | (old_sha1 ? REF_HAVE_OLD : 
> 0);

Perhaps gcc is *more* intelligent than we give it credit for, and is
actually worried that the `flags` argument passed in by the caller
might *already* have one of these bits set. In that case
`ref_transaction_add_update()` would indeed be called incorrectly.
Does the warning go away if you change that line to

> if (new_sha1)
> flags |=REF_HAVE_NEW;
> else
> flags &= ~REF_HAVE_NEW;
> if (old_sha1)
> flags |=REF_HAVE_OLD;
> else
> flags &= ~REF_HAVE_OLD;

? This might be a nice change to have anyway, to isolate
`ref_transaction_update()` from mistakes by its callers. For that
matter, one might want to be even more selective about what bits are
allowed in the `flags` argument to `ref_transaction_update()`'s
callers:

> flags &= REF_ALLOWED_FLAGS; /* value would need to be determined */

Michael

Re: Donation

2017-09-07 Thread Mavis Wanczyk Foundation

Greetings To You,

My Name is Mavis wanczyk , the winner of the Power ball jackpot of $ $758.7 
million  in the AUGUST 24, 2017, My jackpot was a gift from God to me hence my 
Entire family/foundation has AGREED to do this. My foundation is donating 
$500,000.00USD to you. please contac maviswanczyk...@gmail.com for full details 
and please accept this token as a gift from me and my family.

Read more: 
http://money.cnn.com/2017/08/23/news/powerball-700-million-jackpot/index.html

 Best Regards,
 Mavis Wanczyk

63 matches

Mail list logo