date:20170905

What's cooking in git.git (Sep 2017, #01; Wed, 6)

2017-09-05 Thread Junio C Hamano

Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.  The ones marked with '.' do not appear in any of
the integration branches, but I am still holding onto them.

We are at week #5 of this cycle.  It seems that people had a
productive week while I was away, which I am reasonably happy about
;-) I counted about a dozen or so topics, among which I managed to
pick up only 4 or 5 so far X-<.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to "master"]

* ks/branch-set-upstream (2017-08-17) 3 commits
  (merged to 'next' on 2017-08-22 at 10fd938390)
 + branch: quote branch/ref names to improve readability
 + builtin/branch: stop supporting the "--set-upstream" option
 + t3200: cleanup cruft of a test

 "branch --set-upstream" that has been deprecated in Git 1.8 has
 finally been retired.


* po/read-graft-line (2017-08-18) 4 commits
  (merged to 'next' on 2017-08-22 at 1e3fe0d3a1)
 + commit: rewrite read_graft_line
 + commit: allocate array using object_id size
 + commit: replace the raw buffer with strbuf in read_graft_line
 + sha1_file: fix definition of null_sha1

 Conversion from uchar[20] to struct object_id continues; this is to
 ensure that we do not assume sizeof(struct object_id) is the same
 as the length of SHA-1 hash (or length of longest hash we support).


* rs/archive-excluded-directory (2017-08-19) 3 commits
  (merged to 'next' on 2017-08-22 at 1853597c35)
 + archive: don't queue excluded directories
 + archive: factor out helper functions for handling attributes
 + t5001: add tests for export-ignore attributes and exclude pathspecs

 "git archive" did not work well with pathspecs and the
 export-ignore attribute.

 We may want to resurrect the "we don't archive an empty directory"
 bonus patch, but I do not mind merging the above early to 'next'
 and leave it as a separate follow-up enhancement.
 cf. <20170820090629.tumvqwzkromcy...@sigill.intra.peff.net>

--
[New Topics]

* dw/diff-highlight-makefile-fix (2017-09-06) 1 commit
 - diff-highlight: add clean target to Makefile

 Build clean-up.

 Will merge to 'next'.


* jk/config-lockfile-leak-fix (2017-09-06) 1 commit
 - config: use a static lock_file struct

 A leakfix.

 Will merge to 'next'.


* kw/merge-recursive-cleanup (2017-09-06) 4 commits
 - SQUASH???
 - merge-recursive: change current file dir string_lists to hashmap
 - merge-recursive: remove return value from get_files_dirs
 - merge-recursive: fix memory leak

 A leakfix and code clean-up.


* ma/pkt-line-leakfix (2017-09-06) 1 commit
 - pkt-line: re-'static'-ify buffer in packet_write_fmt_1()

 A leakfix.

 Will merge to 'next'.


* ma/split-symref-update-fix (2017-09-06) 3 commits
 - refs/files-backend: correct return value in lock_ref_for_update
 - refs/files-backend: fix memory leak in lock_ref_for_update
 - refs/files-backend: add longer-scoped copy of string to list

 A leakfix.

 Will merge to 'next'.


* mg/timestamp-t-fix (2017-09-06) 1 commit
 - name-rev: change ULONG_MAX to TIME_MAX

 A mismerge fix.

 Will merge to 'next'.

--
[Stalled]

* mg/status-in-progress-info (2017-05-10) 2 commits
 - status --short --inprogress: spell it as --in-progress
 - status: show in-progress info for short status

 "git status" learns an option to report various operations
 (e.g. "merging") that the user is in the middle of.

 cf. 


* nd/worktree-move (2017-04-20) 6 commits
 - worktree remove: new command
 - worktree move: refuse to move worktrees with submodules
 - worktree move: accept destination as directory
 - worktree move: new command
 - worktree.c: add update_worktree_location()
 - worktree.c: add validate_worktree()

 "git worktree" learned move and remove subcommands.

 Expecting a reroll.
 cf. <20170420101024.7593-1-pclo...@gmail.com>
 cf. <20170421145916.mknekgqzhxffu...@sigill.intra.peff.net>
 cf. 


* sg/clone-refspec-from-command-line-config (2017-06-16) 2 commits
 - Documentation/clone: document ignored configuration variables
 - clone: respect additional configured fetch refspecs during initial fetch
 (this branch is used by sg/remote-no-string-refspecs.)

 "git clone -c var=val" is a way to set configuration variables in
 the resulting repository, but it is more useful to also make these
 variables take effect while the initial clone is happening,
 e.g. these configuration variables could be fetch refspecs.

 Waiting for a response.
 cf. <20170617112228.vugswym4o4owf...@sigill.intra.peff.net>
 cf. 


* js/rebase-i-final (2017-07-27) 10 commits
 -

Re: [PATCH] config: use a static lock_file struct

2017-09-05 Thread Junio C Hamano

Jeff King  writes:

> In the long run we may want to drop the "tempfiles must remain forever"
> rule. This is certainly not the first time it has caused confusion or
> leaks. And I don't think it's a fundamental issue, just the way the code
> is written. But in the interim, this fix is probably worth doing.

I notice you also have a series to rework the "must remain forever"
on the list, but nevertheless let's take this one first.

Will queue.

Thanks.

Re: [PATCH] hashmap: add API to disable item counting when threaded

2017-09-05 Thread Junio C Hamano

Jeff King  writes:

> On Sat, Sep 02, 2017 at 01:31:19AM +0200, Johannes Schindelin wrote:
>
>> BTW this made me think that we may have a problem in our code since
>> switching from my original hashmap implementation to the bucket one added
>> in 6a364ced497 (add a hashtable implementation that supports O(1) removal,
>> 2013-11-14): while it is not expected that there are many collisions, the
>> "grow_at" logic still essentially assumes the number of buckets to be
>> equal to the number of hashmap entries.
>
> I'm confused about what the problem is. If I am reading the code
> correctly, "size" is always the number of elements and "grow_at" is the
> table size times a load factor. Those are the same numbers you'd use to
> decide to grow in an open-address table.
>
> It's true that this does not take into account the actual number of
> collisions we see (or the average per bucket, or however you want to
> count it). But generally nor do open-address schemes (and certainly our
> other hash tables just use load factor to decide when to grow).

Are we comparing the hashmap.[ch] with the hash.[ch] added in
9027f53c ("Do linear-time/space rename logic for exact renames",
2007-10-25)?  I am a bit confused because Johannes calls it "my"
original.

Unless the real person in this discussion thread sending the
messages under Johannes's name is Linus, that is ;-).  Or maybe the
"original" being compared is something other than the series with
6a364ced497 replaced with its hashmap.[ch]?

In any case, I do think your reading of the code is correct in that
the comparison between size and grow-at/shrink-at is done correctly
with the true load factor of the table, not how many buckets out of
the possible buckets are filled.  

Old one used to grow at 50% full and never shrunk it, but the
current one grows at 80% and shrinks at a bit below 40%; I agree
with Dscho's feeling (in part not quoted above) that 50% vs 80%
doesn't seem to have been backed by any numbers, but optimizing the
load factor is outside the scope of this series, I would think.

6a364ced ("add a hashtable implementation that supports O(1)
removal", 2013-11-14) credits less frequent resizing for gain of
insert performance, but my hunch is that the need for frequent
resizing in the version before it primarily comes from the fact that
the table started empty (as opposed to having an initial size of 64,
which is what the current implementation uses).

Re: [PATCH 3/3] merge-recursive: change current file dir string_lists to hashmap

2017-09-05 Thread Junio C Hamano

I needed this squashed into before I could even compile the
resulting code.  Perhaps my compiler is stale?



merge-recursive.c:28:22: error: declaration does not declare anything [-Werror]
  struct hashmap_entry;
  ^
merge-recursive.c:29:7: error: flexible array member in otherwise empty struct
  char path[FLEX_ARRAY];
   ^
---
 merge-recursive.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index ef4fe5f09f..1cd35db3f0 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -25,7 +25,7 @@
 #include "submodule.h"
 
 struct path_hashmap_entry {
-   struct hashmap_entry;
+   struct hashmap_entry e;
char path[FLEX_ARRAY];
 };
 
-- 
2.14.1-546-g97ae4c876d

Re: [PATCH] name-rev: change ULONG_MAX to TIME_MAX

2017-09-05 Thread Junio C Hamano

Michael J Gruber  writes:

> Earlier, dddbad728c ("timestamp_t: a new data type for timestamps",
> 2017-04-26) changed several types to timestamp_t.
>
> 5589e87fd8 ("name-rev: change a "long" variable to timestamp_t",
> 2017-05-20) cleaned up a missed variable, but both missed a _MAX
> constant.
>
> Change the remaining constant to the one appropriate for the current
> type
>
> Signed-off-by: Michael J Gruber 
> ---

Thanks.

I think this (and the earlier 5589e8) was caused by an unnoticed
semantic conflict at 78089b71 ("Merge branch 'jc/name-rev-lw-tag'",
2017-05-30).  Merging is sometimes hard ;-)

Will queue.

>  builtin/name-rev.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/builtin/name-rev.c b/builtin/name-rev.c
> index c41ea7c2a6..598da6c8bc 100644
> --- a/builtin/name-rev.c
> +++ b/builtin/name-rev.c
> @@ -253,7 +253,7 @@ static int name_ref(const char *path, const struct 
> object_id *oid, int flags, vo
>   struct commit *commit = (struct commit *)o;
>   int from_tag = starts_with(path, "refs/tags/");
>  
> - if (taggerdate == ULONG_MAX)
> + if (taggerdate == TIME_MAX)
>   taggerdate = ((struct commit *)o)->date;
>   path = name_ref_abbrev(path, can_abbreviate_output);
>   name_rev(commit, xstrdup(path), taggerdate, 0, 0,

Re: [PATCH] format-patch: use raw format for notes

2017-09-05 Thread Junio C Hamano

Sam Bobroff  writes:

> If "--notes=..." is used with "git format-patch", the notes are
> prefixed with the ref's local name and indented, which looks odd and
> exposes the path of the ref.
>
> Extend the test that suppresses this behaviour so that it also catches
> this case, causing the notes to be included without additional
> processing.
>
> Signed-off-by: Sam Bobroff 
> ---
>
> Notes (foo):
> Hi,
> 
> I've noticed what appears to be a small cosmetic bug in git format-patch, 
> as
> I've described in the commit message.
> 
> I'm not sure if this patch is the right way to fix it (or perhaps it's 
> not even
> a bug), but it should at least help to explain what I'm talking about.
> 
> I've used "git format-patch --notes=foo" to prepare this email so that it 
> is an
> example of the issue :-)
> 
> Cheers,
> Sam.

Is the above addition from your 'foo' notes with or without this
patch?  I think the answer is "without", and the above "example"
looks just fine to me.

 - It is very much intended to allow The "(foo)" after the "Notes"
   label to show which notes ref the note comes from, because there
   can be more than one notes refs that annotate the same commit.

 - And the contents are indented, just like the diffstat and other
   stuff we place after "---" but before the first "diff", to ensure
   no matter what text appears there it will not be mistaken as part
   sure that the contents from the notes will not be mistaken as part
   of the patch.

I do not think an unconditional change of the established format,
like your patch does, is acceptable, as existing users have relied
on, and expect to be able to continue relying on, the above two
aspect of the current format.

But I am somewhat curious what your use case that wants to insert
the raw contents there is.  We may be able to construct a valid
argument to add such an output as an optional feature if there is a
good use case for it.

Thanks.

>  log-tree.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/log-tree.c b/log-tree.c
> index 410ab4f02..26bc21ad3 100644
> --- a/log-tree.c
> +++ b/log-tree.c
> @@ -655,7 +655,8 @@ void show_log(struct rev_info *opt)
>   int raw;
>   struct strbuf notebuf = STRBUF_INIT;
>  
> - raw = (opt->commit_format == CMIT_FMT_USERFORMAT);
> + raw = (opt->commit_format == CMIT_FMT_USERFORMAT) ||
> +   (opt->commit_format == CMIT_FMT_EMAIL);
>   format_display_notes(>object.oid, ,
>get_log_output_encoding(), raw);
>   ctx.notes_message = notebuf.len

Re: [PATCH] parse-options: warn developers on negated options

2017-09-05 Thread Junio C Hamano

Junio C Hamano  writes:

> Stefan Beller  writes:
>
>>   This patch disallows all no- options, but we could be more open and allow
>>   --no-options that have the NO_NEG bit set.
>
> "--no-foo" that does not take "--foo" is perhaps OK so should not
> trigger an error.
>
> A ("--no-foo", "--foo") pair is better spelled as ("--foo",
> "--no-foo") pair whose default is "--foo", but making it an error is
> probably a bit too much.
>
> Compared to that, ("--no-foo", "--no-no-foo") pair feels nonsense.

Ahh, I was an idiot (call it vacation-induced-brain-disfunction).  I
forgot about 0f1930c5 ("parse-options: allow positivation of options
starting, with no-", 2012-02-25), which may have already made your
new use of "--no-verify" in builtin/merge.c and existing one in
commit.c OK long time ago.  A quick check to see how your version of

git merge --verify
git merge --no-verify

behaves with respect to the commit-msg hook is veriy much
appreciated, as my tree is in no shape to apply and try a patch
while trying to absorb the patches sent to the list the past week.

Thanks, and sorry for a possible false alarm.

> Having said that, because the existing parse_options_check() is all
> about catching the programming mistake (the end user cannot fix an
> error from it by tweaking the command line option s/he gives to the
> program), I do not think a conditional compilation like you added
> mixes well.  Either make the whole thing, not just your new test,
> conditional to -DDEVELOPER (which would make it possible for you to
> build and ship a binary with broken options[] array to the end-users
> that does not die in this function), which is undesirable, or add a
> new test that catches a definite error unconditionally.

This part still is valid.  If René's work 2 years ago is sufficient
to address "--no-foo" thing, then there is nothing we need to add to
this test, but if we later need to add new sanity check, we should
add it without -DDEVELOPER, or we should make the whole thing inside
it.

Thanks.

Re: [PATCHv2] builtin/merge: honor commit-msg hook for merges

2017-09-05 Thread Junio C Hamano

Stefan Beller  writes:

> Junio writes:
>> I didn't check how "merge --continue" is implemented, but we need to
>> behave exactly the same way over there, too.  Making sure that it is
>> a case in t7504 may be a good idea, in addition to the test you
>> added.
>
> After inspection of the code I do not think it is a good idea, because
> (a) it clutters the test suite with something "obvious" for now,
> the call to cmd_commit will be the same as git-commit on the
> command line and
> (b) piping through --[no-]verify would either introduce irregularities
> ("Why do we pipe through --no-verify, when --sign-off is more important?")
> or miss important options to pipe through: 
>
>   static int continue_current_merge;
> ...
>   OPT_BOOL(0, "continue", _current_merge,
>   N_("continue the current in-progress merge")),
> ...
>   if (continue_current_merge) {
>   int nargc = 1;
>   const char *nargv[] = {"commit", NULL};
>
>   if (orig_argc != 2)
>   usage_msg_opt(_("--continue expects no arguments"),
> builtin_merge_usage, builtin_merge_options);
>
>   /* Invoke 'git commit' */
>   ret = cmd_commit(nargc, nargv, prefix);
>   goto done;
>   }

That line of thought is backwards.  'something "obvious" for now'
talks about the present.  tests are all about future-proofing.

I also thought that we were hunting calls of cmd_foo() from outside
the git.c command dispatcher as grave errors and want to clean up
the codebase to get rid of them.  So the above is the worst example
to use when you are trying to convince why it needs no test---the
above is a good example of the code that would need to change soon
when we have enough volunteers willing to keep the codebase clean
and healthy, and we would benefit from future-proofing tests.

Re: [PATCH] parse-options: warn developers on negated options

2017-09-05 Thread Junio C Hamano

Stefan Beller  writes:

>   This patch disallows all no- options, but we could be more open and allow
>   --no-options that have the NO_NEG bit set.

"--no-foo" that does not take "--foo" is perhaps OK so should not
trigger an error.

A ("--no-foo", "--foo") pair is better spelled as ("--foo",
"--no-foo") pair whose default is "--foo", but making it an error is
probably a bit too much.

Compared to that, ("--no-foo", "--no-no-foo") pair feels nonsense.

Having said that, because the existing parse_options_check() is all
about catching the programming mistake (the end user cannot fix an
error from it by tweaking the command line option s/he gives to the
program), I do not think a conditional compilation like you added
mixes well.  Either make the whole thing, not just your new test,
conditional to -DDEVELOPER (which would make it possible for you to
build and ship a binary with broken options[] array to the end-users
that does not die in this function), which is undesirable, or add a
new test that catches a definite error unconditionally.

Re: git merge algorithm question

2017-09-05 Thread Bryan Turner

On Tue, Sep 5, 2017 at 5:53 PM, Daniel Biran  wrote:
>
>>> I'm trying to better understand one of the merge algorithms as I had some 
>>> triumphs and tribulations with using a set of commands during a merge. 
>>> tldr: can a git merge -s recursive -X patience; // result in a fast-forward 
>>> merge? will --no-ff stop it
>>>
>>> So, the scenario is this:
>>>  - Merging a master branch into a feature branch that is 2+ years old
>>>  - We found this command was more beneficial when merging a large 20k 
>>> line text file:
>>>  - git merge -s recursive -X patience master
>>>  - In a recent merge using this approach the reflog shows that the 
>>> merge was performed using a fast-forward from the feature branch's head
>>>  - 082517-1, feature/branch) HEAD@{23}: merge feature/branch: 
>>> Fast-forward
>>>
>>>
>>> My question is, is it possible for that command to use a fast-forward like 
>>> this? (or did something else go horribly wrong? possibly an atlassian git 
>>> GUI tool corrupting the work):
>>>  - If it is possible for the command to fast-forward the merge when 
>>> making the commit does --no-ff force the command to never use fast-forward 
>>> in this case

Unless you specify --no-ff, git merge is always free to create a
fast-forward "merge", even when you request the recursive strategy
explicitly:

$ git init recursive-merge
Initialized empty Git repository in C:/Temp/recursive-merge/.git/

$ cd recursive-merge/

$ echo "Test" > file.txt

$ git add file.txt

$ git commit -m "Initial commit"
[master (root-commit) ad48617] Initial commit
 1 file changed, 1 insertion(+)
 create mode 100644 file.txt

$ git checkout -b feature-branch
Switched to a new branch 'feature-branch'

$ echo "Edit" >> file.txt

$ git commit -am "Feature branch change"
[feature-branch b226557] Feature branch change
 1 file changed, 1 insertion(+)

$ git checkout master
Switched to branch 'master'

$ git merge -s recursive -X patience feature-branch
Updating ad48617..b226557
Fast-forward
 file.txt | 1 +
 1 file changed, 1 insertion(+)

With --no-ff:

$ git reset --hard ad48617
HEAD is now at ad48617 Initial commit

$ git merge --no-ff -s recursive -X patience feature-branch
Merge made by the 'recursive' strategy.
 file.txt | 1 +
 1 file changed, 1 insertion(+)

With fast-forwarding disabled, you can see the recursive strategy is
used as requested.

>>>
>>> Thanks for the help,
>>> Daniel
>>
>

Re: [PATCH 0/10] towards clean leak-checker output

2017-09-05 Thread Junio C Hamano

Jeff King  writes:

> In fact, the whole extract_argv0_path thing is pointless without
> RUNTIME_PREFIX. So I think one reasonable fix is just:
>
> diff --git a/exec_cmd.c b/exec_cmd.c
> index fb94aeba9c..09f05c3bc3 100644
> --- a/exec_cmd.c
> +++ b/exec_cmd.c
> @@ -5,7 +5,10 @@
>  #define MAX_ARGS 32
>  
>  static const char *argv_exec_path;
> +
> +#ifdef RUNTIME_PREFIX
>  static const char *argv0_path;
> +#endif
>  
>  char *system_path(const char *path)
>  {
> @@ -40,6 +43,7 @@ char *system_path(const char *path)
>  
>  void git_extract_argv0_path(const char *argv0)
>  {
> +#ifdef RUNTIME_PREFIX
>   const char *slash;
>  
>   if (!argv0 || !*argv0)
> @@ -49,6 +53,7 @@ void git_extract_argv0_path(const char *argv0)
>  
>   if (slash)
>   argv0_path = xstrndup(argv0, slash - argv0);
> +#endif
>  }
>  
>  void git_set_argv_exec_path(const char *exec_path)
>
> -Peff

Yeah, I kind of like this (I would have reduced the ifdef noise by
leaving a totally-unused argv0_path in the BSS, though).

Re: [PATCH] refs: make sure we never pass NULL to hashcpy

2017-09-05 Thread Junio C Hamano

Thomas Gummerer  writes:

> gcc on arch linux (version 7.1.1) warns that a NULL argument is passed
> as the second parameter of memcpy.
>
> In file included from refs.c:5:0:
> refs.c: In function ‘ref_transaction_verify’:
> cache.h:948:2: error: argument 2 null where non-null expected 
> [-Werror=nonnull]
>   memcpy(sha_dst, sha_src, GIT_SHA1_RAWSZ);
>   ^~~~
> In file included from git-compat-util.h:165:0,
>  from cache.h:4,
>  from refs.c:5:
> /usr/include/string.h:43:14: note: in a call to function ‘memcpy’ declared 
> here
>  extern void *memcpy (void *__restrict __dest, const void *__restrict __src,
>   ^~
> ...
> diff --git a/refs.c b/refs.c
> index ba22f4acef..d8c12a9c44 100644
> --- a/refs.c
> +++ b/refs.c
> @@ -896,10 +896,14 @@ struct ref_update *ref_transaction_add_update(
>  
>   update->flags = flags;
>  
> - if (flags & REF_HAVE_NEW)
> + if (flags & REF_HAVE_NEW) {
> + assert(new_sha1);
>   hashcpy(update->new_oid.hash, new_sha1);
> - if (flags & REF_HAVE_OLD)
> + }
> + if (flags & REF_HAVE_OLD) {
> + assert(old_sha1);
>   hashcpy(update->old_oid.hash, old_sha1);
> + }

It is hugely annoying to see a halfway-intelligent compiler forces
you to add such pointless asserts.

The only way the compiler could error on this is by inferring the
fact that new_sha1/old_sha1 could be NULL by looking at the callsite
in ref_transaction_update() where these are used as conditionals to
set HAVE_NEW/HAVE_OLD that are passed.  Even if the compiler were
doing the whole-program analysis, the other two callsites of the
function pass the address of oid.hash[] in an oid structure so it
should know these won't be NULL.

Or is the compiler being really dumb and triggering an error for
every use of

memcpy(dst, src, size);

that must now be written as

assert(src);
memcpy(dst, src, size);

???  That would be doubly annoying.

I wonder if REF_HAVE_NEW/REF_HAVE_OLD are really needed in these
codepaths, though.  Perhaps we can instead declare !!new_sha1 means
we have the new side and rewrite the above part to

if (new_sha1)
hashcpy(update->new_oid.hash, new_sha1);

without an extra and totally pointless assert()?

>   update->msg = xstrdup_or_null(msg);
>   return update;
>  }

Re: [PATCH] hashmap: add API to disable item counting when threaded

2017-09-05 Thread Junio C Hamano

Jeff Hostetler  writes:

> From: Jeff Hostetler 
>
> This is to address concerns raised by ThreadSanitizer on the
> mailing list about threaded unprotected R/W access to map.size with my 
> previous
> "disallow rehash" change (0607e10009ee4e37cb49b4cec8d28a9dda1656a4).  See:
> https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.ag...@gmail.com/
>
> Add API to hashmap to disable item counting and to disable automatic 
> rehashing.  
> Also include APIs to re-enable item counting and automatica rehashing.
>
> When item counting is disabled, the map.size field is invalid.  So to
> prevent accidents, the field has been renamed and an accessor function
> hashmap_get_size() has been added.  All direct references to this
> field have been been updated.  And the name of the field changed
> to map.private_size to communicate thie.
>
> Signed-off-by: Jeff Hostetler 
> ---
>  attr.c  | 14 ---
>  builtin/describe.c  |  2 +-
>  hashmap.c   | 31 +++-
>  hashmap.h   | 98 
> ++---
>  name-hash.c |  6 ++-
>  t/helper/test-hashmap.c |  2 +-
>  6 files changed, 113 insertions(+), 40 deletions(-)

I feel somewhat stupid to say this, especially after seeing many
people applaud this patch, but I do not seem to be able to even
build Git with this patch.  I am getting:

common-main.o: In function `hashmap_get_size':
/home/gitster/w/git.git/hashmap.h:260: multiple definition of 
`hashmap_get_size'
fast-import.o:/home/gitster/w/git.git/hashmap.h:260: first defined here
libgit.a(argv-array.o): In function `hashmap_get_size':
/home/gitster/w/git.git/hashmap.h:260: multiple definition of 
`hashmap_get_size'
fast-import.o:/home/gitster/w/git.git/hashmap.h:260: first defined here
libgit.a(attr.o): In function `hashmap_get_size':
...

and wonder if others are building with different options or something..

> diff --git a/hashmap.h b/hashmap.h
> index 7a8fa7f..7b8e6f4 100644
> --- a/hashmap.h
> +++ b/hashmap.h
> @@ -253,6 +253,19 @@ static inline void hashmap_entry_init(void *entry, 
> unsigned int hash)
>  }
>  
>  /*
> + * Return the number of items in the map.
> + */
> +inline unsigned int hashmap_get_size(struct hashmap *map)

I think this must become static inline like everybody else in this
file, at least.

I also wonder if this header is excessively inlining too many
functions without measuring first, but that is a different story.

Re: [PATCHv2] pull: honor submodule.recurse config option

2017-09-05 Thread Junio C Hamano

Nicolas Morey-Chaisemartin  writes:

> "git pull" supports a --recurse-submodules option but does not parse the
> submodule.recurse configuration item to set the default for that option.
> Meanwhile "git fetch" does support submodule.recurse, producing
> confusing behavior: when submodule.recurse is enabled, "git pull"
> recursively fetches submodules but does not update them after fetch.
>
> Handle submodule.recurse in "git pull" to fix this.
>
> Reported-by: Magnus Homann 
> Signed-off-by: Nicolas Morey-Chaisemartin 
> ---
>
> Changes since v1:
>  * Cleanup commit message
>  * Add test
>  * Remove extra var in code and fallthrough to git_default_config
>
>  builtin/pull.c|  4 
>  t/t5572-pull-submodule.sh | 10 ++
>  2 files changed, 14 insertions(+)
>
> diff --git a/builtin/pull.c b/builtin/pull.c
> index 7fe281414..ce8ccb15b 100644
> --- a/builtin/pull.c
> +++ b/builtin/pull.c
> @@ -325,6 +325,10 @@ static int git_pull_config(const char *var, const char 
> *value, void *cb)
>   if (!strcmp(var, "rebase.autostash")) {
>   config_autostash = git_config_bool(var, value);
>   return 0;
> + } else if (!strcmp(var, "submodule.recurse")) {
> + recurse_submodules = git_config_bool(var, value) ?
> + RECURSE_SUBMODULES_ON : RECURSE_SUBMODULES_OFF;
> + return 0;
>   }
>   return git_default_config(var, value, cb);
>  }

If I am reading the existing code correctly, things happen in
cmd_pull() in this order:

 - recurse_submodules is a file-scope static that is initialized to
   RECURSE_SUBMODULES_DEFAULT

 - pull_options[] is given to parse_options() so that
   submodule-config.c::option_fetch_parse_recurse_submodules() can
   read "--recurse-submodules=" from the command line to
   update recurse_submodules.

 - git_pull_config() is given to git_config() so that settings in
   the configuration files are read.

Care must be taken to make sure that values given from the command
line is never overriden by the default value specified in the
configuration system because the order of the second and third items
in the above are backwards from the usual flow.  This patch does not
seem to have any such provision.

Existing handling of "--autostash" vs "rebase.autostash" solves this
issue by having opt_autostash and config_autostash as two separate
variables, so I suspect that something similar to it must be there,
at least, for this new configuration.

If we want to keep the current code structure, that is.  I do not
recall if we did not notice the fact that the order of options and
config parsing is backwards and unknowingly worked it around with
two variables when we added the rebase.autostash thing, or we knew
the order was unusual but there was a good reason to keep that
unusual order (iow, if we simply swapped the order of
parse_options() and git_config() calls, there are things that will
break).  

If it is not the latter, perhaps we may want to flip the order of
config parsing and option parsing around?  That will allow us to fix
the handling of autostash thing to use only one variable, and also
fix your patch to do the right thing.

> diff --git a/t/t5572-pull-submodule.sh b/t/t5572-pull-submodule.sh
> index 077eb07e1..1b3a3f445 100755
> --- a/t/t5572-pull-submodule.sh
> +++ b/t/t5572-pull-submodule.sh
> @@ -65,6 +65,16 @@ test_expect_success 'recursive pull updates working tree' '
>   test_path_is_file super/sub/merge_strategy.t
>  '
>  
> +test_expect_success "submodule.recurse option triggers recursive pull" '
> + test_commit -C child merge_strategy_2 &&
> + git -C parent submodule update --remote &&
> + git -C parent add sub &&
> + git -C parent commit -m "update submodule" &&
> +
> + git -C super -c submodule.recurse pull --no-rebase &&
> + test_path_is_file super/sub/merge_strategy_2.t
> +'

This new test does not test interactions with submodule.recurse
configuration and --recurse-submodules= from the command
line.  It would be necessary to add tests to cover the permutations
in addition to the basic test we see above.

Thanks.

git merge algorithm question

2017-09-05 Thread Daniel Biran


>> I'm trying to better understand one of the merge algorithms as I had some 
>> triumphs and tribulations with using a set of commands during a merge. tldr: 
>> can a git merge -s recursive -X patience; // result in a fast-forward merge? 
>> will --no-ff stop it
>> 
>> So, the scenario is this:
>>  - Merging a master branch into a feature branch that is 2+ years old
>>  - We found this command was more beneficial when merging a large 20k 
>> line text file: 
>>  - git merge -s recursive -X patience master
>>  - In a recent merge using this approach the reflog shows that the merge 
>> was performed using a fast-forward from the feature branch's head
>>  - 082517-1, feature/branch) HEAD@{23}: merge feature/branch: 
>> Fast-forward
>> 
>> 
>> My question is, is it possible for that command to use a fast-forward like 
>> this? (or did something else go horribly wrong? possibly an atlassian git 
>> GUI tool corrupting the work):
>>  - If it is possible for the command to fast-forward the merge when 
>> making the commit does --no-ff force the command to never use fast-forward 
>> in this case
>> 
>> Thanks for the help,
>> Daniel
>

Re: Empty directories in Git: Current guidance for historical commits?

2017-09-05 Thread wafflecode



Quoting Torsten Bögershausen :

Is just dropping a ".gitignore" or "README" file in the empty  
directories during

conversion still the most reasonable approach?


A .gitignore will, but may cost 0.001% CPU time when running "git status".
I have seen systems that create a file called ".empty" in every directory.


If so, is there a way to do this
automatically during the conversion using "git svn" or the like?


Not what I am aware of.
And even if, the .empty files need to be removed, once the directory is
removed.
And we don't have logic for that, as far as I know.



So upon re-reading the git-svn docs, there apparently _are_  
"--preserve-empty-dirs" and "--placeholder-filename=" options.


I'm not sure how I missed these.  :-/

If I understand correctly, it sounds like exactly what I want.  i.e. a  
placeholder file being added (and tracked) to each empty folder.  Are  
there known issues or limitations to be aware of?



Thank you,
David


-

ONLY AT VFEmail! - Use our Metadata Mitigator to keep your email out of the 
NSA's hands!
$24.95 ONETIME Lifetime accounts with Privacy Features!  
15GB disk! No bandwidth quotas!
Commercial and Bulk Mail Options!

[PATCHv2] builtin/merge: honor commit-msg hook for merges

2017-09-05 Thread Stefan Beller

Similar to 65969d43d1 (merge: honor prepare-commit-msg hook, 2011-02-14)
merge should also honor the commit-msg hook: When a merge is stopped due
to conflicts or --no-commit, the subsequent commit calls the commit-msg
hook.  However, it is not called after a clean merge. Fix this
inconsistency by invoking the hook after clean merges as well.

This change is motivated by Gerrit's commit-msg hook to install a ChangeId
trailer into the commit message. Without such a ChangeId, Gerrit refuses
to accept any commit by default, such that the inconsistency of (not)
running the commit-msg hook between commit and merge leads to confusion
and might block people from getting their work done.

As the githooks man page is very vocal about the possibility of skipping
the commit-msg hook via the --no-verify option, implement the option
in merge, too.

Signed-off-by: Stefan Beller 
---

addressed all but one issues.

Junio writes:
> I didn't check how "merge --continue" is implemented, but we need to
> behave exactly the same way over there, too.  Making sure that it is
> a case in t7504 may be a good idea, in addition to the test you
> added.

After inspection of the code I do not think it is a good idea, because
(a) it clutters the test suite with something "obvious" for now,
the call to cmd_commit will be the same as git-commit on the
command line and
(b) piping through --[no-]verify would either introduce irregularities
("Why do we pipe through --no-verify, when --sign-off is more important?")
or miss important options to pipe through: 

static int continue_current_merge;
...
OPT_BOOL(0, "continue", _current_merge,
N_("continue the current in-progress merge")),
...
if (continue_current_merge) {
int nargc = 1;
const char *nargv[] = {"commit", NULL};

if (orig_argc != 2)
usage_msg_opt(_("--continue expects no arguments"),
  builtin_merge_usage, builtin_merge_options);

/* Invoke 'git commit' */
ret = cmd_commit(nargc, nargv, prefix);
goto done;
}

Thanks,
Stefan

 builtin/merge.c|  8 
 t/t7504-commit-msg-hook.sh | 45 +
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/builtin/merge.c b/builtin/merge.c
index 7df3fe3927..780435d7a1 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -73,6 +73,7 @@ static int show_progress = -1;
 static int default_to_upstream = 1;
 static int signoff;
 static const char *sign_commit;
+static int verify_msg = 1;
 
 static struct strategy all_strategy[] = {
{ "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
@@ -236,6 +237,7 @@ static struct option builtin_merge_options[] = {
  N_("GPG sign commit"), PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
OPT_BOOL(0, "overwrite-ignore", _ignore, N_("update ignored 
files (default)")),
OPT_BOOL(0, "signoff", , N_("add Signed-off-by:")),
+   OPT_BOOL(0, "verify", _msg, N_("verify commit-msg hook")),
OPT_END()
 };
 
@@ -780,6 +782,12 @@ static void prepare_to_commit(struct commit_list 
*remoteheads)
if (launch_editor(git_path_merge_msg(), NULL, NULL))
abort_commit(remoteheads, NULL);
}
+
+   if (verify_msg && run_commit_hook(0 < option_edit, get_index_file(),
+ "commit-msg",
+ git_path_merge_msg(), NULL))
+   abort_commit(remoteheads, NULL);
+
read_merge_msg();
strbuf_stripspace(, 0 < option_edit);
if (!msg.len)
diff --git a/t/t7504-commit-msg-hook.sh b/t/t7504-commit-msg-hook.sh
index 88d4cda299..1cd54af3cc 100755
--- a/t/t7504-commit-msg-hook.sh
+++ b/t/t7504-commit-msg-hook.sh
@@ -101,6 +101,10 @@ cat > "$HOOK" <> file &&
@@ -135,6 +139,32 @@ test_expect_success '--no-verify with failing hook 
(editor)' '
 
 '
 
+test_expect_success 'merge fails with failing hook' '
+
+   test_when_finished "git branch -D newbranch" &&
+   test_when_finished "git checkout -f master" &&
+   git checkout --orphan newbranch &&
+   : >file2 &&
+   git add file2 &&
+   git commit --no-verify file2 -m in-side-branch &&
+   test_must_fail git merge --allow-unrelated-histories master &&
+   commit_msg_is "in-side-branch" # HEAD before merge
+
+'
+
+test_expect_success 'merge bypasses failing hook with --no-verify' '
+
+   test_when_finished "git branch -D newbranch" &&
+   test_when_finished "git checkout -f master" &&
+   git checkout --orphan newbranch &&
+   : >file2 &&
+   git add file2 &&
+   git commit --no-verify file2 -m in-side-branch &&
+   git merge --no-verify --allow-unrelated-histories master &&
+   commit_msg_is "Merge branch '\''master'\'' into newbranch"
+'
+
+
 chmod -x "$HOOK"

[PATCH] parse-options: warn developers on negated options

2017-09-05 Thread Stefan Beller

When compiling with -DDEVELOPER set (enabled via the Makefile DEVELOPER
switch), inspect options if they are negated and warn about them.

 1. Negated options are usually are problem down the maintenance road:
When a new negated option ("--no-foo") is introduced, then the option
can be negated at run time("--no-no-foo"), which is usually confusing
for boolean options.
 2. The handling of negated values (specifically double negations), usually
require more brain power to get it right. In code that we own, we
should strive to avoid double negatives ("!no_foo").

Signed-off-by: Stefan Beller 
---

  #leftoverbits
  
  I was wondering if such a patch may be of value eventually (after all
  occurrences of "no-" options are fixed).
  This patch disallows all no- options, but we could be more open and allow
  --no-options that have the NO_NEG bit set.

 Makefile| 3 ++-
 parse-options.c | 6 ++
 parse-options.h | 7 +--
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/Makefile b/Makefile
index aba0f5a0f9..9b87093f99 100644
--- a/Makefile
+++ b/Makefile
@@ -433,7 +433,8 @@ DEVELOPER_CFLAGS = -Werror \
-Wpointer-arith \
-Wstrict-prototypes \
-Wunused \
-   -Wvla
+   -Wvla \
+   -DDEVELOPER
 LDFLAGS =
 ALL_CFLAGS = $(CPPFLAGS) $(CFLAGS)
 ALL_LDFLAGS = $(LDFLAGS)
diff --git a/parse-options.c b/parse-options.c
index 0dd9fc6a0d..08db926adf 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -403,6 +403,12 @@ static void parse_options_check(const struct option *opts)
if (opts->argh &&
strcspn(opts->argh, " _") != strlen(opts->argh))
err |= optbug(opts, "multi-word argh should use dash to 
separate words");
+#if DEVELOPER
+   if ((opts->flags & PARSE_OPT_ERR_NEGATED) &&
+   !strcmp("no-", opts->long_name))
+   BUG("Get %s fixed! double negation possible",
+   opts->long_name);
+#endif
}
if (err)
exit(128);
diff --git a/parse-options.h b/parse-options.h
index af711227ae..9dc07fd9bb 100644
--- a/parse-options.h
+++ b/parse-options.h
@@ -38,7 +38,8 @@ enum parse_opt_option_flags {
PARSE_OPT_LASTARG_DEFAULT = 16,
PARSE_OPT_NODASH = 32,
PARSE_OPT_LITERAL_ARGHELP = 64,
-   PARSE_OPT_SHELL_EVAL = 256
+   PARSE_OPT_SHELL_EVAL = 256,
+   PARSE_OPT_ERR_NEGATED = 512
 };
 
 struct option;
@@ -124,7 +125,9 @@ struct option {
  (h), PARSE_OPT_NOARG }
 #define OPT_SET_INT(s, l, v, h, i)  { OPTION_SET_INT, (s), (l), (v), NULL, \
  (h), PARSE_OPT_NOARG, NULL, (i) }
-#define OPT_BOOL(s, l, v, h)OPT_SET_INT(s, l, v, h, 1)
+#define OPT_BOOL(s, l, v, h){ OPTION_SET_INT, (s), (l), (v), NULL, \
+ (h), 
PARSE_OPT_NOARG|PARSE_OPT_ERR_NEGATED, \
+ NULL, 1 }
 #define OPT_HIDDEN_BOOL(s, l, v, h) { OPTION_SET_INT, (s), (l), (v), NULL, \
  (h), PARSE_OPT_NOARG | PARSE_OPT_HIDDEN, 
NULL, 1}
 #define OPT_CMDMODE(s, l, v, h, i)  { OPTION_CMDMODE, (s), (l), (v), NULL, \
-- 
2.14.0.rc0.3.g6c2e499285

Re: BUG: attempt to trim too many characters

2017-09-05 Thread Linus Torvalds

On Tue, Sep 5, 2017 at 3:03 PM, Jeff King  wrote:
>
> This probably fixes it:

Yup. Thanks.

   Linus

Re: [PATCH 10/10] add UNLEAK annotation for reducing leak false positives

2017-09-05 Thread Stefan Beller

On Tue, Sep 5, 2017 at 6:05 AM, Jeff King  wrote:

>   int main(void)

nit of the day:
  s/void/int argc, char *argv/ or in case we do not
  want to emphasize the argument list s/void//
  as that adds no uninteresting things.

>
> In other words, you can do:
>
>   int main(void)
>   {
> char *p = some_function();
> printf("%s", p);
> UNLEAK(p);
> return 0;
>   }
>
> to annotate "p" and suppress the leak report.

This sounds really cool so far.

After having a sneak peak at the implementation
it is O(1) in runtime for each added element, and the
space complexity is O(well).

> But wait, couldn't we just say "free(p)"? In this toy
> example, yes. But using UNLEAK() has several advantages over
> actually freeing the memory:

This is indeed the big question, that I have had.

>
>   1. It can be compiled conditionally. There's no need in
>  normal runs to do this free(), and it just wastes time.
>  By using a macro, we can get the benefit for leak-check
>  builds with zero cost for normal builds (this patch
>  uses a compile-time check, though we could clearly also
>  make it a run-time check at very low cost).
>
>  Of course one could also hide free() behind a macro, so
>  this is really just arguing for having UNLEAK(), not
>  for its particular implementation.

This is only a real argument in combination with (2), or in other
words you seem to hint at situations like these:

  struct *foo = obtain_new_foo();
  ...
  #if FREE_ANNOTATED_LEAKS
/* special free() */
release_foo(foo);
  #endif

With UNLEAK this situation works out nicely as we just
copy over all memory, ignoring elements allocated inside
foo, but for free() we'd have issues combining the preprocessor
magic with the special free implementation.

So how would we use syntactic sugar to made this
more comfortable? Roughly like

MAYBE(release_foo(foo))

  #if (FREE_ANNOTATED_LEAKS)
  /* we rely on strict text substitution */
  /* as the function signature may change */
  #define MAYBE(fn) fn;
  #else
  #define MAYBE(fn)
  #endif

Me regurgitating this first argument is just a long way of saying
that it put me off even more after reading only the first argument.
Maybe reorder this argument to show up after the current second
argument, so the reader is guided better?

>   2. It's recursive across structures. In many cases our "p"
>  is not just a pointer, but a complex struct whose
>  fields may have been allocated by a sub-function. And
>  in some cases (e.g., dir_struct) we don't even have a
>  function which knows how to free all of the struct
>  members.
>
>  By marking the struct itself as reachable, that confers
>  reachability on any pointers it contains (including those
>  found in embedded structs, or reachable by walking
>  heap blocks recursively.
>
>   3. It works on cases where we're not sure if the value is
>  allocated or not. For example:
>
>char *p = argc > 1 ? argv[1] : some_function();
>
>  It's safe to use UNLEAK(p) here, because it's not
>  freeing any memory. In the case that we're pointing to
>  argv here, the reachability checker will just ignore
>  our bytes.

This argument demonstrates why the MAYBE above is
inferior.

>
>   4. Because it's not actually freeing memory, you can
>  UNLEAK() before we are finished accessing the variable.
>  This is helpful in cases like this:
>
>char *p = some_function();
>return another_function(p);
>
>  Writing this with free() requires:
>
>int ret;
>char *p = some_function();
>ret = another_function(p);
>free(p);
>return ret;
>
>  But with unleak we can just write:
>
>char *p = some_function();
>UNLEAK(p);
>return another_function(p);

  5. It's not just about worrying if we can call UNLEAK
  once (in 4), but we also do not have to worry about
  calling it twice, or recursively. (This argument can be bad
  for cargo cult programmers, but we don't have these ;-)

> +#ifdef SUPPRESS_ANNOTATED_LEAKS
> +extern void unleak_memory(const void *ptr, size_t len);
> +#define UNLEAK(var) unleak_memory(&(var), sizeof(var));

As always with macros we have to be careful about its arguments.

  UNLEAK(a++)
  UNLEAK(baz())

won't work as intended.

Re: BUG: attempt to trim too many characters

2017-09-05 Thread Jeff King

On Tue, Sep 05, 2017 at 02:55:08PM -0700, Linus Torvalds wrote:

> On Tue, Sep 5, 2017 at 2:50 PM, Jeff King  wrote:
> >
> > What version of git are you running? This should be fixed by 03df567fbf
> > (for_each_bisect_ref(): don't trim refnames, 2017-06-18) which is in
> > v2.14.
> 
> I'm way more recent than 2.14.
> 
> I'm at commit 238e487ea ("The fifth batch post 2.14")

Ugh. Bitten again by the fact that rev-parse and revision.c implement
the same things in subtly different ways.

This probably fixes it:

diff --git a/builtin/rev-parse.c b/builtin/rev-parse.c
index 2bd28d3c08..9f24004c0a 100644
--- a/builtin/rev-parse.c
+++ b/builtin/rev-parse.c
@@ -757,8 +757,8 @@ int cmd_rev_parse(int argc, const char **argv, const char 
*prefix)
continue;
}
if (!strcmp(arg, "--bisect")) {
-   for_each_ref_in("refs/bisect/bad", 
show_reference, NULL);
-   for_each_ref_in("refs/bisect/good", 
anti_reference, NULL);
+   for_each_fullref_in("refs/bisect/bad", 
show_reference, NULL, 0);
+   for_each_fullref_in("refs/bisect/good", 
anti_reference, NULL, 0);
continue;
}
if (opt_with_value(arg, "--branches", )) {

Re: [Bug] commit-tree shouldn't append an extra newline to commit messages

2017-09-05 Thread Junio C Hamano

Jeff King  writes:

> On Tue, Sep 05, 2017 at 12:57:21PM -0400, Ross Kabus wrote:
>
>> On Tue, Sep 5, 2017 at 11:36 AM, Jeff King  wrote:
>> 
>> > So I'd argue that "git commit -F" is doing a reasonable
>> > thing, and "commit-tree -F" should probably change to match it (because
>> > it's inconsistent, and because if anything the plumbing commit-tree
>> > should err more on the side of not touching its input than the porcelain
>> > commit command).
>> 
>> I would agree that "commit-tree -F" should change to match the behavior of
>> "git commit -F --cleanup=verbatim". I feel pretty strongly that this type of
>> cleanup logic shouldn't be done in a plumbing command, though I'm not sure
>> it is a big enough deal to change behavior/compatibility for everyone.
>
> OK. Do you want to try your hand at a patch?
>
>> Yup, confusion #2. I was using "-F -" which I see now is a different code
>> path. Reading via stdin without "-F -" _is_ the verbatim option. This
>> difference burned someone else on the mailing list as well. See:
>
> Ah, OK, your confusion makes more sense now.

Yeah, my initial knee-jerk reaction when I started reading the early
part of the thread was that the use of strbuf_complete_line() is a
strong enough sign that this was intended behaviour, but the "we get
different results between reading from standard input and pretending
'-' is a file and reading from it" made me change my mind.  I agree
that dropping strbuf_complete_line() call from that codepath (and
nowhere else, especially the "-m" oneline thing [*1*]) would be a
backward incompatible change that is acceptable.

Here is what I am preparing to add to the What's cooking report when
such a patch materializes ;-)

 * "git commit-tree -F $file" used to add terminating newline if the
   input ends with an incomplete line, but when the command reads
   the input from its standard input, we recorded what was given
   verbatim.  The former has been changed to record the input with
   an incomplete line to make them consistent.

Something like that probably needs to be added to the release notes
but I am way being ahead of myself ;-)

Thanks, both, for sensible discussion.

[Footnote]

*1* The message from Ross you are responding to talks about
"cleanup", but I tend to view the calls to *_complete_line() as
more for "the end-user convenience".  "-m" meant to take a
one-liner from the command line should have it because exactly
as you said, having to pass the terminating newline from the
command line with "-m" is awkward.  On the other hand, if the
user/caller prepared the exact message in a file and wants to
feed it either via "-F" or from the standard input, I agree that
"the end-user convenience" would get in the way and it is
preferrable to avoid it in the plumbing layer.

Re: BUG: attempt to trim too many characters

2017-09-05 Thread Linus Torvalds

On Tue, Sep 5, 2017 at 2:50 PM, Jeff King  wrote:
>
> What version of git are you running? This should be fixed by 03df567fbf
> (for_each_bisect_ref(): don't trim refnames, 2017-06-18) which is in
> v2.14.

I'm way more recent than 2.14.

I'm at commit 238e487ea ("The fifth batch post 2.14")

  Linus

Re: BUG: attempt to trim too many characters

2017-09-05 Thread Jeff King

On Tue, Sep 05, 2017 at 02:39:05PM -0700, Linus Torvalds wrote:

> I just got this with
> 
>gitk --bisect
> 
> while doing some bisection on my current kernel.
> 
> It happens with "git rev-parse --bisect" too, but interestingly, "git
> log --bisect" works fine.
> 
> I have not tried to figure out anything further, except that it was
> introduced by commit b9c8e7f2f ("prefix_ref_iterator: don't trim too
> much") and it happens with
> 
>iter->iter0->refname = "refs/bisect/bad"
>iter->trim = 15
> 
> (where 15 is also the same as the length of that refname).
> 
> Not having time to chase this down any more, since I'm busy with the
> merge window (and that annoying bisect)

What version of git are you running? This should be fixed by 03df567fbf
(for_each_bisect_ref(): don't trim refnames, 2017-06-18) which is in
v2.14.

More discussion at

  https://public-inbox.org/git/cover.1497792827.git.mhag...@alum.mit.edu/

-Peff

BUG: attempt to trim too many characters

2017-09-05 Thread Linus Torvalds

I just got this with

   gitk --bisect

while doing some bisection on my current kernel.

It happens with "git rev-parse --bisect" too, but interestingly, "git
log --bisect" works fine.

I have not tried to figure out anything further, except that it was
introduced by commit b9c8e7f2f ("prefix_ref_iterator: don't trim too
much") and it happens with

   iter->iter0->refname = "refs/bisect/bad"
   iter->trim = 15

(where 15 is also the same as the length of that refname).

Not having time to chase this down any more, since I'm busy with the
merge window (and that annoying bisect)

  Linus

Re: [PATCH] builtin/merge: honor commit-msg hook for merges

2017-09-05 Thread Junio C Hamano

Stefan Beller  writes:

> Similar to 65969d43d1 (merge: honor prepare-commit-msg hook, 2011-02-14)
> merge should also honor the commit-msg hook; the reason is the same as
> in that commit: When a merge is stopped due to conflicts or --no-commit,
> the subsequent commit calls the commit-msg hook.  However, it is not
> called after a clean merge. Fix this inconsistency by invoking the hook
> after clean merges as well.

The above reads better without "; the reason is the same as in that
commit"---"Similar to", combined with the clean and concise
explanation after the colon you have, sufficiently justifies why
this is a good change.  

Excellent job spotting the precedent and making it consistent ;-).

> This change is motivated by Gerrits commit-msg hook to install a change-id

s/Gerrits/Gerrit's/ perhaps?

> trailer into the commit message. Without such a change id, Gerrit refuses

I do not live in Gerrit land and I do not know which one is the more
preferred one, but be consistent between "change-id" and "change
id".

> to accept (merge) commits by default, such that the inconsistency of
> (not) running commit-msg hook between commit and merge leads to confusion
> and might block people from getting their work done.

Yup.  Nicely explained.

I didn't check how "merge --continue" is implemented, but we need to
behave exactly the same way over there, too.  Making sure that it is
a case in t7504 may be a good idea, in addition to the test you
added.

> Signed-off-by: Stefan Beller 
> ---
>  builtin/merge.c|  8 
>  t/t7504-commit-msg-hook.sh | 45 +
>  2 files changed, 49 insertions(+), 4 deletions(-)
>
> diff --git a/builtin/merge.c b/builtin/merge.c
> index 7df3fe3927..087efd560d 100644
> --- a/builtin/merge.c
> +++ b/builtin/merge.c
> @@ -73,6 +73,7 @@ static int show_progress = -1;
>  static int default_to_upstream = 1;
>  static int signoff;
>  static const char *sign_commit;
> +static int no_verify;
>  
>  static struct strategy all_strategy[] = {
>   { "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
> @@ -236,6 +237,7 @@ static struct option builtin_merge_options[] = {
> N_("GPG sign commit"), PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
>   OPT_BOOL(0, "overwrite-ignore", _ignore, N_("update ignored 
> files (default)")),
>   OPT_BOOL(0, "signoff", , N_("add Signed-off-by:")),
> + OPT_BOOL(0, "no-verify", _verify, N_("bypass pre-commit and 
> commit-msg hooks")),

This allows "--no-no-verify", which may want to be eventually
addressed (either by changing the code not to accept, or declaring
that it is an intended behaviour); I do not offhand know for sure but I
strong suspect "commit" shares the same issue, in which case this
patch is perfectly fine and addressing "--no-no-verify" should be
done for both of them in a separate follow-up topic.  #leftoverbits

Thanks.  I'll be online starting today, but please expect slow
responses for a few days as there is significant backlog.

Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Junio C Hamano

Jeff King  writes:

> I noticed the HEAD funniness, too, when looking at this earlier. I agree
> with Junio that it's not quite consistent with the general rule of
> "string list items point to their refnames", but I don't think it
> matters in practice.

Yup, we are on the same page; the "fix" I was alluding to would look
exactly like what you wrote below, but I agree the distinction does
not matter in practice.  IOW, I do not think the code after Martin's
fix is wrong per-se.

Thanks.

> I think the fix, if we wanted to do one, would be similar to what you
> did in split_symref_update(). Like:
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index f3455609d6..3f9deff902 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2095,8 +2095,7 @@ static int split_head_update(struct ref_update *update,
>* transaction. This insertion is O(N) in the transaction
>* size, but it happens at most once per transaction.
>*/
> - item = string_list_insert(affected_refnames, "HEAD");
> - if (item->util) {
> + if (string_list_has_string(affected_refnames, "HEAD")) {
>   /* An entry already existed */
>   strbuf_addf(err,
>   "multiple updates for 'HEAD' (including one "
> @@ -2111,6 +2110,7 @@ static int split_head_update(struct ref_update *update,
>   update->new_oid.hash, update->old_oid.hash,
>   update->msg);
>  
> + item = string_list_insert(affected_refnames, new_update->refname);
>   item->util = new_update;
>  
>   return 0;
>
> -Peff

Re: Git in Outreachy round 15?

2017-09-05 Thread Christian Couder

Hi,

On Sat, Sep 2, 2017 at 12:30 AM, Jeff King  wrote:
>
> The big questions on whether we can make this happen are:
>
>   1. Do we have mentor interest?
>
>  I'm willing to mentor, but I'd like to hear whether other people
>  are interested, as well. Either way I can take responsibility for
>  the administrative bits.

I am willing to co-mentor, but I prefer not to take care of administrative bits.

>  We could probably use some effort brushing up our project-ideas
>  page. I've also considered whether it would make sense to do a
>  "mixed bag" project where somebody works on multiple smaller
>  projects, and we try to get them more involved in day-to-day
>  project activities (after all, very few of us started working on
>  Git with a dedicated 3-month project; people typically ease into it
>  by fixing small problems, reviewing code, etc).

I agree that we should try "mixed bag" projects consisting in multiple
smaller projects.
It would be very interesting to see if it provides better results.

>   2. Do we have the money (and want to spend it on this)?
>
>  The Git project does have money to cover at least one intern.
>
>  I'm looking into securing outside funding, so that we don't have to
>  use project funds. If for whatever reason that doesn't happen, is
>  this something we want to spend project money on?
>
>  (My opinion is yes, but I'd like to hear what others in the
>  community think).

My opinion is yes too.

Is there something like the GSoC Mentor Summit? I think this is an
interesting part of the GSoC and it could help motivate mentors if
there was something like that.

Thanks for taking care of this,
Christian.

[PATCH] builtin/merge: honor commit-msg hook for merges

2017-09-05 Thread Stefan Beller

Similar to 65969d43d1 (merge: honor prepare-commit-msg hook, 2011-02-14)
merge should also honor the commit-msg hook; the reason is the same as
in that commit: When a merge is stopped due to conflicts or --no-commit,
the subsequent commit calls the commit-msg hook.  However, it is not
called after a clean merge. Fix this inconsistency by invoking the hook
after clean merges as well.

This change is motivated by Gerrits commit-msg hook to install a change-id
trailer into the commit message. Without such a change id, Gerrit refuses
to accept (merge) commits by default, such that the inconsistency of
(not) running commit-msg hook between commit and merge leads to confusion
and might block people from getting their work done.

Signed-off-by: Stefan Beller 
---
 builtin/merge.c|  8 
 t/t7504-commit-msg-hook.sh | 45 +
 2 files changed, 49 insertions(+), 4 deletions(-)

diff --git a/builtin/merge.c b/builtin/merge.c
index 7df3fe3927..087efd560d 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -73,6 +73,7 @@ static int show_progress = -1;
 static int default_to_upstream = 1;
 static int signoff;
 static const char *sign_commit;
+static int no_verify;
 
 static struct strategy all_strategy[] = {
{ "recursive",  DEFAULT_TWOHEAD | NO_TRIVIAL },
@@ -236,6 +237,7 @@ static struct option builtin_merge_options[] = {
  N_("GPG sign commit"), PARSE_OPT_OPTARG, NULL, (intptr_t) "" },
OPT_BOOL(0, "overwrite-ignore", _ignore, N_("update ignored 
files (default)")),
OPT_BOOL(0, "signoff", , N_("add Signed-off-by:")),
+   OPT_BOOL(0, "no-verify", _verify, N_("bypass pre-commit and 
commit-msg hooks")),
OPT_END()
 };
 
@@ -780,6 +782,12 @@ static void prepare_to_commit(struct commit_list 
*remoteheads)
if (launch_editor(git_path_merge_msg(), NULL, NULL))
abort_commit(remoteheads, NULL);
}
+
+   if (!no_verify && run_commit_hook(0 < option_edit, get_index_file(),
+ "commit-msg",
+ git_path_merge_msg(), NULL))
+   abort_commit(remoteheads, NULL);
+
read_merge_msg();
strbuf_stripspace(, 0 < option_edit);
if (!msg.len)
diff --git a/t/t7504-commit-msg-hook.sh b/t/t7504-commit-msg-hook.sh
index 88d4cda299..1cd54af3cc 100755
--- a/t/t7504-commit-msg-hook.sh
+++ b/t/t7504-commit-msg-hook.sh
@@ -101,6 +101,10 @@ cat > "$HOOK" <> file &&
@@ -135,6 +139,32 @@ test_expect_success '--no-verify with failing hook 
(editor)' '
 
 '
 
+test_expect_success 'merge fails with failing hook' '
+
+   test_when_finished "git branch -D newbranch" &&
+   test_when_finished "git checkout -f master" &&
+   git checkout --orphan newbranch &&
+   : >file2 &&
+   git add file2 &&
+   git commit --no-verify file2 -m in-side-branch &&
+   test_must_fail git merge --allow-unrelated-histories master &&
+   commit_msg_is "in-side-branch" # HEAD before merge
+
+'
+
+test_expect_success 'merge bypasses failing hook with --no-verify' '
+
+   test_when_finished "git branch -D newbranch" &&
+   test_when_finished "git checkout -f master" &&
+   git checkout --orphan newbranch &&
+   : >file2 &&
+   git add file2 &&
+   git commit --no-verify file2 -m in-side-branch &&
+   git merge --no-verify --allow-unrelated-histories master &&
+   commit_msg_is "Merge branch '\''master'\'' into newbranch"
+'
+
+
 chmod -x "$HOOK"
 test_expect_success POSIXPERM 'with non-executable hook' '
 
@@ -178,10 +208,6 @@ exit 0
 EOF
 chmod +x "$HOOK"
 
-commit_msg_is () {
-   test "$(git log --pretty=format:%s%b -1)" = "$1"
-}
-
 test_expect_success 'hook edits commit message' '
 
echo "additional" >> file &&
@@ -217,7 +243,17 @@ test_expect_success "hook doesn't edit commit message 
(editor)" '
echo "more plus" > FAKE_MSG &&
GIT_EDITOR="\"\$FAKE_EDITOR\"" git commit --no-verify &&
commit_msg_is "more plus"
+'
 
+test_expect_success 'hook called in git-merge picks up commit message' '
+   test_when_finished "git branch -D newbranch" &&
+   test_when_finished "git checkout -f master" &&
+   git checkout --orphan newbranch &&
+   : >file2 &&
+   git add file2 &&
+   git commit --no-verify file2 -m in-side-branch &&
+   git merge --allow-unrelated-histories master &&
+   commit_msg_is "new message"
 '
 
 # set up fake editor to replace `pick` by `reword`
@@ -237,4 +273,5 @@ test_expect_success 'hook is called for reword during 
`rebase -i`' '
 
 '
 
+
 test_done
-- 
2.14.0.rc0.3.g6c2e499285

Re: [Bug] commit-tree shouldn't append an extra newline to commit messages

2017-09-05 Thread Ross Kabus

Gmail mangled that patch of course...
Ross Kabus
Software Engineer
www.aerotech.com | 412-968-2833


On Tue, Sep 5, 2017 at 4:57 PM, Ross Kabus  wrote:
> From: Ross Kabus 
> Date: Tue, 5 Sep 2017 13:54:52 -0400
> Subject: [PATCH] commit-tree: don't append a newline with -F
>
> This change makes it such that commit-tree -F never appends a newline
> character to the supplied commit message (either from file or stdin).
>
> Previously, commit-tree -F would always append a newline character to
> the text brought in from file or stdin. This has caused confusion in a
> number of ways:
>   - This is a plumbing command and it is generally expected not to do
> text cleanup or other niceties.
>   - stdin piping with "-F -" appends a newline but stdin piping without
> -F does not append a newline (inconsistent).
>   - git-commit has the --cleanup=verbatim option that prevents appending
> a newline with its -F argument. There is no verbatim counterpart to
> commit-tree -F (inconsistent).
> ---
>  builtin/commit-tree.c | 1 -
>  1 file changed, 1 deletion(-)
>
> diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
> index 19e898fa4..2177251e2 100644
> --- a/builtin/commit-tree.c
> +++ b/builtin/commit-tree.c
> @@ -102,7 +102,6 @@ int cmd_commit_tree(int argc, const char **argv,
> const char *prefix)
>   if (fd && close(fd))
>   die_errno("git commit-tree: failed to close '%s'",
>argv[i]);
> - strbuf_complete_line();
>   continue;
>   }
>
> --
> 2.13.1.windows.2

Re: [Bug] commit-tree shouldn't append an extra newline to commit messages

2017-09-05 Thread Ross Kabus

From: Ross Kabus 
Date: Tue, 5 Sep 2017 13:54:52 -0400
Subject: [PATCH] commit-tree: don't append a newline with -F

This change makes it such that commit-tree -F never appends a newline
character to the supplied commit message (either from file or stdin).

Previously, commit-tree -F would always append a newline character to
the text brought in from file or stdin. This has caused confusion in a
number of ways:
  - This is a plumbing command and it is generally expected not to do
text cleanup or other niceties.
  - stdin piping with "-F -" appends a newline but stdin piping without
-F does not append a newline (inconsistent).
  - git-commit has the --cleanup=verbatim option that prevents appending
a newline with its -F argument. There is no verbatim counterpart to
commit-tree -F (inconsistent).
---
 builtin/commit-tree.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/builtin/commit-tree.c b/builtin/commit-tree.c
index 19e898fa4..2177251e2 100644
--- a/builtin/commit-tree.c
+++ b/builtin/commit-tree.c
@@ -102,7 +102,6 @@ int cmd_commit_tree(int argc, const char **argv,
const char *prefix)
  if (fd && close(fd))
  die_errno("git commit-tree: failed to close '%s'",
   argv[i]);
- strbuf_complete_line();
  continue;
  }

-- 
2.13.1.windows.2

Re: [PATCH 0/10] towards clean leak-checker output

2017-09-05 Thread Martin Ågren

On 5 September 2017 at 21:02, Jeff King  wrote:
> On Tue, Sep 05, 2017 at 07:50:10PM +0200, Martin Ågren wrote:
>
>> > That line is the setting of argv0_path, which is a global (and thus
>> > shouldn't be marked as leaking). Interestingly, it only happens with
>> > -O2. Compiling with -O0 works fine. I'm not sure if it's a bug or
>> > what.
>> >
>> > I did most of my testing with clang-6.0, which gets this case right.
>>
>> Hmmm, I got the same wrong results (IMHO) from Valgrind, which
>> classified this as "definitely lost". Like you I found that -O0 helped.
>> And yes, that was with gcc. Maybe gcc with optimization somehow manages
>> to hide the pointers from these tools. I know too little about the
>> technical details to have any real ideas, though. My searches did not
>> bring up anything useful. (gcc 5.4.0)
>
> Yeah, I think it is just optimizing out the variable entirely. If
> RUNTIME_PREFIX isn't defined (and it's not for non-Windows platforms)
> then we never look at the variable at all, and it's a dead assignment.
> And the compiler can see that easily because it's got static linkage. So
> it drops the variable completely, but it can't drop the call to
> xstrdup() with the information in exec_cmd.c. It has to call the
> function and throw away the result, resulting in the leak.

I see. Yeah, that makes sense.

FWIW, this series (combined with the other series you mentioned) makes
t and t0001 pass for me with gcc/clang. There are actually some
leaks in t, they're just silently being reported to stderr, since
the exit statuses from git are hidden by pipes. Maybe you're already
aware of it. Depending on your definition of "running clean" it might be
out of scope for this series, which is of course still very interesting
and enlightening as it stands.

One is in cmd_show (you did mention git show) where we leak data in rev.
The other is some use of strdup. I can't immediately figure out how to
get a useful stacktrace (you mentioned this as well) and it's past
bed-time here. I'll try to play more with this tomorrow.

Note for self: getting rid of all pipes would probably also help flush
out a few leaks (or "introduce" them, depending on your viewpoint).

Martin

Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Jeff King

On Tue, Sep 05, 2017 at 07:24:18PM +0200, Martin Ågren wrote:

> > And from that point of view, doesn't split_head_update() wants a
> > similar fix?  It attempts to insert "HEAD", makes sure it hasn't
> > been inserted and then hangs a new update transaction as its util.
> > It is not wrong per-se from purely leak-prevention point of view,
> > as that "HEAD" is a literal string we woudn't even want to free,
> > but from logical/"what each data means" point of view, it still
> > feels wrong.
> 
> There is a "Special hack" comment related to this, and I don't feel
> particularly confident that I could make any meaningful contribution in
> this area. To be honest, I don't immediately see in which direction your
> suggestion/idea/thought is going, which tells me I should not be making
> a mess out of it. :-)

I noticed the HEAD funniness, too, when looking at this earlier. I agree
with Junio that it's not quite consistent with the general rule of
"string list items point to their refnames", but I don't think it
matters in practice.

I think the fix, if we wanted to do one, would be similar to what you
did in split_symref_update(). Like:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index f3455609d6..3f9deff902 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2095,8 +2095,7 @@ static int split_head_update(struct ref_update *update,
 * transaction. This insertion is O(N) in the transaction
 * size, but it happens at most once per transaction.
 */
-   item = string_list_insert(affected_refnames, "HEAD");
-   if (item->util) {
+   if (string_list_has_string(affected_refnames, "HEAD")) {
/* An entry already existed */
strbuf_addf(err,
"multiple updates for 'HEAD' (including one "
@@ -2111,6 +2110,7 @@ static int split_head_update(struct ref_update *update,
update->new_oid.hash, update->old_oid.hash,
update->msg);
 
+   item = string_list_insert(affected_refnames, new_update->refname);
item->util = new_update;
 
return 0;

-Peff

Re: Empty directories in Git: Current guidance for historical commits?

2017-09-05 Thread Jeff King

On Tue, Sep 05, 2017 at 12:47:43PM +, wafflec...@openmail.cc wrote:

> Is just dropping a ".gitignore" or "README" file in the empty directories
> during conversion still the most reasonable approach?

Yes, I think so.

> If so, is there a way
> to do this automatically during the conversion using "git svn" or the like?

I haven't used git-svn in years, but the manpage says:

  --preserve-empty-dirs
  Create a placeholder file in the local Git repository for each
  empty directory fetched from Subversion. This includes directories
  that become empty by removing all entries in the Subversion
  repository (but not the directory itself). The placeholder files
  are also tracked and removed when no longer necessary.

-Peff

RE: [PATCH] read-cache: fix index corruption with index v4

2017-09-05 Thread Kevin Willford

> From: Thomas Gummerer [mailto:t.gumme...@gmail.com]
> Sent: Monday, September 4, 2017 4:58 PM
> 
> ce012deb98 ("read-cache: avoid allocating every ondisk entry when
> writing", 2017-08-21) changed the way cache entries are written to the
> index file.  While previously it wrote the name to an struct that was
> allocated using xcalloc(), it now uses ce_write() directly.  Previously
> ce_namelen - common bytes were written to the cache entry, which would
> automatically make it nul terminated, as it was allocated using calloc.
> 
> Now we are writing ce_namelen - common + 1 bytes directly from the
> ce->name to the index.  As ce->name is however not nul terminated, and
> index-v4 needs the nul terminator to split between one index entry and
> the next, this would end up in a corrupted index.
> 
> Fix that by only writing ce_namelen - common bytes directly from
> ce->name to the index, and adding the nul terminator in an extra call to
> ce_write.
> 
> This bug was turned up by setting TEST_GIT_INDEX_VERSION = 4 in
> config.mak and running the test suite (t1700 specifically broke).
> 
> Signed-off-by: Thomas Gummerer 
> ---
> 
> I unfortunately didn't have more time to dig so
> 
> > As ce->name is however not nul terminated
> 
> just comes from my memory and from the patch below actually fixing the
> corruption, so it's really the most likely cause.  Would be great if
> someone who can remember more about the index could confirm that this
> is indeed the case.
> 

Digging into this and ce->name IS nul terminated.  The issue comes in when
the CE_STRIP_NAME is set, which is only set when using a split index. 
This sets the ce->ce_namelen = 0 without changing the actual ce->name buffer.
When writing the entry for the split index version 4 it was using the first 
character
in the ce->name buffer because of the + 1, which obviously isn't correct.
Before
it was using a newly allocated name buffer from the ondisk struct which was
allocated based on the ce_namelen of zero.

>  read-cache.c | 4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index 40da87ea71..80830ddcfc 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2103,7 +2103,9 @@ static int ce_write_entry(git_SHA_CTX *c, int fd, struct
> cache_entry *ce,
>   if (!result)
>   result = ce_write(c, fd, to_remove_vi, prefix_size);
>   if (!result)
> - result = ce_write(c, fd, ce->name + common,
> ce_namelen(ce) - common + 1);
> + result = ce_write(c, fd, ce->name + common,
> ce_namelen(ce) - common);
> + if (!result)
> + result = ce_write(c, fd, "\0", 1);

You could use the padding variable here as well which is used in the < version 4
ce_write.

> 
>   strbuf_splice(previous_name, common, to_remove,
> ce->name + common, ce_namelen(ce) - common);
> --
> 2.14.1.480.gb18f417b89

While looking at the code I was wondering if we could get around the
whole setting ce->ce_namelen to zero bit but that would be much bigger
patch and possibly introduce other bugs so this seems the appropriate
fix for now.

Thanks for finding this!
Kevin

Re: [PATCH 0/10] towards clean leak-checker output

2017-09-05 Thread Jeff King

On Tue, Sep 05, 2017 at 07:50:10PM +0200, Martin Ågren wrote:

> > That line is the setting of argv0_path, which is a global (and thus
> > shouldn't be marked as leaking). Interestingly, it only happens with
> > -O2. Compiling with -O0 works fine. I'm not sure if it's a bug or
> > what.
> >
> > I did most of my testing with clang-6.0, which gets this case right.
> 
> Hmmm, I got the same wrong results (IMHO) from Valgrind, which
> classified this as "definitely lost". Like you I found that -O0 helped.
> And yes, that was with gcc. Maybe gcc with optimization somehow manages
> to hide the pointers from these tools. I know too little about the
> technical details to have any real ideas, though. My searches did not
> bring up anything useful. (gcc 5.4.0)

Yeah, I think it is just optimizing out the variable entirely. If
RUNTIME_PREFIX isn't defined (and it's not for non-Windows platforms)
then we never look at the variable at all, and it's a dead assignment.
And the compiler can see that easily because it's got static linkage. So
it drops the variable completely, but it can't drop the call to
xstrdup() with the information in exec_cmd.c. It has to call the
function and throw away the result, resulting in the leak.

In fact, the whole extract_argv0_path thing is pointless without
RUNTIME_PREFIX. So I think one reasonable fix is just:

diff --git a/exec_cmd.c b/exec_cmd.c
index fb94aeba9c..09f05c3bc3 100644
--- a/exec_cmd.c
+++ b/exec_cmd.c
@@ -5,7 +5,10 @@
 #define MAX_ARGS   32
 
 static const char *argv_exec_path;
+
+#ifdef RUNTIME_PREFIX
 static const char *argv0_path;
+#endif
 
 char *system_path(const char *path)
 {
@@ -40,6 +43,7 @@ char *system_path(const char *path)
 
 void git_extract_argv0_path(const char *argv0)
 {
+#ifdef RUNTIME_PREFIX
const char *slash;
 
if (!argv0 || !*argv0)
@@ -49,6 +53,7 @@ void git_extract_argv0_path(const char *argv0)
 
if (slash)
argv0_path = xstrndup(argv0, slash - argv0);
+#endif
 }
 
 void git_set_argv_exec_path(const char *exec_path)

-Peff

[PATCH] config: remove git_config_maybe_bool

2017-09-05 Thread Martin Ågren

The function was deprecated in commit 89576613 ("treewide: deprecate
git_config_maybe_bool, use git_parse_maybe_bool", 2017-08-07) and has no
users.

Signed-off-by: Martin Ågren 
---
ma/parse-maybe-bool has been in master for some time and I cannot find
any inflight topics which rely on git_config_maybe_bool (neither in pu,
nor on the list). Indeed, it would have been easier to kill this
function immediately instead of deprecating it.

 Documentation/technical/api-config.txt | 4 
 config.h   | 1 -
 config.c   | 5 -
 3 files changed, 10 deletions(-)

diff --git a/Documentation/technical/api-config.txt 
b/Documentation/technical/api-config.txt
index 7a83a3a6e..9a778b0ca 100644
--- a/Documentation/technical/api-config.txt
+++ b/Documentation/technical/api-config.txt
@@ -186,10 +186,6 @@ parsing is successful, the return value is the result.
 Same as `git_config_bool`, except that integers are returned as-is, and
 an `is_bool` flag is unset.
 
-`git_config_maybe_bool`::
-Deprecated. Use `git_parse_maybe_bool` instead. They are exactly the
-same, except this function takes an unused argument `name`.
-
 `git_parse_maybe_bool`::
 Same as `git_config_bool`, except that it returns -1 on error rather
 than dying.
diff --git a/config.h b/config.h
index 97471b887..456b3d134 100644
--- a/config.h
+++ b/config.h
@@ -56,7 +56,6 @@ extern unsigned long git_config_ulong(const char *, const 
char *);
 extern ssize_t git_config_ssize_t(const char *, const char *);
 extern int git_config_bool_or_int(const char *, const char *, int *);
 extern int git_config_bool(const char *, const char *);
-extern int git_config_maybe_bool(const char *, const char *);
 extern int git_config_string(const char **, const char *, const char *);
 extern int git_config_pathname(const char **, const char *, const char *);
 extern int git_config_set_in_file_gently(const char *, const char *, const 
char *);
diff --git a/config.c b/config.c
index d0d8ce823..f627657cf 100644
--- a/config.c
+++ b/config.c
@@ -956,11 +956,6 @@ int git_parse_maybe_bool(const char *value)
return -1;
 }
 
-int git_config_maybe_bool(const char *name, const char *value)
-{
-   return git_parse_maybe_bool(value);
-}
-
 int git_config_bool_or_int(const char *name, const char *value, int *is_bool)
 {
int v = git_parse_maybe_bool_text(value);
-- 
2.14.1.460.g848a19d64

Re: signing commits using gpg2

2017-09-05 Thread shawn wilson

Apparently you need to set the GPG_TTY for git to work (I also set the
gpg.program so I know it shouldn't /need/ that variable set)
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=840687#10

I'm not sure if there's anything that has or can be done upstream to
make this easier (I feel this was a bigger PITA than it needed to be),
I'm on git 2.7.5 from Fedira.

On Tue, Sep 5, 2017 at 9:40 AM, Michael J Gruber  wrote:
> shawn wilson venit, vidit, dixit 02.09.2017 23:11:
>> tl;dr - how do I get git to use gpg2 to sign things?
>>
>> I'm using gpg2 (so no agent options are configured but an agent is
>> running) which is configured w/ a Nitrokey (Pro if it matters):
>>
>>  % git commit -m "Initial."
>>
>>  gits/bash-libs (master ⚡) localhost
>> gpg: detected reader `Nitrokey Nitrokey Pro (3467) 00 00'
>> gpg: pcsc_connect failed: sharing violation (0x801b)
>> gpg: apdu_send_simple(0) failed: locking failed
>> Please insert the card and hit return or enter 'c' to cancel:
>> gpg: pcsc_connect failed: sharing violation (0x801b)
>> gpg: pcsc_connect failed: sharing violation (0x801b)
>> gpg: apdu_send_simple(0) failed: locking failed
>> Please insert the card and hit return or enter 'c' to cancel: c
>> gpg: selecting openpgp failed: general error
>> gpg: signing failed: general error
>> gpg: signing failed: general error
>> error: gpg failed to sign the data
>> fatal: failed to write commit object
>>
>> This works with gpg and ssh:
>
> Not really...
>
>>  % touch foo
>>
>>  ~ localhost
>>  % gpg2 --sign foo
>
> ... because you're using gpg2, not gpg.
>
>>
>>  ~ localhost
>> gpg: using "846FF490" as default secret key for signing
>>  % cat foo*
>>
>>  ~ localhost
>> -BEGIN PGP MESSAGE-
>> Version: GnuPG v2
>>
>> owEBuQFG/pANAwAKAYwdY7SEb/SQAcsJYgNmb29ZqxfviQGcBAABCgAGBQJZqxfv
>> AAoJEIwdY7SEb/SQAcEL/jonw+HymnlmfebtEwlvfx2Gl1Sbuw0xWWPpQ2Dtjljz
>> HtpD+LWczjpOSMTHFNK9xPR2kcs1WNY+mO8M45QI7iDgFkKRzaxEqeNUJkoyF/+I
>> 81VMmXDQMXFs4+8jy00b+UxTdvwdXaHMsOtu+6YCtmCR5Bzohg07ADsnXnGGn3Sd
>> WTjVMzV6Dlh8LRF+coGJ8JuErBsRAI6vdNgJRVHYBULGNXci4uF/4a+58uiTL4/U
>> PvC4ruXCNxCKi89nMERhwlnOvglseX3TDR5ldrc4Hzb+pLsj/l6N4sBW0Zmb8UcE
>> 9BG3WjOs4eZvnLmk5XHrwisD2CXuHvyWMl0yH7LTrg+m4Itj0PJ4Px4H9E5t/zfs
>> C1vcB/okcigeIyXnO06um02a5oZAYOKadB+6NRnBjULz5GvP2yxj/AO1VPmZprpt
>> budMuHZcA0zNE3uBmcnQY5+1tdkyTrlTxsL58lQrn/U3wvgah3AXMEvjRGqbYWHj
>> jDikQVJ7ESoevNqlfLPj8Q==
>> =hV6v
>> -END PGP MESSAGE-
>>
>> However, if I try this w/ the old gpg:
>>
>>  % gpg -ae -o foo.gpg foo
>>
>>  ~ localhost
>>  % gpg -d foo.gpg
>>
>>  ~ localhost
>> gpg: detected reader `Nitrokey Nitrokey Pro (3467) 00 00'
>> gpg: pcsc_connect failed: sharing violation (0x801b)
>> gpg: apdu_send_simple(0) failed: locking failed
>> Please insert the card and hit return or enter 'c' to cancel: c
>> gpg: selecting openpgp failed: general error
>> gpg: encrypted with 3072-bit RSA key, ID 41826CFB, created 2017-03-13
>>   "Shawn Wilson "
>> gpg: public key decryption failed: general error
>> gpg: decryption failed: secret key not available
>>  % gpg2 -d foo.gpg
>>
>>  ~ localhost
>> gpg: encrypted with 3072-bit RSA key, ID E27FA0B841826CFB, created 2017-03-13
>>   "Shawn Wilson "
>> foo
>>
>> (yeah I added data to the file)
>>
>> And just to prove basic competency checking:
>>
>>  % git config --global -l | grep sign
>>
>>  ~ localhost
>> user.signingkey=846FF490
>> filter.gitconfig-rmuser.clean=sed -e "s/^\( *email =\).*/\1 > address>/" -e "s/^\( *name =\).*/\1 /" -e "s/^\(
>> *signingkey =\).*/\1 /"
>> filter.gitconfig-rmuser.smudge=egrep "^ *(email|name|signingkey) = "
>> commit.gpgsign=true
>>
>
> So, gpg2 works and gpg does not. This is typical for the way in which
> the gpg upgrade path is broken, and your distro installs gpg because it
> still relies on it.
>
> git sees two executables gpg and gpg2 and uses the first, so as to not
> migrate your secrete key store inadvertently.
>
> Short answer: Use
>
> git config --global gpg.program gpg2
>
> to make git use gpg2 which apparantly is your working gnupg setup.
>
> Michael

Re: Empty directories in Git: Current guidance for historical commits?

2017-09-05 Thread Torsten Bögershausen

On 2017-09-05 14:47, wafflec...@openmail.cc wrote:
> 
> Hello,
> 
> Per the FAQ it's clear that Git (by current design) does not track empty
> directories: 
> https://git.wiki.kernel.org/index.php/GitFaq#Can_I_add_empty_directories.3F
> 
> That's fine and I can fix that for my code going forward via the build system,
> as needed.
> 
> However I'm looking to import a fairly deep (10k+ commits) svn history in to 
> Git
> and I'm wondering what the current best practices are for dealing with empty
> folders?   Meaning to say, I'd like to be able to check out an older revision 
> of
> the code and have it build correctly using the older build system which 
> expects
> certain folders to exist.
> 
> There are a few different answers floating around the net, so I figured I'd
> confirm via the mailing list.
> 
> Is just dropping a ".gitignore" or "README" file in the empty directories 
> during
> conversion still the most reasonable approach? 

A .gitignore will, but may cost 0.001% CPU time when running "git status".
I have seen systems that create a file called ".empty" in every directory.

> If so, is there a way to do this
> automatically during the conversion using "git svn" or the like?

Not what I am aware of.
And even if, the .empty files need to be removed, once the directory is
removed.
And we don't have logic for that, as far as I know.

That would be by suggestion:
convert the repo "as is", with the tools we have.

If you need to continue to work on an old branch (or the master branch),
add the .empty file, where needed, and continue to work in Git, on
a branch as usual.

> 
> 
> Thank you
> David
>

Re: [PATCH 0/10] towards clean leak-checker output

2017-09-05 Thread Martin Ågren

On 5 September 2017 at 15:01, Jeff King  wrote:
> Using leak-checking tools like valgrind or LSAN is usually not all that
> productive with Git, because the output is far from clean. And _most_ of
> these are just not interesting, as they're either:
>
>   1. Real leaks, but ones that only be triggered a small, set number of
>  times per program.
>
>   2. Intentional leaks of variables as the main cmd_* functions go out
>  of scope (as opposed to manually cleaning).
>
> I focused here on getting t and t0001 to run clean against a
> leak-checking tool. That's a fraction of the total test suite, but my
> goal was have a tractable sample which could let us see if the path
> seems sane.
>
> I feel positive overall about the approach this series takes. The
> suppressions aren't too terrible to write, and I found some real (albeit
> minor) leaks in these few tests. I hope it can serve as a base for an
> ongoing effort to get the whole test suite clean.
>
> A few things to note:
>
>   - I switched from valgrind to ASAN/LSAN early on. It's just way
> faster, which makes the compile-test-fix cycle a lot more pleasant.
> With these patches, you should be able to do:
>
>   make SANITIZE=leak && (cd t && ./t1234-* -v -i)
>
> and get a leak report for a single script dumped to stderr.
>
> If you want to try it with t, you'll need the lock-file series I
> just posted at:
>
>   
> https://public-inbox.org/git/20170905121353.62zg3mtextmq5...@sigill.intra.peff.net/
>
> and the fix from Martin at:
>
>   
> https://public-inbox.org/git/CAN0heSqn59sFF3A-eQ593G+ZDWnO9pKM5F=sgisqk+prur-...@mail.gmail.com/
>
> to get a clean run.
>
>   - I'm using LSAN instead of the full-on ASAN because it's faster. The
> docs warn that it's a bit experimental, and I did notice a few funny
> behaviors (like truncated backtraces), but overall it seems fine.
> You can also do:
>
>   make SANITIZE=address &&
> (cd t && ASAN_OPTIONS=abort_on_error=1 ./t1234-* -v -i)
>
> to do a full ASAN run (the extra environment setting is necessary to
> override test-lib's defaults).
>
>   - gcc's leak-checker doesn't seem to handle reachability correctly (or
> maybe I'm holding it wrong). As a simple case, building with ASAN
> and running git-init complains:
>
>   $ make SANITIZE=address && ./git init foo
>   ...
>   Direct leak of 2 byte(s) in 1 object(s) allocated from:
>   #0 0x7f011dc699e0 in malloc 
> (/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd99e0)
>   #1 0x558eeedbdab5 in do_xmalloc /home/peff/compile/git/wrapper.c:60
>   #2 0x558eeedbdbab in do_xmallocz 
> /home/peff/compile/git/wrapper.c:100
>   #3 0x558eeedbdd0d in xmallocz /home/peff/compile/git/wrapper.c:108
>   #4 0x558eeedbdd0d in xmemdupz /home/peff/compile/git/wrapper.c:124
>   #5 0x558eeedbdd0d in xstrndup /home/peff/compile/git/wrapper.c:130
>   #6 0x558eeea0507a in main /home/peff/compile/git/common-main.c:39
>   #7 0x7f011cf612e0 in __libc_start_main 
> (/lib/x86_64-linux-gnu/libc.so.6+0x202e0)
>
> That line is the setting of argv0_path, which is a global (and thus
> shouldn't be marked as leaking). Interestingly, it only happens with
> -O2. Compiling with -O0 works fine. I'm not sure if it's a bug or
> what.
>
> I did most of my testing with clang-6.0, which gets this case right.

Hmmm, I got the same wrong results (IMHO) from Valgrind, which
classified this as "definitely lost". Like you I found that -O0 helped.
And yes, that was with gcc. Maybe gcc with optimization somehow manages
to hide the pointers from these tools. I know too little about the
technical details to have any real ideas, though. My searches did not
bring up anything useful. (gcc 5.4.0)

I guess clang will be my next attempt. And asan/lsan. Valgrind is slow..

I'll look through this series and get back if I have any input.

>   - I have no idea who close or far this is to covering the whole suite.
> Only 98 test scripts complete with no leaks. But many of those
> failures will be hitting the same leaks over and over. It looks like
> running "git show' hits a few, which is going to affect a lot of
> scripts. But bear in mind when reading the patches that this might
> be the first step on a successful road, or it could be a dead end. :)
>
> Most of this is actual leak fixes. The interesting part, I think, is the
> UNLEAK annotation in patch 10.
>
>   [01/10]: test-lib: --valgrind should not override --verbose-log
>   [02/10]: test-lib: set LSAN_OPTIONS to abort by default
>   [03/10]: add: free leaked pathspec after add_files_to_cache()
>   [04/10]: update-index: fix cache entry leak in add_one_file()
>   [05/10]: config: plug user_config leak
>   [06/10]: reset: make tree counting less confusing
>   [07/10]: reset: free allocated tree buffers
>   [08/10]: repository: free fields before overwriting

Re: [PATCH v3 2/3] refs/files-backend: fix memory leak in lock_ref_for_update

2017-09-05 Thread Martin Ågren

On 5 September 2017 at 10:47, Jeff King  wrote:
> On Tue, Aug 29, 2017 at 07:18:23PM +0200, Martin Ågren wrote:
>
>> After the previous patch, none of the functions we call hold on to
>> `referent.buf`, so we can safely release the string buffer before
>> returning.
>
> I ended up doing this a little differently in my version:
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 9266f5ab9d..1d16c1b33e 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2287,9 +2292,12 @@ static int lock_ref_for_update(struct files_ref_store 
> *refs,
>  * the transaction, so we have to read it here
>  * to record and possibly check old_sha1:
>  */
> -   if (refs_read_ref_full(>base,
> -  referent.buf, 0,
> -  lock->old_oid.hash, NULL)) {
> +   ret = refs_read_ref_full(>base,
> +referent.buf, 0,
> +lock->old_oid.hash, NULL);
> +   strbuf_release();
> +
> +   if (ret) {
> if (update->flags & REF_HAVE_OLD) {
> strbuf_addf(err, "cannot lock ref 
> '%s': "
> "error reading reference",
> @@ -2310,6 +2318,7 @@ static int lock_ref_for_update(struct files_ref_store 
> *refs,
> ret = split_symref_update(refs, update,
>   referent.buf, transaction,
>   affected_refnames, err);
> +   strbuf_release();
> if (ret)
> return ret;
> }
>
> After we look at referent.buf once in each of the branch arms, we don't
> need it at all. Disposing of it there means we don't have to worry about
> it in all of the later early-returns.
>
> I'm assuming that referent will always be empty unless REF_ISSYMREF is
> set. Which seems reasonable, but I didn't double check.

Some time after I posted v3, I had the same thought. I did double-check
and it does hold. But then I thought that even if it holds now, maybe
it's a bit too brittle. On the other hand, my patch is quite noisy and
maybe the connection between the two is obvious enough that it will hold
in the future as well.

> Food for thought. I'd be OK with either version.

So would I...

Martin

Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Martin Ågren

On 5 September 2017 at 12:02, Junio C Hamano  wrote:
> Martin Ågren  writes:
>
>> On 30 August 2017 at 04:52, Michael Haggerty  wrote:
>>> v3 looks good to me. Thanks!
>>>
>>> Reviewed-by: Michael Haggerty 
>>
>> Thank _you_ for very helpful feedback on the earlier versions.
>>
>> Martin
>
> Yes, the earlier attempt was sort-of barking up a wrong tree.
>
> Once we step back and observe other users of affected_refnames and
> realize that the list is meant to to point at a refname field of an
> existing instance of another structure by borrowing, the blame
> shifts from files_transaction_prepare() to the real culprit.
> Michael's review gave us a very good "let's step back a bit" that
> made the huge improvement between v1 and v2/v3.
>
> I wonder if we should be tightening the use of affected_refnames
> even further, though.
>
> It is may be sufficient to make sure that we do not throw anything
> that we would need to free as part of destroying affected_refnames
> into the string list, purely from the "leak prevention" perspective.
>
> But stepping back a bit, the reason why the string list exists in
> the first place is to make sure we do not touch the same ref twice
> in a single transaction, the set of all possible updates in the
> single transaction exists somewhere, and each of these updates know
> what ref it wants to update.
>
> And that is recorded in transaction->updates[]->refname no?
>
> So it seems to me that logically any and all string that is
> registered in affected_refnames list must be the refname field of
> a ref_update structure in the transaction.

I'm with you up to here.

> And from that point of view, doesn't split_head_update() wants a
> similar fix?  It attempts to insert "HEAD", makes sure it hasn't
> been inserted and then hangs a new update transaction as its util.
> It is not wrong per-se from purely leak-prevention point of view,
> as that "HEAD" is a literal string we woudn't even want to free,
> but from logical/"what each data means" point of view, it still
> feels wrong.

There is a "Special hack" comment related to this, and I don't feel
particularly confident that I could make any meaningful contribution in
this area. To be honest, I don't immediately see in which direction your
suggestion/idea/thought is going, which tells me I should not be making
a mess out of it. :-)

Martin

Re: [PATCH] hashmap: add API to disable item counting when threaded

2017-09-05 Thread Martin Ågren

On 5 September 2017 at 18:39, Jeff Hostetler  wrote:
>
>
> On 9/1/2017 7:50 PM, Jonathan Nieder wrote:
>>
>> Hi,
>>
>> Johannes Schindelin wrote:
>>>
>>> On Wed, 30 Aug 2017, Jeff Hostetler wrote:
>>
>>
 This is to address concerns raised by ThreadSanitizer on the mailing
 list about threaded unprotected R/W access to map.size with my previous
 "disallow rehash" change (0607e10009ee4e37cb49b4cec8d28a9dda1656a4).
>>
>>
>> Nice!
>>
>> What does the message from TSan look like?  (The full message doesn't
>> need to go in the commit message, but a snippet can help.)  How can I
>> reproduce it?
>
>
> I'll let Martin common on how to run TSan; I'm just going on
> what he reported in the "tsan: t3008..." message from the URL
> I quoted.  I didn't think to copy that text into the commit
> message because it is just stack traces and too long, but I
> could include a snippet.

I ran the test suite with ThreadSanitizer:

$ make SANITIZE=thread test

Any failures were then inspected:

$ cd t
$ ./t3008-ls-files-lazy-init-name-hash.sh --verbose

That can be done with or without ma/ts-cleanups. That series adds a file
.tsan-suppressions, which can be used by defining the environment
variable

TSAN_OPTIONS="suppressions=/some/absolute/path/.tsan-suppressions"

in order to suppress some findings which are indeed races, but which are
"not a problem in practice".

Martin

Re: [PATCH 17/34] mailinfo: release strbuf on error return in handle_boundary()

2017-09-05 Thread Martin Ågren

On 31 August 2017 at 19:21, René Scharfe  wrote:
> Am 30.08.2017 um 20:23 schrieb Martin Ågren:
>> On 30 August 2017 at 19:49, Rene Scharfe  wrote:
>>> Signed-off-by: Rene Scharfe 
>>> ---
>>>   mailinfo.c | 1 +
>>>   1 file changed, 1 insertion(+)
>>>
>>> diff --git a/mailinfo.c b/mailinfo.c
>>> index b1f5159546..f2387a3267 100644
>>> --- a/mailinfo.c
>>> +++ b/mailinfo.c
>>> @@ -911,48 +911,49 @@ static int find_boundary(struct mailinfo *mi, struct 
>>> strbuf *line)
>>>   static int handle_boundary(struct mailinfo *mi, struct strbuf *line)
>>>   {
>>>  struct strbuf newline = STRBUF_INIT;
>>>
>>>  strbuf_addch(, '\n');
>>>   again:
>>>  if (line->len >= (*(mi->content_top))->len + 2 &&
>>>  !memcmp(line->buf + (*(mi->content_top))->len, "--", 2)) {
>>>  /* we hit an end boundary */
>>>  /* pop the current boundary off the stack */
>>>  strbuf_release(*(mi->content_top));
>>>  FREE_AND_NULL(*(mi->content_top));
>>>
>>>  /* technically won't happen as is_multipart_boundary()
>>> will fail first.  But just in case..
>>>   */
>>>  if (--mi->content_top < mi->content) {
>>>  error("Detected mismatched boundaries, can't 
>>> recover");
>>>  mi->input_error = -1;
>>>  mi->content_top = mi->content;
>>> +   strbuf_release();
>>>  return 0;
>>>  }
>>
>> Since this code path can't be taken (or so it says): How did you find
>> this and the others? Static analysis? Grepping around?
>
> Code inspection: I looked for functions with STRBUF_INIT that return
> without calling strbuf_release() with "git grep -W STRBUF_INIT" and
> searching for return in less(1).

Thanks for sharing.

I read through this series and thought it looked good. One thing I like
is that it doesn't just blindly apply some standard solution in all
patches, but always tries to do what it feels is "right" for each
particular function.

Martin

Re: [PATCH] hashmap: add API to disable item counting when threaded

2017-09-05 Thread Jeff Hostetler




On 9/2/2017 4:05 AM, Jeff King wrote:

On Wed, Aug 30, 2017 at 06:59:22PM +, Jeff Hostetler wrote:


From: Jeff Hostetler 

This is to address concerns raised by ThreadSanitizer on the
mailing list about threaded unprotected R/W access to map.size with my previous
"disallow rehash" change (0607e10009ee4e37cb49b4cec8d28a9dda1656a4).  See:
https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.ag...@gmail.com/

Add API to hashmap to disable item counting and to disable automatic rehashing.
Also include APIs to re-enable item counting and automatica rehashing.


It may be worth summarizing the discussion at that thread here.

At first glance, it looks like this is woefully inadequate for allowing
multi-threaded access. But after digging, the issue is that we're really
trying to accommodate one specific callers which is doing its own
per-bucket locking, and which needs the internals of the hashmap to be
truly read-only.

I suspect the code might be easier to follow if that pattern were pushed
into its own threaded_hashmap that disabled the size and handled the
mod-n locking, but I don't insist on that as a blocker to this fix.


Agreed.  It would be better and easier to understand to produce
a thread-safe version of the hashmap code -- there are other uses
of the code that are currently doing their own locking that might
benefit, but I'm not looking to gratuitously refactor things.

The other issue was that we are multi-threaded during the
lazy-init phase, but the main status-or-whatever code that then
uses the hashmap is not.  So we only pay for the locking during
the multi-threaded usage.

Thanks,
Jeff

Re: [Bug] commit-tree shouldn't append an extra newline to commit messages

2017-09-05 Thread Jeff King

On Tue, Sep 05, 2017 at 12:57:21PM -0400, Ross Kabus wrote:

> On Tue, Sep 5, 2017 at 11:36 AM, Jeff King  wrote:
> 
> > So I'd argue that "git commit -F" is doing a reasonable
> > thing, and "commit-tree -F" should probably change to match it (because
> > it's inconsistent, and because if anything the plumbing commit-tree
> > should err more on the side of not touching its input than the porcelain
> > commit command).
> 
> I would agree that "commit-tree -F" should change to match the behavior of
> "git commit -F --cleanup=verbatim". I feel pretty strongly that this type of
> cleanup logic shouldn't be done in a plumbing command, though I'm not sure
> it is a big enough deal to change behavior/compatibility for everyone.

OK. Do you want to try your hand at a patch?

> Yup, confusion #2. I was using "-F -" which I see now is a different code
> path. Reading via stdin without "-F -" _is_ the verbatim option. This
> difference burned someone else on the mailing list as well. See:

Ah, OK, your confusion makes more sense now.

-Peff

Re: [Bug] commit-tree shouldn't append an extra newline to commit messages

2017-09-05 Thread Ross Kabus

On Tue, Sep 5, 2017 at 11:36 AM, Jeff King  wrote:

> So I'd argue that "git commit -F" is doing a reasonable
> thing, and "commit-tree -F" should probably change to match it (because
> it's inconsistent, and because if anything the plumbing commit-tree
> should err more on the side of not touching its input than the porcelain
> commit command).

I would agree that "commit-tree -F" should change to match the behavior of
"git commit -F --cleanup=verbatim". I feel pretty strongly that this type of
cleanup logic shouldn't be done in a plumbing command, though I'm not sure
it is a big enough deal to change behavior/compatibility for everyone.

>   $ tree=$(git write-tree)
>   $ commit=$(printf end | git commit-tree $tree)
>   $ git cat-file commit $commit | xxd
>   ...
>   0090: 3034 3030 0a0a 656e 64   0400..end

Yup, confusion #2. I was using "-F -" which I see now is a different code
path. Reading via stdin without "-F -" _is_ the verbatim option. This
difference burned someone else on the mailing list as well. See:

https://public-inbox.org/git/cacauox6zt0wxxclxk+sk0e4hgfd_a_ewwkvwntoxk0-hzw7...@mail.gmail.com/

Probably more reason that "-F" should be 'verbatim' by default.

~Ross

Re: [PATCH] hashmap: add API to disable item counting when threaded

2017-09-05 Thread Jeff Hostetler




On 9/2/2017 4:17 AM, Jeff King wrote:

On Sat, Sep 02, 2017 at 01:31:19AM +0200, Johannes Schindelin wrote:

Before anybody can ask for this message to be wrapped in _(...) to be
translateable, let me suggest instead to add the prefix "BUG: ".


Agreed on both (and Jonathan's suggestion to just use BUG()).


will do.  thanks.




+static inline void hashmap_enable_item_counting(struct hashmap *map)
+{
+   void *item;
+   unsigned int n = 0;
+   struct hashmap_iter iter;
+
+   hashmap_iter_init(map, );
+   while ((item = hashmap_iter_next()))
+   n++;
+
+   map->do_count_items = 1;
+   map->private_size = n;
+}


BTW this made me think that we may have a problem in our code since
switching from my original hashmap implementation to the bucket one added
in 6a364ced497 (add a hashtable implementation that supports O(1) removal,
2013-11-14): while it is not expected that there are many collisions, the
"grow_at" logic still essentially assumes the number of buckets to be
equal to the number of hashmap entries.


I'm confused about what the problem is. If I am reading the code
correctly, "size" is always the number of elements and "grow_at" is the
table size times a load factor. Those are the same numbers you'd use to
decide to grow in an open-address table.

It's true that this does not take into account the actual number of
collisions we see (or the average per bucket, or however you want to
count it). But generally nor do open-address schemes (and certainly our
other hash tables just use load factor to decide when to grow).

Am I missing something?

-Peff



Hashmap is not thread-safe by itself.  There are several uses of
it in a threaded context and they all handle their own locking
before accessing the hashmap.  Those usually work by locking the
whole hashmap.

My changes in "lazy-init-name-hash" deviated from that pattern
by locking on individual hash chains.  That is, n locks each
controlling 1/nth of the chains.

https://public-inbox.org/git/1490202865-31325-1-git-send-email-...@jeffhostetler.com/

To do that I had to disable automatic rehashing for the duration
of my threaded computation.  The problem that TSan identified is
that "size" is always incremented during inserts and it doesn't
have any locks protecting it.  So even though auto-rehash was
disabled, we are still counting the number of items in the map.
Not a terrible problem, but still a race.

Jeff

Re: [PATCH] hashmap: add API to disable item counting when threaded

2017-09-05 Thread Jeff Hostetler




On 9/1/2017 7:50 PM, Jonathan Nieder wrote:

Hi,

Johannes Schindelin wrote:

On Wed, 30 Aug 2017, Jeff Hostetler wrote:



This is to address concerns raised by ThreadSanitizer on the mailing
list about threaded unprotected R/W access to map.size with my previous
"disallow rehash" change (0607e10009ee4e37cb49b4cec8d28a9dda1656a4).


Nice!

What does the message from TSan look like?  (The full message doesn't
need to go in the commit message, but a snippet can help.)  How can I
reproduce it?


I'll let Martin common on how to run TSan; I'm just going on
what he reported in the "tsan: t3008..." message from the URL
I quoted.  I didn't think to copy that text into the commit
message because it is just stack traces and too long, but I
could include a snippet.

Thanks,
Jeff

Re: [PATCH] hashmap: add API to disable item counting when threaded

2017-09-05 Thread Jeff Hostetler




On 9/1/2017 7:31 PM, Johannes Schindelin wrote:

Hi Jeff,

On Wed, 30 Aug 2017, Jeff Hostetler wrote:


From: Jeff Hostetler 

This is to address concerns raised by ThreadSanitizer on the mailing
list about threaded unprotected R/W access to map.size with my previous
"disallow rehash" change (0607e10009ee4e37cb49b4cec8d28a9dda1656a4).
See:
https://public-inbox.org/git/adb37b70139fd1e2bac18bfd22c8b96683ae18eb.1502780344.git.martin.ag...@gmail.com/

Add API to hashmap to disable item counting and to disable automatic
rehashing.  Also include APIs to re-enable item counting and automatica
rehashing.

When item counting is disabled, the map.size field is invalid.  So to
prevent accidents, the field has been renamed and an accessor function
hashmap_get_size() has been added.  All direct references to this field
have been been updated.  And the name of the field changed to
map.private_size to communicate thie.

Signed-off-by: Jeff Hostetler 
---


The Git contribution process forces me to point out lines longer than 80
columns. I wish there was already an automated tool to fix that, but we
(as in "the core Git developers") have not yet managed to agree on one. So
I'll have to ask you to identify and fix them manually.


I'm not sure which lines you're talking about, but I'll
give it another scan and double check.

There's not much I can do about the public-inbox.org URL.




@@ -253,6 +253,19 @@ static inline void hashmap_entry_init(void *entry, 
unsigned int hash)
  }
  
  /*

+ * Return the number of items in the map.
+ */
+inline unsigned int hashmap_get_size(struct hashmap *map)
+{
+   if (map->do_count_items)
+   return map->private_size;
+
+   /* TODO Consider counting them and returning that. */


I'd rather not. If counting is disabled, it is disabled.


+   die("hashmap_get_size: size not set");


Before anybody can ask for this message to be wrapped in _(...) to be
translateable, let me suggest instead to add the prefix "BUG: ".


Good point.  Thanks.




+static inline void hashmap_enable_item_counting(struct hashmap *map)
+{
+   void *item;
+   unsigned int n = 0;
+   struct hashmap_iter iter;
+
+   hashmap_iter_init(map, );
+   while ((item = hashmap_iter_next()))
+   n++;
+
+   map->do_count_items = 1;
+   map->private_size = n;
+}


BTW this made me think that we may have a problem in our code since
switching from my original hashmap implementation to the bucket one added
in 6a364ced497 (add a hashtable implementation that supports O(1) removal,
2013-11-14): while it is not expected that there are many collisions, the
"grow_at" logic still essentially assumes the number of buckets to be
equal to the number of hashmap entries.

Your code simply reiterates that assumption, so I do not blame you for
anything here, nor ask you to change your patch.


I'm not sure what you're saying here.  The iterator iterates over
all entries (and handles walking collision chains), so my newly
computed count should be correct and all of this is independent of
the "grow-at" and table-size logic.

I'm not forcing a rehash when counting is enabled.  I'm just
reestablishing the expected state.  The next insert may cause
a rehash, but I'm not forcing it.

However, there is an assumption that the caller pre-allocated sufficient
table-size space to avoid poor performance for the duration of the
non-counting period.



But it does look a bit weird to assume so much about the nature of our
data, without having any real-life numbers. I wish I had more time so that
I could afford to run a couple of tests on this hashmap, such as: what is
the typical difference between bucket count and entry count, or the median
of the bucket sizes when the map is 80% full (i.e. *just* below the grow
threshold).


Personally, I think the 80% threshold is too aggressive (and the
default size is too small), but that's a different question.

The hashmap in question contains directory pathnames, so the
distribution will be completely dependent on the shape of the
data.

FWIW, I created a tool to dump some of this data.  See:
t/helper/test-lazy-init-name-hash.c




diff --git a/name-hash.c b/name-hash.c
index 0e10f3e..829ff59 100644
--- a/name-hash.c
+++ b/name-hash.c
@@ -580,9 +580,11 @@ static void lazy_init_name_hash(struct index_state *istate)
NULL, istate->cache_nr);
  
  	if (lookup_lazy_params(istate)) {

-   hashmap_disallow_rehash(>dir_hash, 1);
+   hashmap_disable_item_counting(>dir_hash);
+   hashmap_disable_auto_rehash(>dir_hash);
threaded_lazy_init_name_hash(istate);
-   hashmap_disallow_rehash(>dir_hash, 0);
+   hashmap_enable_auto_rehash(>dir_hash);
+   hashmap_enable_item_counting(>dir_hash);


By your rationale, it would be enough to simply disable and re-enable
counting...

The rest of the patch looks

Re: [Bug] commit-tree shouldn't append an extra newline to commit messages

2017-09-05 Thread Jeff King

On Tue, Sep 05, 2017 at 11:09:01AM -0400, Ross Kabus wrote:

> On Sat, Sep 2, 2017 at 4:33 AM, Jeff King  wrote:
> 
> > But I am confused by your "inconsistent with git commit porcelain"
> > comment. The porcelain git-commit definitely _does_ add a newline if one
> > isn't present (and in fact runs the whole thing through git-stripspace
> > to clean up whitespace oddities).
> 
> Ok I figured out my confusion. The repository I am working with did
> commits with "git commit --cleanup=verbatim" thus do not have a newline.
> This is why I thought there was an inconsistency.

OK, that makes more sense. Though I think even with verbatim, "-m" will
still complete a line (it happens in opt_parse_m):

  $ git commit -q --allow-empty -m foo
  $ git cat-file commit HEAD | xxd
  ...
  0090: 3034 3030 0a0a 666f 6f0a 0400..foo.
  ^^
  newline

That makes sense, since the idea of "-m" is to take a one-liner, but you
would not typically provide the newline yourself via the shell.

It looks like "-F" does not do an explicit complete-line (so it would
depend on --cleanup to do it normally, but in verbatim mode would not).
That makes sense to me, as well (in a full file, usually you _would_
have the newline, or choose to omit it intentionally, and we should
respect that). So I'd argue that "git commit -F" is doing a reasonable
thing, and "commit-tree -F" should probably change to match it (because
it's inconsistent, and because if anything the plumbing commit-tree
should err more on the side of not touching its input than the porcelain
commit command).

> > So I don't think "inconsistent with git-commit" is a compelling
> > argument, unless I'm missing something.
> 
> Agreed, but now I guess I would argue that it is inconsistent because
> it lacks a "verbatim" option like git-commit has. I would like to see
> an argument like this for commit-tree but a clean way to add this option
> didn't immediately jump out at me.

I think the default behavior of reading via stdin _is_ the verbatim
option. And you can choose whether or not to pipe it through
git-stripspace to do more cleanup (and in fact, when git-commit was
implemented as a shell script long ago, that's exactly how it was
implemented). But...

> > And definitely it does not when taking the message in via stdin.
> 
> I'm not seeing this, I see commit-tree as adding a new line even via
> stdin (and the code seems to corroborate this), unless I missed something.

I'm not sure why you're seeing that. It seems verbatim to me:

  $ tree=$(git write-tree)
  $ commit=$(printf end | git commit-tree $tree)
  $ git cat-file commit $commit | xxd
  ...
  0090: 3034 3030 0a0a 656e 64   0400..end

-Peff

Re: [Bug] commit-tree shouldn't append an extra newline to commit messages

2017-09-05 Thread Ross Kabus

On Sat, Sep 2, 2017 at 4:33 AM, Jeff King  wrote:

> But I am confused by your "inconsistent with git commit porcelain"
> comment. The porcelain git-commit definitely _does_ add a newline if one
> isn't present (and in fact runs the whole thing through git-stripspace
> to clean up whitespace oddities).

Ok I figured out my confusion. The repository I am working with did
commits with "git commit --cleanup=verbatim" thus do not have a newline.
This is why I thought there was an inconsistency.

> So I don't think "inconsistent with git-commit" is a compelling
> argument, unless I'm missing something.

Agreed, but now I guess I would argue that it is inconsistent because
it lacks a "verbatim" option like git-commit has. I would like to see
an argument like this for commit-tree but a clean way to add this option
didn't immediately jump out at me.

> And definitely it does not when taking the message in via stdin.

I'm not seeing this, I see commit-tree as adding a new line even via
stdin (and the code seems to corroborate this), unless I missed something.

~Ross

Empty directories in Git: Current guidance for historical commits?

2017-09-05 Thread wafflecode



Hello,

Per the FAQ it's clear that Git (by current design) does not track  
empty directories:   
https://git.wiki.kernel.org/index.php/GitFaq#Can_I_add_empty_directories.3F


That's fine and I can fix that for my code going forward via the build  
system, as needed.


However I'm looking to import a fairly deep (10k+ commits) svn history  
in to Git and I'm wondering what the current best practices are for  
dealing with empty folders?   Meaning to say, I'd like to be able to  
check out an older revision of the code and have it build correctly  
using the older build system which expects certain folders to exist.


There are a few different answers floating around the net, so I  
figured I'd confirm via the mailing list.


Is just dropping a ".gitignore" or "README" file in the empty  
directories during conversion still the most reasonable approach?  If  
so, is there a way to do this automatically during the conversion  
using "git svn" or the like?



Thank you
David



-

ONLY AT VFEmail! - Use our Metadata Mitigator to keep your email out of the 
NSA's hands!
$24.95 ONETIME Lifetime accounts with Privacy Features!  
15GB disk! No bandwidth quotas!
Commercial and Bulk Mail Options!

Re: signing commits using gpg2

2017-09-05 Thread Michael J Gruber

shawn wilson venit, vidit, dixit 02.09.2017 23:11:
> tl;dr - how do I get git to use gpg2 to sign things?
> 
> I'm using gpg2 (so no agent options are configured but an agent is
> running) which is configured w/ a Nitrokey (Pro if it matters):
> 
>  % git commit -m "Initial."
> 
>  gits/bash-libs (master ⚡) localhost
> gpg: detected reader `Nitrokey Nitrokey Pro (3467) 00 00'
> gpg: pcsc_connect failed: sharing violation (0x801b)
> gpg: apdu_send_simple(0) failed: locking failed
> Please insert the card and hit return or enter 'c' to cancel:
> gpg: pcsc_connect failed: sharing violation (0x801b)
> gpg: pcsc_connect failed: sharing violation (0x801b)
> gpg: apdu_send_simple(0) failed: locking failed
> Please insert the card and hit return or enter 'c' to cancel: c
> gpg: selecting openpgp failed: general error
> gpg: signing failed: general error
> gpg: signing failed: general error
> error: gpg failed to sign the data
> fatal: failed to write commit object
> 
> This works with gpg and ssh:

Not really...

>  % touch foo
> 
>  ~ localhost
>  % gpg2 --sign foo

... because you're using gpg2, not gpg.

> 
>  ~ localhost
> gpg: using "846FF490" as default secret key for signing
>  % cat foo*
> 
>  ~ localhost
> -BEGIN PGP MESSAGE-
> Version: GnuPG v2
> 
> owEBuQFG/pANAwAKAYwdY7SEb/SQAcsJYgNmb29ZqxfviQGcBAABCgAGBQJZqxfv
> AAoJEIwdY7SEb/SQAcEL/jonw+HymnlmfebtEwlvfx2Gl1Sbuw0xWWPpQ2Dtjljz
> HtpD+LWczjpOSMTHFNK9xPR2kcs1WNY+mO8M45QI7iDgFkKRzaxEqeNUJkoyF/+I
> 81VMmXDQMXFs4+8jy00b+UxTdvwdXaHMsOtu+6YCtmCR5Bzohg07ADsnXnGGn3Sd
> WTjVMzV6Dlh8LRF+coGJ8JuErBsRAI6vdNgJRVHYBULGNXci4uF/4a+58uiTL4/U
> PvC4ruXCNxCKi89nMERhwlnOvglseX3TDR5ldrc4Hzb+pLsj/l6N4sBW0Zmb8UcE
> 9BG3WjOs4eZvnLmk5XHrwisD2CXuHvyWMl0yH7LTrg+m4Itj0PJ4Px4H9E5t/zfs
> C1vcB/okcigeIyXnO06um02a5oZAYOKadB+6NRnBjULz5GvP2yxj/AO1VPmZprpt
> budMuHZcA0zNE3uBmcnQY5+1tdkyTrlTxsL58lQrn/U3wvgah3AXMEvjRGqbYWHj
> jDikQVJ7ESoevNqlfLPj8Q==
> =hV6v
> -END PGP MESSAGE-
> 
> However, if I try this w/ the old gpg:
> 
>  % gpg -ae -o foo.gpg foo
> 
>  ~ localhost
>  % gpg -d foo.gpg
> 
>  ~ localhost
> gpg: detected reader `Nitrokey Nitrokey Pro (3467) 00 00'
> gpg: pcsc_connect failed: sharing violation (0x801b)
> gpg: apdu_send_simple(0) failed: locking failed
> Please insert the card and hit return or enter 'c' to cancel: c
> gpg: selecting openpgp failed: general error
> gpg: encrypted with 3072-bit RSA key, ID 41826CFB, created 2017-03-13
>   "Shawn Wilson "
> gpg: public key decryption failed: general error
> gpg: decryption failed: secret key not available
>  % gpg2 -d foo.gpg
> 
>  ~ localhost
> gpg: encrypted with 3072-bit RSA key, ID E27FA0B841826CFB, created 2017-03-13
>   "Shawn Wilson "
> foo
> 
> (yeah I added data to the file)
> 
> And just to prove basic competency checking:
> 
>  % git config --global -l | grep sign
> 
>  ~ localhost
> user.signingkey=846FF490
> filter.gitconfig-rmuser.clean=sed -e "s/^\( *email =\).*/\1  address>/" -e "s/^\( *name =\).*/\1 /" -e "s/^\(
> *signingkey =\).*/\1 /"
> filter.gitconfig-rmuser.smudge=egrep "^ *(email|name|signingkey) = "
> commit.gpgsign=true
> 

So, gpg2 works and gpg does not. This is typical for the way in which
the gpg upgrade path is broken, and your distro installs gpg because it
still relies on it.

git sees two executables gpg and gpg2 and uses the first, so as to not
migrate your secrete key store inadvertently.

Short answer: Use

git config --global gpg.program gpg2

to make git use gpg2 which apparantly is your working gnupg setup.

Michael

Re: Donation

2017-09-05 Thread Mavis Wanczyk Foundation

Greetings To You,

My Name is Mavis wanczyk , the winner of the Power ball jackpot of $ $758.7 
million  in the AUGUST 24, 2017, My jackpot was a gift from God to me hence my 
Entire family/foundation has AGREED to do this. My foundation is donating 
$500,000.00USD to you. please contac maviswanczyk...@gmail.com for full details 
and please accept this token as a gift from me and my family.

Read more: 
http://money.cnn.com/2017/08/23/news/powerball-700-million-jackpot/index.html

 Best Regards,
 Mavis Wanczyk

[PATCH 08/10] repository: free fields before overwriting them

2017-09-05 Thread Jeff King

It's possible that the repository data may be initialized
twice (e.g., after doing a chdir() to the top of the
worktree we may have to adjust a relative git_dir path). We
should free() any existing fields before assigning to them
to avoid leaks.

This should be safe, as the fields are set based on the
environment or on other strings like the gitdir or
commondir. That makes it impossible that we are feeding an
alias to the just-freed string.

Signed-off-by: Jeff King 
---
 environment.c | 4 +++-
 repository.c  | 4 
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/environment.c b/environment.c
index 3fd4b10845..f1f934b6fd 100644
--- a/environment.c
+++ b/environment.c
@@ -97,7 +97,7 @@ int ignore_untracked_cache_config;
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
 
-static const char *namespace;
+static char *namespace;
 
 static const char *super_prefix;
 
@@ -152,8 +152,10 @@ void setup_git_env(void)
if (getenv(NO_REPLACE_OBJECTS_ENVIRONMENT))
check_replace_refs = 0;
replace_ref_base = getenv(GIT_REPLACE_REF_BASE_ENVIRONMENT);
+   free(git_replace_ref_base);
git_replace_ref_base = xstrdup(replace_ref_base ? replace_ref_base
  : "refs/replace/");
+   free(namespace);
namespace = expand_namespace(getenv(GIT_NAMESPACE_ENVIRONMENT));
shallow_file = getenv(GIT_SHALLOW_FILE_ENVIRONMENT);
if (shallow_file)
diff --git a/repository.c b/repository.c
index f107af7d76..52f1821c6b 100644
--- a/repository.c
+++ b/repository.c
@@ -40,11 +40,15 @@ static void repo_setup_env(struct repository *repo)
 
repo->different_commondir = find_common_dir(, repo->gitdir,
!repo->ignore_env);
+   free(repo->commondir);
repo->commondir = strbuf_detach(, NULL);
+   free(repo->objectdir);
repo->objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir,
"objects", !repo->ignore_env);
+   free(repo->graft_file);
repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir,
 "info/grafts", !repo->ignore_env);
+   free(repo->index_file);
repo->index_file = git_path_from_env(INDEX_ENVIRONMENT, repo->gitdir,
 "index", !repo->ignore_env);
 }
-- 
2.14.1.721.gc5bc1565f1

[PATCH 07/10] reset: free allocated tree buffers

2017-09-05 Thread Jeff King

We read the tree objects with fill_tree_descriptor(), but
never actually free the resulting buffers, causing a memory
leak. This isn't a huge deal because we call this code at
most twice per program invocation. But it does potentially
double our heap usage if you have large root trees. Let's
free the trees before returning.


Signed-off-by: Jeff King 
---
Actually, in the two-call case we repeat one of the trees,
so we _could_ in theory stash it away. But we don't do that
currently, and it's probably not worth the effort. My main
motivation is just shutting up the leak checker so I can
find actual important leaks.

 builtin/reset.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index 34839db8ca..9e22b4a1dc 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -44,10 +44,11 @@ static inline int is_merge(void)
 
 static int reset_index(const struct object_id *oid, int reset_type, int quiet)
 {
-   int nr = 0;
+   int i, nr = 0;
struct tree_desc desc[2];
struct tree *tree;
struct unpack_trees_options opts;
+   int ret = -1;
 
memset(, 0, sizeof(opts));
opts.head_idx = 1;
@@ -81,19 +82,26 @@ static int reset_index(const struct object_id *oid, int 
reset_type, int quiet)
opts.fn = twoway_merge;
}
 
-   if (!fill_tree_descriptor(desc + nr, oid))
-   return error(_("Failed to find tree of %s."), oid_to_hex(oid));
+   if (!fill_tree_descriptor(desc + nr, oid)) {
+   error(_("Failed to find tree of %s."), oid_to_hex(oid));
+   goto out;
+   }
nr++;
 
-   if (unpack_trees(nr, desc, ))
-   return -1;
+   if (unpack_trees(nr, desc, )) 
+   goto out;
 
if (reset_type == MIXED || reset_type == HARD) {
tree = parse_tree_indirect(oid);
prime_cache_tree(_index, tree);
}
 
-   return 0;
+   ret = 0;
+
+out:
+   for (i = 0; i < nr; i++)
+   free((void *)desc[i].buffer);
+   return ret;
 }
 
 static void print_new_head_line(struct commit *commit)
-- 
2.14.1.721.gc5bc1565f1

[PATCH 09/10] set_git_dir: handle feeding gitdir to itself

2017-09-05 Thread Jeff King

Ideally we'd free the existing gitdir field before assigning
the new one, to avoid a memory leak. But we can't do so
safely because some callers do the equivalent of:

  set_git_dir(get_git_dir());

We can detect that case as a noop, but there are even more
complicated cases like:

  set_git_dir(remove_leading_path(worktree, get_git_dir());

where we really do need to do some work, but the original
string must remain valid.

Rather than put the burden on callers to make a copy of the
string (only to free it later, since we'll make a copy of it
ourselves), let's solve the problem inside set_git_dir(). We
can make a copy of the pointer for the old gitdir, and then
avoid freeing it until after we've made our new copy.

Signed-off-by: Jeff King 
---
 repository.c | 10 +++---
 setup.c  |  5 -
 2 files changed, 3 insertions(+), 12 deletions(-)

diff --git a/repository.c b/repository.c
index 52f1821c6b..97c732bd48 100644
--- a/repository.c
+++ b/repository.c
@@ -56,16 +56,12 @@ static void repo_setup_env(struct repository *repo)
 void repo_set_gitdir(struct repository *repo, const char *path)
 {
const char *gitfile = read_gitfile(path);
+   char *old_gitdir = repo->gitdir;
 
-   /*
-* NEEDSWORK: Eventually we want to be able to free gitdir and the rest
-* of the environment before reinitializing it again, but we have some
-* crazy code paths where we try to set gitdir with the current gitdir
-* and we don't want to free gitdir before copying the passed in value.
-*/
repo->gitdir = xstrdup(gitfile ? gitfile : path);
-
repo_setup_env(repo);
+
+   free(old_gitdir);
 }
 
 /*
diff --git a/setup.c b/setup.c
index 23950173fc..6d8380acd2 100644
--- a/setup.c
+++ b/setup.c
@@ -399,11 +399,6 @@ void setup_work_tree(void)
if (getenv(GIT_WORK_TREE_ENVIRONMENT))
setenv(GIT_WORK_TREE_ENVIRONMENT, ".", 1);
 
-   /*
-* NEEDSWORK: this call can essentially be set_git_dir(get_git_dir())
-* which can cause some problems when trying to free the old value of
-* gitdir.
-*/
set_git_dir(remove_leading_path(git_dir, work_tree));
initialized = 1;
 }
-- 
2.14.1.721.gc5bc1565f1

[PATCH 10/10] add UNLEAK annotation for reducing leak false positives

2017-09-05 Thread Jeff King

It's a common pattern in git commands to allocate some
memory that should last for the lifetime of the program and
then not bother to free it, relying on the OS to throw it
away.

This keeps the code simple, and it's fast (we don't waste
time traversing structures or calling free at the end of the
program). But it also triggers warnings from memory-leak
checkers like valgrind or LSAN. They know that the memory
was still allocated at program exit, but they don't know
_when_ the leaked memory stopped being useful. If it was
early in the program, then it's probably a real and
important leak. But if it was used right up until program
exit, it's not an interesting leak and we'd like to suppress
it so that we can see the real leaks.

This patch introduces an UNLEAK() macro that lets us do so.
To understand its design, let's first look at some of the
alternatives.

Unfortunately the suppression systems offered by
leak-checking tools don't quite do what we want. A
leak-checker basically knows two things:

  1. Which blocks were allocated via malloc, and the
 callstack during the allocation.

  2. Which blocks were left un-freed at the end of the
 program (and which are unreachable, but more on that
 later).

Their suppressions work by mentioning the function or
callstack of a particular allocation, and marking it as OK
to leak.  So imagine you have code like this:

  int main(void)
  {
/* this allocates some memory */
char *p = some_function();
printf("%s", p);
return 0;
  }

You can say "ignore allocations from some_function(),
they're not leaks". But that's not right. That function may
be called elsewhere, too, and we would potentially want to
know about those leaks.

So you can say "ignore the callstack when main calls
some_function".  That works, but your annotations are
brittle. In this case it's only two functions, but you can
imagine that the actual allocation is much deeper. If any of
the intermediate code changes, you have to update the
suppression.

What we _really_ want to say is that "the value assigned to
p at the end of the function is not a real leak". But
leak-checkers can't understand that; they don't know about
"p" in the first place.

However, we can do something a little bit tricky if we make
some assumptions about how leak-checkers work. They
generally don't just report all un-freed blocks. That would
report even globals which are still accessible when the
leak-check is run.  Instead they take some set of memory
(like BSS) as a root and mark it as "reachable". Then they
scan the reachable blocks for anything that looks like a
pointer to a malloc'd block, and consider that block
reachable. And then they scan those blocks, and so on,
transitively marking anything reachable from a global as
"not leaked" (or at least leaked in a different category).

So we can mark the value of "p" as reachable by putting it
into a variable with program lifetime. One way to do that is
to just mark "p" as static. But that actually affects the
run-time behavior if the function is called twice (you
aren't likely to call main() twice, but some of our cmd_*()
functions are called from other commands).

Instead, we can trick the leak-checker by putting the value
into _any_ reachable bytes. This patch keeps a global
linked-list of bytes copied from "unleaked" variables. That
list is reachable even at program exit, which confers
recursive reachability on whatever values we unleak.

In other words, you can do:

  int main(void)
  {
char *p = some_function();
printf("%s", p);
UNLEAK(p);
return 0;
  }

to annotate "p" and suppress the leak report.

But wait, couldn't we just say "free(p)"? In this toy
example, yes. But using UNLEAK() has several advantages over
actually freeing the memory:

  1. It can be compiled conditionally. There's no need in
 normal runs to do this free(), and it just wastes time.
 By using a macro, we can get the benefit for leak-check
 builds with zero cost for normal builds (this patch
 uses a compile-time check, though we could clearly also
 make it a run-time check at very low cost).

 Of course one could also hide free() behind a macro, so
 this is really just arguing for having UNLEAK(), not
 for its particular implementation.

  2. It's recursive across structures. In many cases our "p"
 is not just a pointer, but a complex struct whose
 fields may have been allocated by a sub-function. And
 in some cases (e.g., dir_struct) we don't even have a
 function which knows how to free all of the struct
 members.

 By marking the struct itself as reachable, that confers
 reachability on any pointers it contains (including those
 found in embedded structs, or reachable by walking
 heap blocks recursively.

  3. It works on cases where we're not sure if the value is
 allocated or not. For example:

   char *p = argc > 1 ? argv[1] : some_function();

[PATCH 06/10] reset: make tree counting less confusing

2017-09-05 Thread Jeff King

Depending on whether we're in --keep mode, git-reset may
feed one or two trees to unpack_trees(). We start a counter
at "1" and then increment it to "2" only for the two-tree
case. But that means we must always subtract one to find the
correct array slot to fill with each descriptor.

Instead, let's start at "0" and just increment our counter
after adding each tree. This skips the extra subtraction,
and will make things much easier when we start to actually
free our tree buffers.

While we're at it, let's make the first allocation use the
slot at "desc + nr", too, even though we know "nr" is 0 at
that point. It makes the two fill_tree_descriptor() calls
consistent (always "desc + nr", followed by always
incrementing "nr").

Signed-off-by: Jeff King 
---
 builtin/reset.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index f1af9345e4..34839db8ca 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -44,7 +44,7 @@ static inline int is_merge(void)
 
 static int reset_index(const struct object_id *oid, int reset_type, int quiet)
 {
-   int nr = 1;
+   int nr = 0;
struct tree_desc desc[2];
struct tree *tree;
struct unpack_trees_options opts;
@@ -75,14 +75,16 @@ static int reset_index(const struct object_id *oid, int 
reset_type, int quiet)
struct object_id head_oid;
if (get_oid("HEAD", _oid))
return error(_("You do not have a valid HEAD."));
-   if (!fill_tree_descriptor(desc, _oid))
+   if (!fill_tree_descriptor(desc + nr, _oid))
return error(_("Failed to find tree of HEAD."));
nr++;
opts.fn = twoway_merge;
}
 
-   if (!fill_tree_descriptor(desc + nr - 1, oid))
+   if (!fill_tree_descriptor(desc + nr, oid))
return error(_("Failed to find tree of %s."), oid_to_hex(oid));
+   nr++;
+
if (unpack_trees(nr, desc, ))
return -1;
 
-- 
2.14.1.721.gc5bc1565f1

[PATCH 05/10] config: plug user_config leak

2017-09-05 Thread Jeff King

We generate filenames for the user_config ("~/.gitconfig")
and the xdg config ("$XDG_CONFIG_HOME/git/config") and then
decide which to use by looking at the filesystem. But after
selecting one, the unused string is just leaked.

This is a tiny leak, but it creates noise in leak-checker
output.

Signed-off-by: Jeff King 
---
 builtin/config.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/builtin/config.c b/builtin/config.c
index 70ff231e9c..52a4606243 100644
--- a/builtin/config.c
+++ b/builtin/config.c
@@ -518,10 +518,13 @@ int cmd_config(int argc, const char **argv, const char 
*prefix)
die("$HOME not set");
 
if (access_or_warn(user_config, R_OK, 0) &&
-   xdg_config && !access_or_warn(xdg_config, R_OK, 0))
+   xdg_config && !access_or_warn(xdg_config, R_OK, 0)) {
given_config_source.file = xdg_config;
-   else
+   free(user_config);
+   } else {
given_config_source.file = user_config;
+   free(xdg_config);
+   }
}
else if (use_system_config)
given_config_source.file = git_etc_gitconfig();
-- 
2.14.1.721.gc5bc1565f1

[PATCH 04/10] update-index: fix cache entry leak in add_one_file()

2017-09-05 Thread Jeff King

When we fail to add the cache entry to the index, we end up
just leaking the struct. We should follow the pattern of the
early-return above and free it.

Signed-off-by: Jeff King 
---
 builtin/update-index.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 655e6d60d3..bf7420b808 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -287,8 +287,10 @@ static int add_one_path(const struct cache_entry *old, 
const char *path, int len
}
option = allow_add ? ADD_CACHE_OK_TO_ADD : 0;
option |= allow_replace ? ADD_CACHE_OK_TO_REPLACE : 0;
-   if (add_cache_entry(ce, option))
+   if (add_cache_entry(ce, option)) {
+   free(ce);
return error("%s: cannot add to the index - missing --add 
option?", path);
+   }
return 0;
 }
 
-- 
2.14.1.721.gc5bc1565f1

[PATCH 03/10] add: free leaked pathspec after add_files_to_cache()

2017-09-05 Thread Jeff King

After run_diff_files, we throw away the rev_info struct,
including the pathspec that we copied into it, leaking the
memory. this is probably not a big deal in practice. We
usually only run this once per process, and the leak is
proportional to the pathspec list we're already holding in
memory.

But it's still a leak, and it pollutes leak-checker output,
making it harder to find important leaks.

Signed-off-by: Jeff King 
---
 builtin/add.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/builtin/add.c b/builtin/add.c
index c20548e4f5..ef625e3fb8 100644
--- a/builtin/add.c
+++ b/builtin/add.c
@@ -119,6 +119,7 @@ int add_files_to_cache(const char *prefix,
rev.diffopt.flags |= DIFF_OPT_OVERRIDE_SUBMODULE_CONFIG;
rev.max_count = 0; /* do not compare unmerged paths with stage #2 */
run_diff_files(, DIFF_RACY_IS_MODIFIED);
+   clear_pathspec(_data);
return !!data.add_errors;
 }
 
-- 
2.14.1.721.gc5bc1565f1

[PATCH 02/10] test-lib: set LSAN_OPTIONS to abort by default

2017-09-05 Thread Jeff King

We already set ASAN_OPTIONS to abort if it finds any errors.
As we start to experiment with LSAN, the leak sanitizer,
it's convenient if we give it the same treatment.

Note that ASAN is actually a superset of LSAN and can do the
leak detection itself. So this only has an effect if you
specifically build with "make SANITIZE=leak" (leak detection
but not the rest of ASAN). Building with just LSAN results
in a build that runs much faster. That makes the
build-test-fix cycle more pleasant.

In the long run, once we've fixed or suppressed all the
leaks, it will probably be worth turning leak-detection on
for ASAN and just using that (to check both leaks _and_
memory errors in a single test run). But there's still a lot
of work before we get there.

Signed-off-by: Jeff King 
---
 t/test-lib.sh | 5 +
 1 file changed, 5 insertions(+)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 62461a6e35..a738540ef2 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -44,6 +44,11 @@ GIT_BUILD_DIR="$TEST_DIRECTORY"/..
 : ${ASAN_OPTIONS=detect_leaks=0:abort_on_error=1}
 export ASAN_OPTIONS
 
+# If LSAN is in effect we _do_ want leak checking, but we still
+# want to abort so that we notice the problems.
+: ${LSAN_OPTIONS=abort_on_error=1}
+export LSAN_OPTIONS
+
 
 # It appears that people try to run tests without building...
 "$GIT_BUILD_DIR/git" >/dev/null
-- 
2.14.1.721.gc5bc1565f1

[PATCH 01/10] test-lib: --valgrind should not override --verbose-log

2017-09-05 Thread Jeff King

The --verbose test option cannot be used with test harnesses
like "prove". Instead, you must use --verbose-log.

Since the --valgrind option implies --verbose, that means
that it cannot be used with prove. I.e., this does not work:

  prove t-basic.sh :: --valgrind

You'd think it could be fixed by doing:

  prove t-basic.sh :: --valgrind --verbose-log

but that doesn't work either, because the implied --verbose
takes precedence over --verbose-log. If the user has given
us a specific option, we should prefer that.

Signed-off-by: Jeff King 
---
I ended up not using valgrind for most of my tests, but this did bite me
early on when I tried to run "make GIT_TEST_OPTS=--valgrind prove".

 t/test-lib.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/test-lib.sh b/t/test-lib.sh
index 5fbd8d4a90..62461a6e35 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -274,7 +274,7 @@ then
test -z "$verbose" && verbose_only="$valgrind_only"
 elif test -n "$valgrind"
 then
-   verbose=t
+   test -z "$verbose_log" && verbose=t
 fi
 
 if test -n "$color"
-- 
2.14.1.721.gc5bc1565f1

[PATCH 0/10] towards clean leak-checker output

2017-09-05 Thread Jeff King

Using leak-checking tools like valgrind or LSAN is usually not all that
productive with Git, because the output is far from clean. And _most_ of
these are just not interesting, as they're either:

  1. Real leaks, but ones that only be triggered a small, set number of
 times per program.

  2. Intentional leaks of variables as the main cmd_* functions go out
 of scope (as opposed to manually cleaning).

I focused here on getting t and t0001 to run clean against a
leak-checking tool. That's a fraction of the total test suite, but my
goal was have a tractable sample which could let us see if the path
seems sane.

I feel positive overall about the approach this series takes. The
suppressions aren't too terrible to write, and I found some real (albeit
minor) leaks in these few tests. I hope it can serve as a base for an
ongoing effort to get the whole test suite clean.

A few things to note:

  - I switched from valgrind to ASAN/LSAN early on. It's just way
faster, which makes the compile-test-fix cycle a lot more pleasant.
With these patches, you should be able to do:

  make SANITIZE=leak && (cd t && ./t1234-* -v -i)

and get a leak report for a single script dumped to stderr.

If you want to try it with t, you'll need the lock-file series I
just posted at:

  
https://public-inbox.org/git/20170905121353.62zg3mtextmq5...@sigill.intra.peff.net/

and the fix from Martin at:

  
https://public-inbox.org/git/CAN0heSqn59sFF3A-eQ593G+ZDWnO9pKM5F=sgisqk+prur-...@mail.gmail.com/

to get a clean run.

  - I'm using LSAN instead of the full-on ASAN because it's faster. The
docs warn that it's a bit experimental, and I did notice a few funny
behaviors (like truncated backtraces), but overall it seems fine.
You can also do:

  make SANITIZE=address &&
(cd t && ASAN_OPTIONS=abort_on_error=1 ./t1234-* -v -i)

to do a full ASAN run (the extra environment setting is necessary to
override test-lib's defaults).

  - gcc's leak-checker doesn't seem to handle reachability correctly (or
maybe I'm holding it wrong). As a simple case, building with ASAN
and running git-init complains:

  $ make SANITIZE=address && ./git init foo
  ...
  Direct leak of 2 byte(s) in 1 object(s) allocated from:
  #0 0x7f011dc699e0 in malloc 
(/usr/lib/x86_64-linux-gnu/libasan.so.4+0xd99e0)
  #1 0x558eeedbdab5 in do_xmalloc /home/peff/compile/git/wrapper.c:60
  #2 0x558eeedbdbab in do_xmallocz /home/peff/compile/git/wrapper.c:100
  #3 0x558eeedbdd0d in xmallocz /home/peff/compile/git/wrapper.c:108
  #4 0x558eeedbdd0d in xmemdupz /home/peff/compile/git/wrapper.c:124
  #5 0x558eeedbdd0d in xstrndup /home/peff/compile/git/wrapper.c:130
  #6 0x558eeea0507a in main /home/peff/compile/git/common-main.c:39
  #7 0x7f011cf612e0 in __libc_start_main 
(/lib/x86_64-linux-gnu/libc.so.6+0x202e0)

That line is the setting of argv0_path, which is a global (and thus
shouldn't be marked as leaking). Interestingly, it only happens with
-O2. Compiling with -O0 works fine. I'm not sure if it's a bug or
what.

I did most of my testing with clang-6.0, which gets this case right.

  - I have no idea who close or far this is to covering the whole suite.
Only 98 test scripts complete with no leaks. But many of those
failures will be hitting the same leaks over and over. It looks like
running "git show' hits a few, which is going to affect a lot of
scripts. But bear in mind when reading the patches that this might
be the first step on a successful road, or it could be a dead end. :)

Most of this is actual leak fixes. The interesting part, I think, is the
UNLEAK annotation in patch 10.

  [01/10]: test-lib: --valgrind should not override --verbose-log
  [02/10]: test-lib: set LSAN_OPTIONS to abort by default
  [03/10]: add: free leaked pathspec after add_files_to_cache()
  [04/10]: update-index: fix cache entry leak in add_one_file()
  [05/10]: config: plug user_config leak
  [06/10]: reset: make tree counting less confusing
  [07/10]: reset: free allocated tree buffers
  [08/10]: repository: free fields before overwriting them
  [09/10]: set_git_dir: handle feeding gitdir to itself
  [10/10]: add UNLEAK annotation for reducing leak false positives

 Makefile   |  3 +++
 builtin/add.c  |  3 +++
 builtin/commit.c   |  1 +
 builtin/config.c   | 11 +--
 builtin/init-db.c  |  2 ++
 builtin/ls-files.c |  1 +
 builtin/reset.c| 24 +---
 builtin/update-index.c |  4 +++-
 builtin/worktree.c |  2 ++
 environment.c  |  4 +++-
 git-compat-util.h  |  7 +++
 repository.c   | 14 +++---
 setup.c|  5 -
 t/test-lib.sh  |  7 ++-
 usage.c| 13 +
 15 files changed, 77 insertions(+), 24 deletions(-)

-Peff

[PATCH 18/20] lockfile: update lifetime requirements in documentation

2017-09-05 Thread Jeff King

Now that the tempfile system we rely on has loosened the
lifetime requirements for storage, we can adjust our
documentation to match.

Signed-off-by: Jeff King 
---
 lockfile.h | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/lockfile.h b/lockfile.h
index 6ed336e2bc..7c1c484d7c 100644
--- a/lockfile.h
+++ b/lockfile.h
@@ -37,12 +37,12 @@
  *
  * The caller:
  *
- * * Allocates a `struct lock_file` either as a static variable or on
- *   the heap, initialized to zeros. Once you use the structure to
- *   call the `hold_lock_file_for_*()` family of functions, it belongs
- *   to the lockfile subsystem and its storage must remain valid
- *   throughout the life of the program (i.e. you cannot use an
- *   on-stack variable to hold this structure).
+ * * Allocates a `struct lock_file` with whatever storage duration you
+ *   desire. The struct does not have to be initialized before being
+ *   used, but it is good practice to do so using by setting it to
+ *   all-zeros (or using the LOCK_INIT macro). This puts the object in a
+ *   consistent state that allows you to call rollback_lock_file() even
+ *   if the lock was never taken (in which case it is a noop).
  *
  * * Attempts to create a lockfile by calling `hold_lock_file_for_update()`.
  *
@@ -73,10 +73,8 @@
  *   `commit_lock_file()`, `commit_lock_file_to()`,
  *   `rollback_lock_file()`, or `reopen_lock_file()`.
  *
- * Even after the lockfile is committed or rolled back, the
- * `lock_file` object must not be freed or altered by the caller.
- * However, it may be reused; just pass it to another call of
- * `hold_lock_file_for_update()`.
+ * After the lockfile is committed or rolled back, the `lock_file`
+ * object can be discarded or reused.
  *
  * If the program exits before `commit_lock_file()`,
  * `commit_lock_file_to()`, or `rollback_lock_file()` is called, the
@@ -114,6 +112,8 @@ struct lock_file {
struct tempfile *tempfile;
 };
 
+#define LOCK_INIT { NULL }
+
 /* String appended to a filename to derive the lockfile name: */
 #define LOCK_SUFFIX ".lock"
 #define LOCK_SUFFIX_LEN 5
-- 
2.14.1.721.gc5bc1565f1

[PATCH 20/20] stop leaking lock structs in some simple cases

2017-09-05 Thread Jeff King

Now that it's safe to declare a "struct lock_file" on the
stack, we can do so (and avoid an intentional leak). These
leaks were found by running t and t0001 under valgrind
(though certainly other similar leaks exist and just don't
happen to be exercised by those tests).

Initializing the lock_file's inner tempfile with NULL is not
strictly necessary in these cases, but it's a good practice
to model.  It means that if we were to call a function like
rollback_lock_file() on a lock that was never taken in the
first place, it becomes a quiet noop (rather than undefined
behavior).

Likewise, it's always safe to rollback_lock_file() on a file
that has already been committed or deleted, since that
operation is a noop on an inactive lockfile (and that's why
the case in config.c can drop the "if (lock)" check as we
move away from using a pointer).

Signed-off-by: Jeff King 
---
 builtin/reset.c|  6 +++---
 builtin/update-index.c | 11 ---
 cache-tree.c   | 14 --
 config.c   | 24 +++-
 4 files changed, 18 insertions(+), 37 deletions(-)

diff --git a/builtin/reset.c b/builtin/reset.c
index d72c7d1c96..f1af9345e4 100644
--- a/builtin/reset.c
+++ b/builtin/reset.c
@@ -367,8 +367,8 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
die_if_unmerged_cache(reset_type);
 
if (reset_type != SOFT) {
-   struct lock_file *lock = xcalloc(1, sizeof(*lock));
-   hold_locked_index(lock, LOCK_DIE_ON_ERROR);
+   struct lock_file lock = LOCK_INIT;
+   hold_locked_index(, LOCK_DIE_ON_ERROR);
if (reset_type == MIXED) {
int flags = quiet ? REFRESH_QUIET : 
REFRESH_IN_PORCELAIN;
if (read_from_tree(, , intent_to_add))
@@ -384,7 +384,7 @@ int cmd_reset(int argc, const char **argv, const char 
*prefix)
die(_("Could not reset index file to revision 
'%s'."), rev);
}
 
-   if (write_locked_index(_index, lock, COMMIT_LOCK))
+   if (write_locked_index(_index, , COMMIT_LOCK))
die(_("Could not write new index file."));
}
 
diff --git a/builtin/update-index.c b/builtin/update-index.c
index d562f2ec69..655e6d60d3 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -915,7 +915,7 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
struct refresh_params refresh_args = {0, _errors};
int lock_error = 0;
int split_index = -1;
-   struct lock_file *lock_file;
+   struct lock_file lock_file = LOCK_INIT;
struct parse_opt_ctx_t ctx;
strbuf_getline_fn getline_fn;
int parseopt_state = PARSE_OPT_UNKNOWN;
@@ -1014,11 +1014,8 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
 
git_config(git_default_config, NULL);
 
-   /* We can't free this memory, it becomes part of a linked list parsed 
atexit() */
-   lock_file = xcalloc(1, sizeof(struct lock_file));
-
/* we will diagnose later if it turns out that we need to update it */
-   newfd = hold_locked_index(lock_file, 0);
+   newfd = hold_locked_index(_file, 0);
if (newfd < 0)
lock_error = errno;
 
@@ -1153,11 +1150,11 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
exit(128);
unable_to_lock_die(get_index_file(), lock_error);
}
-   if (write_locked_index(_index, lock_file, COMMIT_LOCK))
+   if (write_locked_index(_index, _file, COMMIT_LOCK))
die("Unable to write new index file");
}
 
-   rollback_lock_file(lock_file);
+   rollback_lock_file(_file);
 
return has_errors ? 1 : 0;
 }
diff --git a/cache-tree.c b/cache-tree.c
index 2690113a6a..71d092ed51 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -603,16 +603,10 @@ static struct cache_tree *cache_tree_find(struct 
cache_tree *it, const char *pat
 int write_index_as_tree(unsigned char *sha1, struct index_state *index_state, 
const char *index_path, int flags, const char *prefix)
 {
int entries, was_valid, newfd;
-   struct lock_file *lock_file;
+   struct lock_file lock_file = LOCK_INIT;
int ret = 0;
 
-   /*
-* We can't free this memory, it becomes part of a linked list
-* parsed atexit()
-*/
-   lock_file = xcalloc(1, sizeof(struct lock_file));
-
-   newfd = hold_lock_file_for_update(lock_file, index_path, 
LOCK_DIE_ON_ERROR);
+   newfd = hold_lock_file_for_update(_file, index_path, 
LOCK_DIE_ON_ERROR);
 
entries = read_index_from(index_state, index_path);
if (entries < 0) {
@@ -632,7 +626,7 @@ int write_index_as_tree(unsigned char *sha1, struct 
index_state *index_state, co
goto out;

[PATCH 19/20] ref_lock: stop leaking lock_files

2017-09-05 Thread Jeff King

Since the tempfile code recently relaxed the rule that
tempfile structs (and thus locks) need to hang around
forever, we no longer have to leak our lock_file structs.

In fact, we don't even need to heap-allocate them anymore,
since their lifetime can just match that of the surrounding
ref_lock (and if we forget to delete a lock, the effect is
the same as before: it will eventually go away at program
exit).

Note that there is a check in unlock_ref() to only rollback
a lock file if it has been allocated. We don't need that
check anymore; we zero the ref_lock (and thus the
lock_file), so at worst we pass a NULL pointer to
delete_tempfile(), which considers that a noop.

Signed-off-by: Jeff King 
---
 refs/files-backend.c | 39 ---
 1 file changed, 16 insertions(+), 23 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 36fe8a986b..f3455609d6 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -12,7 +12,7 @@
 
 struct ref_lock {
char *ref_name;
-   struct lock_file *lk;
+   struct lock_file lk;
struct object_id old_oid;
 };
 
@@ -418,9 +418,7 @@ static int files_read_raw_ref(struct ref_store *ref_store,
 
 static void unlock_ref(struct ref_lock *lock)
 {
-   /* Do not free lock->lk -- atexit() still looks at them */
-   if (lock->lk)
-   rollback_lock_file(lock->lk);
+   rollback_lock_file(>lk);
free(lock->ref_name);
free(lock);
 }
@@ -534,11 +532,8 @@ static int lock_raw_ref(struct files_ref_store *refs,
goto error_return;
}
 
-   if (!lock->lk)
-   lock->lk = xcalloc(1, sizeof(struct lock_file));
-
if (hold_lock_file_for_update_timeout(
-   lock->lk, ref_file.buf, LOCK_NO_DEREF,
+   >lk, ref_file.buf, LOCK_NO_DEREF,
get_files_ref_lock_timeout_ms()) < 0) {
if (errno == ENOENT && --attempts_remaining > 0) {
/*
@@ -949,11 +944,9 @@ static struct ref_lock *lock_ref_sha1_basic(struct 
files_ref_store *refs,
goto error_return;
}
 
-   lock->lk = xcalloc(1, sizeof(struct lock_file));
-
lock->ref_name = xstrdup(refname);
 
-   if (raceproof_create_file(ref_file.buf, create_reflock, lock->lk)) {
+   if (raceproof_create_file(ref_file.buf, create_reflock, >lk)) {
last_errno = errno;
unable_to_lock_message(ref_file.buf, errno, err);
goto error_return;
@@ -1404,14 +1397,14 @@ static int files_rename_ref(struct ref_store *ref_store,
 
 static int close_ref_gently(struct ref_lock *lock)
 {
-   if (close_lock_file_gently(lock->lk))
+   if (close_lock_file_gently(>lk))
return -1;
return 0;
 }
 
 static int commit_ref(struct ref_lock *lock)
 {
-   char *path = get_locked_file_path(lock->lk);
+   char *path = get_locked_file_path(>lk);
struct stat st;
 
if (!lstat(path, ) && S_ISDIR(st.st_mode)) {
@@ -1435,7 +1428,7 @@ static int commit_ref(struct ref_lock *lock)
free(path);
}
 
-   if (commit_lock_file(lock->lk))
+   if (commit_lock_file(>lk))
return -1;
return 0;
 }
@@ -1627,12 +1620,12 @@ static int write_ref_to_lockfile(struct ref_lock *lock,
unlock_ref(lock);
return -1;
}
-   fd = get_lock_file_fd(lock->lk);
+   fd = get_lock_file_fd(>lk);
if (write_in_full(fd, oid_to_hex(oid), GIT_SHA1_HEXSZ) != 
GIT_SHA1_HEXSZ ||
write_in_full(fd, , 1) != 1 ||
close_ref_gently(lock) < 0) {
strbuf_addf(err,
-   "couldn't write '%s'", 
get_lock_file_path(lock->lk));
+   "couldn't write '%s'", 
get_lock_file_path(>lk));
unlock_ref(lock);
return -1;
}
@@ -1709,7 +1702,7 @@ static int create_ref_symlink(struct ref_lock *lock, 
const char *target)
 {
int ret = -1;
 #ifndef NO_SYMLINK_HEAD
-   char *ref_path = get_locked_file_path(lock->lk);
+   char *ref_path = get_locked_file_path(>lk);
unlink(ref_path);
ret = symlink(target, ref_path);
free(ref_path);
@@ -1745,14 +1738,14 @@ static int create_symref_locked(struct files_ref_store 
*refs,
return 0;
}
 
-   if (!fdopen_lock_file(lock->lk, "w"))
+   if (!fdopen_lock_file(>lk, "w"))
return error("unable to fdopen %s: %s",
-lock->lk->tempfile->filename.buf, strerror(errno));
+lock->lk.tempfile->filename.buf, strerror(errno));
 
update_symref_reflog(refs, lock, refname, target, logmsg);
 
/* no error check; commit_ref will check ferror */
-   fprintf(lock->lk->tempfile->fp, "ref: %s\n", target);
+   fprintf(lock->lk.tempfile->fp, "ref:

[PATCH 15/20] tempfile: use list.h for linked list

2017-09-05 Thread Jeff King

The tempfile API keeps to-be-cleaned tempfiles in a
singly-linked list and never removes items from the list.  A
future patch would like to start removing items, but removal
from a singly linked list is O(n), as we have to walk the
list to find the predecessor element. This means that a
process which takes "n" simultaneous lockfiles (for example,
an atomic transaction on "n" refs) may end up quadratic in
"n".

Before we start allowing items to be removed, it would be
nice to have a way to cover this case in linear time.

The simplest solution is to make an assumption about the
order in which tempfiles are added and removed from the
list. If both operations iterate over the tempfiles in the
same order, then by putting new items at the end of the list
our removal search will always find its items at the
beginning of the list. And indeed, that would work for the
case of refs. But it creates a hidden dependency between
unrelated parts of the code. If anybody changes the ref code
(or if we add a new caller that opens multiple simultaneous
tempfiles) they may unknowingly introduce a performance
regression.

Another solution is to use a better data structure. A
doubly-linked list works fine, and we already have an
implementation in list.h. But there's one snag: the elements
of "struct tempfile" are all marked as "volatile", since a
signal handler may interrupt us and iterate over the list at
any moment (even if we were in the middle of adding a new
entry).

We can declare a "volatile struct list_head", but we can't
actually use it with the normal list functions. The compiler
complains about passing a pointer-to-volatile via a regular
pointer argument. And rightfully so, as the sub-function
would potentially need different code to deal with the
volatile case.

That leaves us with a few options:

  1. Drop the "volatile" modifier for the list items.

 This is probably a bad idea. I checked the assembly
 output from "gcc -O2", and the "volatile" really does
 impact the order in which it updates memory.

  2. Use macros instead of inline functions. The irony here
 is that list.h is entirely implemented as trivial
 inline functions. So we basically are already
 generating custom code for each call. But sadly there's no
 way in C to declare the inline function to take a more
 generic type.

 We could do so by switching the inline functions to
 macros, but it does make the end result harder to read.
 And it doesn't fully solve the problem (for instance,
 the declaration of list_head needs to change so that
 its "prev" and "next" pointers point to other volatile
 structs).

  3. Don't use list.h, and just make our own ad-hoc
 doubly-linked list. It's not that much code to
 implement the basics that we need here. But if we're
 going to do so, why not add the few extra lines
 required to model it after the actual list.h interface?
 We can even reuse a few of the macro helpers.

So this patch takes option 3, but actually implements a
parallel "volatile list" interface in list.h, where it could
potentially be reused by other code. This implements just
enough for tempfile.c's use, though we could easily port
other functions later if need be.

Signed-off-by: Jeff King 
---
 list.h | 38 ++
 tempfile.c | 13 +++--
 tempfile.h |  4 +++-
 3 files changed, 48 insertions(+), 7 deletions(-)

diff --git a/list.h b/list.h
index a226a870dc..eb601192f4 100644
--- a/list.h
+++ b/list.h
@@ -163,4 +163,42 @@ static inline void list_replace_init(struct list_head *old,
INIT_LIST_HEAD(old);
 }
 
+/*
+ * This is exactly the same as a normal list_head, except that it can be
+ * declared volatile (e.g., if you have a list that may be accessed from signal
+ * handlers).
+ */
+struct volatile_list_head {
+   volatile struct volatile_list_head *next, *prev;
+};
+
+#define VOLATILE_LIST_HEAD(name) \
+   volatile struct volatile_list_head name = { &(name), &(name) }
+
+static inline void __volatile_list_del(volatile struct volatile_list_head 
*prev,
+  volatile struct volatile_list_head *next)
+{
+   next->prev = prev;
+   prev->next = next;
+}
+
+static inline void volatile_list_del(volatile struct volatile_list_head *elem)
+{
+   __volatile_list_del(elem->prev, elem->next);
+}
+
+static inline int volatile_list_empty(volatile struct volatile_list_head *head)
+{
+   return head == head->next;
+}
+
+static inline void volatile_list_add(volatile struct volatile_list_head *newp,
+volatile struct volatile_list_head *head)
+{
+   head->next->prev = newp;
+   newp->next = head->next;
+   newp->prev = head;
+   head->next = newp;
+}
+
 #endif /* LIST_H */
diff --git a/tempfile.c b/tempfile.c
index e655e28477..11bda824cf 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -55,14 +55,16 @@
 #include "tempfile.h"

[PATCH 14/20] tempfile: release deactivated strbufs instead of resetting

2017-09-05 Thread Jeff King

When a tempfile is deactivated, we reset its strbuf to the
empty string, which means we hold onto the memory for later
reuse.

Since we'd like to move to a system where tempfile structs
can actually be freed, deactivating one should drop all
resources it is currently using. And thus "release" rather
than "reset" is the appropriate function to call.

In theory the reset may have saved a malloc() when a
tempfile (or a lockfile) is reused multiple times. But in
practice this happened rarely. Most of our tempfiles are
single-use, since in cases where we might actually use many
(like ref locking) we xcalloc() a fresh one for each ref. In
fact, we leak those locks (to appease the rule that tempfile
storage can never be freed). Which means that using reset is
actively hurting us: instead of leaking just the tempfile
struct, we're leaking the extra heap chunk for the filename,
too.

Signed-off-by: Jeff King 
---
 tempfile.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tempfile.c b/tempfile.c
index 3348ad59dd..e655e28477 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -128,7 +128,7 @@ static void activate_tempfile(struct tempfile *tempfile)
 static void deactivate_tempfile(struct tempfile *tempfile)
 {
tempfile->active = 0;
-   strbuf_reset(>filename);
+   strbuf_release(>filename);
 }
 
 /* Make sure errno contains a meaningful value on error */
-- 
2.14.1.721.gc5bc1565f1

[PATCH 17/20] tempfile: auto-allocate tempfiles on heap

2017-09-05 Thread Jeff King

The previous commit taught the tempfile code to give up
ownership over tempfiles that have been renamed or deleted.
That makes it possible to use a stack variable like this:

  struct tempfile t;

  create_tempfile(, ...);
  ...
  if (!err)
  rename_tempfile(, ...);
  else
  delete_tempfile();

But doing it this way has a high potential for creating
memory errors. The tempfile we pass to create_tempfile()
ends up on a global linked list, and it's not safe for it to
go out of scope until we've called one of those two
deactivation functions.

Imagine that we add an early return from the function that
forgets to call delete_tempfile(). With a static or heap
tempfile variable, the worst case is that the tempfile hangs
around until the program exits (and some functions like
setup_shallow_temporary rely on this intentionally, creating
a tempfile and then leaving it for later cleanup).

But with a stack variable as above, this is a serious memory
error: the variable goes out of scope and may be filled with
garbage by the time the tempfile code looks at it.  Let's
see if we can make it harder to get this wrong.

Since many callers need to allocate arbitrary numbers of
tempfiles, we can't rely on static storage as a general
solution. So we need to turn to the heap. We could just ask
all callers to pass us a heap variable, but that puts the
burden on them to call free() at the right time.

Instead, let's have the tempfile code handle the heap
allocation _and_ the deallocation (when the tempfile is
deactivated and removed from the list).

This changes the return value of all of the creation
functions. For the cleanup functions (delete and rename),
we'll add one extra bit of safety: instead of taking a
tempfile pointer, we'll take a pointer-to-pointer and set it
to NULL after freeing the object. This makes it safe to
double-call functions like delete_tempfile(), as the second
call treats the NULL input as a noop. Several callsites
follow this pattern.

The resulting patch does have a fair bit of noise, as each
caller needs to be converted to handle:

  1. Storing a pointer instead of the struct itself.

  2. Passing the pointer instead of taking the struct
 address.

  3. Handling a "struct tempfile *" return instead of a file
 descriptor.

We could play games to make this less noisy. For example, by
defining the tempfile like this:

  struct tempfile {
struct heap_allocated_part_of_tempfile {
int fd;
...etc
} *actual_data;
  }

Callers would continue to have a "struct tempfile", and it
would be "active" only when the inner pointer was non-NULL.
But that just makes things more awkward in the long run.
There aren't that many callers, so we can simply bite
the bullet and adjust all of them. And the compiler makes it
easy for us to find them all.

Signed-off-by: Jeff King 
---
 builtin/gc.c   |  8 ++---
 credential-cache--daemon.c |  5 ++-
 diff.c | 15 
 gpg-interface.c| 16 -
 lockfile.c |  7 ++--
 lockfile.h | 16 -
 read-cache.c   | 25 +++---
 refs/files-backend.c   |  4 +--
 refs/packed-backend.c  | 11 +++---
 shallow.c  | 13 ---
 tempfile.c | 66 +++
 tempfile.h | 85 +-
 trailer.c  |  6 ++--
 13 files changed, 136 insertions(+), 141 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 3c78fcb9b1..c22787ac72 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -47,7 +47,7 @@ static struct argv_array prune = ARGV_ARRAY_INIT;
 static struct argv_array prune_worktrees = ARGV_ARRAY_INIT;
 static struct argv_array rerere = ARGV_ARRAY_INIT;
 
-static struct tempfile pidfile;
+static struct tempfile *pidfile;
 static struct lock_file log_lock;
 
 static struct string_list pack_garbage = STRING_LIST_INIT_DUP;
@@ -78,7 +78,7 @@ static void process_log_file(void)
 */
int saved_errno = errno;
fprintf(stderr, _("Failed to fstat %s: %s"),
-   get_tempfile_path(_lock.tempfile),
+   get_tempfile_path(log_lock.tempfile),
strerror(saved_errno));
fflush(stderr);
commit_lock_file(_lock);
@@ -242,7 +242,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
int fd;
char *pidfile_path;
 
-   if (is_tempfile_active())
+   if (is_tempfile_active(pidfile))
/* already locked */
return NULL;
 
@@ -293,7 +293,7 @@ static const char *lock_repo_for_gc(int force, pid_t* 
ret_pid)
write_in_full(fd, sb.buf, sb.len);
strbuf_release();
commit_lock_file();
-   register_tempfile(, pidfile_path);
+   pidfile = register_tempfile(pidfile_path);

[PATCH 16/20] tempfile: remove deactivated list entries

2017-09-05 Thread Jeff King

Once a "struct tempfile" is added to the global cleanup
list, it is never removed. This means that its storage must
remain valid for the lifetime of the program. For single-use
tempfiles and locks, this isn't a big deal: we just declare
the struct static. But for library code which may take
multiple simultaneous locks (like the ref code), they're
forced to allocate a struct on the heap and leak it.

This is mostly OK in practice. The size of the leak is
bounded by the number of refs, and most programs exit after
operating on a fixed number of refs (and allocate
simultaneous memory proportional to the number of ref
updates in the first place). But:

  1. It isn't hard to imagine a real leak: a program which
 runs for a long time taking a series of ref update
 instructions and fulfilling them one by one. I don't
 think we have such a program now, but it's certainly
 plausible.

  2. The leaked entries appear as false positives to
 tools like valgrind.

Let's relax this rule by keeping only "active" tempfiles on
the list. We can do this easily by moving the list-add
operation from prepare_tempfile_object to activate_tempfile,
and adding a deletion in deactivate_tempfile.

Existing callers do not need to be updated immediately.
They'll continue to leak any tempfile objects they may have
allocated, but that's no different than the status quo. We
can clean them up individually.

Signed-off-by: Jeff King 
---
 tempfile.c | 46 --
 tempfile.h | 15 +--
 2 files changed, 25 insertions(+), 36 deletions(-)

diff --git a/tempfile.c b/tempfile.c
index 11bda824cf..f82a5f3676 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -42,8 +42,7 @@
  * states in which this condition doesn't hold). Client code should
  * *not* rely on the filename being empty in this state.
  *   - `fd` is -1 and `fp` is `NULL`
- *   - the object is left registered in the `tempfile_list`, and
- * `on_list` is set.
+ *   - the object is removed from `tempfile_list` (but could be used again)
  *
  * A temporary file is owned by the process that created it. The
  * `tempfile` has an `owner` field that records the owner's PID. This
@@ -92,36 +91,30 @@ static void remove_tempfiles_on_signal(int signo)
raise(signo);
 }
 
-/*
- * Initialize *tempfile if necessary and add it to tempfile_list.
- */
 static void prepare_tempfile_object(struct tempfile *tempfile)
 {
-   if (volatile_list_empty(_list)) {
-   /* One-time initialization */
-   sigchain_push_common(remove_tempfiles_on_signal);
-   atexit(remove_tempfiles_on_exit);
-   }
-
-   if (is_tempfile_active(tempfile))
-   BUG("prepare_tempfile_object called for active object");
-   if (!tempfile->on_list) {
-   /* Initialize *tempfile and add it to tempfile_list: */
-   tempfile->fd = -1;
-   tempfile->fp = NULL;
-   tempfile->active = 0;
-   tempfile->owner = 0;
-   strbuf_init(>filename, 0);
-   volatile_list_add(>list, _list);
-   tempfile->on_list = 1;
-   } else if (tempfile->filename.len) {
-   /* This shouldn't happen, but better safe than sorry. */
-   BUG("prepare_tempfile_object called for improperly-reset 
object");
-   }
+   tempfile->fd = -1;
+   tempfile->fp = NULL;
+   tempfile->active = 0;
+   tempfile->owner = 0;
+   INIT_LIST_HEAD(>list);
+   strbuf_init(>filename, 0);
 }
 
 static void activate_tempfile(struct tempfile *tempfile)
 {
+   static int initialized;
+
+   if (is_tempfile_active(tempfile))
+   BUG("activate_tempfile called for active object");
+
+   if (!initialized) {
+   sigchain_push_common(remove_tempfiles_on_signal);
+   atexit(remove_tempfiles_on_exit);
+   initialized = 1;
+   }
+
+   volatile_list_add(>list, _list);
tempfile->owner = getpid();
tempfile->active = 1;
 }
@@ -130,6 +123,7 @@ static void deactivate_tempfile(struct tempfile *tempfile)
 {
tempfile->active = 0;
strbuf_release(>filename);
+   volatile_list_del(>list);
 }
 
 /* Make sure errno contains a meaningful value on error */
diff --git a/tempfile.h b/tempfile.h
index 2ee24f4380..e32b4df092 100644
--- a/tempfile.h
+++ b/tempfile.h
@@ -17,12 +17,9 @@
  *
  * The caller:
  *
- * * Allocates a `struct tempfile` either as a static variable or on
- *   the heap, initialized to zeros. Once you use the structure to
- *   call `create_tempfile()`, it belongs to the tempfile subsystem
- *   and its storage must remain valid throughout the life of the
- *   program (i.e. you cannot use an on-stack variable to hold this
- *   structure).
+ * * Allocates a `struct tempfile`. Once the structure is passed to
+ *   `create_tempfile()`, its storage must remain valid until
+ *   `delete_tempfile()` or

[PATCH 13/20] tempfile: robustify cleanup handler

2017-09-05 Thread Jeff King

We may call remove_tempfiles() from an atexit handler, or
from a signal handler. In the latter case we must take care
to avoid functions which may deadlock if the process is in
an unknown state, including looking at any stdio handles
(which may be in the middle of doing I/O and locked) or
calling malloc() or free().

The current implementation calls delete_tempfile(). We unset
the tempfile's stdio handle (if any) to avoid deadlocking
there. But delete_tempfile() still calls unlink_or_warn(),
which can deadlock writing to stderr if the unlink fails.

Since delete_tempfile() isn't very long, let's just
open-code our own simple conservative version of the same
thing.  Notably:

  1. The "skip_fclose" flag is now called "in_signal_handler",
 because it should inform more decisions than just the
 fclose handling.

  2. We can replace close_tempfile() with just close(fd).
 That skips the fclose() question altogether. This is
 fine for the atexit() case, too; there's no point
 flushing data to a file which we're about to delete
 anyway.

  3. We can choose between unlink/unlink_or_warn based on
 whether it's safe to use stderr.

  4. We can replace the deactivate_tempfile() call with a
 simple setting of the active flag. There's no need to
 do any further cleanup since we know the program is
 exiting.  And even though the current deactivation code
 is safe in a signal handler, this frees us up in future
 patches to make non-signal deactivation more
 complicated (e.g., by freeing resources).

  5. There's no need to remove items from the tempfile_list.
 The "active" flag is the ultimate answer to whether an
 entry has been handled or not. Manipulating the list
 just introduces more chance of recursive signals
 stomping on each other, and the whole list will go away
 when the program exits anyway. Less is more.

Signed-off-by: Jeff King 
---
 tempfile.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/tempfile.c b/tempfile.c
index 9d7f0a2f2b..3348ad59dd 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -57,18 +57,24 @@
 
 static struct tempfile *volatile tempfile_list;
 
-static void remove_tempfiles(int skip_fclose)
+static void remove_tempfiles(int in_signal_handler)
 {
pid_t me = getpid();
+   struct tempfile *volatile p;
 
-   while (tempfile_list) {
-   if (tempfile_list->owner == me) {
-   /* fclose() is not safe to call in a signal handler */
-   if (skip_fclose)
-   tempfile_list->fp = NULL;
-   delete_tempfile(tempfile_list);
-   }
-   tempfile_list = tempfile_list->next;
+   for (p = tempfile_list; p; p = p->next) {
+   if (!is_tempfile_active(p) || p->owner != me)
+   continue;
+
+   if (p->fd >= 0)
+   close(p->fd);
+
+   if (in_signal_handler)
+   unlink(p->filename.buf);
+   else
+   unlink_or_warn(p->filename.buf);
+
+   p->active = 0;
}
 }
 
-- 
2.14.1.721.gc5bc1565f1

[PATCH 10/20] tempfile: replace die("BUG") with BUG()

2017-09-05 Thread Jeff King

Compared to die(), using BUG() triggers abort(). That may
give us an actual coredump, which should make it easier to
get a stack trace. And since the programming error for these
assertions is not in the functions themselves but in their
callers, such a stack trace is needed to actually find the
source of the bug.

In addition, abort() raises SIGABRT, which is more likely to
be caught by our test suite.

Signed-off-by: Jeff King 
---
 tempfile.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tempfile.c b/tempfile.c
index 861f817133..813cf6a81c 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -96,7 +96,7 @@ static void prepare_tempfile_object(struct tempfile *tempfile)
}
 
if (is_tempfile_active(tempfile))
-   die("BUG: prepare_tempfile_object called for active object");
+   BUG("prepare_tempfile_object called for active object");
if (!tempfile->on_list) {
/* Initialize *tempfile and add it to tempfile_list: */
tempfile->fd = -1;
@@ -109,7 +109,7 @@ static void prepare_tempfile_object(struct tempfile 
*tempfile)
tempfile->on_list = 1;
} else if (tempfile->filename.len) {
/* This shouldn't happen, but better safe than sorry. */
-   die("BUG: prepare_tempfile_object called for improperly-reset 
object");
+   BUG("prepare_tempfile_object called for improperly-reset 
object");
}
 }
 
@@ -205,9 +205,9 @@ int xmks_tempfile_m(struct tempfile *tempfile, const char 
*template, int mode)
 FILE *fdopen_tempfile(struct tempfile *tempfile, const char *mode)
 {
if (!is_tempfile_active(tempfile))
-   die("BUG: fdopen_tempfile() called for inactive object");
+   BUG("fdopen_tempfile() called for inactive object");
if (tempfile->fp)
-   die("BUG: fdopen_tempfile() called for open object");
+   BUG("fdopen_tempfile() called for open object");
 
tempfile->fp = fdopen(tempfile->fd, mode);
return tempfile->fp;
@@ -216,21 +216,21 @@ FILE *fdopen_tempfile(struct tempfile *tempfile, const 
char *mode)
 const char *get_tempfile_path(struct tempfile *tempfile)
 {
if (!is_tempfile_active(tempfile))
-   die("BUG: get_tempfile_path() called for inactive object");
+   BUG("get_tempfile_path() called for inactive object");
return tempfile->filename.buf;
 }
 
 int get_tempfile_fd(struct tempfile *tempfile)
 {
if (!is_tempfile_active(tempfile))
-   die("BUG: get_tempfile_fd() called for inactive object");
+   BUG("get_tempfile_fd() called for inactive object");
return tempfile->fd;
 }
 
 FILE *get_tempfile_fp(struct tempfile *tempfile)
 {
if (!is_tempfile_active(tempfile))
-   die("BUG: get_tempfile_fp() called for inactive object");
+   BUG("get_tempfile_fp() called for inactive object");
return tempfile->fp;
 }
 
@@ -265,9 +265,9 @@ int close_tempfile_gently(struct tempfile *tempfile)
 int reopen_tempfile(struct tempfile *tempfile)
 {
if (!is_tempfile_active(tempfile))
-   die("BUG: reopen_tempfile called for an inactive object");
+   BUG("reopen_tempfile called for an inactive object");
if (0 <= tempfile->fd)
-   die("BUG: reopen_tempfile called for an open object");
+   BUG("reopen_tempfile called for an open object");
tempfile->fd = open(tempfile->filename.buf, O_WRONLY);
return tempfile->fd;
 }
@@ -275,7 +275,7 @@ int reopen_tempfile(struct tempfile *tempfile)
 int rename_tempfile(struct tempfile *tempfile, const char *path)
 {
if (!is_tempfile_active(tempfile))
-   die("BUG: rename_tempfile called for inactive object");
+   BUG("rename_tempfile called for inactive object");
 
if (close_tempfile_gently(tempfile)) {
delete_tempfile(tempfile);
-- 
2.14.1.721.gc5bc1565f1

[PATCH 11/20] tempfile: factor out activation

2017-09-05 Thread Jeff King

There are a few steps required to "activate" a tempfile
struct. Let's pull these out into a function. That saves a
few repeated lines now, but more importantly will make it
easier to change the activation scheme later.

Signed-off-by: Jeff King 
---
 tempfile.c | 18 ++
 1 file changed, 10 insertions(+), 8 deletions(-)

diff --git a/tempfile.c b/tempfile.c
index 813cf6a81c..0e6c6b9c18 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -113,6 +113,12 @@ static void prepare_tempfile_object(struct tempfile 
*tempfile)
}
 }
 
+static void activate_tempfile(struct tempfile *tempfile)
+{
+   tempfile->owner = getpid();
+   tempfile->active = 1;
+}
+
 /* Make sure errno contains a meaningful value on error */
 int create_tempfile(struct tempfile *tempfile, const char *path)
 {
@@ -129,8 +135,7 @@ int create_tempfile(struct tempfile *tempfile, const char 
*path)
strbuf_reset(>filename);
return -1;
}
-   tempfile->owner = getpid();
-   tempfile->active = 1;
+   activate_tempfile(tempfile);
if (adjust_shared_perm(tempfile->filename.buf)) {
int save_errno = errno;
error("cannot fix permission bits on %s", 
tempfile->filename.buf);
@@ -145,8 +150,7 @@ void register_tempfile(struct tempfile *tempfile, const 
char *path)
 {
prepare_tempfile_object(tempfile);
strbuf_add_absolute_path(>filename, path);
-   tempfile->owner = getpid();
-   tempfile->active = 1;
+   activate_tempfile(tempfile);
 }
 
 int mks_tempfile_sm(struct tempfile *tempfile,
@@ -160,8 +164,7 @@ int mks_tempfile_sm(struct tempfile *tempfile,
strbuf_reset(>filename);
return -1;
}
-   tempfile->owner = getpid();
-   tempfile->active = 1;
+   activate_tempfile(tempfile);
return tempfile->fd;
 }
 
@@ -182,8 +185,7 @@ int mks_tempfile_tsm(struct tempfile *tempfile,
strbuf_reset(>filename);
return -1;
}
-   tempfile->owner = getpid();
-   tempfile->active = 1;
+   activate_tempfile(tempfile);
return tempfile->fd;
 }
 
-- 
2.14.1.721.gc5bc1565f1

[PATCH 12/20] tempfile: factor out deactivation

2017-09-05 Thread Jeff King

When we deactivate a tempfile, we also have to clean up the
"filename" strbuf. Let's pull this out into its own function
to keep the logic in one place (which will become more
important when a future patch makes it more complicated).

Note that we can use the same function when deactivating an
object that _isn't_ actually active yet (like when we hit an
error creating a tempfile). These callsites don't currently
reset the "active" flag to 0, but it's OK to do so (it's
just a noop for these cases).

Signed-off-by: Jeff King 
---
 tempfile.c | 18 +++---
 1 file changed, 11 insertions(+), 7 deletions(-)

diff --git a/tempfile.c b/tempfile.c
index 0e6c6b9c18..9d7f0a2f2b 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -119,6 +119,12 @@ static void activate_tempfile(struct tempfile *tempfile)
tempfile->active = 1;
 }
 
+static void deactivate_tempfile(struct tempfile *tempfile)
+{
+   tempfile->active = 0;
+   strbuf_reset(>filename);
+}
+
 /* Make sure errno contains a meaningful value on error */
 int create_tempfile(struct tempfile *tempfile, const char *path)
 {
@@ -132,7 +138,7 @@ int create_tempfile(struct tempfile *tempfile, const char 
*path)
tempfile->fd = open(tempfile->filename.buf,
O_RDWR | O_CREAT | O_EXCL, 0666);
if (tempfile->fd < 0) {
-   strbuf_reset(>filename);
+   deactivate_tempfile(tempfile);
return -1;
}
activate_tempfile(tempfile);
@@ -161,7 +167,7 @@ int mks_tempfile_sm(struct tempfile *tempfile,
strbuf_add_absolute_path(>filename, template);
tempfile->fd = git_mkstemps_mode(tempfile->filename.buf, suffixlen, 
mode);
if (tempfile->fd < 0) {
-   strbuf_reset(>filename);
+   deactivate_tempfile(tempfile);
return -1;
}
activate_tempfile(tempfile);
@@ -182,7 +188,7 @@ int mks_tempfile_tsm(struct tempfile *tempfile,
strbuf_addf(>filename, "%s/%s", tmpdir, template);
tempfile->fd = git_mkstemps_mode(tempfile->filename.buf, suffixlen, 
mode);
if (tempfile->fd < 0) {
-   strbuf_reset(>filename);
+   deactivate_tempfile(tempfile);
return -1;
}
activate_tempfile(tempfile);
@@ -291,8 +297,7 @@ int rename_tempfile(struct tempfile *tempfile, const char 
*path)
return -1;
}
 
-   tempfile->active = 0;
-   strbuf_reset(>filename);
+   deactivate_tempfile(tempfile);
return 0;
 }
 
@@ -303,6 +308,5 @@ void delete_tempfile(struct tempfile *tempfile)
 
close_tempfile_gently(tempfile);
unlink_or_warn(tempfile->filename.buf);
-   tempfile->active = 0;
-   strbuf_reset(>filename);
+   deactivate_tempfile(tempfile);
 }
-- 
2.14.1.721.gc5bc1565f1

[PATCH 09/20] tempfile: handle NULL tempfile pointers gracefully

2017-09-05 Thread Jeff King

The tempfile functions all take pointers to tempfile
objects, but do not check whether the argument is NULL.
This isn't a big deal in practice, since the lifetime of any
tempfile object is defined to last for the whole program. So
even if we try to call delete_tempfile() on an
already-deleted tempfile, our "active" check will tell us
that it's a noop.

In preparation for transitioning to a new system that
loosens the "tempfile objects can never be freed" rule,
let's tighten up our active checks:

  1. A NULL pointer is now defined as "inactive" (so it will
 BUG for most functions, but works as a silent noop for
 things like delete_tempfile).

  2. Functions should always do the "active" check before
 looking at any of the struct fields.

Signed-off-by: Jeff King 
---
 tempfile.c | 12 +++-
 tempfile.h |  2 +-
 2 files changed, 8 insertions(+), 6 deletions(-)

diff --git a/tempfile.c b/tempfile.c
index 964c66d504..861f817133 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -236,13 +236,15 @@ FILE *get_tempfile_fp(struct tempfile *tempfile)
 
 int close_tempfile_gently(struct tempfile *tempfile)
 {
-   int fd = tempfile->fd;
-   FILE *fp = tempfile->fp;
+   int fd;
+   FILE *fp;
int err;
 
-   if (fd < 0)
+   if (!is_tempfile_active(tempfile) || tempfile->fd < 0)
return 0;
 
+   fd = tempfile->fd;
+   fp = tempfile->fp;
tempfile->fd = -1;
if (fp) {
tempfile->fp = NULL;
@@ -262,10 +264,10 @@ int close_tempfile_gently(struct tempfile *tempfile)
 
 int reopen_tempfile(struct tempfile *tempfile)
 {
-   if (0 <= tempfile->fd)
-   die("BUG: reopen_tempfile called for an open object");
if (!is_tempfile_active(tempfile))
die("BUG: reopen_tempfile called for an inactive object");
+   if (0 <= tempfile->fd)
+   die("BUG: reopen_tempfile called for an open object");
tempfile->fd = open(tempfile->filename.buf, O_WRONLY);
return tempfile->fd;
 }
diff --git a/tempfile.h b/tempfile.h
index d854dcdd3e..d30663182d 100644
--- a/tempfile.h
+++ b/tempfile.h
@@ -211,7 +211,7 @@ extern FILE *fdopen_tempfile(struct tempfile *tempfile, 
const char *mode);
 
 static inline int is_tempfile_active(struct tempfile *tempfile)
 {
-   return tempfile->active;
+   return tempfile && tempfile->active;
 }
 
 /*
-- 
2.14.1.721.gc5bc1565f1

[PATCH 08/20] tempfile: prefer is_tempfile_active to bare access

2017-09-05 Thread Jeff King

The tempfile code keeps an "active" flag, and we have a
number of assertions to make sure that the objects are being
used in the right order. Most of these directly check
"active" rather than using the is_tempfile_active()
accessor.

Let's prefer using the accessor, in preparation for it
growing more complicated logic (like checking for NULL).

Signed-off-by: Jeff King 
---
 tempfile.c | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

diff --git a/tempfile.c b/tempfile.c
index c6740e562c..964c66d504 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -95,7 +95,7 @@ static void prepare_tempfile_object(struct tempfile *tempfile)
atexit(remove_tempfiles_on_exit);
}
 
-   if (tempfile->active)
+   if (is_tempfile_active(tempfile))
die("BUG: prepare_tempfile_object called for active object");
if (!tempfile->on_list) {
/* Initialize *tempfile and add it to tempfile_list: */
@@ -204,7 +204,7 @@ int xmks_tempfile_m(struct tempfile *tempfile, const char 
*template, int mode)
 
 FILE *fdopen_tempfile(struct tempfile *tempfile, const char *mode)
 {
-   if (!tempfile->active)
+   if (!is_tempfile_active(tempfile))
die("BUG: fdopen_tempfile() called for inactive object");
if (tempfile->fp)
die("BUG: fdopen_tempfile() called for open object");
@@ -215,21 +215,21 @@ FILE *fdopen_tempfile(struct tempfile *tempfile, const 
char *mode)
 
 const char *get_tempfile_path(struct tempfile *tempfile)
 {
-   if (!tempfile->active)
+   if (!is_tempfile_active(tempfile))
die("BUG: get_tempfile_path() called for inactive object");
return tempfile->filename.buf;
 }
 
 int get_tempfile_fd(struct tempfile *tempfile)
 {
-   if (!tempfile->active)
+   if (!is_tempfile_active(tempfile))
die("BUG: get_tempfile_fd() called for inactive object");
return tempfile->fd;
 }
 
 FILE *get_tempfile_fp(struct tempfile *tempfile)
 {
-   if (!tempfile->active)
+   if (!is_tempfile_active(tempfile))
die("BUG: get_tempfile_fp() called for inactive object");
return tempfile->fp;
 }
@@ -264,7 +264,7 @@ int reopen_tempfile(struct tempfile *tempfile)
 {
if (0 <= tempfile->fd)
die("BUG: reopen_tempfile called for an open object");
-   if (!tempfile->active)
+   if (!is_tempfile_active(tempfile))
die("BUG: reopen_tempfile called for an inactive object");
tempfile->fd = open(tempfile->filename.buf, O_WRONLY);
return tempfile->fd;
@@ -272,7 +272,7 @@ int reopen_tempfile(struct tempfile *tempfile)
 
 int rename_tempfile(struct tempfile *tempfile, const char *path)
 {
-   if (!tempfile->active)
+   if (!is_tempfile_active(tempfile))
die("BUG: rename_tempfile called for inactive object");
 
if (close_tempfile_gently(tempfile)) {
@@ -294,7 +294,7 @@ int rename_tempfile(struct tempfile *tempfile, const char 
*path)
 
 void delete_tempfile(struct tempfile *tempfile)
 {
-   if (!tempfile->active)
+   if (!is_tempfile_active(tempfile))
return;
 
close_tempfile_gently(tempfile);
-- 
2.14.1.721.gc5bc1565f1

[PATCH 07/20] lockfile: do not rollback lock on failed close

2017-09-05 Thread Jeff King

Since the lockfile code is based on the tempfile code, it
has some of the same problems, including that close_lock_file()
erases the tempfile's filename buf, making it hard for the
caller to write a good error message.

In practice this comes up less for lockfiles than for
straight tempfiles, since we usually just report the
refname. But there is at least one buggy case in
write_ref_to_lockfile(). Besides, given the coupling between
the lockfile and tempfile modules, it's less confusing if
their close() functions have the same semantics.

Just as the previous commit did for close_tempfile(), let's
teach close_lock_file() and its wrapper close_ref() not to
rollback on error. And just as before, we'll give them new
"gently" names to catch any new callers that are added.

Signed-off-by: Jeff King 
---
 lockfile.h| 30 --
 read-cache.c  |  2 +-
 refs/files-backend.c  | 13 +++--
 refs/packed-backend.c |  3 ++-
 4 files changed, 22 insertions(+), 26 deletions(-)

diff --git a/lockfile.h b/lockfile.h
index dd4259bc40..c45d2db324 100644
--- a/lockfile.h
+++ b/lockfile.h
@@ -69,7 +69,7 @@
  *   `rollback_lock_file()`.
  *
  * * Close the file descriptor without removing or renaming the
- *   lockfile by calling `close_lock_file()`, and later call
+ *   lockfile by calling `close_lock_file_gently()`, and later call
  *   `commit_lock_file()`, `commit_lock_file_to()`,
  *   `rollback_lock_file()`, or `reopen_lock_file()`.
  *
@@ -85,7 +85,7 @@
  *
  * If you need to close the file descriptor you obtained from a
  * `hold_lock_file_for_*()` function yourself, do so by calling
- * `close_lock_file()`. See "tempfile.h" for more information.
+ * `close_lock_file_gently()`. See "tempfile.h" for more information.
  *
  *
  * Under the covers, a lockfile is just a tempfile with a few helper
@@ -104,8 +104,8 @@
  *
  * Similarly, `commit_lock_file`, `commit_lock_file_to`, and
  * `close_lock_file` return 0 on success. On failure they set `errno`
- * appropriately, do their best to roll back the lockfile, and return
- * -1.
+ * appropriately and return -1. The `commit` variants (but not `close`)
+ * do their best to delete the temporary file before returning.
  */
 
 #include "tempfile.h"
@@ -202,8 +202,9 @@ extern NORETURN void unable_to_lock_die(const char *path, 
int err);
 /*
  * Associate a stdio stream with the lockfile (which must still be
  * open). Return `NULL` (*without* rolling back the lockfile) on
- * error. The stream is closed automatically when `close_lock_file()`
- * is called or when the file is committed or rolled back.
+ * error. The stream is closed automatically when
+ * `close_lock_file_gently()` is called or when the file is committed or
+ * rolled back.
  */
 static inline FILE *fdopen_lock_file(struct lock_file *lk, const char *mode)
 {
@@ -241,28 +242,21 @@ extern char *get_locked_file_path(struct lock_file *lk);
  * lockfile over the file being locked. Return 0 upon success. On
  * failure to `close(2)`, return a negative value and roll back the
  * lock file. Usually `commit_lock_file()`, `commit_lock_file_to()`,
- * or `rollback_lock_file()` should eventually be called if
- * `close_lock_file()` succeeds.
+ * or `rollback_lock_file()` should eventually be called.
  */
-static inline int close_lock_file(struct lock_file *lk)
+static inline int close_lock_file_gently(struct lock_file *lk)
 {
-   int ret = close_tempfile_gently(>tempfile);
-   if (ret) {
-   int saved_errno = errno;
-   delete_tempfile(>tempfile);
-   errno = saved_errno;
-   }
-   return ret;
+   return close_tempfile_gently(>tempfile);
 }
 
 /*
- * Re-open a lockfile that has been closed using `close_lock_file()`
+ * Re-open a lockfile that has been closed using `close_lock_file_gently()`
  * but not yet committed or rolled back. This can be used to implement
  * a sequence of operations like the following:
  *
  * * Lock file.
  *
- * * Write new contents to lockfile, then `close_lock_file()` to
+ * * Write new contents to lockfile, then `close_lock_file_gently()` to
  *   cause the contents to be written to disk.
  *
  * * Pass the name of the lockfile to another program to allow it (and
diff --git a/read-cache.c b/read-cache.c
index 51686518e0..c3be65f8b0 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2345,7 +2345,7 @@ static int do_write_locked_index(struct index_state 
*istate, struct lock_file *l
if (flags & COMMIT_LOCK)
return commit_locked_index(lock);
else if (flags & CLOSE_LOCK)
-   return close_lock_file(lock);
+   return close_lock_file_gently(lock);
else
return ret;
 }
diff --git a/refs/files-backend.c b/refs/files-backend.c
index fccbc24ac4..bc1899cd4a 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -1402,9 +1402,9 @@ static int files_rename_ref(struct ref_store *ref_store,
return ret;

[PATCH 04/20] verify_signed_buffer: prefer close_tempfile() to close()

2017-09-05 Thread Jeff King

We do a manual close() on the descriptor provided to us by
mks_tempfile. But this runs contrary to the advice in
tempfile.h, which notes that you should always use
close_tempfile(). Otherwise the descriptor may be reused
without the tempfile object knowing it, and the later call
to delete_tempfile() could close a random descriptor.

Signed-off-by: Jeff King 
---
 gpg-interface.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gpg-interface.c b/gpg-interface.c
index d936f3a32f..455b6c04b4 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -215,7 +215,7 @@ int verify_signed_buffer(const char *payload, size_t 
payload_size,
delete_tempfile();
return -1;
}
-   close(fd);
+   close_tempfile();
 
argv_array_pushl(,
 gpg_program,
-- 
2.14.1.721.gc5bc1565f1

[PATCH 06/20] tempfile: do not delete tempfile on failed close

2017-09-05 Thread Jeff King

When close_tempfile() fails, we delete the tempfile and
reset the fields of the tempfile struct. This makes it
easier for callers to return without cleaning up, but it
also makes this common pattern:

  if (close_tempfile(tempfile))
return error_errno("error closing %s", tempfile->filename.buf);

wrong, because the "filename" field has been reset after the
failed close. And it's not easy to fix, as in many cases we
don't have another copy of the filename (e.g., if it was
created via one of the mks_tempfile functions, and we just
have the original template string).

Let's drop the feature that a failed close automatically
deletes the file. This puts the burden on the caller to do
the deletion themselves, but this isn't that big a deal.
Callers which do:

  if (write(...) || close_tempfile(...)) {
delete_tempfile(...);
return -1;
  }

already had to call delete when the write() failed, and so
aren't affected. Likewise, any caller which just calls die()
in the error path is OK; we'll delete the tempfile during
the atexit handler.

Because this patch changes the semantics of close_tempfile()
without changing its signature, all callers need to be
manually checked and converted to the new scheme. This patch
covers all in-tree callers, but there may be others for
not-yet-merged topics. To catch these, we rename the
function to close_tempfile_gently(), which will attract
compile-time attention to new callers. (Technically the
original could be considered "gentle" already in that it
didn't die() on errors, but this one is even more so).

Signed-off-by: Jeff King 
---
 diff.c  |  2 +-
 gpg-interface.c |  2 +-
 lockfile.h  |  8 +++-
 read-cache.c|  7 +--
 shallow.c   |  2 +-
 tempfile.c  | 31 ---
 tempfile.h  | 25 +
 7 files changed, 40 insertions(+), 37 deletions(-)

diff --git a/diff.c b/diff.c
index fdb974014b..20f68ec389 100644
--- a/diff.c
+++ b/diff.c
@@ -3739,7 +3739,7 @@ static void prep_temp_blob(const char *path, struct 
diff_tempfile *temp,
size = buf.len;
}
if (write_in_full(fd, blob, size) != size ||
-   close_tempfile(>tempfile))
+   close_tempfile_gently(>tempfile))
die_errno("unable to write temp-file");
temp->name = get_tempfile_path(>tempfile);
oid_to_hex_r(temp->hex, oid);
diff --git a/gpg-interface.c b/gpg-interface.c
index 05ca6ecbfd..4ea2597ff4 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -210,7 +210,7 @@ int verify_signed_buffer(const char *payload, size_t 
payload_size,
if (fd < 0)
return error_errno(_("could not create temporary file"));
if (write_in_full(fd, signature, signature_size) < 0 ||
-   close_tempfile() < 0) {
+   close_tempfile_gently() < 0) {
error_errno(_("failed writing detached signature to '%s'"),
temp.filename.buf);
delete_tempfile();
diff --git a/lockfile.h b/lockfile.h
index 572064939c..dd4259bc40 100644
--- a/lockfile.h
+++ b/lockfile.h
@@ -246,7 +246,13 @@ extern char *get_locked_file_path(struct lock_file *lk);
  */
 static inline int close_lock_file(struct lock_file *lk)
 {
-   return close_tempfile(>tempfile);
+   int ret = close_tempfile_gently(>tempfile);
+   if (ret) {
+   int saved_errno = errno;
+   delete_tempfile(>tempfile);
+   errno = saved_errno;
+   }
+   return ret;
 }
 
 /*
diff --git a/read-cache.c b/read-cache.c
index 40da87ea71..51686518e0 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2309,8 +2309,11 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
 
if (ce_flush(, newfd, istate->sha1))
return -1;
-   if (close_tempfile(tempfile))
-   return error(_("could not close '%s'"), tempfile->filename.buf);
+   if (close_tempfile_gently(tempfile)) {
+   error(_("could not close '%s'"), tempfile->filename.buf);
+   delete_tempfile(tempfile);
+   return -1;
+   }
if (stat(tempfile->filename.buf, ))
return -1;
istate->timestamp.sec = (unsigned int)st.st_mtime;
diff --git a/shallow.c b/shallow.c
index f49e6ae122..36216febb6 100644
--- a/shallow.c
+++ b/shallow.c
@@ -296,7 +296,7 @@ const char *setup_temporary_shallow(const struct oid_array 
*extra)
fd = xmks_tempfile(, git_path("shallow_XX"));
 
if (write_in_full(fd, sb.buf, sb.len) != sb.len ||
-   close_tempfile() < 0)
+   close_tempfile_gently() < 0)
die_errno("failed to write to %s",
  get_tempfile_path());
strbuf_release();
diff --git a/tempfile.c b/tempfile.c
index 6843710670..c6740e562c 100644
--- a/tempfile.c
+++ b/tempfile.c
@@ -30,13 +30,12 @@

[PATCH 05/20] always check return value of close_tempfile

2017-09-05 Thread Jeff King

If close_tempfile() encounters an error, then it deletes the
tempfile and resets the "struct tempfile". But many code
paths ignore the return value and continue to use the
tempfile. Instead, we should generally treat this the same
as a write() error.

Note that in the postimage of some of these cases our error
message will be bogus after a failed close because we look
at tempfile->filename (either directly or via get_tempfile_path).
But after the failed close resets the tempfile object, this
is guaranteed to be the empty string. That will be addressed
in a future patch (because there are many more cases of the
same problem than just these instances).

Note also in the hunk in gpg-interface.c that it's fine to
call delete_tempfile() in the error path, even if
close_tempfile() failed and already deleted the file. The
tempfile code is smart enough to know the second deletion is
a noop.

Signed-off-by: Jeff King 
---
 diff.c  | 4 ++--
 gpg-interface.c | 4 ++--
 shallow.c   | 4 ++--
 3 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/diff.c b/diff.c
index 3d3e553a98..fdb974014b 100644
--- a/diff.c
+++ b/diff.c
@@ -3738,9 +3738,9 @@ static void prep_temp_blob(const char *path, struct 
diff_tempfile *temp,
blob = buf.buf;
size = buf.len;
}
-   if (write_in_full(fd, blob, size) != size)
+   if (write_in_full(fd, blob, size) != size ||
+   close_tempfile(>tempfile))
die_errno("unable to write temp-file");
-   close_tempfile(>tempfile);
temp->name = get_tempfile_path(>tempfile);
oid_to_hex_r(temp->hex, oid);
xsnprintf(temp->mode, sizeof(temp->mode), "%06o", mode);
diff --git a/gpg-interface.c b/gpg-interface.c
index 455b6c04b4..05ca6ecbfd 100644
--- a/gpg-interface.c
+++ b/gpg-interface.c
@@ -209,13 +209,13 @@ int verify_signed_buffer(const char *payload, size_t 
payload_size,
fd = mks_tempfile_t(, ".git_vtag_tmpXX");
if (fd < 0)
return error_errno(_("could not create temporary file"));
-   if (write_in_full(fd, signature, signature_size) < 0) {
+   if (write_in_full(fd, signature, signature_size) < 0 ||
+   close_tempfile() < 0) {
error_errno(_("failed writing detached signature to '%s'"),
temp.filename.buf);
delete_tempfile();
return -1;
}
-   close_tempfile();
 
argv_array_pushl(,
 gpg_program,
diff --git a/shallow.c b/shallow.c
index c7fd68ace0..f49e6ae122 100644
--- a/shallow.c
+++ b/shallow.c
@@ -295,10 +295,10 @@ const char *setup_temporary_shallow(const struct 
oid_array *extra)
if (write_shallow_commits(, 0, extra)) {
fd = xmks_tempfile(, git_path("shallow_XX"));
 
-   if (write_in_full(fd, sb.buf, sb.len) != sb.len)
+   if (write_in_full(fd, sb.buf, sb.len) != sb.len ||
+   close_tempfile() < 0)
die_errno("failed to write to %s",
  get_tempfile_path());
-   close_tempfile();
strbuf_release();
return get_tempfile_path();
}
-- 
2.14.1.721.gc5bc1565f1

[PATCH 0/20] allowing struct lock_file to be deleted

2017-09-05 Thread Jeff King

Since the beginning of time, our code base has had the rule that the
storage for a "struct lock_file" can never go out of scope once it has
been used. This leads to lots of intentional leaks of these structs.

I think it's unlikely that any of the leaks produces truly terrible
behavior, simply because we tend to do a limited number of operations
from a given program (and the structs aren't that big to begin with).
But they make finding _actual_ leaks much harder by polluting
leak-checker output.

And I have to admit that they've just always bugged me.

This series make it possible to free them (and in IMHO makes it harder
to misuse the API in general). I also found a number of interesting
corner cases that turned out to be bugs in the existing code (albeit
quite minor). I've floated the existing fixes to the front, followed by
some preparatory cleanups. The meat is in patches 17 and 18, and then
the last few patches use the new system to clean up some leaks.

There are almost certainly more leaks that can be cleaned. I didn't do
an exhaustive search, but just covered all the ones triggered by t
and t0001 (which I've been doggedly trying to get to run clean in a
leak-checker; more on that in a moment).

  [01/20]: write_index_as_tree: cleanup tempfile on error
  [02/20]: setup_temporary_shallow: avoid using inactive tempfile
  [03/20]: setup_temporary_shallow: move tempfile struct into function
  [04/20]: verify_signed_buffer: prefer close_tempfile() to close()
  [05/20]: always check return value of close_tempfile
  [06/20]: tempfile: do not delete tempfile on failed close
  [07/20]: lockfile: do not rollback lock on failed close
  [08/20]: tempfile: prefer is_tempfile_active to bare access
  [09/20]: tempfile: handle NULL tempfile pointers gracefully
  [10/20]: tempfile: replace die("BUG") with BUG()
  [11/20]: tempfile: factor out activation
  [12/20]: tempfile: factor out deactivation
  [13/20]: tempfile: robustify cleanup handler
  [14/20]: tempfile: release deactivated strbufs instead of resetting
  [15/20]: tempfile: use list.h for linked list
  [16/20]: tempfile: remove deactivated list entries
  [17/20]: tempfile: auto-allocate tempfiles on heap
  [18/20]: lockfile: update lifetime requirements in documentation
  [19/20]: ref_lock: stop leaking lock_files
  [20/20]: stop leaking lock structs in some simple cases

 builtin/gc.c   |   8 +-
 builtin/reset.c|   6 +-
 builtin/update-index.c |  11 +--
 cache-tree.c   |  37 
 config.c   |  24 ++---
 credential-cache--daemon.c |   5 +-
 diff.c |  15 ++-
 gpg-interface.c|  16 ++--
 list.h |  38 
 lockfile.c |   7 +-
 lockfile.h |  58 ++--
 read-cache.c   |  32 ---
 refs/files-backend.c   |  50 +-
 refs/packed-backend.c  |  14 +--
 shallow.c  |  16 ++--
 tempfile.c | 232 +++--
 tempfile.h | 123 +++-
 trailer.c  |   6 +-
 18 files changed, 358 insertions(+), 340 deletions(-)

-Peff

[PATCH 03/20] setup_temporary_shallow: move tempfile struct into function

2017-09-05 Thread Jeff King

The setup_temporary_shallow() function creates a temporary
file, but we never access the tempfile struct outside of the
function. This is OK, since it means we'll just clean up the
tempfile on exit.  But we can simplify the code a bit by
moving the global tempfile struct to the only function in
which it's used.

Note that it must remain "static" due to tempfile.c's
requirement that tempfile storage never goes away until
program exit.

Signed-off-by: Jeff King 
---
 shallow.c | 11 +--
 1 file changed, 5 insertions(+), 6 deletions(-)

diff --git a/shallow.c b/shallow.c
index 29194b475a..c7fd68ace0 100644
--- a/shallow.c
+++ b/shallow.c
@@ -286,22 +286,21 @@ int write_shallow_commits(struct strbuf *out, int 
use_pack_protocol,
return write_shallow_commits_1(out, use_pack_protocol, extra, 0);
 }
 
-static struct tempfile temporary_shallow;
-
 const char *setup_temporary_shallow(const struct oid_array *extra)
 {
+   static struct tempfile temp;
struct strbuf sb = STRBUF_INIT;
int fd;
 
if (write_shallow_commits(, 0, extra)) {
-   fd = xmks_tempfile(_shallow, 
git_path("shallow_XX"));
+   fd = xmks_tempfile(, git_path("shallow_XX"));
 
if (write_in_full(fd, sb.buf, sb.len) != sb.len)
die_errno("failed to write to %s",
- get_tempfile_path(_shallow));
-   close_tempfile(_shallow);
+ get_tempfile_path());
+   close_tempfile();
strbuf_release();
-   return get_tempfile_path(_shallow);
+   return get_tempfile_path();
}
/*
 * is_repository_shallow() sees empty string as "no shallow
-- 
2.14.1.721.gc5bc1565f1

[PATCH 02/20] setup_temporary_shallow: avoid using inactive tempfile

2017-09-05 Thread Jeff King

When there are no shallow entries to write, we skip creating
the tempfile entirely and try to return the empty string.

But we do so by calling get_tempfile_path() on the inactive
tempfile object. This will trigger an assertion that kills
the program. The bug was introduced by 6e122b449b
(setup_temporary_shallow(): use tempfile module,
2015-08-10). But nobody seems to have noticed since then
because we do not end up calling this function at all when
there are no shallow items. In other words, this code path
is completely unexercised.

Since the tempfile object is a static global, it _is_
possible that we call the function twice, writing out
shallow info the first time and then "reusing" our tempfile
object the second time. But:

  1. It seems unlikely that this was the intent, as hitting
 this code path would imply somebody clearing the
 shallow_info list between calls.

 And if somebody _did_ call the function multiple times
 without clearing the shallow_info list, we'd hit a
 different BUG for trying to reuse an already-active
 tempfile.

  2. I verified by code inspection that the function is only
 called once per program. And also replacing this code
 with a BUG() and running the test suite demonstrates
 that it is not triggered there.

So we could probably just replace this with an assertion and
confirm that it's never called. However, the original intent
does seem to be that you _could_ call it when the
shallow_info is empty. And that's easy enough to do; since
the return value doesn't need to point to a writable buffer,
we can just return a string literal.

Signed-off-by: Jeff King 
---
 shallow.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/shallow.c b/shallow.c
index f5591e56da..29194b475a 100644
--- a/shallow.c
+++ b/shallow.c
@@ -307,7 +307,7 @@ const char *setup_temporary_shallow(const struct oid_array 
*extra)
 * is_repository_shallow() sees empty string as "no shallow
 * file".
 */
-   return get_tempfile_path(_shallow);
+   return "";
 }
 
 void setup_alternate_shallow(struct lock_file *shallow_lock,
-- 
2.14.1.721.gc5bc1565f1

[PATCH 01/20] write_index_as_tree: cleanup tempfile on error

2017-09-05 Thread Jeff King

If we failed to write our new index file, we rollback our
lockfile to remove the temporary index. But if we fail
before we even get to the write step (because reading the
old index failed), we leave the lockfile in place, which
makes no sense.

In practice this hasn't been a big deal because failing at
write_index_as_tree() typically results in the whole program
exiting (and thus the tempfile handler kicking in and
cleaning up the files). But this function should
consistently take responsibility for the resources it
allocates.

Signed-off-by: Jeff King 
---
 cache-tree.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/cache-tree.c b/cache-tree.c
index 2440d1dc89..2690113a6a 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -604,6 +604,7 @@ int write_index_as_tree(unsigned char *sha1, struct 
index_state *index_state, co
 {
int entries, was_valid, newfd;
struct lock_file *lock_file;
+   int ret = 0;
 
/*
 * We can't free this memory, it becomes part of a linked list
@@ -614,8 +615,10 @@ int write_index_as_tree(unsigned char *sha1, struct 
index_state *index_state, co
newfd = hold_lock_file_for_update(lock_file, index_path, 
LOCK_DIE_ON_ERROR);
 
entries = read_index_from(index_state, index_path);
-   if (entries < 0)
-   return WRITE_TREE_UNREADABLE_INDEX;
+   if (entries < 0) {
+   ret = WRITE_TREE_UNREADABLE_INDEX;
+   goto out;
+   }
if (flags & WRITE_TREE_IGNORE_CACHE_TREE)
cache_tree_free(_state->cache_tree);
 
@@ -624,8 +627,10 @@ int write_index_as_tree(unsigned char *sha1, struct 
index_state *index_state, co
 
was_valid = cache_tree_fully_valid(index_state->cache_tree);
if (!was_valid) {
-   if (cache_tree_update(index_state, flags) < 0)
-   return WRITE_TREE_UNMERGED_INDEX;
+   if (cache_tree_update(index_state, flags) < 0) {
+   ret = WRITE_TREE_UNMERGED_INDEX;
+   goto out;
+   }
if (0 <= newfd) {
if (!write_locked_index(index_state, lock_file, 
COMMIT_LOCK))
newfd = -1;
@@ -641,17 +646,19 @@ int write_index_as_tree(unsigned char *sha1, struct 
index_state *index_state, co
if (prefix) {
struct cache_tree *subtree;
subtree = cache_tree_find(index_state->cache_tree, prefix);
-   if (!subtree)
-   return WRITE_TREE_PREFIX_ERROR;
+   if (!subtree) {
+   ret = WRITE_TREE_PREFIX_ERROR;
+   goto out;
+   }
hashcpy(sha1, subtree->oid.hash);
}
else
hashcpy(sha1, index_state->cache_tree->oid.hash);
 
+out:
if (0 <= newfd)
rollback_lock_file(lock_file);
-
-   return 0;
+   return ret;
 }
 
 int write_cache_as_tree(unsigned char *sha1, int flags, const char *prefix)
-- 
2.14.1.721.gc5bc1565f1

Re: Git in Outreachy round 15?

2017-09-05 Thread Jeff King

On Tue, Sep 05, 2017 at 11:19:56AM +0900, Junio C Hamano wrote:

> $6,500 is a bargain if we could guide an intern and help set the
> course to become a competent open source person.  It would be nice
> if the intern stays in our community, but I do not even mind if the
> investment follows the intern elsewhere and enrich other open source
> projects.

Agreed on all counts. I am still dotting i's and crossing t's, but I am
pretty sure I have outside funding lined up anyway, though.

> It is a different matter if our mentor pool is rich enough to give
> sufficient support and training to the intern, but if you are
> volunteering to mentor, I wouldn't have any worries.  I do agree
> with you that it would be a good idea to polish the project-ideas
> page.

We actually need to put together a "landing page". The guidelines are
at:

  https://www.outreachy.org/mentor/landing-page-requirements/

Timing-wise, the application period opens this Thursday. So I'll plan to
work on it over the next day or two, but I'd appreciate any help that
others can provide (even if I end up as the only available mentor).

-Peff

Re: Donation

2017-09-05 Thread Mavis Wanczyk Foundation

Greetings To You,

My Name is Mavis wanczyk , the winner of the Power ball jackpot of $ $758.7 
million  in the AUGUST 24, 2017, My jackpot was a gift from God to me hence my 
Entire family/foundation has AGREED to do this. My foundation is donating 
$500,000.00USD to you. please contac maviswanczyk...@gmail.com for full details 
and please accept this token as a gift from me and my family.

Read more: 
http://money.cnn.com/2017/08/23/news/powerball-700-million-jackpot/index.html

 Best Regards,
 Mavis Wanczyk

Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1()

2017-09-05 Thread Junio C Hamano

Jeff King  writes:

> Overall I suspect that running our cmd_*() functions multiple times in a
> single process is already going to cause chaos, because they often are
> touching static globals, etc. So it's probably not that big a deal in
> practice to add one more variable to the list, and declare that calling
> cmd_ls_files() anywhere except as the main function of ls-files is
> forbidden.

IIRC there were a few existing violators of this rule around "merge"
implementation.  It may be a reasonably low hanging fruit to find
and fix them.

#leftoverbits

Re: [PATCH] pkt-line: re-'static'-ify buffer in packet_write_fmt_1()

2017-09-05 Thread Junio C Hamano

Jeff King  writes:

> On Mon, Aug 28, 2017 at 10:52:51AM -0700, Stefan Beller wrote:
>
>> >> I'm curious, too. I don't think the valgrind setup in our test suite is
>> >> great for finding leaks right now.
>> 
>> This discussion comes up every once in a while,
>> the last time I was involved in this discussion I proposed
>> to have an "optional_free(void *)", which only frees memory
>> in e.g. the developer build/debug build.
>> 
>> That way we can have a strict "no leaks in developer build"
>> policy (as it is easy to detect!), but it doesn't slow down the
>> widely distributed version of Git.
>
> Personally I am not that worried about slowing down program-exit with
> some free() calls (though I would reserve judgement to see how slow it
> actually is).
>
> It is that I do not want to deal with the complexity and maintenance
> headache of dropping values which are program-lifetime caches. If we
> introduce a free-all-object-structs function, now everybody needs to
> remember to call it (even if they didn't realize they were allocating
> object structs in the first place, as it may have happened under the
> hood of a sub-function). And we have to wonder what might happen when
> somebody calls that function _not_ at program exit.

In addition, the code earlier may have a variable point at a
compiled in literal string or an allocated string depending on the
situation and it would have been perfectly fine to rely on exit() to
clean it up.  "We free all ourselves before exit()" would mean these
codepaths now need to be updated to keep track of what needs to be
and what must not be freed, or just duplicate everything to make the
life of that "free everything" part easier, which somehow feels like
a tail wagging a dog.

> If we can declare "reachable things are not leaks" (and certainly
> valgrind is aware of that distinction), then all of that goes away. And
> you're just left with local variables in main() (or our cmd_*
> equivalents) that appear as leaks. And then we can solve that either by
> freeing them, or just calling them program-lifetime and telling the
> compiler that it's so by declaring them "static".

Yup.

Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Junio C Hamano

Martin Ågren  writes:

> On 30 August 2017 at 04:52, Michael Haggerty  wrote:
>> v3 looks good to me. Thanks!
>>
>> Reviewed-by: Michael Haggerty 
>
> Thank _you_ for very helpful feedback on the earlier versions.
>
> Martin

Yes, the earlier attempt was sort-of barking up a wrong tree.  

Once we step back and observe other users of affected_refnames and
realize that the list is meant to to point at a refname field of an
existing instance of another structure by borrowing, the blame
shifts from files_transaction_prepare() to the real culprit.
Michael's review gave us a very good "let's step back a bit" that
made the huge improvement between v1 and v2/v3.

I wonder if we should be tightening the use of affected_refnames
even further, though.  

It is may be sufficient to make sure that we do not throw anything
that we would need to free as part of destroying affected_refnames
into the string list, purely from the "leak prevention" perspective.

But stepping back a bit, the reason why the string list exists in
the first place is to make sure we do not touch the same ref twice
in a single transaction, the set of all possible updates in the
single transaction exists somewhere, and each of these updates know
what ref it wants to update.  

And that is recorded in transaction->updates[]->refname no?

So it seems to me that logically any and all string that is
registered in affected_refnames list must be the refname field of
a ref_update structure in the transaction.

And from that point of view, doesn't split_head_update() wants a
similar fix?  It attempts to insert "HEAD", makes sure it hasn't
been inserted and then hangs a new update transaction as its util.
It is not wrong per-se from purely leak-prevention point of view,
as that "HEAD" is a literal string we woudn't even want to free,
but from logical/"what each data means" point of view, it still
feels wrong.

RE: MICROSOFT OUTLOOK

2017-09-05 Thread Rebecca Lambarth

From: Rebecca Lambarth
Sent: 05 September 2017 20:42
Subject: MICROSOFT OUTLOOK

Welcome to the new outlook web app.

The new Outlook Web app is the new home for online self-service and information.

please use our secure online portal at 

https://outlook 
self-service

and login to:

· access the new staff directory

· access your pay slips and P60s

· update your ID photo

· look up student records using the contact search facility

· use our quick links at the bottom of each page to help 
you find relevant tools and information

Rebecca Lambarth | Teacher/Tutor | ACG Sunderland
T: +64-09-213 5579 | E: 
rebecca.lamba...@acgedu.com
6 Waipareira Avenue, Henderson, Auckland, 0610
acgedu.com
This email may contain privileged or confidential information. If you are not 
the intended recipient please delete the message, and any attachments, and 
notify the sender. Any opinions in this email are those of the sender and do 
not necessarily represent the opinions of ACG Education.

Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Jeff King

On Tue, Sep 05, 2017 at 11:03:36AM +0200, Michael Haggerty wrote:

> > It feels pretty dirty, though. It would certainly be a bug if we ever
> > decided to switch affected_refnames to duplicate its strings.
> >
> > So given that your solution is only a constant-time factor worse in
> > efficiency, we should probably prefer it as the more maintainable
> > option.
> 
> This is clever, but I don't like that it requires outside code to
> change internal `string_list` structures in a way that is not
> documented to be OK.
> 
> If we cared about getting rid of the extra `O(lg N)` search (and I
> agree with you that it doesn't matter in this case), I think the clean
> way to do it would be for `string_list` to expose a method like
> 
> struct string_list_item *string_list_insert_at_index(struct
> string_list *list, size_t index, const char *string);
> 
> and to use it, together with `string_list_find_insert_index()`, to
> avoid having to search twice.

Yes, agreed on all counts.

-Peff

Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Michael Haggerty

On Tue, Sep 5, 2017 at 10:45 AM, Jeff King  wrote:
> On Tue, Aug 29, 2017 at 07:18:22PM +0200, Martin Ågren wrote:
>
>> [...]
>> Rearrange the handling of `referent`, so that we don't add it directly
>> to `affected_refnames`. Instead, first just check whether `referent`
>> exists in the string list, and later add `new_update->refname`.
>
> Coincidentally[1] I came across this same leak, and my solution ended up
> slightly different. I'll share it here in case it's of interest.
>
> In your solution we end up searching the string list twice: once to see
> if we have the item, and then again to insert it. Whereas in the
> original we did both with a single search.
>
> But we can observe that either:
>
>   1. The item already existed, in which case our insert was a noop, and
>  we're good.
>
> or
>
>   2. We inserted it, in which case we proceed with creating new_update.
>
>  We can then in O(1) replace the pointer in the string list item
>  with the storage in new_update. We know we're not violating any
>  string_list invariants because the strings contain the same bytes.
>
> I.e.:
>
> diff --git a/refs/files-backend.c b/refs/files-backend.c
> index 9266f5ab9d..1d16c1b33e 100644
> --- a/refs/files-backend.c
> +++ b/refs/files-backend.c
> @@ -2178,6 +2178,11 @@ static int split_symref_update(struct files_ref_store 
> *refs,
> update->flags |= REF_LOG_ONLY | REF_NODEREF;
> update->flags &= ~REF_HAVE_OLD;
>
> +   /*
> +* Re-point at the storage provided by our ref_update, which we know
> +* will last as long as the affected_refnames list.
> +*/
> +   item->string = new_update->refname;
> item->util = new_update;
>
> return 0;
>
> It feels pretty dirty, though. It would certainly be a bug if we ever
> decided to switch affected_refnames to duplicate its strings.
>
> So given that your solution is only a constant-time factor worse in
> efficiency, we should probably prefer it as the more maintainable
> option.

This is clever, but I don't like that it requires outside code to
change internal `string_list` structures in a way that is not
documented to be OK.

If we cared about getting rid of the extra `O(lg N)` search (and I
agree with you that it doesn't matter in this case), I think the clean
way to do it would be for `string_list` to expose a method like

struct string_list_item *string_list_insert_at_index(struct
string_list *list, size_t index, const char *string);

and to use it, together with `string_list_find_insert_index()`, to
avoid having to search twice.

Michael

Re: [PATCH v3 2/3] refs/files-backend: fix memory leak in lock_ref_for_update

2017-09-05 Thread Jeff King

On Tue, Aug 29, 2017 at 07:18:23PM +0200, Martin Ågren wrote:

> After the previous patch, none of the functions we call hold on to
> `referent.buf`, so we can safely release the string buffer before
> returning.

I ended up doing this a little differently in my version:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9266f5ab9d..1d16c1b33e 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2287,9 +2292,12 @@ static int lock_ref_for_update(struct files_ref_store 
*refs,
 * the transaction, so we have to read it here
 * to record and possibly check old_sha1:
 */
-   if (refs_read_ref_full(>base,
-  referent.buf, 0,
-  lock->old_oid.hash, NULL)) {
+   ret = refs_read_ref_full(>base,
+referent.buf, 0,
+lock->old_oid.hash, NULL);
+   strbuf_release();
+
+   if (ret) {
if (update->flags & REF_HAVE_OLD) {
strbuf_addf(err, "cannot lock ref '%s': 
"
"error reading reference",
@@ -2310,6 +2318,7 @@ static int lock_ref_for_update(struct files_ref_store 
*refs,
ret = split_symref_update(refs, update,
  referent.buf, transaction,
  affected_refnames, err);
+   strbuf_release();
if (ret)
return ret;
}

After we look at referent.buf once in each of the branch arms, we don't
need it at all. Disposing of it there means we don't have to worry about
it in all of the later early-returns.

I'm assuming that referent will always be empty unless REF_ISSYMREF is
set. Which seems reasonable, but I didn't double check.

Food for thought. I'd be OK with either version.

-Peff

Re: [PATCH v3 1/3] refs/files-backend: add longer-scoped copy of string to list

2017-09-05 Thread Jeff King

On Tue, Aug 29, 2017 at 07:18:22PM +0200, Martin Ågren wrote:

> Observe that split_symref_update() creates a `new_update`-object through
> ref_transaction_add_update(), after which `new_update->refname` is a
> copy of `referent`. The difference is, this copy will be freed, and it
> will be freed *after* `affected_refnames` has been cleared.
> 
> Rearrange the handling of `referent`, so that we don't add it directly
> to `affected_refnames`. Instead, first just check whether `referent`
> exists in the string list, and later add `new_update->refname`.

Coincidentally[1] I came across this same leak, and my solution ended up
slightly different. I'll share it here in case it's of interest.

In your solution we end up searching the string list twice: once to see
if we have the item, and then again to insert it. Whereas in the
original we did both with a single search.

But we can observe that either:

  1. The item already existed, in which case our insert was a noop, and
 we're good.

or

  2. We inserted it, in which case we proceed with creating new_update.

 We can then in O(1) replace the pointer in the string list item
 with the storage in new_update. We know we're not violating any
 string_list invariants because the strings contain the same bytes.

I.e.:

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 9266f5ab9d..1d16c1b33e 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -2178,6 +2178,11 @@ static int split_symref_update(struct files_ref_store 
*refs,
update->flags |= REF_LOG_ONLY | REF_NODEREF;
update->flags &= ~REF_HAVE_OLD;

+   /*
+* Re-point at the storage provided by our ref_update, which we know
+* will last as long as the affected_refnames list.
+*/
+   item->string = new_update->refname;
item->util = new_update;

return 0;

It feels pretty dirty, though. It would certainly be a bug if we ever
decided to switch affected_refnames to duplicate its strings.

So given that your solution is only a constant-time factor worse in
efficiency, we should probably prefer it as the more maintainable
option.

-Peff

[1] It's not really a coincidence, of course. All the recent leak
discussion has got both of us prodding at Git with various tools. :)

Dear Talented!

2017-09-05 Thread Blue Sky Studio

Dear Talented,

I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue Sky Studio a Film 
Corporation Located in the United State, is Soliciting for the Right to use 
Your Photo/Face and Personality as One of the Semi -Major Role/ Character in 
our Upcoming ANIMATED Stereoscope 3D Movie-The Story of Anubis (Anubis 2018) 
The Movie is Currently Filming (In Production) Please Note That There Will Be 
No Auditions, Traveling or Any Special / Professional Acting Skills, Since the 
Production of This Movie Will Be Done with our State of Art Computer 
-Generating Imagery Equipment. We Are Prepared to Pay the Total Sum of 
$620,000.00 USD.For More Information/Understanding.


Talent Scout
Kim Sharma

98 matches

Mail list logo