Re: [PATCH v7 03/12] update-index: add a new --force-write-index option

2017-09-19 Thread Junio C Hamano
Ben Peart  writes:

> + OPT_SET_INT(0, "force-write-index", _write,
> + N_("write out the index even if is not flagged as 
> changed"), 1),

Hmph.  The only time this makes difference is when the code forgets
to mark active_cache_changed even when it actually made a change to
the index, no?  I do understand the wish to be able to observe what
_would_ be written if such a bug did not exist in order to debug the
other aspects of the change in this series, but at the same time I
fear that we may end up sweeping the problem under the rug by
running the tests with this option.

>   OPT_END()
>   };
>  
> @@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const 
> char *prefix)
>   die("BUG: bad untracked_cache value: %d", untracked_cache);
>   }
>  
> - if (active_cache_changed) {
> + if (active_cache_changed || force_write) {
>   if (newfd < 0) {
>   if (refresh_args.flags & REFRESH_QUIET)
>   exit(128);


Re: [PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list

2017-09-19 Thread Junio C Hamano
Jonathan Nieder  writes:

> D. Eliminate for_each_string_list_item and let callers just do
>
>   unsigned int i;
>   for (i = 0; i < list->nr; i++) {
>   struct string_list_item *item = list->items[i];
>   ...
>   }
>
>Having to declare item is unnecessarily verbose, decreasing the
>appeal of this option.  I think I like it anyway, but I wasn't able
>to convince coccinelle to do it.

When using the macro, item still needs to be declared outside by the
user, so it's not all that unpleasant, though.

> E. Use subtraction instead of addition:
>#define for_each_string_list_item(item, list) \
>   for (item = ...; \
>(item == list->items ? 0 : item - list->items) < nr; \
>item++)
>
>I expected the compiler to figure out that this is a long way of writing
>(item - list->items), but at least with gcc 4.8.4 -O2, no such
>luck.  This generates uglier assembly than the NULL check.

Yuck.  You cannot easily unsee such an ugliness X-<.

The patch and explanation above --- looked quite nicely written.
Will queue.

Thanks.

> diff --git a/string-list.h b/string-list.h
> index 29bfb7ae45..79ae567cbc 100644
> --- a/string-list.h
> +++ b/string-list.h
> @@ -32,8 +32,10 @@ void string_list_clear_func(struct string_list *list, 
> string_list_clear_func_t c
>  typedef int (*string_list_each_func_t)(struct string_list_item *, void *);
>  int for_each_string_list(struct string_list *list,
>string_list_each_func_t, void *cb_data);
> -#define for_each_string_list_item(item,list) \
> - for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
> +#define for_each_string_list_item(item,list)\
> + for (item = (list)->items;  \
> +  item && item < (list)->items + (list)->nr; \
> +  ++item)
>  
>  /*
>   * Apply want to each item in list, retaining only the ones for which


Re: Behaviour of 'git stash show' when a stash has untracked files

2017-09-19 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> Some time ago, I stashed a few changes along with untracked files. I
> almost forgot it until recently. Then I wanted to see what I change I
> had in the stash. So I did a 'git stash show '. It worked fine but
> didn't say anything about the untracked files in that stash. That made
> me wonder where the untracked files I added went. I then applied the
> stash to see that they were still there but weren't listed in show.
>
> I understand that they aren't listed because 'git stash show' is
> typically a "diff between the stashed state and its original parent" as
> the documentation says but shouldn't there be at least a message that
> the stash contains untracked files? Those untracked files are "part of
> the stash" and I see no way to get information about their presence
> currently.
>
> So, should this behaviour be changed?

Hmm, crickets tell us that nobody is all that interested in this, it
seems.  I do not think I'd be against a new feature that lets users
ask what untracked paths are in the stash (and even their contents),
but I do think it is a bad idea to change a vanilla "stash show" to
show that information in addition to what is currently shown.

Two things need to be designed carefuly.  One is the UI to _invoke_
the new feature, the other is the output from the new feature.

As to the invocation, an obvious pair of choices are:

 - "git stash show-untracked stash@{0} [ [--] ]"?
 - "git stash show --untracked stash@{0} [ [--] ]"?

I'd personally vote for the former, if only because the latter makes
the design more complicated.  For one thing, tying the feature to
"show" means the output _must_ be in the form of "diff" output in
order to be consistent with the normal output from the subcommand,
but a whole-file creation diff may not be the best way to show the
entire contents of an untracked file.  Also by adding it as an
option to the existing "show" command, it makes it debatable if the
output should show the contents of untracked files in addition to
the stashed changes of tracked paths, or in place of them.  Because
I suspect that viewing contents of the untracked files and the
changes to tracked paths may serve quite different purposes from the
point of view of expected use cases, I am leaning towards saying
that it is a bad idea to show contents of untracked paths in
addition to changes to tracked paths.  There probably are other
reasons why we should prefer the former, i.e. a separate subcommand
to "git stash", independent of the existing "git stash show".

Assuming that we choose to go with a separate command, the output
format from the command does not have to be in the form of patch, so
perhaps "git stash show-untracked --list" may be a way to list the
paths (and we may want -z to show the list NUL-terminated like
"ls-files" does)?  There may be other operations to help those who
want a way to learn about untracked paths in a stash.  But this is
not exactly my itch so I'll let people who do have the itch to work
on designing the details out.





[PATCH v2] for_each_string_list_item: avoid undefined behavior for empty list

2017-09-19 Thread Jonathan Nieder
From: Michael Haggerty 

If you pass a newly initialized or newly cleared `string_list` to
`for_each_string_list_item()`, then the latter does

for (
item = (list)->items; /* NULL */
item < (list)->items + (list)->nr; /* NULL + 0 */
++item)

Even though this probably works almost everywhere, it is undefined
behavior, and it could plausibly cause highly-optimizing compilers to
misbehave.  C99 section 6.5.6 paragraph 8 explains:

If both the pointer operand and the result point to elements
of the same array object, or one past the last element of the
array object, the evaluation shall not produce an overflow;
otherwise, the behavior is undefined.

and (6.3.2.3.3) a null pointer does not point to anything.

Guard the loop with a NULL check to make the intent crystal clear to
even the most pedantic compiler.  A suitably clever compiler could let
the NULL check only run in the first iteration, but regardless, this
overhead is likely to be dwarfed by the work to be done on each item.

This problem was noticed by Coverity.

[jn: using a NULL check instead of a placeholder empty list;
 fleshed out the commit message based on mailing list discussion]

Signed-off-by: Michael Haggerty 
Signed-off-by: Jonathan Nieder 
---
 string-list.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Junio C Hamano wrote:
> Jonathan Nieder  writes:

>> ...  But a quick test with gcc 4.8.4
>> -O2 finds that at least this compiler does not contain such an
>> optimization.  The overhead Michael Haggerty mentioned is real.
>
> Still, I have a feeling that users of string_list wouldn't care
> the overhead of single pointer NULL-ness check.
>
>  - apply.c collects conflicted paths and reports them with fprintf().
>
>  - builtin/clean.c uses the function to walk the list of paths to be
>removed, and either does a human interaction (for "-i" codepath)
>or goes to the filesystem to remove things.
>
>  - builtin/config.c uses it in get_urlmatch() in preparation for
>doing network-y things.
>
>  - builtin/describe.c walks the list of exclude and include patterns
>to run wildmatch on the candidate reference name to filter it out.
>
>  ...
>
> In all of these examples, what happens for each item in the loop
> seems to me far heavier than the overhead this macro adds.

Yes, agreed.  As a small tweak,

   #define for_each_string_list_item(item, list) \
for (item = ...; item && ...; ...)

produces nicer assembly than

   #define for_each_string_list_item(item, list) \
for (item = ...; list->items && ...; ...)

(By the way, the potential optimization I described isn't valid: we
know that when item == NULL and list->items == NULL, list->nr is
always zero, but the compiler has no way to know that.  So it can't
eliminate the NULL test.  For comparison, a suitably smart compiler
should be able to eliminate a 'list->nr != 0 &&' guard if 'list'
doesn't escape in the loop body.)

Recapping the other proposed fixes:

A. Make it an invariant of string_list that items is never NULL and
   update string_list_init et al to use an empty array.  This is
   pretty painless until you notice some other structs that embed
   string_list without using STRING_LIST_INIT.  Updating all those
   would be too painful.

B. #define for_each_string_list_item(item, list) \
if (list->items) \
for (item = ...; ...; ... )

   This breaks a caller like
if (foo)
for_each_string_list_item(item, list)
...
else
...

   making it a non-starter.

C. As Gábor suggested,
   #define for_each_string_list_item(item, list) \
if (!list->items) \
; /* nothing to do */ \
else \
for (item = ...; ...; ...)

   This handles the caller from (B) correctly.  But it produces
   compiler warnings for a caller like

if (foo)
for_each_string_list_item(item, list)
...

   There is only one instance of that construct in git today.  It
   looks nicer anyway with braces, so this approach would also be
   promising.

D. Eliminate for_each_string_list_item and let callers just do

unsigned int i;
for (i = 0; i < list->nr; i++) {
struct string_list_item *item = list->items[i];
...
}

   Having to declare item is unnecessarily verbose, decreasing the
   appeal of this option.  I think I like it anyway, but I wasn't able
   to convince coccinelle to do it.

E. Use subtraction instead of addition:
   #define for_each_string_list_item(item, list) \
for (item = ...; \
 (item == list->items ? 0 : item - list->items) < nr; \
 item++)

   I expected the compiler to figure out that this is a long way of writing
   (item - list->items), but at least with gcc 4.8.4 

Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread Junio C Hamano
Jonathan Nieder  writes:

> I'll send a patch with a commit message saying so to try to close out
> this discussion.

Thanks.  One less thing we have to worry about ;-)


Re: [PATCH v2 10/10] add UNLEAK annotation for reducing leak false positives

2017-09-19 Thread Junio C Hamano
Jeff King  writes:

> At any rate, I think Jonathan's point is that writing:
>
>   UNLEAK(foo)
>
> will silently pass in a normal build, and only much later will somebody
> run a leak-checking build and see the compile error.

Yeah, I think I understand that concern.

#if SOME_CONDITION
#define X(y) do_x(y)
#else
#define X(y) /* nothing */
#endif

is something we quite often use, and I just found it a bit unusual
that all of a sudden we start to be extra careful only with this
macro.


Re: [PATCH] Win32: simplify loading of DLL functions

2017-09-19 Thread Junio C Hamano
Jonathan Nieder  writes:

> ...
> strerror(ENOSYS) is "Function not implemented".  Cute.
>
> Reviewed-by: Jonathan Nieder 

Thanks, both.  Will queue.


Re: [RFC PATCH 3/5] branch: cleanup branch name validation

2017-09-19 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> -int validate_new_branchname(const char *name, struct strbuf *ref,
> - int force, int attr_only)
> +int validate_branch_update(const char *name, struct strbuf *ref,
> +int could_exist, int clobber_head)

"update" to me means something already exists and the caller is
asking this function if it is OK to update it, but is that what this
function is used for?  I do not find the original name too bad, but
if I were renaming it, I'd call it ok_to_create_branch(), with the
understanding that forcing a recreation of an existing branch falls
into the wider definition of "create".

Also I'd avoid "could", which can be taken as an optimization hint
(i.e. "you usually do not have to worry about this thing to already
exist, but I am telling you that for this one call that is not the
case and you need to be a bit more careful by spending extra cycles
to see if it is and deal with the situation accordingly if it indeed
is"), and use "ok" as part of the name for the parameter (or flip
the meaning of it and say "create_only" or something).


Re: [RFC PATCH 2/5] branch: document the usage of certain parameters

2017-09-19 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> Documentation for a certain function was incomplete as it didn't say
> what certain parameters were used for.
>
> So, document them for the sake of completeness and easy reference.

Thanks.

> @@ -15,6 +15,11 @@
>   *
>   *   - reflog creates a reflog for the branch
>   *
> + *   - if 'force' is true, clobber_head indicates whether the branch could be
> + * the current branch; else it has no effect

Everybody else in this list begins with what it is describing for
easy eyeballing.  Can you make this match that pattern?

Also, what does "could be" mean in that sentence?  Is the caller
telling the function "how, I do not exactly know if that is the
case, but the branch I am asking to create you might be the same
branch as what is currently checked out, so be extra careful"?

Or is the comment telling a potential caller that it can pass true
to signal that create_branch() is allowed to (re)"create" the branch
that is already checked out (hence it already exists)?

I think the confusing statement above arises because an assumption
is unstated there.  If the reader knows "Even with force, calling
create_branch() on the currently checked out branch is normally
forbidden", then the reader can guess your "could" mean the latter.

- clobber_head_ok allows the currently checked out (hence
  existing) branch to be overwritten; without force, it has
  no effect.

perhaps?  As the underlying helper calls it clobber_head_ok, and
that name is more clearly expresses that this is a permission than
the current name, I chose to add _ok to the above example, but if
you are to take the suggestion, you'd need to also update the names
in the declaration, too.

> + *
> + *   - quiet suppresses tracking information
> + *
>   *   - track causes the new branch to be configured to merge the remote 
> branch
>   * that start_name is a tracking branch for (if any).
>   */


Re: [RFC PATCH 1/5] builtin/checkout: avoid usage of '!!'

2017-09-19 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> There was a usage for which there's no compelling reason.So, replace
> such a usage as with something else that expresses the intent more
> clearly.

I actually think this is a good example of the exception-rule.  The
function wants to take true or false in "int", and the caller has a
pointer.  And !!ptr is a shorter and more established way than ptr
!= NULL to turn non-NULL ness into an int boolean, without having to
either repeat or introducing an otherwise unnecessary temporary.


Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread Junio C Hamano
Jonathan Nieder  writes:

> ...  But a quick test with gcc 4.8.4
> -O2 finds that at least this compiler does not contain such an
> optimization.  The overhead Michael Haggerty mentioned is real.

Still, I have a feeling that users of string_list wouldn't care 
the overhead of single pointer NULL-ness check.

 - apply.c collects conflicted paths and reports them with fprintf().

 - builtin/clean.c uses the function to walk the list of paths to be
   removed, and either does a human interaction (for "-i" codepath)
   or goes to the filesystem to remove things.

 - builtin/config.c uses it in get_urlmatch() in preparation for
   doing network-y things.

 - builtin/describe.c walks the list of exclude and include patterns
   to run wildmatch on the candidate reference name to filter it out.

 ...

In all of these examples, what happens for each item in the loop
seems to me far heavier than the overhead this macro adds.

So...




Re: [PATCH] t4014: strengthen search patterns

2017-09-19 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> The regex patterns for some failing test cases were a bit loose
> giving way for a few incorrect outputs being accepted as correct
> outputs.

If these were part of scripted Porcelain that needs to take any
end-user input, then having '.' that are meant to match only a dot
is a bug, but I personally do not think it is worth the patch noise
to quote them, when we _know_ (after all, we are in control of the
data we use for these tests) there is no other lines that would
match these patterns.



> To avoid such incorrect outputs from being flagged as correct ones
> use fixed string matches when possible and strengthen regex when
> it's not.
>
> Signed-off-by: Kaartic Sivaraam 
> ---
>  t/t4014-format-patch.sh | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)
>
> diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
> index 482112ca339f0..7dff7996c9e1f 100755
> --- a/t/t4014-format-patch.sh
> +++ b/t/t4014-format-patch.sh
> @@ -163,7 +163,7 @@ test_expect_failure 'additional command line cc (rfc822)' 
> '
>   git config --replace-all format.headers "Cc: R E Cipient 
> " &&
>   git format-patch --cc="S. E. Cipient " --stdout 
> master..side | sed -e "/^\$/q" >patch5 &&
>   grep "^Cc: R E Cipient ,\$" patch5 &&
> - grep "^ *\"S. E. Cipient\" \$" patch5
> + grep "^ *\"S\. E\. Cipient\" \$" patch5
>  '
>  
>  test_expect_success 'command line headers' '
> @@ -191,13 +191,13 @@ test_expect_success 'command line To: header (ascii)' '
>  test_expect_failure 'command line To: header (rfc822)' '
>  
>   git format-patch --to="R. E. Cipient " --stdout 
> master..side | sed -e "/^\$/q" >patch8 &&
> - grep "^To: \"R. E. Cipient\" \$" patch8
> + grep -F "To: \"R. E. Cipient\" " patch8
>  '
>  
>  test_expect_failure 'command line To: header (rfc2047)' '
>  
>   git format-patch --to="R Ä Cipient " --stdout 
> master..side | sed -e "/^\$/q" >patch8 &&
> - grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" 
> patch8
> + grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" 
> patch8
>  '
>  
>  test_expect_success 'configuration To: header (ascii)' '
> @@ -211,14 +211,14 @@ test_expect_failure 'configuration To: header (rfc822)' 
> '
>  
>   git config format.to "R. E. Cipient " &&
>   git format-patch --stdout master..side | sed -e "/^\$/q" >patch9 &&
> - grep "^To: \"R. E. Cipient\" \$" patch9
> + grep -F "To: \"R. E. Cipient\" " patch9
>  '
>  
>  test_expect_failure 'configuration To: header (rfc2047)' '
>  
>   git config format.to "R Ä Cipient " &&
>   git format-patch --stdout master..side | sed -e "/^\$/q" >patch9 &&
> - grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" 
> patch9
> + grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" 
> patch9
>  '
>  
>  # check_patch : Verify that  looks like a half-sane
>
> --
> https://github.com/git/git/pull/406


Re: [PATCH v2 0/2] fix read past end of array in alternates files

2017-09-19 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Sep 18, 2017 at 11:51:00AM -0400, Jeff King wrote:
>
>> This series fixes a regression in v2.11.1 where we might read past the
>> end of an mmap'd buffer. It was introduced in cf3c635210, but I didn't
>> base the patch on there, for a few reasons:
>
> Here's a v2 that fixes a minor leak in the first patch (I carefully
> remembered to free() the path buffer on the error path, but forgot to do
> it in the normal one. Oops).

Thanks.

> I also tried to address Jonathan's "should this be in the commit
> message" comment. The information above _is_ in there, but maybe putting
> it at the top as a sort of tl;dr makes it easier to find?

Probably.

> The second patch is unchanged.
>
> Junio, I see you ended up back-porting it to v2.11. Would you prefer me
> to have done it that way in the first place? I was trying to reduce your
> work by basing it on "maint" (figuring that we wouldn't bother making a
> v2.11.x release anyway, and that skips you having to apply the second
> patch separately after the merge).

Upon seeing that this dated back to 2.11, because I am lazy and do
not assess how much the fix needs to go to older tracks when I am
queuing (remember: my attention span during patch queueing is
measured in minutes, as people send changes to different areas), I
tend to first try to see what's the oldest maintenance track we can
practically apply the patch to.  It turned out that the conflict
resolution to apply on maint-2.11 wasn't that painful, so I took the
lazy route all the way---the real fix on the oldest, even though I
do not know (because I refused to think and decide due to laziness)
if a next v2.11.x release is necessary, followed by a nice-to-have
warning that uses newer features on the maintenance track.  That
way, when we decide that the fix won't be a big deal to require a
new v2.11.x, but it is nice to have in v2.13.x, we could merge the
first one, without having to cherry-pick.

All of the above is part of how the daily maintainer workflow goes,
and there is no strong preference on my side, if the original is on
the theoretically oldest (i.e. maint-2.11) or on the oldest
practical (i.e. maint), as long as the conflicts are not too
painful.



Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread Jonathan Nieder
Hi,

Kaartic Sivaraam wrote:
> On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote:
>> Jonathan Nieder wrote:

>>> Does the following alternate fix work?  I think I prefer it because
>>> it doesn't require introducing a new global. [...]
>>>   #define for_each_string_list_item(item,list) \
>>> -   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
>>> +   for (item = (list)->items; \
>>> +(list)->items && item < (list)->items + (list)->nr; \
>>> +++item)
>>
>> This is the possibility that I was referring to as "add[ing] overhead to
>> each iteration of the loop". I'd rather not add an extra test-and-branch
>> to every iteration of a loop in which `list->items` is *not* NULL, which
>> your solution appears to do. Or are compilers routinely able to optimize
>> the check out?
>
> I t seems at least 'gcc' is able to optimize this out even with a -O1
> and 'clang' optimizes this out with a -O2. Taking a sneak peek at
> the 'Makefile' shows that our default is -O2.
>
> For a proof, see https://godbolt.org/g/CPt73L

>From that link:

for ( ;valid_int && *valid_int < 10; (*valid_int)++) {
printf("Valid instance");
}

Both gcc and clang are able to optimize out the 'valid_int &&' because
it is dereferenced on the RHS of the &&.

For comparison, 'item < (list)->items + (list)->nr' does not
dereference (list)->items.  So that optimization doesn't apply here.

A smart compiler could be able to take advantage of there being no
object pointed to by a null pointer, which means

item < (list)->items + (list)->nr

is always false when (list)->items is NULL, which in turn makes a
'(list)->items &&' test redundant.  But a quick test with gcc 4.8.4
-O2 finds that at least this compiler does not contain such an
optimization.  The overhead Michael Haggerty mentioned is real.

Thanks and hope that helps,
Jonathan


Re: [PATCH v7 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-19 Thread Junio C Hamano
Ben Peart  writes:

> +/* do stat comparison even if CE_FSMONITOR_VALID is true */
> +#define CE_MATCH_IGNORE_FSMONITOR 0X20

Hmm, when should a programmer use this flag?

> +int git_config_get_fsmonitor(void)
> +{
> + if (git_config_get_pathname("core.fsmonitor", _fsmonitor))
> + core_fsmonitor = getenv("GIT_FSMONITOR_TEST");

Will the environment be part of the published API, or is it a
remnant of a useful tool for debugging while developing the feature?

If it is the former (and I'd say why not, even though "git -c
core.fsmontor=..." may be easy enough), we might want to rename it
to replace _TEST with _PROGRAM or something and document it.

> diff --git a/diff-lib.c b/diff-lib.c
> index 2a52b07954..23c6d03ca9 100644
> --- a/diff-lib.c
> +++ b/diff-lib.c
> @@ -12,6 +12,7 @@
>  #include "refs.h"
>  #include "submodule.h"
>  #include "dir.h"
> +#include "fsmonitor.h"
>  
>  /*
>   * diff-files
> @@ -228,6 +229,7 @@ int run_diff_files(struct rev_info *revs, unsigned int 
> option)
>  
>   if (!changed && !dirty_submodule) {
>   ce_mark_uptodate(ce);
> + mark_fsmonitor_valid(ce);

I notice all calls to mark_fsmonitor_valid(ce) always follow a call
to ce_mark_uptodate(ce).  Should the call to the former be made as
part of the latter instead?  Are there cases where we want to call
ce_mark_uptodate(ce) to mark the ce up-to-date, yet we do not want
to call mark_fsmonitor_valid(ce)?  How does a programmer tell when
s/he wants to call ce_mark_uptodate(ce) if s/he also should call
mark_fsmonitor_valid(ce)?

Together with "when to pass IGNORE_FSMONITOR" question, is there a
summary that guides new programmers to answer these questions in the
new documentation?

> diff --git a/dir.h b/dir.h
> index e3717055d1..fab8fc1561 100644
> --- a/dir.h
> +++ b/dir.h
> @@ -139,6 +139,8 @@ struct untracked_cache {
>   int gitignore_invalidated;
>   int dir_invalidated;
>   int dir_opened;
> + /* fsmonitor invalidation data */
> + unsigned int use_fsmonitor : 1;

This makes it look like we will add a bit more fields in later
patches that are about "invalidation" around fsmonitor, but it
appears that such an addition never happens til the end of the
series.  And use_fsmonitor boolean does not seem to be what the
comment says---it just tells us if fsmonitor is in use in the
operation of the untracked cache, no?

> diff --git a/entry.c b/entry.c
> index cb291aa88b..5e6794f9fc 100644
> --- a/entry.c
> +++ b/entry.c
> @@ -4,6 +4,7 @@
>  #include "streaming.h"
>  #include "submodule.h"
>  #include "progress.h"
> +#include "fsmonitor.h"
>  
>  static void create_directories(const char *path, int path_len,
>  const struct checkout *state)
> @@ -357,6 +358,7 @@ static int write_entry(struct cache_entry *ce,
>   lstat(ce->name, );
>   fill_stat_cache_info(ce, );
>   ce->ce_flags |= CE_UPDATE_IN_BASE;
> + mark_fsmonitor_invalid(state->istate, ce);
>   state->istate->cache_changed |= CE_ENTRY_CHANGED;

Similar to "how does mark_fsmonitor_valid() relate to
ce_mark_uptodate()?" question earlier, mark_fsmonitor_invalid()
seems to often appear in pairs with the update to cache_changed.
Are there cases where we need CE_ENTRY_CHANGED bit added to the
index state yet we do not want to call mark_fsmonitor_invalid()?  I
am wondering if there should be a single helper function that lets
callers say "I changed this ce in this istate this way.  Please
update CE_VALID, CE_UPDATE_IN_BASE and CE_FSMONITOR_VALID
accordingly".

That "changed _this way_" is deliberately vague in my question
above, because it is not yet clear to me when mark-valid and
mark-invalid should and should not be called from the series.

> + /* a fsmonitor process can return '*' to indicate all entries are 
> invalid */
> + if (query_success && query_result.buf[0] != '/') {
> + /* Mark all entries returned by the monitor as dirty */

The comment talks about '*' and code checks if it is not '/'?  The
else clause of this if/else handles the "invalidate all" case, so
shouldn't we be comparing with '*' instead here?

Ah, fsmonitor-watchman section of the doc tells us to write the hook
in a way to return slash, so the comment above the code is stale and
the comparison with '/' is what we want, I guess.


Re: [PATCH v2 10/10] add UNLEAK annotation for reducing leak false positives

2017-09-19 Thread Jeff King
On Wed, Sep 20, 2017 at 10:45:05AM +0900, Junio C Hamano wrote:

> >> > +#else
> >> > +#define UNLEAK(var)
> >> 
> >> I think this should be defined to be something (for example, "do {}
> >> while (0)"), at least so that we have compiler errors when UNLEAK(var)
> >> is used incorrectly (for example, without the semicolon) when
> >> SUPPRESS_ANNOTATED_LEAKS is not defined.
> >
> > Seems reasonable.
> 
> Hmph, I am not so sure about this one.  But I agree that the
> semicolon must go.

I thought we had run into some issues with a compiler or linter
complaining about null statements before. But they _are_ pretty common,
so maybe I'm mis-remembering (or maybe it was something like the if/else
conditional you showed earlier).

At any rate, I think Jonathan's point is that writing:

  UNLEAK(foo)

will silently pass in a normal build, and only much later will somebody
run a leak-checking build and see the compile error.

Of course people adding UNLEAK()s are likely to be compiling leak-check
builds to confirm that they've silenced the warning, so maybe it's not
that important a case to catch.

-Peff


[no subject]

2017-09-19 Thread mrb132
<>


Re: [PATCH v2 10/10] add UNLEAK annotation for reducing leak false positives

2017-09-19 Thread Junio C Hamano
Jeff King  writes:

> On Tue, Sep 19, 2017 at 01:45:52PM -0700, Jonathan Tan wrote:
>
>> The following comments are assuming that we're going to standardize on
>> UNLEAK(var); (with the semicolon).
>
> Yeah, I assumed we would. We don't have to, since this really is sort-of
> magical, but I think the code looks better with it.
>
>> On Fri, 8 Sep 2017 02:38:41 -0400
>> Jeff King  wrote:
>> 
>> > +#ifdef SUPPRESS_ANNOTATED_LEAKS
>> > +extern void unleak_memory(const void *ptr, size_t len);
>> > +#define UNLEAK(var) unleak_memory(&(var), sizeof(var));
>> 
>> I would feel better if the semicolon was omitted. I don't think it
>> matters in this particular case, though.
>
> You end up with a double semi-colon. Some linters might complain.

Yeah, and it makes

if (some condition)
UNLEAK(var);
else
do_something_else_to(var);

a syntax error.

Should have spotted during the review; sorry.

>
>> > +#else
>> > +#define UNLEAK(var)
>> 
>> I think this should be defined to be something (for example, "do {}
>> while (0)"), at least so that we have compiler errors when UNLEAK(var)
>> is used incorrectly (for example, without the semicolon) when
>> SUPPRESS_ANNOTATED_LEAKS is not defined.
>
> Seems reasonable.

Hmph, I am not so sure about this one.  But I agree that the
semicolon must go.


>
> I think both are worth doing, but the patch is already in next so we
> need to do it on top. Do you want to prepare a patch?
>
> -Peff


Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread Jonathan Nieder
Hi,

Junio C Hamano wrote:
> Kaartic Sivaraam  writes:
>> On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote:
>>> Jonathan Nieder wrote:

 Does the following alternate fix work?  I think I prefer it because
 it doesn't require introducing a new global. [...]
   #define for_each_string_list_item(item,list) \
 -  for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
 +  for (item = (list)->items; \
 +   (list)->items && item < (list)->items + (list)->nr; \
 +   ++item)
>>>
>>> This is the possibility that I was referring to as "add[ing] overhead to
>>> each iteration of the loop". I'd rather not add an extra test-and-branch
>>> to every iteration of a loop in which `list->items` is *not* NULL, which
>>> your solution appears to do. Or are compilers routinely able to optimize
>>> the check out?
>>
>> It seems at least 'gcc' is able to optimize this out even with a -O1
>> and 'clang' optimizes this out with a -O2. Taking a sneak peek at
>> the 'Makefile' shows that our default is -O2.
>
> But doesn't the versions of gcc and clang currently available do the
> right thing with the current code without this change anyway?  I've
> been operating under the assumption that this is to future-proof the
> code even when the compilers change to use the "NULL+0 is undefined"
> as an excuse to make demons fly out of your nose, so unfortunately I
> do not think it is not so huge a plus to find that the current
> compilers do the right thing to the code with proposed updates.

I think you and Kaartic are talking about different things.  Kaartic
was checking that this wouldn't introduce a performance regression
(thanks!).  You are concerned about whether the C standard and common
practice treat the resulting code as exhibiting undefined behavior.

Fortunately the C standard is pretty clear about this.  The undefined
behavior here is at run time, not compile time.  As you suggested in
an earlier reply, the 'list->items &&' effectively guards the
'list->items + list->nr' to prevent that undefined behavior.

I'll send a patch with a commit message saying so to try to close out
this discussion.

Thanks,
Jonathan


Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread Junio C Hamano
Kaartic Sivaraam  writes:

> On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote:
>>> Does the following alternate fix work?  I think I prefer it because
>>> it doesn't require introducing a new global. [...]
>>>   #define for_each_string_list_item(item,list) \
>>> -   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
>>> +   for (item = (list)->items; \
>>> +(list)->items && item < (list)->items + (list)->nr; \
>>> +++item)
>> This is the possibility that I was referring to as "add[ing] overhead to
>> each iteration of the loop". I'd rather not add an extra test-and-branch
>> to every iteration of a loop in which `list->items` is *not* NULL, which
>> your solution appears to do. Or are compilers routinely able to optimize
>> the check out?
>
> It seems at least 'gcc' is able to optimize this out even with a -O1
> and 'clang' optimizes this out with a -O2. Taking a sneak peek at
> the 'Makefile' shows that our default is -O2.

But doesn't the versions of gcc and clang currently available do the
right thing with the current code without this change anyway?  I've
been operating under the assumption that this is to future-proof the
code even when the compilers change to use the "NULL+0 is undefined"
as an excuse to make demons fly out of your nose, so unfortunately I
do not think it is not so huge a plus to find that the current
compilers do the right thing to the code with proposed updates.



[PATCH v2] describe: teach --match to handle branches and remotes

2017-09-19 Thread Max Kirillov
When `git describe` uses `--match`, it matches only tags, basically
ignoring the `--all` argument even when it is specified.

Fix it by also matching branch name and $remote_name/$remote_branch_name,
for remote-tracking references, with the specified patterns. Update
documentation accordingly and add tests.

Signed-off-by: Max Kirillov 
---
Changed to use skip_prefix(). Calculate path_to_match only once.

Add case of discarding unknown type with exclude
 Documentation/git-describe.txt | 24 ++--
 builtin/describe.c | 29 +
 t/t6120-describe.sh| 27 +++
 3 files changed, 58 insertions(+), 22 deletions(-)

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index 26f19d3b07..c924c945ba 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -87,19 +87,23 @@ OPTIONS
 
 --match ::
Only consider tags matching the given `glob(7)` pattern,
-   excluding the "refs/tags/" prefix.  This can be used to avoid
-   leaking private tags from the repository. If given multiple times, a
-   list of patterns will be accumulated, and tags matching any of the
-   patterns will be considered. Use `--no-match` to clear and reset the
-   list of patterns.
+   excluding the "refs/tags/" prefix. If used with `--all`, it also
+   considers local branches and remote-tracking references matching the
+   pattern, excluding respectively "refs/heads/" and "refs/remotes/"
+   prefix; references of other types are never considered. If given
+   multiple times, a list of patterns will be accumulated, and tags
+   matching any of the patterns will be considered.  Use `--no-match` to
+   clear and reset the list of patterns.
 
 --exclude ::
Do not consider tags matching the given `glob(7)` pattern, excluding
-   the "refs/tags/" prefix. This can be used to narrow the tag space and
-   find only tags matching some meaningful criteria. If given multiple
-   times, a list of patterns will be accumulated and tags matching any
-   of the patterns will be excluded. When combined with --match a tag will
-   be considered when it matches at least one --match pattern and does not
+   the "refs/tags/" prefix. If used with `--all`, it also does not consider
+   local branches and remote-tracking references matching the pattern,
+   excluding respectively "refs/heads/" and "refs/remotes/" prefix;
+   references of other types are never considered. If given multiple times,
+   a list of patterns will be accumulated and tags matching any of the
+   patterns will be excluded. When combined with --match a tag will be
+   considered when it matches at least one --match pattern and does not
match any of the --exclude patterns. Use `--no-exclude` to clear and
reset the list of patterns.
 
diff --git a/builtin/describe.c b/builtin/describe.c
index 42afa1e244..f15b6e531d 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -129,13 +129,24 @@ static void add_to_known_names(const char *path,
 
 static int get_name(const char *path, const struct object_id *oid, int flag, 
void *cb_data)
 {
-   int is_tag = starts_with(path, "refs/tags/");
+   int is_tag = 0;
struct object_id peeled;
int is_annotated, prio;
-
-   /* Reject anything outside refs/tags/ unless --all */
-   if (!all && !is_tag)
+   const char *path_to_match = NULL;
+
+   if (skip_prefix(path, "refs/tags/", _to_match)) {
+   is_tag = 1;
+   } else if (all) {
+   if ((exclude_patterns.nr || patterns.nr) &&
+   !skip_prefix(path, "refs/heads/", _to_match) &&
+   !skip_prefix(path, "refs/remotes/", _to_match)) {
+   /* Only accept reference of known type if there are 
match/exclude patterns */
+   return 0;
+   }
+   } else {
+   /* Reject anything outside refs/tags/ unless --all */
return 0;
+   }
 
/*
 * If we're given exclude patterns, first exclude any tag which match
@@ -144,11 +155,8 @@ static int get_name(const char *path, const struct 
object_id *oid, int flag, voi
if (exclude_patterns.nr) {
struct string_list_item *item;
 
-   if (!is_tag)
-   return 0;
-
for_each_string_list_item(item, _patterns) {
-   if (!wildmatch(item->string, path + 10, 0))
+   if (!wildmatch(item->string, path_to_match, 0))
return 0;
}
}
@@ -161,11 +169,8 @@ static int get_name(const char *path, const struct 
object_id *oid, int flag, voi
int found = 0;
struct string_list_item *item;
 
-   if (!is_tag)
-  

Re: [PATCH] describe: teach --match to handle branches and remotes

2017-09-19 Thread Max Kirillov
On Tue, Sep 19, 2017 at 08:52:24AM +0900, Junio C Hamano wrote:
> I think you can use skip_prefix() to avoid counting the length of
> the prefix yourself, i.e.

Thanks, will use it.

> The hardcoded +10 for "is_tag" case assumes that anything other than
> "refs/tags/something" would ever be used to call into this function
> when is_tag is true, and that may well be true in the current code
> and have been so ever since the original code was written, but it
> still smells like an invitation for future bugs.

is_tag is used later. I'll chance it so that it does not
rely on it to match, but it still has to produce it.

> Was there a reason why A and c are in different cases?  Are we
> worried about case insensitive filesystems or something?

The tags have been there of different case already. I don't
know why. I'll change the branch names but I'm reluctant to
touch existing tests.

-- 
Max


RE: [PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit

2017-09-19 Thread Ben Peart
> -Original Message-
> From: David Turner [mailto:david.tur...@twosigma.com]
> Sent: Tuesday, September 19, 2017 5:27 PM
> To: 'Ben Peart' ; Ben Peart
> 
> Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
> gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
> p...@peff.net
> Subject: RE: [PATCH v7 06/12] ls-files: Add support in ls-files to display the
> fsmonitor valid bit
> 
> 
> > -Original Message-
> > From: Ben Peart [mailto:peart...@gmail.com]
> > Sent: Tuesday, September 19, 2017 4:45 PM
> > To: David Turner ; 'Ben Peart'
> > 
> > Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
> > gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
> > p...@peff.net
> > Subject: Re: [PATCH v7 06/12] ls-files: Add support in ls-files to
> > display the fsmonitor valid bit
> >
> >
> >
> > On 9/19/2017 3:46 PM, David Turner wrote:
> > >> -Original Message-
> > >> From: Ben Peart [mailto:benpe...@microsoft.com]
> > >> Sent: Tuesday, September 19, 2017 3:28 PM
> > >> To: benpe...@microsoft.com
> > >> Cc: David Turner ; ava...@gmail.com;
> > >> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> > >> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> > >> Subject: [PATCH v7 06/12] ls-files: Add support in ls-files to
> > >> display the fsmonitor valid bit
> > >>
> > >> Add a new command line option (-f) to ls-files to have it use
> > >> lowercase letters for 'fsmonitor valid' files
> > >>
> > >> Signed-off-by: Ben Peart 
> > >> ---
> > >>   builtin/ls-files.c | 8 ++--
> > >>   1 file changed, 6 insertions(+), 2 deletions(-)
> > >
> > > This is still missing the corresponding documentation patch.
> >
> > Sorry for the confusion.
> 
> Thanks for following up.
> 
> > > 10/12 (no reply, haven't checked whether same issue but I assume
> > > same issue since the new case I mentioned isn't added)
> >
> > It wasn't a bug so I didn't "fix" it.  I just sent an explanation and
> > patch demonstrating why. You can find it here:
> 
> I was concerned about the case of an untracked file inside a directory that
> contains no tracked files.  Your patch in this mail treats dir3 just like 
> dir1 and
> dir2.  I think you ought to test the case of a dir with no tracked files.
> 

In the case where there is an untracked file inside a directory that contains 
no tracked files, git will (as shown by the "failing" test) actually find the 
untracked file.  This is the correct/expected behavior.  The test failure is 
just indicating that the optimization of not searching that directory for 
untracked files was not able to occur (because there was no entry in the 
untracked cache for that directory).

> After more careful checking, it looks like this case does work, but it's still
> worth testing.



[PATCH for jk/leak-checkers v2] git-compat-util: make UNLEAK less error-prone

2017-09-19 Thread Jonathan Tan
Commit 0e5bba5 ("add UNLEAK annotation for reducing leak false
positives", 2017-09-08) introduced an UNLEAK macro to be used as
"UNLEAK(var);", but its existing definitions leave semicolons that act
as empty statements, which may cause some tools to complain.

Therefore, modify its definitions to not leave these semicolons.

Signed-off-by: Jonathan Tan 
---
OK, here is the same patch with an updated commit message.
---
 git-compat-util.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 003e444c4..9bc15b036 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1184,9 +1184,9 @@ extern int cmd_main(int, const char **);
  */
 #ifdef SUPPRESS_ANNOTATED_LEAKS
 extern void unleak_memory(const void *ptr, size_t len);
-#define UNLEAK(var) unleak_memory(&(var), sizeof(var));
+#define UNLEAK(var) unleak_memory(&(var), sizeof(var))
 #else
-#define UNLEAK(var)
+#define UNLEAK(var) do {} while (0)
 #endif
 
 #endif
-- 
2.14.1.821.g8fa685d3b7-goog



Re: [PATCH for jk/leak-checkers] git-compat-util: make UNLEAK less error-prone

2017-09-19 Thread Jeff King
On Tue, Sep 19, 2017 at 02:34:56PM -0700, Jonathan Tan wrote:

> Commit 0e5bba5 ("add UNLEAK annotation for reducing leak false
> positives", 2017-09-08) introduced an UNLEAK macro to be used as
> "UNLEAK(var);", but its existing definitions make it possible to be
> invoked as "UNLEAK(var)" (without the trailing semicolon) too.
> 
> Therefore, modify its definitions to cause a compile-time error if
> invoked without the trailing semicolon.

The patch itself looks good, but I think your commit message dances
around the real motivation: some parsers may complain about doubled
semicolons, or semicolons without an associated line of code (which is
really just a doubled semicolon, too, I guess).

-Peff


[PATCH for jk/leak-checkers] git-compat-util: make UNLEAK less error-prone

2017-09-19 Thread Jonathan Tan
Commit 0e5bba5 ("add UNLEAK annotation for reducing leak false
positives", 2017-09-08) introduced an UNLEAK macro to be used as
"UNLEAK(var);", but its existing definitions make it possible to be
invoked as "UNLEAK(var)" (without the trailing semicolon) too.

Therefore, modify its definitions to cause a compile-time error if
invoked without the trailing semicolon.

Signed-off-by: Jonathan Tan 
---
Sure, here is the patch.
---
 git-compat-util.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/git-compat-util.h b/git-compat-util.h
index 003e444c4..9bc15b036 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -1184,9 +1184,9 @@ extern int cmd_main(int, const char **);
  */
 #ifdef SUPPRESS_ANNOTATED_LEAKS
 extern void unleak_memory(const void *ptr, size_t len);
-#define UNLEAK(var) unleak_memory(&(var), sizeof(var));
+#define UNLEAK(var) unleak_memory(&(var), sizeof(var))
 #else
-#define UNLEAK(var)
+#define UNLEAK(var) do {} while (0)
 #endif
 
 #endif
-- 
2.14.1.821.g8fa685d3b7-goog



RE: [PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit

2017-09-19 Thread David Turner

> -Original Message-
> From: Ben Peart [mailto:peart...@gmail.com]
> Sent: Tuesday, September 19, 2017 4:45 PM
> To: David Turner ; 'Ben Peart'
> 
> Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
> gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
> p...@peff.net
> Subject: Re: [PATCH v7 06/12] ls-files: Add support in ls-files to display the
> fsmonitor valid bit
> 
> 
> 
> On 9/19/2017 3:46 PM, David Turner wrote:
> >> -Original Message-
> >> From: Ben Peart [mailto:benpe...@microsoft.com]
> >> Sent: Tuesday, September 19, 2017 3:28 PM
> >> To: benpe...@microsoft.com
> >> Cc: David Turner ; ava...@gmail.com;
> >> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> >> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> >> Subject: [PATCH v7 06/12] ls-files: Add support in ls-files to
> >> display the fsmonitor valid bit
> >>
> >> Add a new command line option (-f) to ls-files to have it use
> >> lowercase letters for 'fsmonitor valid' files
> >>
> >> Signed-off-by: Ben Peart 
> >> ---
> >>   builtin/ls-files.c | 8 ++--
> >>   1 file changed, 6 insertions(+), 2 deletions(-)
> >
> > This is still missing the corresponding documentation patch.
> 
> Sorry for the confusion.

Thanks for following up.

> > 10/12 (no reply, haven't checked whether same issue but I assume same
> > issue since the new case I mentioned isn't added)
> 
> It wasn't a bug so I didn't "fix" it.  I just sent an explanation and patch
> demonstrating why. You can find it here:

I was concerned about the case of an untracked file inside a directory 
that contains no tracked files.  Your patch in this mail treats dir3 just 
like dir1 and dir2.  I think you ought to test the case of a dir with no 
tracked files.

After more careful checking, it looks like this case does work, but it's 
still worth testing.



Re: [PATCH v2 10/10] add UNLEAK annotation for reducing leak false positives

2017-09-19 Thread Jeff King
On Tue, Sep 19, 2017 at 01:45:52PM -0700, Jonathan Tan wrote:

> The following comments are assuming that we're going to standardize on
> UNLEAK(var); (with the semicolon).

Yeah, I assumed we would. We don't have to, since this really is sort-of
magical, but I think the code looks better with it.

> On Fri, 8 Sep 2017 02:38:41 -0400
> Jeff King  wrote:
> 
> > +#ifdef SUPPRESS_ANNOTATED_LEAKS
> > +extern void unleak_memory(const void *ptr, size_t len);
> > +#define UNLEAK(var) unleak_memory(&(var), sizeof(var));
> 
> I would feel better if the semicolon was omitted. I don't think it
> matters in this particular case, though.

You end up with a double semi-colon. Some linters might complain.

> > +#else
> > +#define UNLEAK(var)
> 
> I think this should be defined to be something (for example, "do {}
> while (0)"), at least so that we have compiler errors when UNLEAK(var)
> is used incorrectly (for example, without the semicolon) when
> SUPPRESS_ANNOTATED_LEAKS is not defined.

Seems reasonable.

I think both are worth doing, but the patch is already in next so we
need to do it on top. Do you want to prepare a patch?

-Peff


Re: [PATCH v2 10/10] add UNLEAK annotation for reducing leak false positives

2017-09-19 Thread Jonathan Tan
Thanks - this does look like a good thing to have. Sorry for the late
comments.

The following comments are assuming that we're going to standardize on
UNLEAK(var); (with the semicolon).

On Fri, 8 Sep 2017 02:38:41 -0400
Jeff King  wrote:

> +#ifdef SUPPRESS_ANNOTATED_LEAKS
> +extern void unleak_memory(const void *ptr, size_t len);
> +#define UNLEAK(var) unleak_memory(&(var), sizeof(var));

I would feel better if the semicolon was omitted. I don't think it
matters in this particular case, though.

> +#else
> +#define UNLEAK(var)

I think this should be defined to be something (for example, "do {}
while (0)"), at least so that we have compiler errors when UNLEAK(var)
is used incorrectly (for example, without the semicolon) when
SUPPRESS_ANNOTATED_LEAKS is not defined.


Re: [PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit

2017-09-19 Thread Ben Peart



On 9/19/2017 3:46 PM, David Turner wrote:

-Original Message-
From: Ben Peart [mailto:benpe...@microsoft.com]
Sent: Tuesday, September 19, 2017 3:28 PM
To: benpe...@microsoft.com
Cc: David Turner ; ava...@gmail.com;
christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
Subject: [PATCH v7 06/12] ls-files: Add support in ls-files to display the
fsmonitor valid bit

Add a new command line option (-f) to ls-files to have it use lowercase
letters for 'fsmonitor valid' files

Signed-off-by: Ben Peart 
---
  builtin/ls-files.c | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)


This is still missing the corresponding documentation patch.


Sorry for the confusion.

The documentation is all in a patch together as they all have links to 
each other.  You can find it here:


https://public-inbox.org/git/20170919192744.19224-6-benpe...@microsoft.com/T/#u



I can see from replies that at least some of my messages got through.  In 
total, I sent messages about:
04/12 (I see replies)
05/12 (I see replies)
06/12 (no reply, issue not fixed)


The documentation is all in a patch together as they all have links to 
each other.  You can find it here:


https://public-inbox.org/git/20170919192744.19224-6-benpe...@microsoft.com/T/#u


10/12 (no reply, haven't checked whether same issue but I assume same issue 
since the new case I mentioned isn't added)


It wasn't a bug so I didn't "fix" it.  I just sent an explanation and 
patch demonstrating why. You can find it here:


https://public-inbox.org/git/84981984-02c1-f322-a617-57dfe1d87...@gmail.com/T/#u


12/12 (no reply, typo fixed -- no reply required)



Hope this helps.


Re: [PATCH v6 09/12] split-index: disable the fsmonitor extension when running the split index test

2017-09-19 Thread Jonathan Nieder
Hi,

Ben Peart wrote:

> The split index test t1700-split-index.sh has hard coded SHA values for
> the index.  Currently it supports index V4 and V3 but assumes there are
> no index extensions loaded.
>
> When manually forcing the fsmonitor extension to be turned on when
> running the test suite, the SHA values no longer match which causes the
> test to fail.
>
> The potential matrix of index extensions and index versions can is quite
> large so instead disable the extension before attempting to run the test.

Thanks for finding and diagnosing this problem.

This feels to me like the wrong fix.  Wouldn't it be better for the
test not to depend on the precise object ids?  See the "Tips for
Writing Tests" section in t/README:

 And
such drastic changes to the core GIT that even changes these
otherwise supposedly stable object IDs should be accompanied by
an update to t-basic.sh.

However, other tests that simply rely on basic parts of the core
GIT working properly should not have that level of intimate
knowledge of the core GIT internals.  If all the test scripts
hardcoded the object IDs like t-basic.sh does, that defeats
the purpose of t-basic.sh, which is to isolate that level of
validation in one place.  Your test also ends up needing
updating when such a change to the internal happens, so do _not_
do it and leave the low level of validation to t-basic.sh.

Worse, t1700-split-index.sh doesn't explain where the object ids it
uses comes from so it is not even obvious to a casual reader like me
how to fix it.

See t/diff-lib.sh for some examples of one way to avoid depending on
the object id computation.  Another way that is often preferable is to
come up with commands to compute the expected hash values, like
$(git rev-parse HEAD^{tree}), and use those instead of hard-coded
values.

Thanks and hope that helps,
Jonathan


Re: [PATCH v1 1/1] test-lint: echo -e (or -E) is not portable

2017-09-19 Thread Jonathan Nieder
Torsten Bögershausen  wrote:

> Some implementations of `echo` support the '-e' option to enable
> backslash interpretation of the following string.
> As an addition, they support '-E' to turn it off.

nit: please wrap the commit message to a consistent line width.

> However, none of these are portable, POSIX doesn't even mention them,
> and many implementations don't support them.
>
> A check for '-n' is already done in check-non-portable-shell.pl,
> extend it to cover '-n', '-e' or '-E-'
>
> Signed-off-by: Torsten Bögershausen 
> ---
>  t/check-non-portable-shell.pl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)

An excellent change.  Thanks for noticing and fixing this.

Reviewed-by: Jonathan Nieder 


Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension

2017-09-19 Thread Jonathan Nieder
Ben Peart wrote:

> Some stats on these same coding style errors in the current bash scripts:
>
> 298 instances of "[a-z]\(\).*\{" ie "function_name() {" (no space)
> 140 instances of "if \[ .* \]" (not using the preferred "test")
> 293 instances of "if .*; then"
>
> Wouldn't it be great not to have to write up style feedback for when
> these all get copy/pasted into new scripts?

Agreed.  Care to write patches for these? :)  (I think three patches,
one for each issue, would do the trick.)

Thanks,
Jonathan


Re: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension

2017-09-19 Thread Ben Peart



On 9/19/2017 3:32 PM, David Turner wrote:

I think my comment here might have gotten lost, and I don't want it to because 
it's something I'm really worried about:



You have to update the test completely to ensure it passes.  If you run 
the test with the '-v' option you will see the cause of the failure:


t7519-status-fsmonitor.sh: line 27: dir3/untracked: No such file or 
directory


To fix this, you will also need to update the 'setup' test to create the 
directory for the new untracked file to get created into.  Then you will 
need to drop at least one file in it so that the directory is preserved 
through the 'git reset --hard'  Then you have to update the several 'cat 
>expect' blocks to expect the new file.


In addition, the ability to avoid scanning for untracked files relies on 
the untracked cache.  If you don't have another file that git is aware 
of in that directory then there won't be a cache entry and git will do 
the required scan and detect the new untracked file (by design).


Here is a patch to the test that updates it to meet all these 
requirements.  I hope this helps.



diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index c6df85af5e..29ae4e284f 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -24,12 +24,14 @@ dirty_repo () {
: >untracked &&
: >dir1/untracked &&
: >dir2/untracked &&
+   : >dir3/untracked &&
echo 1 >modified &&
echo 2 >dir1/modified &&
echo 3 >dir2/modified &&
echo 4 >new &&
echo 5 >dir1/new &&
echo 6 >dir2/new
+   echo 7 >dir3/new
 }

 write_integration_script () {
@@ -47,12 +49,14 @@ write_integration_script () {
printf "untracked\0"
printf "dir1/untracked\0"
printf "dir2/untracked\0"
+   printf "dir3/untracked\0"
printf "modified\0"
printf "dir1/modified\0"
printf "dir2/modified\0"
printf "new\0"
printf "dir1/new\0"
printf "dir2/new\0"
+   printf "dir3/new\0"
EOF
 }

@@ -71,6 +75,8 @@ test_expect_success 'setup' '
mkdir dir2 &&
: >dir2/tracked &&
: >dir2/modified &&
+   mkdir dir3 &&
+   : >dir3/tracked &&
git -c core.fsmonitor= add . &&
git -c core.fsmonitor= commit -m initial &&
git config core.fsmonitor .git/hooks/fsmonitor-test &&
@@ -107,6 +113,7 @@ h dir1/modified
 H dir1/tracked
 h dir2/modified
 H dir2/tracked
+H dir3/tracked
 h modified
 H tracked
 EOF
@@ -126,6 +133,7 @@ H dir1/modified
 H dir1/tracked
 H dir2/modified
 H dir2/tracked
+H dir3/tracked
 H modified
 H tracked
 EOF
@@ -144,6 +152,7 @@ H dir1/modified
 H dir1/tracked
 H dir2/modified
 H dir2/tracked
+H dir3/tracked
 H modified
 H tracked
 EOF
@@ -164,6 +173,8 @@ H dir1/tracked
 H dir2/modified
 h dir2/new
 H dir2/tracked
+h dir3/new
+H dir3/tracked
 H modified
 h new
 H tracked
@@ -174,6 +185,7 @@ test_expect_success 'newly added files are marked 
valid' '

git add new &&
git add dir1/new &&
git add dir2/new &&
+   git add dir3/new &&
git ls-files -f >actual &&
test_cmp expect actual
 '
@@ -185,6 +197,8 @@ h dir1/tracked
 H dir2/modified
 h dir2/new
 h dir2/tracked
+h dir3/new
+h dir3/tracked
 H modified
 h new
 h tracked
@@ -203,6 +217,7 @@ H dir1/modified
 h dir1/tracked
 h dir2/modified
 h dir2/tracked
+h dir3/tracked
 h modified
 h tracked
 EOF
@@ -269,6 +284,7 @@ do
git add new &&
git add dir1/new &&
git add dir2/new &&
+   git add dir3/new &&
git status >actual &&
git -c core.fsmonitor= status >expect &&
test_i18ncmp expect actual




-Original Message-
From: David Turner
Sent: Friday, September 15, 2017 6:00 PM
To: 'Ben Peart' 
Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
p...@peff.net
Subject: RE: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor
extension


-Original Message-
+dirty_repo () {
+   : >untracked &&
+   : >dir1/untracked &&
+   : >dir2/untracked &&
+   echo 1 >modified &&
+   echo 2 >dir1/modified &&
+   echo 3 >dir2/modified &&
+   echo 4 >new &&
+   echo 5 >dir1/new &&
+   echo 6 >dir2/new


If I add an untracked file named dir3/untracked to dirty_repo  (and
write_integration_script), then "status doesn't detect unreported
modifications", below, fails.  Did I do something wrong, or does this turn up a
bug?





Re: [PATCH] doc: camelCase the config variables to improve readability

2017-09-19 Thread Jonathan Nieder
Hi,

Kaartic Sivaraam wrote:

> The config variable used weren't readable as they were in the
> crude form of how git stores/uses it's config variables.

nit: Git's commit messages describe the current behavior of Git in the
present tense.  Something like:

These manpages' references to config variables are hard to read because
they use all-lowercase names, without indicating where each word ends
and begins in the configuration variable names.

Improve readability by using camelCase instead.  Git treats these names
case-insensitively so this does not affect functionality.

> Improve it's readability by replacing them with camelCased versions
> of config variables as it doesn't have any impact on it's usage.

nit: s/it's/its/ (in both places)

> Signed-off-by: Kaartic Sivaraam 
> ---
>  Documentation/git-branch.txt | 4 ++--
>  Documentation/git-tag.txt| 2 +-
>  2 files changed, 3 insertions(+), 3 deletions(-)

This also is just more consistent with the rest of the docs.

FWIW, with or without the commit message tweaks mentioned above,
Reviewed-by: Jonathan Nieder 

Thanks for your attention to detail.


Re: [PATCH v2 00/21] Read `packed-refs` using mmap()

2017-09-19 Thread Johannes Schindelin
Hi Michael,

On Tue, 19 Sep 2017, Michael Haggerty wrote:

> This is v2 of a patch series that changes the reading and caching of the
> `packed-refs` file to use `mmap()`. Thanks to Junio, Stefan, and
> Johannes for their comments about v1 [1].

Thank you for the new iteration.

> The main change since v1 is to accommodate Windows, which doesn't let
> you replace a file using `rename()` if the file is currently mmapped.
> This is unfortunate, because it means that Windows will never get the
> O(N) → O(lg N) improvement for reading single references that more
> capable systems can now enjoy.

Triggered by your enquiry, I looked into passing the FILE_SHARE_DELETE
flag which I hoped would let us delete the file even if it still is open
(and mapped). In my tests, this did not work. If anybody wants to have a
look at what I did (and whether they can make it work):
https://github.com/dscho/git/tree/replace-wopen

> The background was discussed on the mailing list [2]. The bottom line
> is that on Windows, keeping the `packed-refs` lock mmapped would be
> tantamount to holding reader lock on that file, preventing anybody
> (even unrelated processes) from changing the `packed-refs` file while
> it is mmapped. This is even worse than the situation for packfiles
> (which is solved using `close_all_packs()`), because a packfile, once
> created, never needs to be replaced—every packfile has a filename that
> is determined from its contents. The worst that can happen if a
> packfile is locked is that another process cannot remove it, but that
> is not critical for correctness. The `packed-refs` file, on the other
> hand, always has the same filename and needs to be overwritten for
> correctness.
> 
> So the approach taken here is that a new compile-time option,
> `MMAP_PREVENTS_DELETE`, is introduced. When this option is set, then
> the `packed-refs` file is read quickly into memory then closed.

Another approach would be to imitate close_all_packs() and rely on the
Windows-specific code that retries renames in a staggered fashion, waiting
a little longer and longer before retrying, and finally telling the user
that some file cannot be overwritten:
https://github.com/git-for-windows/git/blob/v2.14.1.windows.1/compat/mingw.c#L2439-L2441

This is not a new problem, by the way. If a file is in use while you try
to run `git checkout` with a different version of that file, we have the
exact same problem on Windows. And we deal with it using that
retry_ask_yes_no() function.

For this to work, the current process really would need to be able to
release all snapshots in one go (for simplicity, I would not even check
the filename but simply blow them all away when we want to overwrite
packed-refs).

I guess I should set aside some time to implement that on top of your
series (I *really* want our in-house users to benefit from that O(lg n)
improvement). In the meantime, I think this can go forward with the
current design.

Ciao,
Dscho

RE: [PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit

2017-09-19 Thread David Turner
> -Original Message-
> From: Ben Peart [mailto:benpe...@microsoft.com]
> Sent: Tuesday, September 19, 2017 3:28 PM
> To: benpe...@microsoft.com
> Cc: David Turner ; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org; gits...@pobox.com;
> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> Subject: [PATCH v7 06/12] ls-files: Add support in ls-files to display the
> fsmonitor valid bit
> 
> Add a new command line option (-f) to ls-files to have it use lowercase
> letters for 'fsmonitor valid' files
> 
> Signed-off-by: Ben Peart 
> ---
>  builtin/ls-files.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)

This is still missing the corresponding documentation patch.  

I can see from replies that at least some of my messages got through.  In 
total, I sent messages about:
04/12 (I see replies)
05/12 (I see replies)
06/12 (no reply, issue not fixed)
10/12 (no reply, haven't checked whether same issue but I assume same issue 
since the new case I mentioned isn't added)
12/12 (no reply, typo fixed -- no reply required)



[PATCH v2 2/2] read_info_alternates: warn on non-trivial errors

2017-09-19 Thread Jeff King
When we fail to open $GIT_DIR/info/alternates, we silently
assume there are no alternates. This is the right thing to
do for ENOENT, but not for other errors.

A hard error is probably overkill here. If we fail to read
an alternates file then either we'll complete our operation
anyway, or we'll fail to find some needed object. Either
way, a warning is good idea. And we already have a helper
function to handle this pattern; let's just call
warn_on_fopen_error().

Note that technically the errno from strbuf_read_file()
might be from a read() error, not open(). But since read()
would never return ENOENT or ENOTDIR, and since it produces
a generic "unable to access" error, it's suitable for
handling errors from either.

Signed-off-by: Jeff King 
---
 sha1_file.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/sha1_file.c b/sha1_file.c
index b682b7ec06..1477ea7b50 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -432,6 +432,7 @@ static void read_info_alternates(const char * 
relative_base, int depth)
 
path = xstrfmt("%s/info/alternates", relative_base);
if (strbuf_read_file(, path, 1024) < 0) {
+   warn_on_fopen_errors(path);
free(path);
return;
}
-- 
2.14.1.1014.g252e627ae0


[PATCH v2 1/2] read_info_alternates: read contents into strbuf

2017-09-19 Thread Jeff King
This patch fixes a regression in v2.11.1 where we might read
past the end of an mmap'd buffer. It was introduced in
cf3c635210.

The link_alt_odb_entries() function has always taken a
ptr/len pair as input. Until cf3c635210 (alternates: accept
double-quoted paths, 2016-12-12), we made a copy of those
bytes in a string. But after that commit, we switched to
parsing the input left-to-right, and we ignore "len"
totally, instead reading until we hit a NUL.

This has mostly gone unnoticed for a few reasons:

  1. All but one caller passes a NUL-terminated string, with
 "len" pointing to the NUL.

  2. The remaining caller, read_info_alternates(), passes in
 an mmap'd file. Unless the file is an exact multiple of
 the page size, it will generally be followed by NUL
 padding to the end of the page, which just works.

The easiest way to demonstrate the problem is to build with:

  make SANITIZE=address NO_MMAP=Nope test

Any test which involves $GIT_DIR/info/alternates will fail,
as the mmap emulation (correctly) does not add an extra NUL,
and ASAN complains about reading past the end of the buffer.

One solution would be to teach link_alt_odb_entries() to
respect "len". But it's actually a bit tricky, since we
depend on unquote_c_style() under the hood, and it has no
ptr/len variant.

We could also just make a NUL-terminated copy of the input
bytes and operate on that. But since all but one caller
already is passing a string, instead let's just fix that
caller to provide NUL-terminated input in the first place,
by swapping out mmap for strbuf_read_file().

There's no advantage to using mmap on the alternates file.
It's not expected to be large (and anyway, we're copying its
contents into an in-memory linked list). Nor is using
git_open() buying us anything here, since we don't keep the
descriptor open for a long period of time.

Let's also drop the "len" parameter entirely from
link_alt_odb_entries(), since it's completely ignored. That
will avoid any new callers re-introducing a similar bug.

Reported-by: Michael Haggerty 
Signed-off-by: Jeff King 
---
 sha1_file.c | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index b4a67bb838..b682b7ec06 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -398,7 +398,7 @@ static const char *parse_alt_odb_entry(const char *string,
return end;
 }
 
-static void link_alt_odb_entries(const char *alt, int len, int sep,
+static void link_alt_odb_entries(const char *alt, int sep,
 const char *relative_base, int depth)
 {
struct strbuf objdirbuf = STRBUF_INIT;
@@ -427,28 +427,18 @@ static void link_alt_odb_entries(const char *alt, int 
len, int sep,
 
 static void read_info_alternates(const char * relative_base, int depth)
 {
-   char *map;
-   size_t mapsz;
-   struct stat st;
char *path;
-   int fd;
+   struct strbuf buf = STRBUF_INIT;
 
path = xstrfmt("%s/info/alternates", relative_base);
-   fd = git_open(path);
-   free(path);
-   if (fd < 0)
-   return;
-   if (fstat(fd, ) || (st.st_size == 0)) {
-   close(fd);
+   if (strbuf_read_file(, path, 1024) < 0) {
+   free(path);
return;
}
-   mapsz = xsize_t(st.st_size);
-   map = xmmap(NULL, mapsz, PROT_READ, MAP_PRIVATE, fd, 0);
-   close(fd);
-
-   link_alt_odb_entries(map, mapsz, '\n', relative_base, depth);
 
-   munmap(map, mapsz);
+   link_alt_odb_entries(buf.buf, '\n', relative_base, depth);
+   strbuf_release();
+   free(path);
 }
 
 struct alternate_object_database *alloc_alt_odb(const char *dir)
@@ -503,7 +493,7 @@ void add_to_alternates_file(const char *reference)
if (commit_lock_file(lock))
die_errno("unable to move new alternates file into 
place");
if (alt_odb_tail)
-   link_alt_odb_entries(reference, strlen(reference), 
'\n', NULL, 0);
+   link_alt_odb_entries(reference, '\n', NULL, 0);
}
free(alts);
 }
@@ -516,7 +506,7 @@ void add_to_alternates_memory(const char *reference)
 */
prepare_alt_odb();
 
-   link_alt_odb_entries(reference, strlen(reference), '\n', NULL, 0);
+   link_alt_odb_entries(reference, '\n', NULL, 0);
 }
 
 /*
@@ -619,7 +609,7 @@ void prepare_alt_odb(void)
if (!alt) alt = "";
 
alt_odb_tail = _odb_list;
-   link_alt_odb_entries(alt, strlen(alt), PATH_SEP, NULL, 0);
+   link_alt_odb_entries(alt, PATH_SEP, NULL, 0);
 
read_info_alternates(get_object_directory(), 0);
 }
-- 
2.14.1.1014.g252e627ae0



[PATCH v2 0/2] fix read past end of array in alternates files

2017-09-19 Thread Jeff King
On Mon, Sep 18, 2017 at 11:51:00AM -0400, Jeff King wrote:

> This series fixes a regression in v2.11.1 where we might read past the
> end of an mmap'd buffer. It was introduced in cf3c635210, but I didn't
> base the patch on there, for a few reasons:

Here's a v2 that fixes a minor leak in the first patch (I carefully
remembered to free() the path buffer on the error path, but forgot to do
it in the normal one. Oops).

I also tried to address Jonathan's "should this be in the commit
message" comment. The information above _is_ in there, but maybe putting
it at the top as a sort of tl;dr makes it easier to find?

The second patch is unchanged.

Junio, I see you ended up back-porting it to v2.11. Would you prefer me
to have done it that way in the first place? I was trying to reduce your
work by basing it on "maint" (figuring that we wouldn't bother making a
v2.11.x release anyway, and that skips you having to apply the second
patch separately after the merge).

  [1/2]: read_info_alternates: read contents into strbuf
  [2/2]: read_info_alternates: warn on non-trivial errors

 sha1_file.c | 31 +++
 1 file changed, 11 insertions(+), 20 deletions(-)

-Peff


RE: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor extension

2017-09-19 Thread David Turner
I think my comment here might have gotten lost, and I don't want it to because 
it's something I'm really worried about:

> -Original Message-
> From: David Turner
> Sent: Friday, September 15, 2017 6:00 PM
> To: 'Ben Peart' 
> Cc: ava...@gmail.com; christian.cou...@gmail.com; git@vger.kernel.org;
> gits...@pobox.com; johannes.schinde...@gmx.de; pclo...@gmail.com;
> p...@peff.net
> Subject: RE: [PATCH v6 10/12] fsmonitor: add test cases for fsmonitor
> extension
> 
> > -Original Message-
> > +dirty_repo () {
> > +   : >untracked &&
> > +   : >dir1/untracked &&
> > +   : >dir2/untracked &&
> > +   echo 1 >modified &&
> > +   echo 2 >dir1/modified &&
> > +   echo 3 >dir2/modified &&
> > +   echo 4 >new &&
> > +   echo 5 >dir1/new &&
> > +   echo 6 >dir2/new
> 
> If I add an untracked file named dir3/untracked to dirty_repo  (and
> write_integration_script), then "status doesn't detect unreported
> modifications", below, fails.  Did I do something wrong, or does this turn up 
> a
> bug?




[PATCH v7 08/12] fsmonitor: add a test tool to dump the index extension

2017-09-19 Thread Ben Peart
Add a test utility (test-dump-fsmonitor) that will dump the fsmonitor
index extension.

Signed-off-by: Ben Peart 
---
 Makefile   |  1 +
 t/helper/test-dump-fsmonitor.c | 21 +
 2 files changed, 22 insertions(+)
 create mode 100644 t/helper/test-dump-fsmonitor.c

diff --git a/Makefile b/Makefile
index 9d6ec9c1e9..d970cd00e9 100644
--- a/Makefile
+++ b/Makefile
@@ -639,6 +639,7 @@ TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
+TEST_PROGRAMS_NEED_X += test-dump-fsmonitor
 TEST_PROGRAMS_NEED_X += test-dump-split-index
 TEST_PROGRAMS_NEED_X += test-dump-untracked-cache
 TEST_PROGRAMS_NEED_X += test-fake-ssh
diff --git a/t/helper/test-dump-fsmonitor.c b/t/helper/test-dump-fsmonitor.c
new file mode 100644
index 00..ad452707e8
--- /dev/null
+++ b/t/helper/test-dump-fsmonitor.c
@@ -0,0 +1,21 @@
+#include "cache.h"
+
+int cmd_main(int ac, const char **av)
+{
+   struct index_state *istate = _index;
+   int i;
+
+   setup_git_directory();
+   if (do_read_index(istate, get_index_file(), 0) < 0)
+   die("unable to read index file");
+   if (!istate->fsmonitor_last_update) {
+   printf("no fsmonitor\n");
+   return 0;
+   }
+   printf("fsmonitor last update %"PRIuMAX"\n", 
(uintmax_t)istate->fsmonitor_last_update);
+
+   for (i = 0; i < istate->cache_nr; i++)
+   printf((istate->cache[i]->ce_flags & CE_FSMONITOR_VALID) ? "+" 
: "-");
+
+   return 0;
+}
-- 
2.14.1.windows.1



[PATCH v7 04/12] fsmonitor: teach git to optionally utilize a file system monitor to speed up detecting new or changed files.

2017-09-19 Thread Ben Peart
When the index is read from disk, the fsmonitor index extension is used
to flag the last known potentially dirty index entries. The registered
core.fsmonitor command is called with the time the index was last
updated and returns the list of files changed since that time. This list
is used to flag any additional dirty cache entries and untracked cache
directories.

We can then use this valid state to speed up preload_index(),
ie_match_stat(), and refresh_cache_ent() as they do not need to lstat()
files to detect potential changes for those entries marked
CE_FSMONITOR_VALID.

In addition, if the untracked cache is turned on valid_cached_dir() can
skip checking directories for new or changed files as fsmonitor will
invalidate the cache only for those directories that have been
identified as having potential changes.

To keep the CE_FSMONITOR_VALID state accurate during git operations;
when git updates a cache entry to match the current state on disk,
it will now set the CE_FSMONITOR_VALID bit.

Inversely, anytime git changes a cache entry, the CE_FSMONITOR_VALID bit
is cleared and the corresponding untracked cache directory is marked
invalid.

Signed-off-by: Ben Peart 
---
 Makefile   |   1 +
 apply.c|   2 +-
 builtin/update-index.c |   2 +
 cache.h|  10 +-
 config.c   |  14 +++
 config.h   |   1 +
 diff-lib.c |   2 +
 dir.c  |  27 --
 dir.h  |   2 +
 entry.c|   4 +-
 environment.c  |   1 +
 fsmonitor.c| 253 +
 fsmonitor.h|  61 
 preload-index.c|   6 +-
 read-cache.c   |  49 --
 submodule.c|   2 +-
 unpack-trees.c |   8 +-
 17 files changed, 419 insertions(+), 26 deletions(-)
 create mode 100644 fsmonitor.c
 create mode 100644 fsmonitor.h

diff --git a/Makefile b/Makefile
index f2bb7f2f63..9d6ec9c1e9 100644
--- a/Makefile
+++ b/Makefile
@@ -786,6 +786,7 @@ LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
+LIB_OBJS += fsmonitor.o
 LIB_OBJS += gettext.o
 LIB_OBJS += gpg-interface.o
 LIB_OBJS += graph.o
diff --git a/apply.c b/apply.c
index 71cbbd141c..9061cc5f15 100644
--- a/apply.c
+++ b/apply.c
@@ -3399,7 +3399,7 @@ static int verify_index_match(const struct cache_entry 
*ce, struct stat *st)
return -1;
return 0;
}
-   return ce_match_stat(ce, st, 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE);
+   return ce_match_stat(ce, st, 
CE_MATCH_IGNORE_VALID|CE_MATCH_IGNORE_SKIP_WORKTREE|CE_MATCH_IGNORE_FSMONITOR);
 }
 
 #define SUBMODULE_PATCH_WITHOUT_INDEX 1
diff --git a/builtin/update-index.c b/builtin/update-index.c
index e1ca0759d5..6f39ee9274 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -16,6 +16,7 @@
 #include "pathspec.h"
 #include "dir.h"
 #include "split-index.h"
+#include "fsmonitor.h"
 
 /*
  * Default to not allowing changes to the list of files. The
@@ -233,6 +234,7 @@ static int mark_ce_flags(const char *path, int flag, int 
mark)
else
active_cache[pos]->ce_flags &= ~flag;
active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
+   mark_fsmonitor_invalid(_index, active_cache[pos]);
cache_tree_invalidate_path(_index, path);
active_cache_changed |= CE_ENTRY_CHANGED;
return 0;
diff --git a/cache.h b/cache.h
index a916bc79e3..eccab968bd 100644
--- a/cache.h
+++ b/cache.h
@@ -203,6 +203,7 @@ struct cache_entry {
 #define CE_ADDED (1 << 19)
 
 #define CE_HASHED(1 << 20)
+#define CE_FSMONITOR_VALID   (1 << 21)
 #define CE_WT_REMOVE (1 << 22) /* remove in work directory */
 #define CE_CONFLICTED(1 << 23)
 
@@ -326,6 +327,7 @@ static inline unsigned int canon_mode(unsigned int mode)
 #define CACHE_TREE_CHANGED (1 << 5)
 #define SPLIT_INDEX_ORDERED(1 << 6)
 #define UNTRACKED_CHANGED  (1 << 7)
+#define FSMONITOR_CHANGED  (1 << 8)
 
 struct split_index;
 struct untracked_cache;
@@ -344,6 +346,7 @@ struct index_state {
struct hashmap dir_hash;
unsigned char sha1[20];
struct untracked_cache *untracked;
+   uint64_t fsmonitor_last_update;
 };
 
 extern struct index_state the_index;
@@ -679,8 +682,10 @@ extern void *read_blob_data_from_index(const struct 
index_state *, const char *,
 #define CE_MATCH_IGNORE_MISSING0x08
 /* enable stat refresh */
 #define CE_MATCH_REFRESH   0x10
-extern int ie_match_stat(const struct index_state *, const struct cache_entry 
*, struct stat *, unsigned int);
-extern int ie_modified(const struct index_state *, const struct cache_entry *, 
struct stat *, unsigned int);
+/* do stat comparison even if CE_FSMONITOR_VALID is true */
+#define 

[PATCH v7 06/12] ls-files: Add support in ls-files to display the fsmonitor valid bit

2017-09-19 Thread Ben Peart
Add a new command line option (-f) to ls-files to have it use lowercase
letters for 'fsmonitor valid' files

Signed-off-by: Ben Peart 
---
 builtin/ls-files.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index e1339e6d17..313962a0c1 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -31,6 +31,7 @@ static int show_resolve_undo;
 static int show_modified;
 static int show_killed;
 static int show_valid_bit;
+static int show_fsmonitor_bit;
 static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
@@ -86,7 +87,8 @@ static const char *get_tag(const struct cache_entry *ce, 
const char *tag)
 {
static char alttag[4];
 
-   if (tag && *tag && show_valid_bit && (ce->ce_flags & CE_VALID)) {
+   if (tag && *tag && ((show_valid_bit && (ce->ce_flags & CE_VALID)) ||
+   (show_fsmonitor_bit && (ce->ce_flags & CE_FSMONITOR_VALID {
memcpy(alttag, tag, 3);
 
if (isalpha(tag[0])) {
@@ -515,6 +517,8 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
N_("identify the file status with tags")),
OPT_BOOL('v', NULL, _valid_bit,
N_("use lowercase letters for 'assume unchanged' 
files")),
+   OPT_BOOL('f', NULL, _fsmonitor_bit,
+   N_("use lowercase letters for 'fsmonitor clean' 
files")),
OPT_BOOL('c', "cached", _cached,
N_("show cached files in the output (default)")),
OPT_BOOL('d', "deleted", _deleted,
@@ -584,7 +588,7 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
for (i = 0; i < exclude_list.nr; i++) {
add_exclude(exclude_list.items[i].string, "", 0, el, 
--exclude_args);
}
-   if (show_tag || show_valid_bit) {
+   if (show_tag || show_valid_bit || show_fsmonitor_bit) {
tag_cached = "H ";
tag_unmerged = "M ";
tag_removed = "R ";
-- 
2.14.1.windows.1



[PATCH v7 10/12] fsmonitor: add test cases for fsmonitor extension

2017-09-19 Thread Ben Peart
Test the ability to add/remove the fsmonitor index extension via
update-index.

Test that dirty files returned from the integration script are properly
represented in the index extension and verify that ls-files correctly
reports their state.

Test that ensure status results are correct when using the new fsmonitor
extension.  Test untracked, modified, and new files by ensuring the
results are identical to when not using the extension.

Test that if the fsmonitor extension doesn't tell git about a change, it
doesn't discover it on its own.  This ensures git is honoring the
extension and that we get the performance benefits desired.

Three test integration scripts are provided:

fsmonitor-all - marks all files as dirty
fsmonitor-none - marks no files as dirty
fsmonitor-watchman - integrates with Watchman with debug logging

To run tests in the test suite while utilizing fsmonitor:

First copy t/t7519/fsmonitor-all to a location in your path and then set
GIT_FORCE_PRELOAD_TEST=true and GIT_FSMONITOR_TEST=fsmonitor-all and run
your tests.

Note: currently when using the test script fsmonitor-watchman on
Windows, many tests fail due to a reported but not yet fixed bug in
Watchman where it holds on to handles for directories and files which
prevents the test directory from being cleaned up properly.

Signed-off-by: Ben Peart 
---
 t/t7519-status-fsmonitor.sh | 304 
 t/t7519/fsmonitor-all   |  24 
 t/t7519/fsmonitor-none  |  22 
 t/t7519/fsmonitor-watchman  | 140 
 4 files changed, 490 insertions(+)
 create mode 100755 t/t7519-status-fsmonitor.sh
 create mode 100755 t/t7519/fsmonitor-all
 create mode 100755 t/t7519/fsmonitor-none
 create mode 100755 t/t7519/fsmonitor-watchman

diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
new file mode 100755
index 00..c6df85af5e
--- /dev/null
+++ b/t/t7519-status-fsmonitor.sh
@@ -0,0 +1,304 @@
+#!/bin/sh
+
+test_description='git status with file system watcher'
+
+. ./test-lib.sh
+
+#
+# To run the entire git test suite using fsmonitor:
+#
+# copy t/t7519/fsmonitor-all to a location in your path and then set
+# GIT_FSMONITOR_TEST=fsmonitor-all and run your tests.
+#
+
+# Note, after "git reset --hard HEAD" no extensions exist other than 'TREE'
+# "git update-index --fsmonitor" can be used to get the extension written
+# before testing the results.
+
+clean_repo () {
+   git reset --hard HEAD &&
+   git clean -fd
+}
+
+dirty_repo () {
+   : >untracked &&
+   : >dir1/untracked &&
+   : >dir2/untracked &&
+   echo 1 >modified &&
+   echo 2 >dir1/modified &&
+   echo 3 >dir2/modified &&
+   echo 4 >new &&
+   echo 5 >dir1/new &&
+   echo 6 >dir2/new
+}
+
+write_integration_script () {
+   write_script .git/hooks/fsmonitor-test<<-\EOF
+   if test "$#" -ne 2
+   then
+   echo "$0: exactly 2 arguments expected"
+   exit 2
+   fi
+   if test "$1" != 1
+   then
+   echo "Unsupported core.fsmonitor hook version." >&2
+   exit 1
+   fi
+   printf "untracked\0"
+   printf "dir1/untracked\0"
+   printf "dir2/untracked\0"
+   printf "modified\0"
+   printf "dir1/modified\0"
+   printf "dir2/modified\0"
+   printf "new\0"
+   printf "dir1/new\0"
+   printf "dir2/new\0"
+   EOF
+}
+
+test_lazy_prereq UNTRACKED_CACHE '
+   { git update-index --test-untracked-cache; ret=$?; } &&
+   test $ret -ne 1
+'
+
+test_expect_success 'setup' '
+   mkdir -p .git/hooks &&
+   : >tracked &&
+   : >modified &&
+   mkdir dir1 &&
+   : >dir1/tracked &&
+   : >dir1/modified &&
+   mkdir dir2 &&
+   : >dir2/tracked &&
+   : >dir2/modified &&
+   git -c core.fsmonitor= add . &&
+   git -c core.fsmonitor= commit -m initial &&
+   git config core.fsmonitor .git/hooks/fsmonitor-test &&
+   cat >.gitignore <<-\EOF
+   .gitignore
+   expect*
+   actual*
+   marker*
+   EOF
+'
+
+# test that the fsmonitor extension is off by default
+test_expect_success 'fsmonitor extension is off by default' '
+   test-dump-fsmonitor >actual &&
+   grep "^no fsmonitor" actual
+'
+
+# test that "update-index --fsmonitor" adds the fsmonitor extension
+test_expect_success 'update-index --fsmonitor" adds the fsmonitor extension' '
+   git update-index --fsmonitor &&
+   test-dump-fsmonitor >actual &&
+   grep "^fsmonitor last update" actual
+'
+
+# test that "update-index --no-fsmonitor" removes the fsmonitor extension
+test_expect_success 'update-index --no-fsmonitor" removes the fsmonitor 
extension' '
+   git update-index --no-fsmonitor &&
+   test-dump-fsmonitor >actual &&
+   grep "^no fsmonitor" actual
+'
+
+cat >expect expect 

[PATCH v7 09/12] split-index: disable the fsmonitor extension when running the split index test

2017-09-19 Thread Ben Peart
The split index test t1700-split-index.sh has hard coded SHA values for
the index.  Currently it supports index V4 and V3 but assumes there are
no index extensions loaded.

When manually forcing the fsmonitor extension to be turned on when
running the test suite, the SHA values no longer match which causes the
test to fail.

The potential matrix of index extensions and index versions can is quite
large so instead disable the extension before attempting to run the test.

Signed-off-by: Ben Peart 
---
 t/t1700-split-index.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 22f69a410b..af9b847761 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -6,6 +6,7 @@ test_description='split index mode tests'
 
 # We need total control of index splitting here
 sane_unset GIT_TEST_SPLIT_INDEX
+sane_unset GIT_FSMONITOR_TEST
 
 test_expect_success 'enable split index' '
git config splitIndex.maxPercentChange 100 &&
-- 
2.14.1.windows.1



[PATCH v7 01/12] bswap: add 64 bit endianness helper get_be64

2017-09-19 Thread Ben Peart
Add a new get_be64 macro to enable 64 bit endian conversions on memory
that may or may not be aligned.

Signed-off-by: Ben Peart 
---
 compat/bswap.h | 22 ++
 1 file changed, 22 insertions(+)

diff --git a/compat/bswap.h b/compat/bswap.h
index 7d063e9e40..6b22c46214 100644
--- a/compat/bswap.h
+++ b/compat/bswap.h
@@ -158,7 +158,9 @@ static inline uint64_t git_bswap64(uint64_t x)
 
 #define get_be16(p)ntohs(*(unsigned short *)(p))
 #define get_be32(p)ntohl(*(unsigned int *)(p))
+#define get_be64(p)ntohll(*(uint64_t *)(p))
 #define put_be32(p, v) do { *(unsigned int *)(p) = htonl(v); } while (0)
+#define put_be64(p, v) do { *(uint64_t *)(p) = htonll(v); } while (0)
 
 #else
 
@@ -178,6 +180,13 @@ static inline uint32_t get_be32(const void *ptr)
(uint32_t)p[3] <<  0;
 }
 
+static inline uint64_t get_be64(const void *ptr)
+{
+   const unsigned char *p = ptr;
+   return  (uint64_t)get_be32(p[0]) << 32 |
+   (uint64_t)get_be32(p[4]) <<  0;
+}
+
 static inline void put_be32(void *ptr, uint32_t value)
 {
unsigned char *p = ptr;
@@ -187,4 +196,17 @@ static inline void put_be32(void *ptr, uint32_t value)
p[3] = value >>  0;
 }
 
+static inline void put_be64(void *ptr, uint64_t value)
+{
+   unsigned char *p = ptr;
+   p[0] = value >> 56;
+   p[1] = value >> 48;
+   p[2] = value >> 40;
+   p[3] = value >> 32;
+   p[4] = value >> 24;
+   p[5] = value >> 16;
+   p[6] = value >>  8;
+   p[7] = value >>  0;
+}
+
 #endif
-- 
2.14.1.windows.1



[PATCH v7 05/12] fsmonitor: add documentation for the fsmonitor extension.

2017-09-19 Thread Ben Peart
This includes the core.fsmonitor setting, the query-fsmonitor hook,
and the fsmonitor index extension.

Signed-off-by: Ben Peart 
---
 Documentation/config.txt |  7 +
 Documentation/git-ls-files.txt   |  7 -
 Documentation/git-update-index.txt   | 45 
 Documentation/githooks.txt   | 28 
 Documentation/technical/index-format.txt | 19 ++
 5 files changed, 105 insertions(+), 1 deletion(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index dc4e3f58a2..db52645cb4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -413,6 +413,13 @@ core.protectNTFS::
8.3 "short" names.
Defaults to `true` on Windows, and `false` elsewhere.
 
+core.fsmonitor::
+   If set, the value of this variable is used as a command which
+   will identify all files that may have changed since the
+   requested date/time. This information is used to speed up git by
+   avoiding unnecessary processing of files that have not changed.
+   See the "fsmonitor-watchman" section of linkgit:githooks[5].
+
 core.trustctime::
If false, the ctime differences between the index and the
working tree are ignored; useful when the inode change time
diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index d153c17e06..3ac3e3a77d 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -9,7 +9,7 @@ git-ls-files - Show information about files in the index and 
the working tree
 SYNOPSIS
 
 [verse]
-'git ls-files' [-z] [-t] [-v]
+'git ls-files' [-z] [-t] [-v] [-f]

(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
(-[c|d|o|i|s|u|k|m])*
[--eol]
@@ -133,6 +133,11 @@ a space) at the start of each line:
that are marked as 'assume unchanged' (see
linkgit:git-update-index[1]).
 
+-f::
+   Similar to `-t`, but use lowercase letters for files
+   that are marked as 'fsmonitor valid' (see
+   linkgit:git-update-index[1]).
+
 --full-name::
When run from a subdirectory, the command usually
outputs paths relative to the current directory.  This
diff --git a/Documentation/git-update-index.txt 
b/Documentation/git-update-index.txt
index e19eba62cd..95231dbfcb 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -16,9 +16,11 @@ SYNOPSIS
 [--chmod=(+|-)x]
 [--[no-]assume-unchanged]
 [--[no-]skip-worktree]
+[--[no-]fsmonitor-valid]
 [--ignore-submodules]
 [--[no-]split-index]
 [--[no-|test-|force-]untracked-cache]
+[--[no-]fsmonitor]
 [--really-refresh] [--unresolve] [--again | -g]
 [--info-only] [--index-info]
 [-z] [--stdin] [--index-version ]
@@ -111,6 +113,12 @@ you will need to handle the situation manually.
set and unset the "skip-worktree" bit for the paths. See
section "Skip-worktree bit" below for more information.
 
+--[no-]fsmonitor-valid::
+   When one of these flags is specified, the object name recorded
+   for the paths are not updated. Instead, these options
+   set and unset the "fsmonitor valid" bit for the paths. See
+   section "File System Monitor" below for more information.
+
 -g::
 --again::
Runs 'git update-index' itself on the paths whose index
@@ -201,6 +209,15 @@ will remove the intended effect of the option.
`--untracked-cache` used to imply `--test-untracked-cache` but
this option would enable the extension unconditionally.
 
+--fsmonitor::
+--no-fsmonitor::
+   Enable or disable files system monitor feature. These options
+   take effect whatever the value of the `core.fsmonitor`
+   configuration variable (see linkgit:git-config[1]). But a warning
+   is emitted when the change goes against the configured value, as
+   the configured value will take effect next time the index is
+   read and this will remove the intended effect of the option.
+
 \--::
Do not interpret any more arguments as options.
 
@@ -447,6 +464,34 @@ command reads the index; while when 
`--[no-|force-]untracked-cache`
 are used, the untracked cache is immediately added to or removed from
 the index.
 
+File System Monitor
+---
+
+This feature is intended to speed up git operations for repos that have
+large working directories.
+
+It enables git to work together with a file system monitor (see the
+"fsmonitor-watchman" section of linkgit:githooks[5]) that can
+inform it as to what files have been modified. This enables git to avoid
+having to lstat() every file to find modified files.
+
+When used in conjunction with the untracked cache, it can further improve
+performance by avoiding the cost of scaning the entire 

[PATCH v7 11/12] fsmonitor: add a sample integration script for Watchman

2017-09-19 Thread Ben Peart
This script integrates the new fsmonitor capabilities of git with the
cross platform Watchman file watching service. To use the script:

Download and install Watchman from https://facebook.github.io/watchman/.
Rename the sample integration hook from fsmonitor-watchman.sample to
fsmonitor-watchman. Configure git to use the extension:

git config core.fsmonitor .git/hooks/fsmonitor-watchman

Optionally turn on the untracked cache for optimal performance.

Signed-off-by: Ben Peart 
Signed-off-by: Johannes Schindelin 
Signed-off-by: Ævar Arnfjörð Bjarmason 
Signed-off-by: Christian Couder 
---
 templates/hooks--fsmonitor-watchman.sample | 122 +
 1 file changed, 122 insertions(+)
 create mode 100755 templates/hooks--fsmonitor-watchman.sample

diff --git a/templates/hooks--fsmonitor-watchman.sample 
b/templates/hooks--fsmonitor-watchman.sample
new file mode 100755
index 00..870a59d237
--- /dev/null
+++ b/templates/hooks--fsmonitor-watchman.sample
@@ -0,0 +1,122 @@
+#!/usr/bin/perl
+
+use strict;
+use warnings;
+use IPC::Open2;
+
+# An example hook script to integrate Watchman
+# (https://facebook.github.io/watchman/) with git to speed up detecting
+# new and modified files.
+#
+# The hook is passed a version (currently 1) and a time in nanoseconds
+# formatted as a string and outputs to stdout all files that have been
+# modified since the given time. Paths must be relative to the root of
+# the working tree and separated by a single NUL.
+#
+# To enable this hook, rename this file to "query-watchman" and set
+# 'git config core.fsmonitor .git/hooks/query-watchman'
+#
+my ($version, $time) = @ARGV;
+
+# Check the hook interface version
+
+if ($version == 1) {
+   # convert nanoseconds to seconds
+   $time = int $time / 10;
+} else {
+   die "Unsupported query-fsmonitor hook version '$version'.\n" .
+   "Falling back to scanning...\n";
+}
+
+# Convert unix style paths to escaped Windows style paths when running
+# in Windows command prompt
+
+my $system = `uname -s`;
+$system =~ s/[\r\n]+//g;
+my $git_work_tree;
+
+if ($system =~ m/^MSYS_NT/) {
+   $git_work_tree = `cygpath -aw "\$PWD"`;
+   $git_work_tree =~ s/[\r\n]+//g;
+   $git_work_tree =~ s,\\,/,g;
+} else {
+   $git_work_tree = $ENV{'PWD'};
+}
+
+my $retry = 1;
+
+launch_watchman();
+
+sub launch_watchman {
+
+   # Set input record separator
+   local $/ = 0666;
+
+   my $pid = open2(\*CHLD_OUT, \*CHLD_IN, 'watchman -j')
+   or die "open2() failed: $!\n" .
+   "Falling back to scanning...\n";
+
+   # In the query expression below we're asking for names of files that
+   # changed since $time but were not transient (ie created after
+   # $time but no longer exist).
+   #
+   # To accomplish this, we're using the "since" generator to use the
+   # recency index to select candidate nodes and "fields" to limit the
+   # output to file names only. Then we're using the "expression" term to
+   # further constrain the results.
+   #
+   # The category of transient files that we want to ignore will have a
+   # creation clock (cclock) newer than $time_t value and will also not
+   # currently exist.
+
+   my $query = <<" END";
+   ["query", "$git_work_tree", {
+   "since": $time,
+   "fields": ["name"],
+   "expression": ["not", ["allof", ["since", $time, 
"cclock"], ["not", "exists"]]]
+   }]
+   END
+
+   print CHLD_IN $query;
+   my $response = ;
+
+   die "Watchman: command returned no output.\n" .
+   "Falling back to scanning...\n" if $response eq "";
+   die "Watchman: command returned invalid output: $response\n" .
+   "Falling back to scanning...\n" unless $response =~ /^\{/;
+
+   my $json_pkg;
+   eval {
+   require JSON::XS;
+   $json_pkg = "JSON::XS";
+   1;
+   } or do {
+   require JSON::PP;
+   $json_pkg = "JSON::PP";
+   };
+
+   my $o = $json_pkg->new->utf8->decode($response);
+
+   if ($retry > 0 and $o->{error} and $o->{error} =~ m/unable to resolve 
root .* directory (.*) is not watched/) {
+   print STDERR "Adding '$git_work_tree' to watchman's watch 
list.\n";
+   $retry--;
+   qx/watchman watch "$git_work_tree"/;
+   die "Failed to make watchman watch '$git_work_tree'.\n" .
+   "Falling back to scanning...\n" if $? != 0;
+
+   # Watchman will always return all files on the first query so
+   # return the fast "everything is dirty" flag to git and do the
+   # Watchman query just to get it over with now so we won't pay
+   # the cost in git to look up each individual file.
+   

[PATCH v7 02/12] preload-index: add override to enable testing preload-index

2017-09-19 Thread Ben Peart
Preload index doesn't run unless it has a minimum number of 1000 files.
To enable running tests with fewer files, add an environment variable
(GIT_FORCE_PRELOAD_TEST) which will override that minimum and set it to 2.

Signed-off-by: Ben Peart 
---
 preload-index.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/preload-index.c b/preload-index.c
index 70a4c80878..75564c497a 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -79,6 +79,8 @@ static void preload_index(struct index_state *index,
return;
 
threads = index->cache_nr / THREAD_COST;
+   if ((index->cache_nr > 1) && (threads < 2) && 
getenv("GIT_FORCE_PRELOAD_TEST"))
+   threads = 2;
if (threads < 2)
return;
if (threads > MAX_PARALLEL)
-- 
2.14.1.windows.1



[PATCH v7 03/12] update-index: add a new --force-write-index option

2017-09-19 Thread Ben Peart
At times, it makes sense to avoid the cost of writing out the index
when the only changes can easily be recomputed on demand. This causes
problems when trying to write test cases to verify that state as they
can't guarantee the state has been persisted to disk.

Add a new option (--force-write-index) to update-index that will
ensure the index is written out even if the cache_changed flag is not
set.

Signed-off-by: Ben Peart 
---
 builtin/update-index.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index d562f2ec69..e1ca0759d5 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -915,6 +915,7 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
struct refresh_params refresh_args = {0, _errors};
int lock_error = 0;
int split_index = -1;
+   int force_write = 0;
struct lock_file *lock_file;
struct parse_opt_ctx_t ctx;
strbuf_getline_fn getline_fn;
@@ -1006,6 +1007,8 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
N_("test if the filesystem supports untracked 
cache"), UC_TEST),
OPT_SET_INT(0, "force-untracked-cache", _cache,
N_("enable untracked cache without testing the 
filesystem"), UC_FORCE),
+   OPT_SET_INT(0, "force-write-index", _write,
+   N_("write out the index even if is not flagged as 
changed"), 1),
OPT_END()
};
 
@@ -1147,7 +1150,7 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
die("BUG: bad untracked_cache value: %d", untracked_cache);
}
 
-   if (active_cache_changed) {
+   if (active_cache_changed || force_write) {
if (newfd < 0) {
if (refresh_args.flags & REFRESH_QUIET)
exit(128);
-- 
2.14.1.windows.1



[PATCH v7 07/12] update-index: add fsmonitor support to update-index

2017-09-19 Thread Ben Peart
Add support in update-index to manually add/remove the fsmonitor
extension via --[no-]fsmonitor flags.

Add support in update-index to manually set/clear the fsmonitor
valid bit via --[no-]fsmonitor-valid flags.

Signed-off-by: Ben Peart 
---
 builtin/update-index.c | 33 -
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 6f39ee9274..41618db098 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -33,6 +33,7 @@ static int force_remove;
 static int verbose;
 static int mark_valid_only;
 static int mark_skip_worktree_only;
+static int mark_fsmonitor_only;
 #define MARK_FLAG 1
 #define UNMARK_FLAG 2
 static struct strbuf mtime_dir = STRBUF_INIT;
@@ -229,12 +230,12 @@ static int mark_ce_flags(const char *path, int flag, int 
mark)
int namelen = strlen(path);
int pos = cache_name_pos(path, namelen);
if (0 <= pos) {
+   mark_fsmonitor_invalid(_index, active_cache[pos]);
if (mark)
active_cache[pos]->ce_flags |= flag;
else
active_cache[pos]->ce_flags &= ~flag;
active_cache[pos]->ce_flags |= CE_UPDATE_IN_BASE;
-   mark_fsmonitor_invalid(_index, active_cache[pos]);
cache_tree_invalidate_path(_index, path);
active_cache_changed |= CE_ENTRY_CHANGED;
return 0;
@@ -460,6 +461,11 @@ static void update_one(const char *path)
die("Unable to mark file %s", path);
return;
}
+   if (mark_fsmonitor_only) {
+   if (mark_ce_flags(path, CE_FSMONITOR_VALID, mark_fsmonitor_only 
== MARK_FLAG))
+   die("Unable to mark file %s", path);
+   return;
+   }
 
if (force_remove) {
if (remove_file_from_cache(path))
@@ -918,6 +924,7 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
int lock_error = 0;
int split_index = -1;
int force_write = 0;
+   int fsmonitor = -1;
struct lock_file *lock_file;
struct parse_opt_ctx_t ctx;
strbuf_getline_fn getline_fn;
@@ -1011,6 +1018,14 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
N_("enable untracked cache without testing the 
filesystem"), UC_FORCE),
OPT_SET_INT(0, "force-write-index", _write,
N_("write out the index even if is not flagged as 
changed"), 1),
+   OPT_BOOL(0, "fsmonitor", ,
+   N_("enable or disable file system monitor")),
+   {OPTION_SET_INT, 0, "fsmonitor-valid", _fsmonitor_only, 
NULL,
+   N_("mark files as fsmonitor valid"),
+   PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, MARK_FLAG},
+   {OPTION_SET_INT, 0, "no-fsmonitor-valid", _fsmonitor_only, 
NULL,
+   N_("clear fsmonitor valid bit"),
+   PARSE_OPT_NOARG | PARSE_OPT_NONEG, NULL, UNMARK_FLAG},
OPT_END()
};
 
@@ -1152,6 +1167,22 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
die("BUG: bad untracked_cache value: %d", untracked_cache);
}
 
+   if (fsmonitor > 0) {
+   if (git_config_get_fsmonitor() == 0)
+   warning(_("core.fsmonitor is unset; "
+   "set it if you really want to "
+   "enable fsmonitor"));
+   add_fsmonitor(_index);
+   report(_("fsmonitor enabled"));
+   } else if (!fsmonitor) {
+   if (git_config_get_fsmonitor() == 1)
+   warning(_("core.fsmonitor is set; "
+   "remove it if you really want to "
+   "disable fsmonitor"));
+   remove_fsmonitor(_index);
+   report(_("fsmonitor disabled"));
+   }
+
if (active_cache_changed || force_write) {
if (newfd < 0) {
if (refresh_args.flags & REFRESH_QUIET)
-- 
2.14.1.windows.1



[PATCH v7 12/12] fsmonitor: add a performance test

2017-09-19 Thread Ben Peart
Add a test utility (test-drop-caches) that flushes all changes to disk
then drops file system cache on Windows, Linux, and OSX.

Add a perf test (p7519-fsmonitor.sh) for fsmonitor.

By default, the performance test will utilize the Watchman file system
monitor if it is installed.  If Watchman is not installed, it will use a
dummy integration script that does not report any new or modified files.
The dummy script has very little overhead which provides optimistic results.

The performance test will also use the untracked cache feature if it is
available as fsmonitor uses it to speed up scanning for untracked files.

There are 3 environment variables that can be used to alter the default
behavior of the performance test:

GIT_PERF_7519_UNTRACKED_CACHE: used to configure core.untrackedCache
GIT_PERF_7519_SPLIT_INDEX: used to configure core.splitIndex
GIT_PERF_7519_FSMONITOR: used to configure core.fsMonitor

The big win for using fsmonitor is the elimination of the need to scan the
working directory looking for changed and untracked files. If the file
information is all cached in RAM, the benefits are reduced.

GIT_PERF_7519_DROP_CACHE: if set, the OS caches are dropped between tests

Signed-off-by: Ben Peart 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile|   1 +
 t/helper/.gitignore |   1 +
 t/helper/test-drop-caches.c | 162 ++
 t/perf/p7519-fsmonitor.sh   | 184 
 4 files changed, 348 insertions(+)
 create mode 100644 t/helper/test-drop-caches.c
 create mode 100755 t/perf/p7519-fsmonitor.sh

diff --git a/Makefile b/Makefile
index d970cd00e9..b2653ee64f 100644
--- a/Makefile
+++ b/Makefile
@@ -638,6 +638,7 @@ TEST_PROGRAMS_NEED_X += test-ctype
 TEST_PROGRAMS_NEED_X += test-config
 TEST_PROGRAMS_NEED_X += test-date
 TEST_PROGRAMS_NEED_X += test-delta
+TEST_PROGRAMS_NEED_X += test-drop-caches
 TEST_PROGRAMS_NEED_X += test-dump-cache-tree
 TEST_PROGRAMS_NEED_X += test-dump-fsmonitor
 TEST_PROGRAMS_NEED_X += test-dump-split-index
diff --git a/t/helper/.gitignore b/t/helper/.gitignore
index 721650256e..f9328eebdd 100644
--- a/t/helper/.gitignore
+++ b/t/helper/.gitignore
@@ -3,6 +3,7 @@
 /test-config
 /test-date
 /test-delta
+/test-drop-caches
 /test-dump-cache-tree
 /test-dump-split-index
 /test-dump-untracked-cache
diff --git a/t/helper/test-drop-caches.c b/t/helper/test-drop-caches.c
new file mode 100644
index 00..4e5ca8f397
--- /dev/null
+++ b/t/helper/test-drop-caches.c
@@ -0,0 +1,162 @@
+#include "git-compat-util.h"
+
+#if defined(GIT_WINDOWS_NATIVE)
+
+static int cmd_sync(void)
+{
+   char Buffer[MAX_PATH];
+   DWORD dwRet;
+   char szVolumeAccessPath[] = ".\\X:";
+   HANDLE hVolWrite;
+   int success = 0;
+
+   dwRet = GetCurrentDirectory(MAX_PATH, Buffer);
+   if ((0 == dwRet) || (dwRet > MAX_PATH))
+   return error("Error getting current directory");
+
+   if ((Buffer[0] < 'A') || (Buffer[0] > 'Z'))
+   return error("Invalid drive letter '%c'", Buffer[0]);
+
+   szVolumeAccessPath[4] = Buffer[0];
+   hVolWrite = CreateFile(szVolumeAccessPath, GENERIC_READ | GENERIC_WRITE,
+   FILE_SHARE_READ | FILE_SHARE_WRITE, NULL, OPEN_EXISTING, 0, 
NULL);
+   if (INVALID_HANDLE_VALUE == hVolWrite)
+   return error("Unable to open volume for writing, need admin 
access");
+
+   success = FlushFileBuffers(hVolWrite);
+   if (!success)
+   error("Unable to flush volume");
+
+   CloseHandle(hVolWrite);
+
+   return !success;
+}
+
+#define STATUS_SUCCESS (0xL)
+#define STATUS_PRIVILEGE_NOT_HELD  (0xC061L)
+
+typedef enum _SYSTEM_INFORMATION_CLASS {
+   SystemMemoryListInformation = 80,
+} SYSTEM_INFORMATION_CLASS;
+
+typedef enum _SYSTEM_MEMORY_LIST_COMMAND {
+   MemoryCaptureAccessedBits,
+   MemoryCaptureAndResetAccessedBits,
+   MemoryEmptyWorkingSets,
+   MemoryFlushModifiedList,
+   MemoryPurgeStandbyList,
+   MemoryPurgeLowPriorityStandbyList,
+   MemoryCommandMax
+} SYSTEM_MEMORY_LIST_COMMAND;
+
+static BOOL GetPrivilege(HANDLE TokenHandle, LPCSTR lpName, int flags)
+{
+   BOOL bResult;
+   DWORD dwBufferLength;
+   LUID luid;
+   TOKEN_PRIVILEGES tpPreviousState;
+   TOKEN_PRIVILEGES tpNewState;
+
+   dwBufferLength = 16;
+   bResult = LookupPrivilegeValueA(0, lpName, );
+   if (bResult) {
+   tpNewState.PrivilegeCount = 1;
+   tpNewState.Privileges[0].Luid = luid;
+   tpNewState.Privileges[0].Attributes = 0;
+   bResult = AdjustTokenPrivileges(TokenHandle, 0, ,
+   (DWORD)((LPBYTE)&(tpNewState.Privileges[1]) - 
(LPBYTE)),
+   , );
+   if (bResult) {
+   tpPreviousState.PrivilegeCount = 1;
+   

[PATCH v7 00/12] Fast git status via a file system watcher

2017-09-19 Thread Ben Peart
Subject: Fast git status via a file system watcher

Thanks to everyone who provided feedback.  There are lots of minor style
changes, documentation updates and a fixed leak.

The only functional change is the addition of support to set/clear the
fsmonitor valid bit via 'git update-index --[no-]fsmonitor-valid' with
the associated documentation and tests.

Interdiff between V6 and V7:

diff --git a/Documentation/config.txt b/Documentation/config.txt
index c196007a27..db52645cb4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -418,6 +418,7 @@ core.fsmonitor::
will identify all files that may have changed since the
requested date/time. This information is used to speed up git by
avoiding unnecessary processing of files that have not changed.
+   See the "fsmonitor-watchman" section of linkgit:githooks[5].
 
 core.trustctime::
If false, the ctime differences between the index and the
diff --git a/Documentation/git-ls-files.txt b/Documentation/git-ls-files.txt
index d153c17e06..3ac3e3a77d 100644
--- a/Documentation/git-ls-files.txt
+++ b/Documentation/git-ls-files.txt
@@ -9,7 +9,7 @@ git-ls-files - Show information about files in the index and 
the working tree
 SYNOPSIS
 
 [verse]
-'git ls-files' [-z] [-t] [-v]
+'git ls-files' [-z] [-t] [-v] [-f]

(--[cached|deleted|others|ignored|stage|unmerged|killed|modified])*
(-[c|d|o|i|s|u|k|m])*
[--eol]
@@ -133,6 +133,11 @@ a space) at the start of each line:
that are marked as 'assume unchanged' (see
linkgit:git-update-index[1]).
 
+-f::
+   Similar to `-t`, but use lowercase letters for files
+   that are marked as 'fsmonitor valid' (see
+   linkgit:git-update-index[1]).
+
 --full-name::
When run from a subdirectory, the command usually
outputs paths relative to the current directory.  This
diff --git a/Documentation/git-update-index.txt 
b/Documentation/git-update-index.txt
index e19eba62cd..95231dbfcb 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -16,9 +16,11 @@ SYNOPSIS
 [--chmod=(+|-)x]
 [--[no-]assume-unchanged]
 [--[no-]skip-worktree]
+[--[no-]fsmonitor-valid]
 [--ignore-submodules]
 [--[no-]split-index]
 [--[no-|test-|force-]untracked-cache]
+[--[no-]fsmonitor]
 [--really-refresh] [--unresolve] [--again | -g]
 [--info-only] [--index-info]
 [-z] [--stdin] [--index-version ]
@@ -111,6 +113,12 @@ you will need to handle the situation manually.
set and unset the "skip-worktree" bit for the paths. See
section "Skip-worktree bit" below for more information.
 
+--[no-]fsmonitor-valid::
+   When one of these flags is specified, the object name recorded
+   for the paths are not updated. Instead, these options
+   set and unset the "fsmonitor valid" bit for the paths. See
+   section "File System Monitor" below for more information.
+
 -g::
 --again::
Runs 'git update-index' itself on the paths whose index
@@ -201,6 +209,15 @@ will remove the intended effect of the option.
`--untracked-cache` used to imply `--test-untracked-cache` but
this option would enable the extension unconditionally.
 
+--fsmonitor::
+--no-fsmonitor::
+   Enable or disable files system monitor feature. These options
+   take effect whatever the value of the `core.fsmonitor`
+   configuration variable (see linkgit:git-config[1]). But a warning
+   is emitted when the change goes against the configured value, as
+   the configured value will take effect next time the index is
+   read and this will remove the intended effect of the option.
+
 \--::
Do not interpret any more arguments as options.
 
@@ -447,6 +464,34 @@ command reads the index; while when 
`--[no-|force-]untracked-cache`
 are used, the untracked cache is immediately added to or removed from
 the index.
 
+File System Monitor
+---
+
+This feature is intended to speed up git operations for repos that have
+large working directories.
+
+It enables git to work together with a file system monitor (see the
+"fsmonitor-watchman" section of linkgit:githooks[5]) that can
+inform it as to what files have been modified. This enables git to avoid
+having to lstat() every file to find modified files.
+
+When used in conjunction with the untracked cache, it can further improve
+performance by avoiding the cost of scaning the entire working directory
+looking for new files.
+
+If you want to enable (or disable) this feature, it is easier to use
+the `core.fsmonitor` configuration variable (see
+linkgit:git-config[1]) than using the `--fsmonitor` option to
+`git update-index` in each repository, especially if you want to do so
+across all repositories you use, because you can set the configuration
+variable to 

Re: [PATCH] rev-parse: rev-parse: add --is-shallow-repository

2017-09-19 Thread Øystein Walle
> Hm, can you say more about the context?  From a certain point of view,
> it might make sense for that command to succeed instead: if the repo
> is already unshallow, then why should't "fetch --unshallow" complain
> instead of declaring victory?

A fellow in #git on Freenode was writing a script for automation and
encountered this error, and asked how to find out whether a repo was
shallow. My *first instinct* was to check if rev-parse had a flag for
it; I wouldn't have been surprised if it did.

I agree that treating it as a fatal error is a bit much in the first
place, but I also think having a way to check can be useful. I also
wonder if a lot of the stuff rev-parse is used for now should be moved
to some sort of `git misc` command, but that's a different can of worms,
so into rev-parse a new flag went.

> What does git-path mean here?  I wonder if it's a copy/paste error.
> ...
> Reviewed-by: Jonathan Nieder 

Yeah, the titles were copy-pasted without adjusting, thanks for fixing,
Jonathan! ;)

> I agree with the fixes to the test titles suggested, so I'll queue the
> patch with the fixes squashed in.  Hearing "yeah, the titles were
> copy-pasted without adjusting, thanks for fixing, Jonathan!" sent by
> =C3=98ystein would be super nice.

Sounds good. Thanks for queueing my patch. My fourth!

�sse


Re: [PATCH] Win32: simplify loading of DLL functions

2017-09-19 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> Dynamic loading of DLL functions is duplicated in several places in Git
> for Windows' source code.
>
> This patch adds a pair of macros to simplify the process: the
> DECLARE_PROC_ADDR(, , ,
> ..) macro to be used at the beginning of a
> code block, and the INIT_PROC_ADDR() macro to call before
> using the declared function. The return value of the INIT_PROC_ADDR()
> call has to be checked; If it is NULL, the function was not found in the
> specified DLL.
>
> Example:
>
> DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
>   LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);
>
> if (!INIT_PROC_ADDR(CreateHardLinkW))
> return error("Could not find CreateHardLinkW() function";
>
>   if (!CreateHardLinkW(source, target, NULL))
>   return error("could not create hardlink from %S to %S",
>source, target);
>   return 0;

nit: whitespace is a bit strange here (mixture of tabs and spaces).

Could this example go near the top of the header instead?  That way,
it's easier for people reading the header to see how to use it.

> Signed-off-by: Karsten Blees 
> Signed-off-by: Johannes Schindelin 
> ---

Just curious: what was Karsten's contribution?  (I ask mostly because
I'm interested in kinds of collaboration git metadata is failing to
capture correctly --- e.g. pair programming.)

> So far, there are no users (except in Git for Windows). Ben
> promised to make use of it in his fsmonitor patch series.
>
>  compat/win32/lazyload.h | 44 
>  1 file changed, 44 insertions(+)
>  create mode 100644 compat/win32/lazyload.h

Are any of the Git for Windows users something that could go upstream
along with this patch?  That would help illustrate what a good caller
looks like, which should help with reviewing future patches that use
this code.

> --- /dev/null
> +++ b/compat/win32/lazyload.h
> @@ -0,0 +1,44 @@
[...]
> +/* Declares a function to be loaded dynamically from a DLL. */
> +#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \
> + static struct proc_addr proc_addr_##function = \
> + { #dll, #function, NULL, 0 }; \
> + static rettype (WINAPI *function)(__VA_ARGS__)
> +
> +/*
> + * Loads a function from a DLL (once-only).
> + * Returns non-NULL function pointer on success.
> + * Returns NULL + errno == ENOSYS on failure.
> + */
> +#define INIT_PROC_ADDR(function) \
> + (function = get_proc_addr(_addr_##function))

Probably worth mentioning in the doc comment that this is not thread
safe, so a caller that wants to lazy-init in a threaded context is
responsible for doing their own locking.

> +
> +static inline void *get_proc_addr(struct proc_addr *proc)
> +{
> + /* only do this once */
> + if (!proc->initialized) {
> + HANDLE hnd;
> + proc->initialized = 1;
> + hnd = LoadLibraryExA(proc->dll, NULL,
> +  LOAD_LIBRARY_SEARCH_SYSTEM32);
> + if (hnd)
> + proc->pfunction = GetProcAddress(hnd, proc->function);
> + }
> + /* set ENOSYS if DLL or function was not found */
> + if (!proc->pfunction)
> + errno = ENOSYS;
> + return proc->pfunction;
> +}

strerror(ENOSYS) is "Function not implemented".  Cute.

Reviewed-by: Jonathan Nieder 

Thanks,
Jonathan


Re: [PATCH v2] Improve performance of git status --ignored

2017-09-19 Thread Brandon Williams
On 09/18, Jameson Miller wrote:
> Improve the performance of the directory listing logic when it wants to list
> non-empty ignored directories. In order to show non-empty ignored directories,
> the existing logic will recursively iterate through all contents of an ignored
> directory. This change introduces the optimization to stop iterating through
> the contents once it finds the first file. This can have a significant
> improvement in 'git status --ignored' performance in repositories with a large
> number of files in ignored directories.
> 
> For an example of the performance difference on an example repository with
> 196,000 files in 400 ignored directories:
> 
> | Command|  Time (s) |
> | -- | - |
> | git status |   1.2 |
> | git status --ignored (old) |   3.9 |
> | git status --ignored (new) |   1.4 |
> 
> Signed-off-by: Jameson Miller 

Everything looks good to me.  My only nit (and no need to change it for
this patch) is that the first line of the commit msg usually has the
form:

  : 

So that its easy to tell which part of the code a commit is changing.

Seriously, great patch.  Thanks!

> ---
>  dir.c | 47 +--
>  1 file changed, 41 insertions(+), 6 deletions(-)
> 
> diff --git a/dir.c b/dir.c
> index 1c55dc3..1d17b80 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -49,7 +49,7 @@ struct cached_dir {
>  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>   struct index_state *istate, const char *path, int len,
>   struct untracked_cache_dir *untracked,
> - int check_only, const struct pathspec *pathspec);
> + int check_only, int stop_at_first_file, const struct pathspec 
> *pathspec);
>  static int get_dtype(struct dirent *de, struct index_state *istate,
>const char *path, int len);
>  
> @@ -1404,8 +1404,13 @@ static enum path_treatment treat_directory(struct 
> dir_struct *dir,
>  
>   untracked = lookup_untracked(dir->untracked, untracked,
>dirname + baselen, len - baselen);
> +
> + /*
> +  * If this is an excluded directory, then we only need to check if
> +  * the directory contains any files.
> +  */
>   return read_directory_recursive(dir, istate, dirname, len,
> - untracked, 1, pathspec);
> + untracked, 1, exclude, pathspec);
>  }
>  
>  /*
> @@ -1633,7 +1638,7 @@ static enum path_treatment treat_path_fast(struct 
> dir_struct *dir,
>* with check_only set.
>*/
>   return read_directory_recursive(dir, istate, path->buf, 
> path->len,
> - cdir->ucd, 1, pathspec);
> + cdir->ucd, 1, 0, pathspec);
>   /*
>* We get path_recurse in the first run when
>* directory_exists_in_index() returns index_nonexistent. We
> @@ -1793,12 +1798,20 @@ static void close_cached_dir(struct cached_dir *cdir)
>   * Also, we ignore the name ".git" (even if it is not a directory).
>   * That likely will not change.
>   *
> + * If 'stop_at_first_file' is specified, 'path_excluded' is returned
> + * to signal that a file was found. This is the least significant value that
> + * indicates that a file was encountered that does not depend on the order of
> + * whether an untracked or exluded path was encountered first.
> + *
>   * Returns the most significant path_treatment value encountered in the scan.
> + * If 'stop_at_first_file' is specified, `path_excluded` is the most
> + * significant path_treatment value that will be returned.
>   */
> +
>  static enum path_treatment read_directory_recursive(struct dir_struct *dir,
>   struct index_state *istate, const char *base, int baselen,
>   struct untracked_cache_dir *untracked, int check_only,
> - const struct pathspec *pathspec)
> + int stop_at_first_file, const struct pathspec *pathspec)
>  {
>   struct cached_dir cdir;
>   enum path_treatment state, subdir_state, dir_state = path_none;
> @@ -1832,12 +1845,34 @@ static enum path_treatment 
> read_directory_recursive(struct dir_struct *dir,
>   subdir_state =
>   read_directory_recursive(dir, istate, path.buf,
>path.len, ud,
> -  check_only, pathspec);
> +  check_only, 
> stop_at_first_file, pathspec);
>   if (subdir_state > dir_state)
>   dir_state = subdir_state;
>   }
>  
>   if (check_only) {
> + if (stop_at_first_file) {
> + /*
> +  * If stopping at first file, then
> 

Re: [PATCH v6 09/40] Add initial external odb support

2017-09-19 Thread Jonathan Tan
I wonder if it's better to get a change like this (PATCH v6 09/40 and
any of the previous patches that this depends on) in and then build on
it rather than to review the whole patch set at a time.  This would
reduce ripple effects (of needing to change later patches in a patch set
multiple times unnecessarily) and help collaboration (in that multiple
people can write patches, since the foundation would already be laid).

The same concerns about fsck apply, but that shouldn't be a problem,
since this patch provides the same internal API as mine ("get" function
taking in a single hash, and "have" function taking in a single hash) so
it shouldn't be too difficult to adapt my fsck and gc patches [1]. (I
can do that if necessary.)

[1] 
https://public-inbox.org/git/20170915134343.3814d...@twelve2.svl.corp.google.com/

One possible issue (with committing something early) is that later work
(for example, a fast long-running process protocol) will make the
earlier work (for example, here, a simple single-shot protocol)
obsolete, while saddling us with the necessity of maintaining the
earlier one. To that end, if we want to start with the support for a
hook, a better approach might be to only code the fast long-running
process protocol, and put a wrapper script in contrib/ that can wrap a
single-shot process in a long-running process.

And another possible issue is that we design ourselves into a corner.
Thinking about the use cases that I know about (the Android use case and
the Microsoft GVFS use case), I don't think we are doing that - for
Android, this means that large blob metadata needs to be part of the
design (and this patch series does provide for that), and for Microsoft
GVFS, "get" is relatively cheap, so a configuration option to not invoke
"have" first when loading a missing object might be sufficient.

As for the design itself (including fetch and clone), it differs from my
patches (linked above as [1]) in that mine is self-contained (requiring
only an updated Git server and Git client) whereas this, as far as I can
tell, requires an external process and some measure of coordination
between the administrator of the server and the client user (for
example, the client must have the same ODB mechanism as the server, if
not, the server might omit certain blobs that the client does not know
how to fetch).

And I think that my design can be extended to support a use case in
which, for example, blobs corresponding to a certain type of filename
(defined by a glob like in gitattributes) can be excluded during
fetch/clone, much like --blob-max-bytes, and they can be fetched either
through the built-in mechanism or through a custom hook.

For those reasons, I still lean towards my design, but if we do want to
go with this design, here are my comments about this patch...

First of all:
 - You'll probably need to add a repository extension.
 - I get compile errors when I "git am" these onto master. I think
   '#include "config.h"' is needed in some places.

On Sat, 16 Sep 2017 10:07:00 +0200
Christian Couder  wrote:

> The external-odb.{c,h} files contains the functions that are
> called by the rest of Git from "sha1_file.c".
> 
> The odb-helper.{c,h} files contains the functions to
> actually implement communication with the external scripts or
> processes that will manage external git objects.
> 
> For now only script mode is supported, and only the 'have' and
> 'get_git_obj' instructions are supported.

This "have", as I see from this commit, is more like a "list" command in
that it lists all hashes that it knows about, and does not check if a
given hash exists.

> +static struct odb_helper *helpers;
> +static struct odb_helper **helpers_tail = 

This could be done with the helpers in list.h instead.

> +int external_odb_get_object(const unsigned char *sha1)
> +{
> + struct odb_helper *o;
> + const char *path;
> +
> + if (!external_odb_has_object(sha1))
> + return -1;
> +
> + path = sha1_file_name_alt(external_odb_root(), sha1);

If the purpose of making these functions global in the previous patch is
just for temporary names, I don't think it's necessary for them to be
global. Just concatenate the hex SHA1 to external_odb_root()?

>  /* Returns 1 if we have successfully freshened the file, 0 otherwise. */
> @@ -667,7 +684,7 @@ static int check_and_freshen_nonlocal(const unsigned char 
> *sha1, int freshen)
>   if (check_and_freshen_file(path, freshen))
>   return 1;
>   }
> - return 0;
> + return external_odb_has_object(sha1);
>  }
>  
>  static int check_and_freshen(const unsigned char *sha1, int freshen)
> @@ -824,6 +841,9 @@ static int stat_sha1_file(const unsigned char *sha1, 
> struct stat *st,
>   return 0;
>   }
>  
> + if (!external_odb_get_object(sha1) && !lstat(*path, st))
> + return 0;
> +
>   return -1;
>  }
>  
> @@ -859,7 +879,14 @@ static int 

Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-19 Thread Nicolas Morey-Chaisemartin


Le 19/09/2017 à 17:43, Johannes Schindelin a écrit :
>
> C'mon, don't *try* to misunderstand me.
>
> Of course there need to be updates as to the state of patch series.
>
> It's just that mails only go *so* far when you need to connect and
> aggregate information. You need the connection between the original patch
> series, the latest unaddressed reviews, links to the branches, history of
> the patch series' iterations, and ideally links to the repositories of the
> contributors with *their* branch names. And then, of course, your verdict
> as to the state of the patch series and your expectation what happens
> next.
>
> To relate that, you are using a plain text format that is not well defined
> and not structured, and certainly not machine-readable, for information
> that is crucial for project management.
>
> What you need is a tool to aggregate this information, to help working
> with it, to manage the project, and to be updated automatically. And to
> publish this information continuously, without costing you extra effort.
>
> I understand that you started before GitHub existed, and before GitHub was
> an option, the script-generated What's cooking mail was the best you could
> do.
>
> Ciao,
> Dscho
Hi,

Would something like patchwork fix your need ?
They now seems to have a REST API, which means it could probably be pluggeg into
Junio's scripts and work seemlessly for him (or any other happy ML user) while 
other people can browse
through the web interface.

I used to work with this one:
http://patches.opendataplane.org/project/lng-odp/list/
It is not the best  example as the patch status are pretty much never updated 
on this one.

But it would solve most of the points you raised, while keeping fully 
compatible with the way people actually work (including Junio).

Nicolas




Re: RFC v3: Another proposed hash function transition plan

2017-09-19 Thread Gilles Van Assche
Hi Johannes,

Thanks for your feedback.

On 19/09/17 00:16, Johannes Schindelin wrote:
>>> SHA-256 got much more cryptanalysis than SHA3-256 […]. 
>>
>> I do not think this is true. 
>
> Please read what I said again: SHA-256 got much more cryptanalysis
> than SHA3-256.

Indeed. What I meant is that SHA3-256 got at least as much cryptanalysis
as SHA-256. :-)

> I never said that SHA3-256 got little cryptanalysis. Personally, I
> think that SHA3-256 got a ton more cryptanalysis than SHA-1, and that
> SHA-256 *still* got more cryptanalysis. But my opinion does not count,
> really. However, the two experts I pestered with questions over
> questions left me with that strong impression, and their opinion does
> count.

OK, I respect your opinion and that of your two experts. Yet, the "much
more" part of your statement, in particular, is something that may
require a bit more explanations.

Kind regards,
Gilles



Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-19 Thread Jonathan Nieder
Johannes Schindelin wrote:

> What you need is a tool to aggregate this information, to help working
> with it, to manage the project, and to be updated automatically. And to
> publish this information continuously, without costing you extra effort.
>
> I understand that you started before GitHub existed, and before GitHub was
> an option, the script-generated What's cooking mail was the best you could
> do.

On second reading, I think you're talking about GitHub's code review
("pull request") feature, not a bug tracker.

There's been some active work (some that you've been involved in, I
believe) on getting information from a GitHub pull request to the
mailing list.  One possibility would be to get information to also go
in the other direction, so people have information about what happened
to their patch in one place.

I can also see why you are directing your attention to the maintainer
for this one, since if the entire project adopts the GitHub Pull
Request workflow, then the complexity and other flaws of such a
two-way sync could be avoided.

Unfortunately the maintainer is not the only person you'd need to
convince.  You'd also need to convince patch authors and reviewers to
move to the new workflow.  There are likely some potential
contributors that this would bring in, that would like to get involved
in the Git project but had been deterred by e.g. the mailing list's
aggressive filtering of any email with an HTML part.  Meanwhile it
would also be a large adjustment for existing contributors, so it's
not risk free.

I personally like how Bazel[1] handles this.  They have two ways to
contribute:

 A. People used to GitHub can use a pull request like they are
accustomed to.

 B. People preferring a more structured code review can use Gerrit.

Having only two ways to contribute means that the code review
information is still easy to archive and search, just like our mailing
list archive.

(Actually, an example I like even more is Akaros[2], which also
appears to accept patch contributions by email.)

Adding new ways for a contributor to submit a patch would mean that we
can experiment with new workflows without getting in the way of people
using an existing one.

Thoughts?
Jonathan

[1] https://bazel.build/contributing.html
[2] https://groups.google.com/forum/#!forum/akaros


Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-19 Thread Jonathan Nieder
Hi,

Johannes Schindelin wrote:

> To relate that, you are using a plain text format that is not well defined
> and not structured, and certainly not machine-readable, for information
> that is crucial for project management.
>
> What you need is a tool to aggregate this information, to help working
> with it, to manage the project, and to be updated automatically. And to
> publish this information continuously, without costing you extra effort.
>
> I understand that you started before GitHub existed, and before GitHub was
> an option, the script-generated What's cooking mail was the best you could
> do.

I think you are subtly (but not directly, for some reason?) advocating
moving project management for the Git project to GitHub Issues.

For what it's worth:

 1. I would be happy to see Git adopt a bug tracker.  As we've
discussed on the list before, I suspect the only way that this is
going to happen is if some contributors start using a bug tracker
and keep up with bugs there, without requiring everyone to use it.
That is how the Linux Kernel project started using
bugzilla.kernel.org, for example.

 2. GitHub Issues is one of my least favorite bug trackers, for what
it's worth.  If some sub-project of Git chooses to use it, then
that's great and I won't get in their way.  I'm just providing
this single data point that approximately any other tracker
(Bugzilla, JIRA, debbugs, patchwork) is something I'd be more
likely to use.

 3. This advice might feel hopeless, because if the maintainer is not
involved in the initial pilot, then how does the bug tracker get
notified when a patch has been accepted?  But fortunately this is
a problem other people have solved: e.g. most bug trackers have an
API that can be used to automatically notify the bug when a patch
with a certain subject line appears on a certain branch.

Thanks and hope that helps,
Jonathan


Re: What's cooking in git.git (Aug 2017, #05; Tue, 22)

2017-09-19 Thread Johannes Schindelin
Hi Junio,

On Tue, 19 Sep 2017, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> Do you have a concrete suggestion to make these individual entries
> >> more helpful for people who may want go back to the original thread
> >> in doing so?  In-reply-to: or References: fields of the "What's
> >> cooking" report would not help.  I often have the message IDs that
> >> made/helped me make these individual comments in the description;
> >> they alone would not react to mouse clicks, though.
> >
> > Oh gawd, not even more stuff piled onto the mail format. Please stop.
> > ...
> > It probably tries to serve too many purposes at the same time, and
> > thereby none.
> 
> Well, this was started as my attempt to give a public service that shows
> a summary of what is happening in the entire integration tree, as there
> was nothing like that before (and going to github.com and looking at
> 'pu' branch would not give you an easy overview).  As many people
> contribute many topics to the project, complaining that it talks about
> too many topics would not get you anywhere.
> 
> If you find "What's cooking" report not serving your needs, and if no
> one finds it not serving his or her needs, then I can stop sending these
> out, of course, but I am not getting the impression that we are at that
> point, at least not yet.

C'mon, don't *try* to misunderstand me.

Of course there need to be updates as to the state of patch series.

It's just that mails only go *so* far when you need to connect and
aggregate information. You need the connection between the original patch
series, the latest unaddressed reviews, links to the branches, history of
the patch series' iterations, and ideally links to the repositories of the
contributors with *their* branch names. And then, of course, your verdict
as to the state of the patch series and your expectation what happens
next.

To relate that, you are using a plain text format that is not well defined
and not structured, and certainly not machine-readable, for information
that is crucial for project management.

What you need is a tool to aggregate this information, to help working
with it, to manage the project, and to be updated automatically. And to
publish this information continuously, without costing you extra effort.

I understand that you started before GitHub existed, and before GitHub was
an option, the script-generated What's cooking mail was the best you could
do.

Ciao,
Dscho


RE: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension

2017-09-19 Thread Ben Peart
> -Original Message-
> From: Torsten Bögershausen [mailto:tbo...@web.de]
> Sent: Tuesday, September 19, 2017 10:16 AM
> To: Ben Peart 
> Cc: Ben Peart ; Junio C Hamano
> ; david.tur...@twosigma.com; ava...@gmail.com;
> christian.cou...@gmail.com; git@vger.kernel.org;
> johannes.schinde...@gmx.de; pclo...@gmail.com; p...@peff.net
> Subject: Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index
> extension
> 
> >
> > Should I just make the variable type itself uintmax_t and then just
> > skip the cast altogether? I went with uint64_t because that is what
> > getnanotime returned.
> >
> 
> That is a bit of taste question (or answer)
> 
> Typically you declare the variables in the type you need, and this is 
> uint64_t.
> 
> Let's step back a bit:
> To print e.g a variable of type uint32_t, you use  PRIu32 in the format 
> string,
> like this:
> 
> fprintf(stderr, "Total %"PRIu32" (delta %"PRIu32"),",
> 
> In theory (it is in the later specs, and it exists on many platforms), there 
> is a
> PRIu64 as well.
> 
> We don't seem to use it in Git, probably because uintmax_t is (more)
> portable and understood by all platforms which support Git.
> (And beside that, on most platforms uintmax_t is 64 bit).
> 
> So my suggestion would be to keep uint64_t and cast the variable into
> uintmax_t whenever it is printed.
> 

Great!  That is the way I have it.


RE: [PATCH v6 12/12] fsmonitor: add a performance test

2017-09-19 Thread Johannes Schindelin
Hi Ben,

On Mon, 18 Sep 2017, Ben Peart wrote:

> > From: Johannes Schindelin [mailto:johannes.schinde...@gmx.de]
> > On Fri, 15 Sep 2017, Ben Peart wrote:
> > 
> > > + DWORD(WINAPI *NtSetSystemInformation)(INT, PVOID, ULONG) =
> > > + (DWORD(WINAPI *)(INT, PVOID,
> > ULONG))GetProcAddress(ntdll, "NtSetSystemInformation");
> > > + if (!NtSetSystemInformation)
> > > + return error("Can't get function addresses, wrong Windows
> > > +version?");
> > 
> > It turns out that we have seen this plenty of times in Git for Windows'
> > fork, so much so that we came up with a nice helper to make this all a bit
> > more robust and a bit more obvious, too: the DECLARE_PROC_ADDR and
> > INIT_PROC_ADDR helpers in compat/win32/lazyload.h.
> > 
> > Maybe this would be the perfect excuse to integrate this patch into upstream
> > Git? 
> 
> This patch is pretty hefty already.  How about you push this capability
> upstream and I take advantage of it in a later patch. :)

Deal:
https://public-inbox.org/git/f5a3add27206df3e7f39efeac8a3c3b47f2b79f2.1505834586.git.johannes.schinde...@gmx.de

Ciao,
Johannes


[PATCH] Win32: simplify loading of DLL functions

2017-09-19 Thread Johannes Schindelin
Dynamic loading of DLL functions is duplicated in several places in Git
for Windows' source code.

This patch adds a pair of macros to simplify the process: the
DECLARE_PROC_ADDR(, , ,
..) macro to be used at the beginning of a
code block, and the INIT_PROC_ADDR() macro to call before
using the declared function. The return value of the INIT_PROC_ADDR()
call has to be checked; If it is NULL, the function was not found in the
specified DLL.

Example:

DECLARE_PROC_ADDR(kernel32.dll, BOOL, CreateHardLinkW,
  LPCWSTR, LPCWSTR, LPSECURITY_ATTRIBUTES);

if (!INIT_PROC_ADDR(CreateHardLinkW))
return error("Could not find CreateHardLinkW() function";

if (!CreateHardLinkW(source, target, NULL))
return error("could not create hardlink from %S to %S",
 source, target);
return 0;

Signed-off-by: Karsten Blees 
Signed-off-by: Johannes Schindelin 
---

So far, there are no users (except in Git for Windows). Ben
promised to make use of it in his fsmonitor patch series.

Published-As: https://github.com/dscho/git/releases/tag/lazyload-v1
Fetch-It-Via: git fetch https://github.com/dscho/git lazyload-v1
 compat/win32/lazyload.h | 44 
 1 file changed, 44 insertions(+)
 create mode 100644 compat/win32/lazyload.h

diff --git a/compat/win32/lazyload.h b/compat/win32/lazyload.h
new file mode 100644
index 000..91c10dad2fb
--- /dev/null
+++ b/compat/win32/lazyload.h
@@ -0,0 +1,44 @@
+#ifndef LAZYLOAD_H
+#define LAZYLOAD_H
+
+/* simplify loading of DLL functions */
+
+struct proc_addr {
+   const char *const dll;
+   const char *const function;
+   FARPROC pfunction;
+   unsigned initialized : 1;
+};
+
+/* Declares a function to be loaded dynamically from a DLL. */
+#define DECLARE_PROC_ADDR(dll, rettype, function, ...) \
+   static struct proc_addr proc_addr_##function = \
+   { #dll, #function, NULL, 0 }; \
+   static rettype (WINAPI *function)(__VA_ARGS__)
+
+/*
+ * Loads a function from a DLL (once-only).
+ * Returns non-NULL function pointer on success.
+ * Returns NULL + errno == ENOSYS on failure.
+ */
+#define INIT_PROC_ADDR(function) \
+   (function = get_proc_addr(_addr_##function))
+
+static inline void *get_proc_addr(struct proc_addr *proc)
+{
+   /* only do this once */
+   if (!proc->initialized) {
+   HANDLE hnd;
+   proc->initialized = 1;
+   hnd = LoadLibraryExA(proc->dll, NULL,
+LOAD_LIBRARY_SEARCH_SYSTEM32);
+   if (hnd)
+   proc->pfunction = GetProcAddress(hnd, proc->function);
+   }
+   /* set ENOSYS if DLL or function was not found */
+   if (!proc->pfunction)
+   errno = ENOSYS;
+   return proc->pfunction;
+}
+
+#endif

base-commit: 9ddaf86b06a8078420f59aec8cab6daa93cf1a91
-- 
2.14.1.windows.1.1020.g03faabc8bc8.dirty


Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread Kaartic Sivaraam

On Saturday 16 September 2017 09:36 AM, Michael Haggerty wrote:

Does the following alternate fix work?  I think I prefer it because
it doesn't require introducing a new global. [...]
  #define for_each_string_list_item(item,list) \
-   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
+   for (item = (list)->items; \
+(list)->items && item < (list)->items + (list)->nr; \
+++item)

This is the possibility that I was referring to as "add[ing] overhead to
each iteration of the loop". I'd rather not add an extra test-and-branch
to every iteration of a loop in which `list->items` is *not* NULL, which
your solution appears to do. Or are compilers routinely able to optimize
the check out?


It seems at least 'gcc' is able to optimize this out even with a -O1
and 'clang' optimizes this out with a -O2. Taking a sneak peek at
the 'Makefile' shows that our default is -O2.

For a proof, see https://godbolt.org/g/CPt73L

---
Kaartic


Re: [PATCH v6 08/12] fsmonitor: add a test tool to dump the index extension

2017-09-19 Thread Torsten Bögershausen
> 
> Should I just make the variable type itself uintmax_t and then just skip
> the cast altogether? I went with uint64_t because that is what
> getnanotime returned.
> 

That is a bit of taste question (or answer)

Typically you declare the variables in the type you need,
and this is uint64_t.

Let's step back a bit:
To print e.g a variable of type uint32_t, you use  PRIu32 in the format
string, like this:

fprintf(stderr, "Total %"PRIu32" (delta %"PRIu32"),",

In theory (it is in the later specs, and it exists on many platforms),
there is a PRIu64 as well.

We don't seem to use it in Git, probably because uintmax_t is (more)
portable and understood by all platforms which support Git.
(And beside that, on most platforms uintmax_t is 64 bit).

So my suggestion would be to keep uint64_t and cast the variable into uintmax_t
whenever it is printed.




Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread SZEDER Gábor
On Tue, Sep 19, 2017 at 3:38 PM, SZEDER Gábor  wrote:
> A bit "futuristic" option along these lines could be something like
> this, using a scoped loop variable in the outer loop to ensure that
> it's executed at most once:
>
>   #define for_each_string_list_item(item,list) \
>   for (int f_e_s_l_i = 1; (list)->items && f_e_s_l_i; f_e_s_l_i = 0) \
>   for (item = (list)->items; item < (list)->items + (list)->nr; 
> ++item)
>
> The high number of underscores are an attempt to make reasonably sure
> that the macro's loop variable doesn't shadow any variable in its
> callers or isn't being shadowed in the loop body, which might(?)
> trigger warnings in some compilers.

Well, and a poor attempt at that, because, of course, the loop
variable would still be shadowed in nested for_each_string_list_item
loops...  And our codebase has these loops nested in
entry.c:finish_delayed_checkout().

OTOH, we don't seem to care too much about shadowed variables, since
building with -Wshadow gives 91 warnings in current master...


Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread SZEDER Gábor
On Tue, Sep 19, 2017 at 8:51 AM, Michael Haggerty  wrote:
> On 09/19/2017 02:08 AM, Stefan Beller wrote:
>>> I am hoping that this last one is not allowed and we can use the
>>> "same condition is checked every time we loop" version that hides
>>> the uglyness inside the macro.
>>
>> By which you are referring to Jonathans solution posted.
>> Maybe we can combine the two solutions (checking for thelist
>> to not be NULL once, by Jonathan) and using an outer structure
>> (SZEDERs solution) by replacing the condition by a for loop,
>> roughly (untested):
>>
>> #define for_each_string_list_item(item,list) \
>> -   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
>> +for (; list; list = NULL)
>> +for (item = (list)->items; item < (list)->items + (list)->nr; 
>> ++item)
>>
>> as that would not mingle with any dangling else clause.
>> It is also just one statement, such that
>>
>> if (bla)
>>   for_each_string_list_item {
>> baz(item);
>>   }
>> else
>>   foo;
>>
>> still works.
>>
>> Are there downsides to this combined approach?
>
> On the plus side, it's pleasantly devious; I wouldn't have thought of
> using a `for` loop for the initial test. But it doesn't work as written,
> because (1) we don't need to guard against `list` being NULL, but rather
> `list->items`; and (2) we don't have the liberty to set `list = NULL`
> (or `list->items = NULL`, because `list` is owned by the caller and we
> shouldn't modify it.
>
> The following is a bit closer:
>
> #define for_each_string_list_item(item,list) \
> for (item = (list)->items; item; item = NULL) \
> for (; item < (list)->items + (list)->nr; ++item)
>
> But I think that also fails, because a callsite that does
>
> for_each_string_list_item(myitem, mylist)
> if (myitem.util)
> break;
>
> would expect that `myitem` is still set after breaking out of the loop,
> whereas the outer `for` loop would reset it to NULL.
>
> If `break` were an expression we could do something like
>
> #define for_each_string_list_item(item,list) \
> for (item = (list)->items; item; break) \
> for (; item < (list)->items + (list)->nr; ++item)

A bit "futuristic" option along these lines could be something like
this, using a scoped loop variable in the outer loop to ensure that
it's executed at most once:

  #define for_each_string_list_item(item,list) \
  for (int f_e_s_l_i = 1; (list)->items && f_e_s_l_i; f_e_s_l_i = 0) \
  for (item = (list)->items; item < (list)->items + (list)->nr; ++item)

The high number of underscores are an attempt to make reasonably sure
that the macro's loop variable doesn't shadow any variable in its
callers or isn't being shadowed in the loop body, which might(?)
trigger warnings in some compilers.

Alas we don't allow scoping the loop variable in for loops, and even a
test balloon patch didn't make it into git.git.

  https://public-inbox.org/git/20170719181956.15845-1-sbel...@google.com/T/#u


> So I think we're still left with the suggestions of Jonathan or Gábor.
> Or the bigger change of initializing `string_list::items` to point at an
> empty sentinal array (similar to `strbuf_slopbuf`) rather than NULL.
> Personally, I think that Jonathan's approach makes the most sense,
> unless somebody wants to jump in an implement a `string_list_slopbuf`.
>
> By the way, I wonder if any open-coded loops over `string_lists` make
> the same mistake as the macro?


[PATCH] doc: camelCase the config variables to improve readability

2017-09-19 Thread Kaartic Sivaraam
The config variable used weren't readable as they were in the
crude form of how git stores/uses it's config variables.

Improve it's readability by replacing them with camelCased versions
of config variables as it doesn't have any impact on it's usage.

Signed-off-by: Kaartic Sivaraam 
---
 Documentation/git-branch.txt | 4 ++--
 Documentation/git-tag.txt| 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt
index e292737b9c5ab..58f1e5c9c74e1 100644
--- a/Documentation/git-branch.txt
+++ b/Documentation/git-branch.txt
@@ -92,10 +92,10 @@ OPTIONS
all changes made to the branch ref, enabling use of date
based sha1 expressions such as "@\{yesterday}".
Note that in non-bare repositories, reflogs are usually
-   enabled by default by the `core.logallrefupdates` config option.
+   enabled by default by the `core.logAllRefUpdates` config option.
The negated form `--no-create-reflog` only overrides an earlier
`--create-reflog`, but currently does not negate the setting of
-   `core.logallrefupdates`.
+   `core.logAllRefUpdates`.
 
 -f::
 --force::
diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt
index 543fb425ee7c1..95e9f391d88fc 100644
--- a/Documentation/git-tag.txt
+++ b/Documentation/git-tag.txt
@@ -174,7 +174,7 @@ This option is only applicable when listing tags without 
annotation lines.
`core.logAllRefUpdates` in linkgit:git-config[1].
The negated form `--no-create-reflog` only overrides an earlier
`--create-reflog`, but currently does not negate the setting of
-   `core.logallrefupdates`.
+   `core.logAllRefUpdates`.
 
 ::
The name of the tag to create, delete, or describe.

--
https://github.com/git/git/pull/407


[PATCH] t4014: strengthen search patterns

2017-09-19 Thread Kaartic Sivaraam
The regex patterns for some failing test cases were a bit loose
giving way for a few incorrect outputs being accepted as correct
outputs.

To avoid such incorrect outputs from being flagged as correct ones
use fixed string matches when possible and strengthen regex when
it's not.

Signed-off-by: Kaartic Sivaraam 
---
 t/t4014-format-patch.sh | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/t/t4014-format-patch.sh b/t/t4014-format-patch.sh
index 482112ca339f0..7dff7996c9e1f 100755
--- a/t/t4014-format-patch.sh
+++ b/t/t4014-format-patch.sh
@@ -163,7 +163,7 @@ test_expect_failure 'additional command line cc (rfc822)' '
git config --replace-all format.headers "Cc: R E Cipient 
" &&
git format-patch --cc="S. E. Cipient " --stdout 
master..side | sed -e "/^\$/q" >patch5 &&
grep "^Cc: R E Cipient ,\$" patch5 &&
-   grep "^ *\"S. E. Cipient\" \$" patch5
+   grep "^ *\"S\. E\. Cipient\" \$" patch5
 '
 
 test_expect_success 'command line headers' '
@@ -191,13 +191,13 @@ test_expect_success 'command line To: header (ascii)' '
 test_expect_failure 'command line To: header (rfc822)' '
 
git format-patch --to="R. E. Cipient " --stdout 
master..side | sed -e "/^\$/q" >patch8 &&
-   grep "^To: \"R. E. Cipient\" \$" patch8
+   grep -F "To: \"R. E. Cipient\" " patch8
 '
 
 test_expect_failure 'command line To: header (rfc2047)' '
 
git format-patch --to="R Ä Cipient " --stdout 
master..side | sed -e "/^\$/q" >patch8 &&
-   grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" 
patch8
+   grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" 
patch8
 '
 
 test_expect_success 'configuration To: header (ascii)' '
@@ -211,14 +211,14 @@ test_expect_failure 'configuration To: header (rfc822)' '
 
git config format.to "R. E. Cipient " &&
git format-patch --stdout master..side | sed -e "/^\$/q" >patch9 &&
-   grep "^To: \"R. E. Cipient\" \$" patch9
+   grep -F "To: \"R. E. Cipient\" " patch9
 '
 
 test_expect_failure 'configuration To: header (rfc2047)' '
 
git config format.to "R Ä Cipient " &&
git format-patch --stdout master..side | sed -e "/^\$/q" >patch9 &&
-   grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" 
patch9
+   grep "^To: =?UTF-8?q?R=20=C3=84=20Cipient?= \$" 
patch9
 '
 
 # check_patch : Verify that  looks like a half-sane

--
https://github.com/git/git/pull/406


Re: [PATCH v2 13/21] packed_ref_cache: keep the `packed-refs` file mmapped if possible

2017-09-19 Thread Michael Haggerty
On 09/19/2017 08:22 AM, Michael Haggerty wrote:
> Keep a copy of the `packed-refs` file contents in memory for as long
> as a `packed_ref_cache` object is in use:
> 
> * If the system allows it, keep the `packed-refs` file mmapped.
> 
> * If not (either because the system doesn't support `mmap()` at all,
>   or because a file that is currently mmapped cannot be replaced via
>   `rename()`), then make a copy of the file's contents in
>   heap-allocated space, and keep that around instead.
> 
> We base the choice of behavior on a new build-time switch,
> `MMAP_PREVENTS_DELETE`. By default, this switch is set for Windows
> variants.
> 
> This whole change is still pointless, because we only read the
> `packed-refs` file contents immediately after instantiating the
> `packed_ref_cache`. But that will soon change.
> 
> Signed-off-by: Michael Haggerty 
> ---
>  Makefile  |  10 +++
>  config.mak.uname  |   3 +
>  refs/packed-backend.c | 184 
> ++
>  3 files changed, 155 insertions(+), 42 deletions(-)
> 
> diff --git a/Makefile b/Makefile
> index f2bb7f2f63..7a49f99c4f 100644
> [...]
> diff --git a/refs/packed-backend.c b/refs/packed-backend.c
> index 0fe41a7203..4981516f1e 100644
> --- a/refs/packed-backend.c
> +++ b/refs/packed-backend.c
> [...]
> @@ -304,6 +371,61 @@ struct ref_iterator *mmapped_ref_iterator_begin(
>   return ref_iterator;
>  }
>  
> +/*
> + * Depending on `mmap_strategy`, either mmap or read the contents of
> + * the `packed-refs` file into the `packed_refs` instance. Return 1 if
> + * the file existed and was read, or 0 if the file was absent. Die on
> + * errors.
> + */
> +static int load_contents(struct packed_ref_cache *packed_refs)
> +{
> + int fd;
> + struct stat st;
> + size_t size, bytes_read;

Coverity helpfully pointed out that `bytes_read` has to be signed:
`ssize_t`. I'll fix that in the next round after waiting for other comments.

> [...]

Michael


I Want To Know Your Opinion In Doing This Project.

2017-09-19 Thread MRS.LOVETH KONNIA
My Dear,

>From Mrs. Loveth Konnia.

I  belief that you can help in setting up a charity foundation for the
benefit of mankind, I wish to establish a charity foundation to help
the poor in your country under your care, Can you help to build this
project in your country? Together We can make the world a better place
when we help one another,

Waiting to read from you today and know your opinion in doing this project,

Remain Blessed,

Mrs. Loveth


Re: [RFC SAMPLE] builtin/branch: give more useful error messages when renaming

2017-09-19 Thread Kaartic Sivaraam

Sorry, the email client seems to have crapped up the formatting. In case
it looks difficult to follow, let me know so that I could send a better 
version.


---
Kaartic


Re: [PATCH] doc: update information about windows build

2017-09-19 Thread Kaartic Sivaraam

On Monday 18 September 2017 12:32 AM, Phillip Wood wrote:

May be the Windows build exit with failure on other repos rather than
saying it passes?


I'm not quite sure what you're asking. If the tests aren't run it 
needs to look like a pass or everyone's branches would be marked as 
failing on github and more importantly it wouldn't be clear when there 
was a real failure on linux/macos. If the tests are run then it will 
pass/fail depending on the test result.

I thought failing wasn't a big issue because the macOS build failed most
of the time due to time-outs, at least for me. Most of the failed builds in
my build history were just the macOS builds timing out. That was some
time ago. They seem to be passing quite often recently.

---
Kaartic


[RFC SAMPLE] builtin/branch: give more useful error messages when renaming

2017-09-19 Thread Kaartic Sivaraam
The patch series results in a change in output as specified below. Only 
few cases
have been shown here to keep it short. The output for other cases are 
similar.


$ git branch
* master
  foo
  bar

Before patch,

$ # Trying to rename non-existent branch
$ git branch -m hypothet no_such_branch
error: refname refs/heads/hypothet not found
fatal: Branch rename failed


$ # Trying to rename non-existent branch into existing one
$ git branch -m hypothet master
error: refname refs/heads/hypothet not found
fatal: Branch rename failed


$ # Trying to force update current branch
$ git branch -M foo master
fatal: Cannot force update the current branch.

$ # Trying to force rename an in-existent branch with an invalid name
$ git branch -M hypothet ?123
fatal: '?123' is not a valid branch name.


After patch,

$ # Trying to rename non-existent branch
$ git branch -m hypothet no_such_branch
fatal: branch 'hypothet' doesn't exist

$ # Trying to rename non-existent branch into existing one
$ git branch -m hypothet master
fatal: branch 'hypothet' doesn't exist, and branch 'master' already exists


$ # Trying to force update current branch
$ git branch -M foo master
fatal: cannot force update the current branch

$ # Trying to force rename an in-existent branch with an invalid name
$ git branch -M hypothet ?123
fatal: branch 'hypothet' doesn't exist, and branch name '?123' is invalid



Re: My first contribution

2017-09-19 Thread Christian Couder
Hi Olga,

On Tue, Sep 19, 2017 at 8:44 AM, Оля Тележная  wrote:
> Hello Jeff,
> I want to try myself into new role of a Git contributor.

Welcome to Git!

> All of the projects sound super interesting for me and I am ready to take
> part in all of them, but the project "Unifying Git's format languages" will
> be super useful for me myself, so my dream is to try solve that task first.

Great that you found a project you like!

> I need help to choose my first task in the project, because first steps are
> usually the most frightening.
> I am familiar enough with C and there's no problem to read any docs, but
> please help me choosing the first task. If you have any guidance like "how
> to start" or "how to choose tasks", please send that links also.

You can try to work first on the project you are interested in or you
can find and work on a small project first.

One way to find a small project is to see what people are
talking about on the mailing list[1]. Quite often there are bugs
that can be fixed, and more experienced people may help sketch out
a solution. You can also find small items good for newcomers marked
with the "leftoverbits" tag, which you can search for in the
mailing list[2].

We don't have a written guide specifically downloading git, getting it
built, running the tests, and so forth, but you might start with:

   git clone https://github.com/git/git

and reading the INSTALL file.

As the mailing list can be a bit intimidating at first, we don't mind
working with you one-on-one a bit during the application period.

About the mailing list, please add [Outreachy] in the subject to make
it clear that you are applying for the Outreachy program.

While at it on the Git mailing list we are used to replying to parts
of message after those parts. We don't usually reply before the
message. In other words we are used to "inline replying", not "top
posting" (see https://en.wikipedia.org/wiki/Posting_style). Please try
to use the same posting style as us, it will help keep discussions
more standard and easier to understand.

Also we feel free to remove parts of the messages we are quoting that
are not relevant anymore.

For getting in touch with us, direct email is our preferred method. We
could also meet on IRC if you like, but it looks like our timezones
might not overlap much. Still, we can probably set up a time.

Let us know if you have any questions at all. This is our first time
mentoring for Outreachy, so we've surely forgotten to mention some
obvious thing you would want to know. :)

Thanks,
Christian.

[1] There are details of the list at https://git-scm.com/community,
but you may want to just browse the archive at:

  https://public-inbox.org/git

[2] https://public-inbox.org/git/?q=leftoverbits


[RFC PATCH 4/5] branch: introduce dont_fail parameter for update validation

2017-09-19 Thread Kaartic Sivaraam
This parameter allows the branch update validation function to
optionally return a flag specifying the reason for failure, when
requested. This allows the caller to know why it was about to die.
This allows more useful error messages to be given to the user when
trying to rename a branch.

The flags are specified in the form of an enum and values for success
flags have been assigned explicitly to clearly express that certain
callers rely those values and they cannot be arbitrary.

Only the logic has been added but no caller has been made to use it, yet.
So, no functional changes.

Signed-off-by: Kaartic Sivaraam 
---
 branch.c   | 34 +++---
 branch.h   | 23 +--
 builtin/branch.c   |  2 +-
 builtin/checkout.c |  2 +-
 4 files changed, 46 insertions(+), 15 deletions(-)

diff --git a/branch.c b/branch.c
index 2020dedf6..9dda336a0 100644
--- a/branch.c
+++ b/branch.c
@@ -178,28 +178,40 @@ int read_branch_desc(struct strbuf *buf, const char 
*branch_name)
return 0;
 }
 
-int validate_branch_update(const char *name, struct strbuf *ref,
-  int could_exist, int clobber_head)
+int validate_branch_update(const char *name, struct strbuf *ref, int 
could_exist,
+  int clobber_head, unsigned dont_fail)
 {
-   if (strbuf_check_branch_ref(ref, name))
-   die(_("'%s' is not a valid branch name."), name);
+   if (strbuf_check_branch_ref(ref, name)) {
+   if (dont_fail)
+   return INVALID_BRANCH_NAME;
+   else
+   die(_("'%s' is not a valid branch name."), name);
+   }
 
if (!ref_exists(ref->buf))
-   return 0;
+   return VALID_BRANCH_NAME;
 
-   if (!could_exist)
-   die(_("A branch named '%s' already exists."), ref->buf + 
strlen("refs/heads/"));
+   if (!could_exist) {
+   if (dont_fail)
+   return BRANCH_EXISTS;
+   else
+   die(_("A branch named '%s' already exists."), ref->buf 
+ strlen("refs/heads/"));
+   }
 
if (!clobber_head) {
const char *head;
struct object_id oid;
 
head = resolve_ref_unsafe("HEAD", 0, oid.hash, NULL);
-   if (!is_bare_repository() && head && !strcmp(head, ref->buf))
-   die(_("Cannot force update the current branch."));
+   if (!is_bare_repository() && head && !strcmp(head, ref->buf)) {
+   if (dont_fail)
+   return CANNOT_FORCE_UPDATE_CURRENT_BRANCH;
+   else
+   die(_("Cannot force update the current 
branch."));
+   }
}
 
-   return 1;
+   return FORCE_UPDATING_BRANCH;
 }
 
 /*
@@ -268,7 +280,7 @@ void create_branch(const char *name, const char *start_name,
validate_existing_branch(name, );
dont_change_ref = 1;
} else {
-   forcing = validate_branch_update(name, , force, 
clobber_head);
+   forcing = validate_branch_update(name, , force, 
clobber_head, 0);
}
 
real_ref = NULL;
diff --git a/branch.h b/branch.h
index 6ada7af59..c6a8a75bb 100644
--- a/branch.h
+++ b/branch.h
@@ -27,6 +27,16 @@ void create_branch(const char *name, const char *start_name,
   int force, int reflog,
   int clobber_head, int quiet, enum branch_track track);
 
+enum branch_validation_result {
+   /* Flags that say it's NOT OK to update */
+   BRANCH_EXISTS = -3,
+   CANNOT_FORCE_UPDATE_CURRENT_BRANCH,
+   INVALID_BRANCH_NAME,
+   /* Flags that say it's OK to update */
+   VALID_BRANCH_NAME = 0,
+   FORCE_UPDATING_BRANCH = 1
+};
+
 /*
  * Validates whether the branch with the given name may be updated (created, 
renamed etc.,)
  * with respect to the given conditions. It returns the interpreted ref in ref.
@@ -36,10 +46,19 @@ void create_branch(const char *name, const char *start_name,
  * if 'could_exist' is true, clobber_head indicates whether the branch could 
be the
  * current branch else it has no effect.
  *
- * A non-zero return value indicates that a branch already exists and can be 
force updated.
+ * The return values have the following meaning,
+ *
+ *   - If dont_fail is 0, the function dies in case of failure and returns 
flags of
+ * 'validate_result' that specify it is OK to update the branch. The 
positive
+ * non-zero flag implies that the branch can be force updated.
+ *
+ *   - If dont_fail is 1, the function doesn't die in case of failure but 
returns flags
+ * of 'validate_result' that specify the reason for failure. The behaviour 
in case of
+ * success is same as above.
  *
  */
-int validate_branch_update(const char *name, struct strbuf *ref, int 
could_exist, int 

[RFC PATCH 2/5] branch: document the usage of certain parameters

2017-09-19 Thread Kaartic Sivaraam
Documentation for a certain function was incomplete as it didn't say
what certain parameters were used for.

So, document them for the sake of completeness and easy reference.

Signed-off-by: Kaartic Sivaraam 
---
 branch.h | 5 +
 1 file changed, 5 insertions(+)

diff --git a/branch.h b/branch.h
index b07788558..33b7f5d88 100644
--- a/branch.h
+++ b/branch.h
@@ -15,6 +15,11 @@
  *
  *   - reflog creates a reflog for the branch
  *
+ *   - if 'force' is true, clobber_head indicates whether the branch could be
+ * the current branch; else it has no effect
+ *
+ *   - quiet suppresses tracking information
+ *
  *   - track causes the new branch to be configured to merge the remote branch
  * that start_name is a tracking branch for (if any).
  */
-- 
2.14.1.1006.g90ad9a07c



[RFC PATCH 0/5] branch: improve error messages of branch renaming

2017-09-19 Thread Kaartic Sivaraam
In builtin/branch, the error messages weren't handled directly by the branch
renaming function and was left to the other function. Though this avoids
redundancy this gave unclear error messages in some cases. So, make 
builtin/branch
give more useful error messages.

The first two patches are preparatory/cleanup patches.

The third patch refactors a function to make it more usable/understandable(?).
This results only in one functional change as noted there. I've tried my best 
not
to screw anything up as a consequence of that refactor[note 1]. In case I missed
something, let me know.

The fourth patch introduces part of the logic needed to improve error messages.
It's kept separate to keep things reviewable.

The fifth patch is the main one which does the improvement of error messages.

These patches apply on top of 'master' and be found in my fork[2].

Note:

[1]: The Travis CI build did succeed but I don't think we can rely on that a lot
because the test aren't exhaustive (I guess).
https://travis-ci.org/sivaraam/git/builds/277146416

[2]: https://github.com/sivaraam/git/tree/work/branch-move-revamp

Kaartic Sivaraam (5):
  builtin/checkout: avoid usage of  '!!' for expressions
  branch: document the usage of certain parameters
  branch: cleanup branch name validation
  branch: introduce dont_fail parameter for update validation
  builtin/branch: give more useful error messages when renaming

 branch.c   | 67 +++---
 branch.h   | 44 +--
 builtin/branch.c   | 48 +-
 builtin/checkout.c |  7 +++---
 t/t3200-branch.sh  |  4 
 5 files changed, 130 insertions(+), 40 deletions(-)

-- 
2.14.1.868.g66c78774b



[RFC PATCH 3/5] branch: cleanup branch name validation

2017-09-19 Thread Kaartic Sivaraam
The function that validates a new branch name was clumsy because,

  1. It did more than just validating the branch name

  2. Despite it's name, it is more often than not used to validate
 existing branch names through the 'force' and 'attr_only'
 parameters (whose names by the way weren't communicative).

  3. The 'attr_only' parameter should have been "true" only when
 callers like to update some attribute of the branch and "false"
 when they were updating where the branch points to. It was misused
 by callers by setting it to "true" (to skip tests) even though they
 were force updating where the current branch was pointing to!

 This is an example of spoiling code clarity by making the caller
 rely on how the function is implemented thus making it hard to
 modify/maintain.

This makes it unclear what the function does at all and would confuse
the people who would ever want to it for the first time.

So, refactor it into a function that validates whether the branch could
be updated. This doesn't bear the uncommunicative 'new'. Further replace
the 'force' parameter with a 'could_exist' parameter which specifies
whether the given branch name could exist or not (it's just a better name
for 'force'). Also replace  the 'attr_only' with 'clobber_head' which is
a more communicative way of seeing "If the branch could exist, it's OK if
it is the current branch".

Separate the validation of an existing branch into another function.
This (at last!) addresses the NEEDSWORK that was added in fa7993767
(branch --set-upstream: regression fix, 2011-09-16)

This refactor has only one functional change. It enforces strictly that
an existing branch should be updated only with the 'force' switch. So,
it's no more possible to do,

$ git branch -m master master

(which doesn't seem that useful). This strongly enforces the following
statement of the 'git branch' documentation,

"If  exists, -M must be used to force the
 rename to happen."

Signed-off-by: Kaartic Sivaraam 
---
 branch.c   | 41 ++---
 branch.h   | 20 
 builtin/branch.c   |  2 +-
 builtin/checkout.c |  4 ++--
 t/t3200-branch.sh  |  4 
 5 files changed, 45 insertions(+), 26 deletions(-)

diff --git a/branch.c b/branch.c
index 703ded69c..2020dedf6 100644
--- a/branch.c
+++ b/branch.c
@@ -178,18 +178,19 @@ int read_branch_desc(struct strbuf *buf, const char 
*branch_name)
return 0;
 }
 
-int validate_new_branchname(const char *name, struct strbuf *ref,
-   int force, int attr_only)
+int validate_branch_update(const char *name, struct strbuf *ref,
+  int could_exist, int clobber_head)
 {
if (strbuf_check_branch_ref(ref, name))
die(_("'%s' is not a valid branch name."), name);
 
if (!ref_exists(ref->buf))
return 0;
-   else if (!force && !attr_only)
+
+   if (!could_exist)
die(_("A branch named '%s' already exists."), ref->buf + 
strlen("refs/heads/"));
 
-   if (!attr_only) {
+   if (!clobber_head) {
const char *head;
struct object_id oid;
 
@@ -197,9 +198,29 @@ int validate_new_branchname(const char *name, struct 
strbuf *ref,
if (!is_bare_repository() && head && !strcmp(head, ref->buf))
die(_("Cannot force update the current branch."));
}
+
return 1;
 }
 
+/*
+ * Validates whether the branch with the given name exists, returning the
+ * interpreted ref in ref.
+ *
+ * This method is invoked if the caller merely wants to know if it is OK
+ * to change some attribute for the named branch (e.g. tracking upstream).
+ *
+ */
+static void validate_existing_branch(const char *name, struct strbuf *ref)
+{
+   if (strbuf_check_branch_ref(ref, name))
+   die(_("'%s' is not a valid branch name."), name);
+
+   if (ref_exists(ref->buf))
+   return;
+   else
+   die(_("branch '%s' doesn't exist"), name);
+}
+
 static int check_tracking_branch(struct remote *remote, void *cb_data)
 {
char *tracking_branch = cb_data;
@@ -243,13 +264,11 @@ void create_branch(const char *name, const char 
*start_name,
if (track == BRANCH_TRACK_EXPLICIT || track == BRANCH_TRACK_OVERRIDE)
explicit_tracking = 1;
 
-   if (validate_new_branchname(name, , force,
-   track == BRANCH_TRACK_OVERRIDE ||
-   clobber_head)) {
-   if (!force)
-   dont_change_ref = 1;
-   else
-   forcing = 1;
+   if (track == BRANCH_TRACK_OVERRIDE) {
+   validate_existing_branch(name, );
+   dont_change_ref = 1;
+   } else {
+   forcing = validate_branch_update(name, , force, 

[RFC PATCH 5/5] builtin/branch: give more useful error messages when renaming

2017-09-19 Thread Kaartic Sivaraam
When trying to rename an inexistent branch to an existing branch
the rename failed specifying the new branch name exists rather than
specifying that the branch trying to be renamed doesn't exist.

$ git branch -m tset master
fatal: A branch named 'master' already exists.

It's conventional to report that 'tset' doesn't exist rather than
reporting that 'master' exists, the same way the 'mv' command does.

$ git branch -m tset master
fatal: branch 'tset' doesn't exist.

That has the problem that the error about an existing branch is shown
only after the user corrects the error about inexistent branch.

$ git branch -m test master
fatal: A branch named 'master' already exists.

This isn't useful either because the user would have corrected this error in
a single go if he had been told this alongside the first error. So, give
more useful error messages by giving errors about old branch name and new
branch name at the same time. This is possible as the branch update validation
function now returns the reason it was about to die, when requested.

$ git branch -m tset master
fatal: branch 'tset' doesn't exist, and branch 'master' already exists

Note: Thanks to the strbuf API that made it possible to easily construct
the composite error message strings!

Signed-off-by: Kaartic Sivaraam 
---
 builtin/branch.c | 48 ++--
 1 file changed, 42 insertions(+), 6 deletions(-)

diff --git a/builtin/branch.c b/builtin/branch.c
index 205c12a11..27d24e83d 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -456,25 +456,56 @@ static void reject_rebase_or_bisect_branch(const char 
*target)
free_worktrees(worktrees);
 }
 
+static void get_error_msg(struct strbuf* error_msg, const char* oldname, 
unsigned old_branch_exists,
+ const char* newname, int new_branch_validation_result)
+{
+   const char* connector_string = ", and ";
+   const unsigned connector_length = 6;
+   unsigned connector_added = 0;
+
+   if (!old_branch_exists) {
+   strbuf_addf(error_msg, _("branch '%s' doesn't exist"), oldname);
+
+   /* add the 'connector_string' and remove it later if it's not 
needed */
+   strbuf_addstr(error_msg, connector_string);
+   connector_added = 1;
+   }
+
+   switch (new_branch_validation_result) {
+   case BRANCH_EXISTS:
+   strbuf_addf(error_msg, _("branch '%s' already exists"), 
newname);
+   break;
+   case CANNOT_FORCE_UPDATE_CURRENT_BRANCH:
+   strbuf_addstr(error_msg, _("cannot force update the 
current branch"));
+   break;
+   case INVALID_BRANCH_NAME:
+   strbuf_addf(error_msg, _("branch name '%s' is 
invalid"), newname);
+   break;
+   case VALID_BRANCH_NAME:
+   case FORCE_UPDATING_BRANCH:
+   if(connector_added)
+   strbuf_remove(error_msg, 
error_msg->len-connector_length, connector_length);
+   }
+}
+
 static void rename_branch(const char *oldname, const char *newname, int force)
 {
struct strbuf oldref = STRBUF_INIT, newref = STRBUF_INIT, logmsg = 
STRBUF_INIT;
struct strbuf oldsection = STRBUF_INIT, newsection = STRBUF_INIT;
int recovery = 0;
int clobber_head_ok;
+   struct strbuf error_msg = STRBUF_INIT, empty = STRBUF_INIT;
 
if (!oldname)
die(_("cannot rename the current branch while not on any."));
 
-   if (strbuf_check_branch_ref(, oldname)) {
+   if (strbuf_check_branch_ref(, oldname) && ref_exists(oldref.buf))
+   {
/*
 * Bad name --- this could be an attempt to rename a
 * ref that we used to allow to be created by accident.
 */
-   if (ref_exists(oldref.buf))
-   recovery = 1;
-   else
-   die(_("Invalid branch name: '%s'"), oldname);
+   recovery = 1;
}
 
/*
@@ -483,7 +514,10 @@ static void rename_branch(const char *oldname, const char 
*newname, int force)
 */
clobber_head_ok = !strcmp(oldname, newname);
 
-   validate_branch_update(newname, , force, clobber_head_ok, 0);
+   get_error_msg(_msg, oldname, ref_exists(oldref.buf),
+   newname, validate_branch_update(newname, , 
force, clobber_head_ok, 1));
+   if (strbuf_cmp(_msg, ))
+   die("%s", error_msg.buf);
 
reject_rebase_or_bisect_branch(oldref.buf);
 
@@ -509,6 +543,8 @@ static void rename_branch(const char *oldname, const char 
*newname, int force)
die(_("Branch is renamed, but update of config-file failed"));
strbuf_release();
strbuf_release();
+   strbuf_release(_msg);
+   

[RFC PATCH 1/5] builtin/checkout: avoid usage of '!!'

2017-09-19 Thread Kaartic Sivaraam
Documentation/CodingGuidelines says,

"Some clever tricks, like using the !! operator with arithmetic
 constructs, can be extremely confusing to others.  Avoid them,
 unless there is a compelling reason to use them."

There was a usage for which there's no compelling reason.So, replace
such a usage as with something else that expresses the intent more
clearly.

Signed-off-by: Kaartic Sivaraam 
---
 I think the expression,

!!opts.new_branch_force

 is equivalent to,

opts.new_branch_force != NULL

 in all cases. If it's not, let me know.

 builtin/checkout.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 5c202b7af..76859da9d 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -1281,11 +1281,10 @@ int cmd_checkout(int argc, const char **argv, const 
char *prefix)
 
if (opts.new_branch) {
struct strbuf buf = STRBUF_INIT;
+   int force = opts.new_branch_force != NULL;
 
-   opts.branch_exists =
-   validate_new_branchname(opts.new_branch, ,
-   !!opts.new_branch_force,
-   !!opts.new_branch_force);
+   opts.branch_exists = validate_new_branchname(opts.new_branch, 
,
+force, force);
 
strbuf_release();
}
-- 
2.14.1.868.g66c78774b



Re: [PATCH] for_each_string_list_item(): behave correctly for empty list

2017-09-19 Thread Michael Haggerty
On 09/19/2017 02:08 AM, Stefan Beller wrote:
>> I am hoping that this last one is not allowed and we can use the
>> "same condition is checked every time we loop" version that hides
>> the uglyness inside the macro.
> 
> By which you are referring to Jonathans solution posted.
> Maybe we can combine the two solutions (checking for thelist
> to not be NULL once, by Jonathan) and using an outer structure
> (SZEDERs solution) by replacing the condition by a for loop,
> roughly (untested):
> 
> #define for_each_string_list_item(item,list) \
> -   for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
> +for (; list; list = NULL)
> +for (item = (list)->items; item < (list)->items + (list)->nr; ++item)
> 
> as that would not mingle with any dangling else clause.
> It is also just one statement, such that
> 
> if (bla)
>   for_each_string_list_item {
> baz(item);
>   }
> else
>   foo;
> 
> still works.
> 
> Are there downsides to this combined approach?

On the plus side, it's pleasantly devious; I wouldn't have thought of
using a `for` loop for the initial test. But it doesn't work as written,
because (1) we don't need to guard against `list` being NULL, but rather
`list->items`; and (2) we don't have the liberty to set `list = NULL`
(or `list->items = NULL`, because `list` is owned by the caller and we
shouldn't modify it.

The following is a bit closer:

#define for_each_string_list_item(item,list) \
for (item = (list)->items; item; item = NULL) \
for (; item < (list)->items + (list)->nr; ++item)

But I think that also fails, because a callsite that does

for_each_string_list_item(myitem, mylist)
if (myitem.util)
break;

would expect that `myitem` is still set after breaking out of the loop,
whereas the outer `for` loop would reset it to NULL.

If `break` were an expression we could do something like

#define for_each_string_list_item(item,list) \
for (item = (list)->items; item; break) \
for (; item < (list)->items + (list)->nr; ++item)

So I think we're still left with the suggestions of Jonathan or Gábor.
Or the bigger change of initializing `string_list::items` to point at an
empty sentinal array (similar to `strbuf_slopbuf`) rather than NULL.
Personally, I think that Jonathan's approach makes the most sense,
unless somebody wants to jump in an implement a `string_list_slopbuf`.

By the way, I wonder if any open-coded loops over `string_lists` make
the same mistake as the macro?

Michael


Re: [PATCH v2 1/4] mktag: add option which allows the tagger field to be omitted

2017-09-19 Thread Ian Campbell
On Tue, 2017-09-19 at 12:01 +0900, Junio C Hamano wrote:
> 
> Hmph.  I cannot shake this nagging feeling that this is probably a
> solution that is overly narrow to a single problem that won't scale
> into the future.
> 
> [...snip good point...]
> 
> If we drop the "verification" step from the above, that essentially
> becomes an equivaent to "hash-object -t tag -w --stdin".
> 
> So I now have to wonder if it may be sufficient to use "hash-object"
> in filter-branch, without doing this "allow malformed data that we
> would not permit if the tag were being created today, only to help
> replaying an old, already broken data" change to "git mktag".
> 
> Is there something that makes "hash-object" insufficient (like it
> still does some extra checks we would want to disable and cannot
> work as a replacement for your "--allow-missing-tagger")?

I've done a couple of quick tests and it looks like it will work. I'll
run a few more checks and repost.

Ian.


[PATCH v2 21/21] packed-backend.c: rename a bunch of things and update comments

2017-09-19 Thread Michael Haggerty
We've made huge changes to this file, and some of the old names and
comments are no longer very fitting. So rename a bunch of things:

* `struct packed_ref_cache` → `struct snapshot`
* `acquire_packed_ref_cache()` → `acquire_snapshot()`
* `release_packed_ref_buffer()` → `clear_snapshot_buffer()`
* `release_packed_ref_cache()` → `release_snapshot()`
* `clear_packed_ref_cache()` → `clear_snapshot()`
* `struct packed_ref_entry` → `struct snapshot_record`
* `cmp_packed_ref_entries()` → `cmp_packed_ref_records()`
* `cmp_entry_to_refname()` → `cmp_record_to_refname()`
* `sort_packed_refs()` → `sort_snapshot()`
* `read_packed_refs()` → `create_snapshot()`
* `validate_packed_ref_cache()` → `validate_snapshot()`
* `get_packed_ref_cache()` → `get_snapshot()`
* Renamed local variables and struct members accordingly.

Also update a bunch of comments to reflect the renaming and the
accumulated changes that the code has undergone.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 422 +++---
 1 file changed, 232 insertions(+), 190 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 92b837a237..b9351da843 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -37,10 +37,30 @@ static enum mmap_strategy mmap_strategy = MMAP_OK;
 
 struct packed_ref_store;
 
-struct packed_ref_cache {
+/*
+ * A `snapshot` represents one snapshot of a `packed-refs` file.
+ *
+ * Normally, this will be a mmapped view of the contents of the
+ * `packed-refs` file at the time the snapshot was created. However,
+ * if the `packed-refs` file was not sorted, this might point at heap
+ * memory holding the contents of the `packed-refs` file with its
+ * records sorted by refname.
+ *
+ * `snapshot` instances are reference counted (via
+ * `acquire_snapshot()` and `release_snapshot()`). This is to prevent
+ * an instance from disappearing while an iterator is still iterating
+ * over it. Instances are garbage collected when their `referrers`
+ * count goes to zero.
+ *
+ * The most recent `snapshot`, if available, is referenced by the
+ * `packed_ref_store`. Its freshness is checked whenever
+ * `get_snapshot()` is called; if the existing snapshot is obsolete, a
+ * new snapshot is taken.
+ */
+struct snapshot {
/*
 * A back-pointer to the packed_ref_store with which this
-* cache is associated:
+* snapshot is associated:
 */
struct packed_ref_store *refs;
 
@@ -61,26 +81,42 @@ struct packed_ref_cache {
size_t header_len;
 
/*
-* What is the peeled state of this cache? (This is usually
-* determined from the header of the "packed-refs" file.)
+* What is the peeled state of the `packed-refs` file that
+* this snapshot represents? (This is usually determined from
+* the file's header.)
 */
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled;
 
/*
-* Count of references to the data structure in this instance,
-* including the pointer from files_ref_store::packed if any.
-* The data will not be freed as long as the reference count
-* is nonzero.
+* Count of references to this instance, including the pointer
+* from `packed_ref_store::snapshot`, if any. The instance
+* will not be freed as long as the reference count is
+* nonzero.
 */
unsigned int referrers;
 
-   /* The metadata from when this packed-refs cache was read */
+   /*
+* The metadata of the `packed-refs` file from which this
+* snapshot was created, used to tell if the file has been
+* replaced since we read it.
+*/
struct stat_validity validity;
 };
 
 /*
- * A container for `packed-refs`-related data. It is not (yet) a
- * `ref_store`.
+ * A `ref_store` representing references stored in a `packed-refs`
+ * file. It implements the `ref_store` interface, though it has some
+ * limitations:
+ *
+ * - It cannot store symbolic references.
+ *
+ * - It cannot store reflogs.
+ *
+ * - It does not support reference renaming (though it could).
+ *
+ * On the other hand, it can be locked outside of a reference
+ * transaction. In that case, it remains locked even after the
+ * transaction is done and the new `packed-refs` file is activated.
  */
 struct packed_ref_store {
struct ref_store base;
@@ -91,10 +127,10 @@ struct packed_ref_store {
char *path;
 
/*
-* A cache of the values read from the `packed-refs` file, if
-* it might still be current; otherwise, NULL.
+* A snapshot of the values read from the `packed-refs` file,
+* if it might still be current; otherwise, NULL.
 */
-   struct packed_ref_cache *cache;
+   struct snapshot *snapshot;
 
/*
 * Lock used for the "packed-refs" file. Note that this (and
@@ -111,43 +147,42 @@ struct 

[PATCH v2 03/21] packed_ref_cache: add a backlink to the associated `packed_ref_store`

2017-09-19 Thread Michael Haggerty
It will prove convenient in upcoming patches.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 23 ---
 1 file changed, 16 insertions(+), 7 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index e411501871..a3d9210cb0 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -7,7 +7,15 @@
 #include "../iterator.h"
 #include "../lockfile.h"
 
+struct packed_ref_store;
+
 struct packed_ref_cache {
+   /*
+* A back-pointer to the packed_ref_store with which this
+* cache is associated:
+*/
+   struct packed_ref_store *refs;
+
struct ref_cache *cache;
 
/*
@@ -154,7 +162,7 @@ static const char *parse_ref_line(struct strbuf *line, 
struct object_id *oid)
 }
 
 /*
- * Read from `packed_refs_file` into a newly-allocated
+ * Read from the `packed-refs` file into a newly-allocated
  * `packed_ref_cache` and return it. The return value will already
  * have its reference count incremented.
  *
@@ -182,7 +190,7 @@ static const char *parse_ref_line(struct strbuf *line, 
struct object_id *oid)
  *  compatibility with older clients, but we do not require it
  *  (i.e., "peeled" is a no-op if "fully-peeled" is set).
  */
-static struct packed_ref_cache *read_packed_refs(const char *packed_refs_file)
+static struct packed_ref_cache *read_packed_refs(struct packed_ref_store *refs)
 {
FILE *f;
struct packed_ref_cache *packed_refs = xcalloc(1, sizeof(*packed_refs));
@@ -191,11 +199,12 @@ static struct packed_ref_cache *read_packed_refs(const 
char *packed_refs_file)
enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
struct ref_dir *dir;
 
+   packed_refs->refs = refs;
acquire_packed_ref_cache(packed_refs);
packed_refs->cache = create_ref_cache(NULL, NULL);
packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
 
-   f = fopen(packed_refs_file, "r");
+   f = fopen(refs->path, "r");
if (!f) {
if (errno == ENOENT) {
/*
@@ -205,7 +214,7 @@ static struct packed_ref_cache *read_packed_refs(const char 
*packed_refs_file)
 */
return packed_refs;
} else {
-   die_errno("couldn't read %s", packed_refs_file);
+   die_errno("couldn't read %s", refs->path);
}
}
 
@@ -218,7 +227,7 @@ static struct packed_ref_cache *read_packed_refs(const char 
*packed_refs_file)
const char *traits;
 
if (!line.len || line.buf[line.len - 1] != '\n')
-   die("unterminated line in %s: %s", packed_refs_file, 
line.buf);
+   die("unterminated line in %s: %s", refs->path, 
line.buf);
 
if (skip_prefix(line.buf, "# pack-refs with:", )) {
if (strstr(traits, " fully-peeled "))
@@ -258,7 +267,7 @@ static struct packed_ref_cache *read_packed_refs(const char 
*packed_refs_file)
last->flag |= REF_KNOWS_PEELED;
} else {
strbuf_setlen(, line.len - 1);
-   die("unexpected line in %s: %s", packed_refs_file, 
line.buf);
+   die("unexpected line in %s: %s", refs->path, line.buf);
}
}
 
@@ -293,7 +302,7 @@ static struct packed_ref_cache *get_packed_ref_cache(struct 
packed_ref_store *re
validate_packed_ref_cache(refs);
 
if (!refs->cache)
-   refs->cache = read_packed_refs(refs->path);
+   refs->cache = read_packed_refs(refs);
 
return refs->cache;
 }
-- 
2.14.1



[PATCH v2 19/21] ref_cache: remove support for storing peeled values

2017-09-19 Thread Michael Haggerty
Now that the `packed-refs` backend doesn't use `ref_cache`, there is
nobody left who might want to store peeled values of references in
`ref_cache`. So remove that feature.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c |  9 -
 refs/ref-cache.c  | 42 +-
 refs/ref-cache.h  | 32 ++--
 3 files changed, 11 insertions(+), 72 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b4a01637a6..b1571872fa 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -2,7 +2,6 @@
 #include "../config.h"
 #include "../refs.h"
 #include "refs-internal.h"
-#include "ref-cache.h"
 #include "packed-backend.h"
 #include "../iterator.h"
 #include "../lockfile.h"
@@ -226,6 +225,14 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
+/*
+ * This value is set in `base.flags` if the peeled value of the
+ * current reference is known. In that case, `peeled` contains the
+ * correct peeled value for the reference, which might be `null_sha1`
+ * if the reference is not a tag or if it is broken.
+ */
+#define REF_KNOWS_PEELED 0x40
+
 /*
  * An iterator over a packed-refs file that is currently mmapped.
  */
diff --git a/refs/ref-cache.c b/refs/ref-cache.c
index 54dfb5218c..4f850e1b5c 100644
--- a/refs/ref-cache.c
+++ b/refs/ref-cache.c
@@ -38,7 +38,6 @@ struct ref_entry *create_ref_entry(const char *refname,
 
FLEX_ALLOC_STR(ref, name, refname);
oidcpy(>u.value.oid, oid);
-   oidclr(>u.value.peeled);
ref->flag = flag;
return ref;
 }
@@ -491,49 +490,10 @@ static int cache_ref_iterator_advance(struct ref_iterator 
*ref_iterator)
}
 }
 
-enum peel_status peel_entry(struct ref_entry *entry, int repeel)
-{
-   enum peel_status status;
-
-   if (entry->flag & REF_KNOWS_PEELED) {
-   if (repeel) {
-   entry->flag &= ~REF_KNOWS_PEELED;
-   oidclr(>u.value.peeled);
-   } else {
-   return is_null_oid(>u.value.peeled) ?
-   PEEL_NON_TAG : PEEL_PEELED;
-   }
-   }
-   if (entry->flag & REF_ISBROKEN)
-   return PEEL_BROKEN;
-   if (entry->flag & REF_ISSYMREF)
-   return PEEL_IS_SYMREF;
-
-   status = peel_object(entry->u.value.oid.hash, 
entry->u.value.peeled.hash);
-   if (status == PEEL_PEELED || status == PEEL_NON_TAG)
-   entry->flag |= REF_KNOWS_PEELED;
-   return status;
-}
-
 static int cache_ref_iterator_peel(struct ref_iterator *ref_iterator,
   struct object_id *peeled)
 {
-   struct cache_ref_iterator *iter =
-   (struct cache_ref_iterator *)ref_iterator;
-   struct cache_ref_iterator_level *level;
-   struct ref_entry *entry;
-
-   level = >levels[iter->levels_nr - 1];
-
-   if (level->index == -1)
-   die("BUG: peel called before advance for cache iterator");
-
-   entry = level->dir->entries[level->index];
-
-   if (peel_entry(entry, 0))
-   return -1;
-   oidcpy(peeled, >u.value.peeled);
-   return 0;
+   return peel_object(ref_iterator->oid->hash, peeled->hash);
 }
 
 static int cache_ref_iterator_abort(struct ref_iterator *ref_iterator)
diff --git a/refs/ref-cache.h b/refs/ref-cache.h
index a082bfb06c..eda65e73ed 100644
--- a/refs/ref-cache.h
+++ b/refs/ref-cache.h
@@ -38,14 +38,6 @@ struct ref_value {
 * referred to by the last reference in the symlink chain.
 */
struct object_id oid;
-
-   /*
-* If REF_KNOWS_PEELED, then this field holds the peeled value
-* of this reference, or null if the reference is known not to
-* be peelable.  See the documentation for peel_ref() for an
-* exact definition of "peelable".
-*/
-   struct object_id peeled;
 };
 
 /*
@@ -97,21 +89,14 @@ struct ref_dir {
  * public values; see refs.h.
  */
 
-/*
- * The field ref_entry->u.value.peeled of this value entry contains
- * the correct peeled value for the reference, which might be
- * null_sha1 if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x10
-
 /* ref_entry represents a directory of references */
-#define REF_DIR 0x20
+#define REF_DIR 0x10
 
 /*
  * Entry has not yet been read from disk (used only for REF_DIR
  * entries representing loose references)
  */
-#define REF_INCOMPLETE 0x40
+#define REF_INCOMPLETE 0x20
 
 /*
  * A ref_entry represents either a reference or a "subdirectory" of
@@ -252,17 +237,4 @@ struct ref_iterator *cache_ref_iterator_begin(struct 
ref_cache *cache,
  const char *prefix,
  int prime_dir);
 
-/*
- * Peel the entry (if possible) and return its new peel_status.  If
- * repeel is true, re-peel the entry even if there is an old 

[PATCH v2 06/21] read_packed_refs(): only check for a header at the top of the file

2017-09-19 Thread Michael Haggerty
This tightens up the parsing a bit; previously, stray header-looking
lines would have been processed.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 35 ---
 1 file changed, 24 insertions(+), 11 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 154abbd83a..141f02b9c8 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -255,11 +255,34 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
pos = buf;
eof = buf + size;
 
+   /* If the file has a header line, process it: */
+   if (pos < eof && *pos == '#') {
+   const char *traits;
+
+   eol = memchr(pos, '\n', eof - pos);
+   if (!eol)
+   die_unterminated_line(refs->path, pos, eof - pos);
+
+   strbuf_add(, pos, eol + 1 - pos);
+
+   if (!skip_prefix(line.buf, "# pack-refs with:", ))
+   die_invalid_line(refs->path, pos, eof - pos);
+
+   if (strstr(traits, " fully-peeled "))
+   peeled = PEELED_FULLY;
+   else if (strstr(traits, " peeled "))
+   peeled = PEELED_TAGS;
+   /* perhaps other traits later as well */
+
+   /* The "+ 1" is for the LF character. */
+   pos = eol + 1;
+   strbuf_reset();
+   }
+
dir = get_ref_dir(packed_refs->cache->root);
while (pos < eof) {
struct object_id oid;
const char *refname;
-   const char *traits;
 
eol = memchr(pos, '\n', eof - pos);
if (!eol)
@@ -267,15 +290,6 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
 
strbuf_add(, pos, eol + 1 - pos);
 
-   if (skip_prefix(line.buf, "# pack-refs with:", )) {
-   if (strstr(traits, " fully-peeled "))
-   peeled = PEELED_FULLY;
-   else if (strstr(traits, " peeled "))
-   peeled = PEELED_TAGS;
-   /* perhaps other traits later as well */
-   goto next_line;
-   }
-
refname = parse_ref_line(, );
if (refname) {
int flag = REF_ISPACKED;
@@ -307,7 +321,6 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
die_invalid_line(refs->path, line.buf, line.len);
}
 
-   next_line:
/* The "+ 1" is for the LF character. */
pos = eol + 1;
strbuf_reset();
-- 
2.14.1



[PATCH v2 17/21] ref_store: implement `refs_peel_ref()` generically

2017-09-19 Thread Michael Haggerty
We're about to stop storing packed refs in a `ref_cache`. That means
that the only way we have left to optimize `peel_ref()` is by checking
whether the reference being peeled is the one currently being iterated
over (in `current_ref_iter`), and if so, using `ref_iterator_peel()`.
But this can be done generically; it doesn't have to be implemented
per-backend.

So implement `refs_peel_ref()` in `refs.c` and remove the `peel_ref()`
method from the refs API.

This removes the last callers of a couple of functions, so delete
them. More cleanup to come...

Signed-off-by: Michael Haggerty 
---
 refs.c| 18 +-
 refs/files-backend.c  | 38 --
 refs/packed-backend.c | 36 
 refs/refs-internal.h  |  3 ---
 4 files changed, 17 insertions(+), 78 deletions(-)

diff --git a/refs.c b/refs.c
index 101c107ee8..c5e6f79c77 100644
--- a/refs.c
+++ b/refs.c
@@ -1735,7 +1735,23 @@ int refs_pack_refs(struct ref_store *refs, unsigned int 
flags)
 int refs_peel_ref(struct ref_store *refs, const char *refname,
  unsigned char *sha1)
 {
-   return refs->be->peel_ref(refs, refname, sha1);
+   int flag;
+   unsigned char base[20];
+
+   if (current_ref_iter && current_ref_iter->refname == refname) {
+   struct object_id peeled;
+
+   if (ref_iterator_peel(current_ref_iter, ))
+   return -1;
+   hashcpy(sha1, peeled.hash);
+   return 0;
+   }
+
+   if (refs_read_ref_full(refs, refname,
+  RESOLVE_REF_READING, base, ))
+   return -1;
+
+   return peel_object(base, sha1);
 }
 
 int peel_ref(const char *refname, unsigned char *sha1)
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 35648c89fc..7d12de88d0 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -655,43 +655,6 @@ static int lock_raw_ref(struct files_ref_store *refs,
return ret;
 }
 
-static int files_peel_ref(struct ref_store *ref_store,
- const char *refname, unsigned char *sha1)
-{
-   struct files_ref_store *refs =
-   files_downcast(ref_store, REF_STORE_READ | REF_STORE_ODB,
-  "peel_ref");
-   int flag;
-   unsigned char base[20];
-
-   if (current_ref_iter && current_ref_iter->refname == refname) {
-   struct object_id peeled;
-
-   if (ref_iterator_peel(current_ref_iter, ))
-   return -1;
-   hashcpy(sha1, peeled.hash);
-   return 0;
-   }
-
-   if (refs_read_ref_full(ref_store, refname,
-  RESOLVE_REF_READING, base, ))
-   return -1;
-
-   /*
-* If the reference is packed, read its ref_entry from the
-* cache in the hope that we already know its peeled value.
-* We only try this optimization on packed references because
-* (a) forcing the filling of the loose reference cache could
-* be expensive and (b) loose references anyway usually do not
-* have REF_KNOWS_PEELED.
-*/
-   if (flag & REF_ISPACKED &&
-   !refs_peel_ref(refs->packed_ref_store, refname, sha1))
-   return 0;
-
-   return peel_object(base, sha1);
-}
-
 struct files_ref_iterator {
struct ref_iterator base;
 
@@ -3012,7 +2975,6 @@ struct ref_storage_be refs_be_files = {
files_initial_transaction_commit,
 
files_pack_refs,
-   files_peel_ref,
files_create_symref,
files_delete_refs,
files_rename_ref,
diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 2cf2f7f73d..6a930a440c 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -849,26 +849,6 @@ static struct packed_ref_cache 
*get_packed_ref_cache(struct packed_ref_store *re
return refs->cache;
 }
 
-static struct ref_dir *get_packed_ref_dir(struct packed_ref_cache 
*packed_ref_cache)
-{
-   return get_ref_dir(packed_ref_cache->cache->root);
-}
-
-static struct ref_dir *get_packed_refs(struct packed_ref_store *refs)
-{
-   return get_packed_ref_dir(get_packed_ref_cache(refs));
-}
-
-/*
- * Return the ref_entry for the given refname from the packed
- * references.  If it does not exist, return NULL.
- */
-static struct ref_entry *get_packed_ref(struct packed_ref_store *refs,
-   const char *refname)
-{
-   return find_ref_entry(get_packed_refs(refs), refname);
-}
-
 static int packed_read_raw_ref(struct ref_store *ref_store,
   const char *refname, unsigned char *sha1,
   struct strbuf *referent, unsigned int *type)
@@ -895,21 +875,6 @@ static int packed_read_raw_ref(struct ref_store *ref_store,
return 0;
 }
 
-static int packed_peel_ref(struct ref_store *ref_store,
-

[PATCH v2 01/21] ref_iterator: keep track of whether the iterator output is ordered

2017-09-19 Thread Michael Haggerty
References are iterated over in order by refname, but reflogs are not.
Some consumers of reference iteration care about the difference. Teach
each `ref_iterator` to keep track of whether its output is ordered.

`overlay_ref_iterator` is one of the picky consumers. Add a sanity
check in `overlay_ref_iterator_begin()` to verify that its inputs are
ordered.

Signed-off-by: Michael Haggerty 
---
 refs.c|  4 
 refs/files-backend.c  | 16 +---
 refs/iterator.c   | 15 ++-
 refs/packed-backend.c |  2 +-
 refs/ref-cache.c  |  2 +-
 refs/ref-cache.h  |  3 ++-
 refs/refs-internal.h  | 23 +++
 7 files changed, 46 insertions(+), 19 deletions(-)

diff --git a/refs.c b/refs.c
index b0106b8162..101c107ee8 100644
--- a/refs.c
+++ b/refs.c
@@ -1309,6 +1309,10 @@ struct ref_iterator *refs_ref_iterator_begin(
if (trim)
iter = prefix_ref_iterator_begin(iter, "", trim);
 
+   /* Sanity check for subclasses: */
+   if (!iter->ordered)
+   BUG("reference iterator is not ordered");
+
return iter;
 }
 
diff --git a/refs/files-backend.c b/refs/files-backend.c
index 961424a4ea..35648c89fc 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -762,7 +762,7 @@ static struct ref_iterator *files_ref_iterator_begin(
const char *prefix, unsigned int flags)
 {
struct files_ref_store *refs;
-   struct ref_iterator *loose_iter, *packed_iter;
+   struct ref_iterator *loose_iter, *packed_iter, *overlay_iter;
struct files_ref_iterator *iter;
struct ref_iterator *ref_iterator;
unsigned int required_flags = REF_STORE_READ;
@@ -772,10 +772,6 @@ static struct ref_iterator *files_ref_iterator_begin(
 
refs = files_downcast(ref_store, required_flags, "ref_iterator_begin");
 
-   iter = xcalloc(1, sizeof(*iter));
-   ref_iterator = >base;
-   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable);
-
/*
 * We must make sure that all loose refs are read before
 * accessing the packed-refs file; this avoids a race
@@ -811,7 +807,13 @@ static struct ref_iterator *files_ref_iterator_begin(
refs->packed_ref_store, prefix, 0,
DO_FOR_EACH_INCLUDE_BROKEN);
 
-   iter->iter0 = overlay_ref_iterator_begin(loose_iter, packed_iter);
+   overlay_iter = overlay_ref_iterator_begin(loose_iter, packed_iter);
+
+   iter = xcalloc(1, sizeof(*iter));
+   ref_iterator = >base;
+   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable,
+  overlay_iter->ordered);
+   iter->iter0 = overlay_iter;
iter->flags = flags;
 
return ref_iterator;
@@ -2084,7 +2086,7 @@ static struct ref_iterator 
*files_reflog_iterator_begin(struct ref_store *ref_st
struct ref_iterator *ref_iterator = >base;
struct strbuf sb = STRBUF_INIT;
 
-   base_ref_iterator_init(ref_iterator, _reflog_iterator_vtable);
+   base_ref_iterator_init(ref_iterator, _reflog_iterator_vtable, 0);
files_reflog_path(refs, , NULL);
iter->dir_iterator = dir_iterator_begin(sb.buf);
iter->ref_store = ref_store;
diff --git a/refs/iterator.c b/refs/iterator.c
index 4cf449ef66..c475360f0a 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -25,9 +25,11 @@ int ref_iterator_abort(struct ref_iterator *ref_iterator)
 }
 
 void base_ref_iterator_init(struct ref_iterator *iter,
-   struct ref_iterator_vtable *vtable)
+   struct ref_iterator_vtable *vtable,
+   int ordered)
 {
iter->vtable = vtable;
+   iter->ordered = !!ordered;
iter->refname = NULL;
iter->oid = NULL;
iter->flags = 0;
@@ -72,7 +74,7 @@ struct ref_iterator *empty_ref_iterator_begin(void)
struct empty_ref_iterator *iter = xcalloc(1, sizeof(*iter));
struct ref_iterator *ref_iterator = >base;
 
-   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable);
+   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable, 1);
return ref_iterator;
 }
 
@@ -205,6 +207,7 @@ static struct ref_iterator_vtable merge_ref_iterator_vtable 
= {
 };
 
 struct ref_iterator *merge_ref_iterator_begin(
+   int ordered,
struct ref_iterator *iter0, struct ref_iterator *iter1,
ref_iterator_select_fn *select, void *cb_data)
 {
@@ -219,7 +222,7 @@ struct ref_iterator *merge_ref_iterator_begin(
 * references through only if they exist in both iterators.
 */
 
-   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable);
+   base_ref_iterator_init(ref_iterator, _ref_iterator_vtable, 
ordered);
iter->iter0 = iter0;
iter->iter1 = iter1;
iter->select = select;
@@ -268,9 +271,11 @@ struct ref_iterator *overlay_ref_iterator_begin(
} else if 

[PATCH v2 07/21] read_packed_refs(): make parsing of the header line more robust

2017-09-19 Thread Michael Haggerty
The old code parsed the traits in the `packed-refs` header by looking
for the string " trait " (i.e., the name of the trait with a space on
either side) in the header line. This is fragile, because if any other
implementation of Git forgets to write the trailing space, the last
trait would silently be ignored (and the error might never be
noticed).

So instead, use `string_list_split_in_place()` to split the traits
into tokens then use `unsorted_string_list_has_string()` to look for
the tokens we are interested in. This means that we can read the
traits correctly even if the header line is missing a trailing
space (or indeed, if it is missing the space after the colon, or if it
has multiple spaces somewhere).

However, older Git clients (and perhaps other Git implementations)
still require the surrounding spaces, so we still have to output the
header with a trailing space.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 141f02b9c8..a45e3ff92f 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -257,25 +257,30 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
 
/* If the file has a header line, process it: */
if (pos < eof && *pos == '#') {
-   const char *traits;
+   char *p;
+   struct string_list traits = STRING_LIST_INIT_NODUP;
 
eol = memchr(pos, '\n', eof - pos);
if (!eol)
die_unterminated_line(refs->path, pos, eof - pos);
 
-   strbuf_add(, pos, eol + 1 - pos);
+   strbuf_add(, pos, eol - pos);
 
-   if (!skip_prefix(line.buf, "# pack-refs with:", ))
+   if (!skip_prefix(line.buf, "# pack-refs with:", (const char 
**)))
die_invalid_line(refs->path, pos, eof - pos);
 
-   if (strstr(traits, " fully-peeled "))
+   string_list_split_in_place(, p, ' ', -1);
+
+   if (unsorted_string_list_has_string(, "fully-peeled"))
peeled = PEELED_FULLY;
-   else if (strstr(traits, " peeled "))
+   else if (unsorted_string_list_has_string(, "peeled"))
peeled = PEELED_TAGS;
/* perhaps other traits later as well */
 
/* The "+ 1" is for the LF character. */
pos = eol + 1;
+
+   string_list_clear(, 0);
strbuf_reset();
}
 
@@ -610,7 +615,11 @@ int packed_refs_is_locked(struct ref_store *ref_store)
 
 /*
  * The packed-refs header line that we write out.  Perhaps other
- * traits will be added later.  The trailing space is required.
+ * traits will be added later.
+ *
+ * Note that earlier versions of Git used to parse these traits by
+ * looking for " trait " in the line. For this reason, the space after
+ * the colon and the trailing space are required.
  */
 static const char PACKED_REFS_HEADER[] =
"# pack-refs with: peeled fully-peeled \n";
-- 
2.14.1



[PATCH v2 12/21] packed-backend.c: reorder some definitions

2017-09-19 Thread Michael Haggerty
No code has been changed. This will make subsequent patches more
self-contained.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 48 
 1 file changed, 24 insertions(+), 24 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 724c88631d..0fe41a7203 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -36,30 +36,6 @@ struct packed_ref_cache {
struct stat_validity validity;
 };
 
-/*
- * Increment the reference count of *packed_refs.
- */
-static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
-{
-   packed_refs->referrers++;
-}
-
-/*
- * Decrease the reference count of *packed_refs.  If it goes to zero,
- * free *packed_refs and return true; otherwise return false.
- */
-static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
-{
-   if (!--packed_refs->referrers) {
-   free_ref_cache(packed_refs->cache);
-   stat_validity_clear(_refs->validity);
-   free(packed_refs);
-   return 1;
-   } else {
-   return 0;
-   }
-}
-
 /*
  * A container for `packed-refs`-related data. It is not (yet) a
  * `ref_store`.
@@ -92,6 +68,30 @@ struct packed_ref_store {
struct tempfile tempfile;
 };
 
+/*
+ * Increment the reference count of *packed_refs.
+ */
+static void acquire_packed_ref_cache(struct packed_ref_cache *packed_refs)
+{
+   packed_refs->referrers++;
+}
+
+/*
+ * Decrease the reference count of *packed_refs.  If it goes to zero,
+ * free *packed_refs and return true; otherwise return false.
+ */
+static int release_packed_ref_cache(struct packed_ref_cache *packed_refs)
+{
+   if (!--packed_refs->referrers) {
+   free_ref_cache(packed_refs->cache);
+   stat_validity_clear(_refs->validity);
+   free(packed_refs);
+   return 1;
+   } else {
+   return 0;
+   }
+}
+
 struct ref_store *packed_ref_store_create(const char *path,
  unsigned int store_flags)
 {
-- 
2.14.1



[PATCH v2 20/21] mmapped_ref_iterator: inline into `packed_ref_iterator`

2017-09-19 Thread Michael Haggerty
Since `packed_ref_iterator` is now delegating to
`mmapped_ref_iterator` rather than `cache_ref_iterator` to do the
heavy lifting, there is no need to keep the two iterators separate. So
"inline" `mmapped_ref_iterator` into `packed_ref_iterator`. This
removes a bunch of boilerplate.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 284 --
 1 file changed, 114 insertions(+), 170 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index b1571872fa..92b837a237 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -225,157 +225,6 @@ static NORETURN void die_invalid_line(const char *path,
 
 }
 
-/*
- * This value is set in `base.flags` if the peeled value of the
- * current reference is known. In that case, `peeled` contains the
- * correct peeled value for the reference, which might be `null_sha1`
- * if the reference is not a tag or if it is broken.
- */
-#define REF_KNOWS_PEELED 0x40
-
-/*
- * An iterator over a packed-refs file that is currently mmapped.
- */
-struct mmapped_ref_iterator {
-   struct ref_iterator base;
-
-   struct packed_ref_cache *packed_refs;
-
-   /* The current position in the mmapped file: */
-   const char *pos;
-
-   /* The end of the mmapped file: */
-   const char *eof;
-
-   struct object_id oid, peeled;
-
-   struct strbuf refname_buf;
-};
-
-static int mmapped_ref_iterator_advance(struct ref_iterator *ref_iterator)
-{
-   struct mmapped_ref_iterator *iter =
-   (struct mmapped_ref_iterator *)ref_iterator;
-   const char *p = iter->pos, *eol;
-
-   strbuf_reset(>refname_buf);
-
-   if (iter->pos == iter->eof)
-   return ref_iterator_abort(ref_iterator);
-
-   iter->base.flags = REF_ISPACKED;
-
-   if (iter->eof - p < GIT_SHA1_HEXSZ + 2 ||
-   parse_oid_hex(p, >oid, ) ||
-   !isspace(*p++))
-   die_invalid_line(iter->packed_refs->refs->path,
-iter->pos, iter->eof - iter->pos);
-
-   eol = memchr(p, '\n', iter->eof - p);
-   if (!eol)
-   die_unterminated_line(iter->packed_refs->refs->path,
- iter->pos, iter->eof - iter->pos);
-
-   strbuf_add(>refname_buf, p, eol - p);
-   iter->base.refname = iter->refname_buf.buf;
-
-   if (check_refname_format(iter->base.refname, REFNAME_ALLOW_ONELEVEL)) {
-   if (!refname_is_safe(iter->base.refname))
-   die("packed refname is dangerous: %s",
-   iter->base.refname);
-   oidclr(>oid);
-   iter->base.flags |= REF_BAD_NAME | REF_ISBROKEN;
-   }
-   if (iter->packed_refs->peeled == PEELED_FULLY ||
-   (iter->packed_refs->peeled == PEELED_TAGS &&
-starts_with(iter->base.refname, "refs/tags/")))
-   iter->base.flags |= REF_KNOWS_PEELED;
-
-   iter->pos = eol + 1;
-
-   if (iter->pos < iter->eof && *iter->pos == '^') {
-   p = iter->pos + 1;
-   if (iter->eof - p < GIT_SHA1_HEXSZ + 1 ||
-   parse_oid_hex(p, >peeled, ) ||
-   *p++ != '\n')
-   die_invalid_line(iter->packed_refs->refs->path,
-iter->pos, iter->eof - iter->pos);
-   iter->pos = p;
-
-   /*
-* Regardless of what the file header said, we
-* definitely know the value of *this* reference. But
-* we suppress it if the reference is broken:
-*/
-   if ((iter->base.flags & REF_ISBROKEN)) {
-   oidclr(>peeled);
-   iter->base.flags &= ~REF_KNOWS_PEELED;
-   } else {
-   iter->base.flags |= REF_KNOWS_PEELED;
-   }
-   } else {
-   oidclr(>peeled);
-   }
-
-   return ITER_OK;
-}
-
-static int mmapped_ref_iterator_peel(struct ref_iterator *ref_iterator,
-   struct object_id *peeled)
-{
-   struct mmapped_ref_iterator *iter =
-   (struct mmapped_ref_iterator *)ref_iterator;
-
-   if ((iter->base.flags & REF_KNOWS_PEELED)) {
-   oidcpy(peeled, >peeled);
-   return is_null_oid(>peeled) ? -1 : 0;
-   } else if ((iter->base.flags & (REF_ISBROKEN | REF_ISSYMREF))) {
-   return -1;
-   } else {
-   return !!peel_object(iter->oid.hash, peeled->hash);
-   }
-}
-
-static int mmapped_ref_iterator_abort(struct ref_iterator *ref_iterator)
-{
-   struct mmapped_ref_iterator *iter =
-   (struct mmapped_ref_iterator *)ref_iterator;
-
-   release_packed_ref_cache(iter->packed_refs);
-   strbuf_release(>refname_buf);
-   base_ref_iterator_free(ref_iterator);
-   return ITER_DONE;
-}
-
-static struct 

[PATCH v2 09/21] packed_ref_cache: remember the file-wide peeling state

2017-09-19 Thread Michael Haggerty
Rather than store the peeling state (i.e., the one defined by traits
in the `packed-refs` file header line) in a local variable in
`read_packed_refs()`, store it permanently in `packed_ref_cache`. This
will be needed when we stop reading all packed refs at once.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 17 -
 1 file changed, 12 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 2b80f244c8..ae276f3445 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -18,6 +18,12 @@ struct packed_ref_cache {
 
struct ref_cache *cache;
 
+   /*
+* What is the peeled state of this cache? (This is usually
+* determined from the header of the "packed-refs" file.)
+*/
+   enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled;
+
/*
 * Count of references to the data structure in this instance,
 * including the pointer from files_ref_store::packed if any.
@@ -195,13 +201,13 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
char *buf;
const char *pos, *eol, *eof;
struct strbuf tmp = STRBUF_INIT;
-   enum { PEELED_NONE, PEELED_TAGS, PEELED_FULLY } peeled = PEELED_NONE;
struct ref_dir *dir;
 
packed_refs->refs = refs;
acquire_packed_ref_cache(packed_refs);
packed_refs->cache = create_ref_cache(NULL, NULL);
packed_refs->cache->root->flag &= ~REF_INCOMPLETE;
+   packed_refs->peeled = PEELED_NONE;
 
fd = open(refs->path, O_RDONLY);
if (fd < 0) {
@@ -244,9 +250,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
string_list_split_in_place(, p, ' ', -1);
 
if (unsorted_string_list_has_string(, "fully-peeled"))
-   peeled = PEELED_FULLY;
+   packed_refs->peeled = PEELED_FULLY;
else if (unsorted_string_list_has_string(, "peeled"))
-   peeled = PEELED_TAGS;
+   packed_refs->peeled = PEELED_TAGS;
/* perhaps other traits later as well */
 
/* The "+ 1" is for the LF character. */
@@ -282,8 +288,9 @@ static struct packed_ref_cache *read_packed_refs(struct 
packed_ref_store *refs)
oidclr();
flag |= REF_BAD_NAME | REF_ISBROKEN;
}
-   if (peeled == PEELED_FULLY ||
-   (peeled == PEELED_TAGS && starts_with(refname, 
"refs/tags/")))
+   if (packed_refs->peeled == PEELED_FULLY ||
+   (packed_refs->peeled == PEELED_TAGS &&
+starts_with(refname, "refs/tags/")))
flag |= REF_KNOWS_PEELED;
entry = create_ref_entry(refname, , flag);
add_ref_entry(dir, entry);
-- 
2.14.1



[PATCH v2 16/21] packed_read_raw_ref(): read the reference from the mmapped buffer

2017-09-19 Thread Michael Haggerty
Instead of reading the reference from the `ref_cache`, read it
directly from the mmapped buffer.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 14 +-
 1 file changed, 9 insertions(+), 5 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 53d0b7e3d6..2cf2f7f73d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -875,18 +875,22 @@ static int packed_read_raw_ref(struct ref_store 
*ref_store,
 {
struct packed_ref_store *refs =
packed_downcast(ref_store, REF_STORE_READ, "read_raw_ref");
-
-   struct ref_entry *entry;
+   struct packed_ref_cache *packed_refs = get_packed_ref_cache(refs);
+   const char *rec;
 
*type = 0;
 
-   entry = get_packed_ref(refs, refname);
-   if (!entry) {
+   rec = find_reference_location(packed_refs, refname, 1);
+
+   if (!rec) {
+   /* refname is not a packed reference. */
errno = ENOENT;
return -1;
}
 
-   hashcpy(sha1, entry->u.value.oid.hash);
+   if (get_sha1_hex(rec, sha1))
+   die_invalid_line(refs->path, rec, packed_refs->eof - rec);
+
*type = REF_ISPACKED;
return 0;
 }
-- 
2.14.1



[PATCH v2 11/21] mmapped_ref_iterator_advance(): no peeled value for broken refs

2017-09-19 Thread Michael Haggerty
If a reference is broken, suppress its peeled value.

Signed-off-by: Michael Haggerty 
---
 refs/packed-backend.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/refs/packed-backend.c b/refs/packed-backend.c
index 312116a99d..724c88631d 100644
--- a/refs/packed-backend.c
+++ b/refs/packed-backend.c
@@ -234,9 +234,15 @@ static int mmapped_ref_iterator_advance(struct 
ref_iterator *ref_iterator)
 
/*
 * Regardless of what the file header said, we
-* definitely know the value of *this* reference:
+* definitely know the value of *this* reference. But
+* we suppress it if the reference is broken:
 */
-   iter->base.flags |= REF_KNOWS_PEELED;
+   if ((iter->base.flags & REF_ISBROKEN)) {
+   oidclr(>peeled);
+   iter->base.flags &= ~REF_KNOWS_PEELED;
+   } else {
+   iter->base.flags |= REF_KNOWS_PEELED;
+   }
} else {
oidclr(>peeled);
}
-- 
2.14.1



[PATCH v2 00/21] Read `packed-refs` using mmap()

2017-09-19 Thread Michael Haggerty
This is v2 of a patch series that changes the reading and caching of
the `packed-refs` file to use `mmap()`. Thanks to Junio, Stefan, and
Johannes for their comments about v1 [1].

The main change since v1 is to accommodate Windows, which doesn't let
you replace a file using `rename()` if the file is currently mmapped.
This is unfortunate, because it means that Windows will never get the
O(N) → O(lg N) improvement for reading single references that more
capable systems can now enjoy.

The background was discussed on the mailing list [2]. The bottom line
is that on Windows, keeping the `packed-refs` lock mmapped would be
tantamount to holding reader lock on that file, preventing anybody
(even unrelated processes) from changing the `packed-refs` file while
it is mmapped. This is even worse than the situation for packfiles
(which is solved using `close_all_packs()`), because a packfile, once
created, never needs to be replaced—every packfile has a filename that
is determined from its contents. The worst that can happen if a
packfile is locked is that another process cannot remove it, but that
is not critical for correctness. The `packed-refs` file, on the other
hand, always has the same filename and needs to be overwritten for
correctness.

So the approach taken here is that a new compile-time option,
`MMAP_PREVENTS_DELETE`, is introduced. When this option is set, then
the `packed-refs` file is read quickly into memory then closed.

Even in that case, though, this branch brings significant performance
benefits, because instead of parsing the whole file and storing it
into lots of little objects in a `ref_cache` (which also involves a
lot of small memory allocations), we copy the verbatim contents of the
file into memory. Then we use the same binary search techniques to
find any references that we need to read, just as we would do if the
file were memory mapped. This means that we only have to fully parse
the references that we are interested in, and hardly have to allocate
any additional memory.

I did some more careful benchmarks of this code vs. Git 2.14.1 on a
repository that is not quite as pathological. The test repo has 110k
references that are fully packed in a `packed-refs` file that has the
`sorted` trait. The current version is compiled three ways:

* `NO_MMAP=YesPlease`—prevents all use of `mmap()`. This variant is
  O(N) when reading a single reference.

* `MMAP_PREVENTS_DELETE=YesPlease`—uses mmap for the initial read, but
  quickly copies the contents to heap-allocated memory and munmaps
  right away. This variant is also O(N) when reading a single
  reference.

* default (mmap enabled)—the `packed-refs` file is kept mmapped for as
  long as it is in use.

The commands that I timed were as follows:

# for-each-ref, warm cache:
$ git -C lots-of-refs for-each-ref --format="%(objectname) %(refname)" 
>/dev/null

# rev-parse, warm cache (this command was run 10 times then the total
# time divided by 10):
$ git -C lots-of-refs rev-parse --verify refs/remotes/origin/pr/38733

# rev-parse, cold cache (but git binary warm):
$ sync ; sudo sh -c 'echo 3 >/proc/sys/vm/drop_caches'; git rev-parse v1.0.0; 
time git -C lots-of-refs rev-parse --verify refs/remotes/origin/pr/38733

(Note that the `rev-parse` commands involve a handful of reference
lookups as the argument is DWIMmed.)

Results:

   for-each-ref rev-parse rev-parse
 warm cachewarm cachecold cache
   ----
v2.14.1   92 ms   23.7 ms 30 ms
NO_MMAP=YesPlease 83 ms3.4 ms 10 ms
MMAP_PREVENTS_DELETE=YesPlease82 ms3.5 ms 11 ms
default (mmap enabled)81 ms0.8 ms  6 ms

So this branch is a little bit faster at iterating over all
references, but it really has big advantages when looking up single
references. The advantage is smaller if `NO_MMAP` or
`MMAP_PREVENTS_DELETE` is set, but is still quite significant.

This branch is also available from my fork on GitHub as branch
`mmap-packed-refs`.

My main uncertainties are:

1. Does this code actually work on Windows?

2. Did I implement the new compile-time option correctly? (I just
   cargo-culted some existing options.) Is there some existing option
   that I could piggy-back off of instead of adding a new one?

3. Is a compile-time option sufficient, or would the `mmap()` option
   need to be configurable at runtime, or even tested at repository
   create time as is done for some other filesystem properties in
   `create_default_files()`?

Michael

[1] https://public-inbox.org/git/cover.1505319366.git.mhag...@alum.mit.edu/
[2] 
https://public-inbox.org/git/alpine.DEB.2.21.1.1709142101560.4132@virtualbox/

https://public-inbox.org/git/alpine.DEB.2.21.1.1709150012550.219280@virtualbox/
[3] https://github.com/mhagger/git

Jeff King (1):
  

[PATCH v2 02/21] prefix_ref_iterator: break when we leave the prefix

2017-09-19 Thread Michael Haggerty
From: Jeff King 

If the underlying iterator is ordered, then `prefix_ref_iterator` can
stop as soon as it sees a refname that comes after the prefix. This
will rarely make a big difference now, because `ref_cache_iterator`
only iterates over the directory containing the prefix (and usually
the prefix will span a whole directory anyway). But if *hint, hint* a
future reference backend doesn't itself know where to stop the
iteration, then this optimization will be a big win.

Note that there is no guarantee that the underlying iterator doesn't
include output preceding the prefix, so we have to skip over any
unwanted references before we get to the ones that we want.

Signed-off-by: Jeff King 
Signed-off-by: Michael Haggerty 
---
 refs/iterator.c | 32 +++-
 1 file changed, 31 insertions(+), 1 deletion(-)

diff --git a/refs/iterator.c b/refs/iterator.c
index c475360f0a..bd35da4e62 100644
--- a/refs/iterator.c
+++ b/refs/iterator.c
@@ -287,6 +287,20 @@ struct prefix_ref_iterator {
int trim;
 };
 
+/* Return -1, 0, 1 if refname is before, inside, or after the prefix. */
+static int compare_prefix(const char *refname, const char *prefix)
+{
+   while (*prefix) {
+   if (*refname != *prefix)
+   return ((unsigned char)*refname < (unsigned 
char)*prefix) ? -1 : +1;
+
+   refname++;
+   prefix++;
+   }
+
+   return 0;
+}
+
 static int prefix_ref_iterator_advance(struct ref_iterator *ref_iterator)
 {
struct prefix_ref_iterator *iter =
@@ -294,9 +308,25 @@ static int prefix_ref_iterator_advance(struct ref_iterator 
*ref_iterator)
int ok;
 
while ((ok = ref_iterator_advance(iter->iter0)) == ITER_OK) {
-   if (!starts_with(iter->iter0->refname, iter->prefix))
+   int cmp = compare_prefix(iter->iter0->refname, iter->prefix);
+
+   if (cmp < 0)
continue;
 
+   if (cmp > 0) {
+   /*
+* If the source iterator is ordered, then we
+* can stop the iteration as soon as we see a
+* refname that comes after the prefix:
+*/
+   if (iter->iter0->ordered) {
+   ok = ref_iterator_abort(iter->iter0);
+   break;
+   } else {
+   continue;
+   }
+   }
+
if (iter->trim) {
/*
 * It is nonsense to trim off characters that
-- 
2.14.1



  1   2   >