Re: [PATCH v6 00/15] rebase -i: offer to recreate commit topology

2018-04-17 Thread Sergey Organov
Hi Jacob,

Jacob Keller  writes:

> On Wed, Apr 11, 2018 at 10:42 PM, Sergey Organov  wrote:
>> Hi Jacob,
>>
>> Jacob Keller  writes:
>>> On Wed, Apr 11, 2018 at 6:13 AM, Sergey Organov  wrote:
 It was rather --recreate-merges just a few weeks ago, and I've seen
 nobody actually commented either in favor or against the
 --rebase-merges.

 git rebase --rebase-merges

>>>
>>> I'm going to jump in here and say that *I* prefer --rebase-merges, as
>>> it clearly mentions merge commits (which is the thing that changes).
>>
>> OK, thanks, it's fair and the first argument in favor of --rebase-merges
>> I see.
>>
>
> I'd be ok with "--keep-merges" also. I don't like the idea of
> "flatten" as it, to me, means that anyone who wants to understand the
> option without prior knowledge must immediately read the man page or
> they will be confused. Something like "--rebase-merges" at least my
> coworkers got it instantly. The same could be said for "--keep-merges"
> too, but so far no one I asked said the immediately understood
> "--no-flatten".

If they got --rebase-merges instantly, they should already have known
what "rebase" and "merge" mean. If so, they are likely Git users that
are already familiar with "git rebase" and thus at least heard about a
buddy called --preserve-merges. If it's the case indeed, the outcome
you've got was rather predictable, me thinks.

Now, what are the consequences?

When pleasing maximum number of users of --preserve-merges (and probably
--recreate-merges) is number one target of design, while the rest of
issues are secondary, being in favor of --rebase-merges, --keep-merges,
or ---merges is only natural indeed.

However, I don't believe meeting user expectations should be the number
one criteria of a good design. Sound technical design should come first,
and meeting user expectations, provided they don't contradict the
design, only second. That's how Git was born, that's how it should
continue to evolve. Going in reverse direction, from user expectations
to design, will give us Bzr, not Git.

In discussing of these patch series though I rather see care for user
expectations or preferences being used as an excuse for questionable
design all the time. That's what actually bothers me much more than
choosing particular names for particular options.

Narrowing back to the topic, don't you see, honestly, that there is
something wrong with:

git rebase --rebase-merges

that is supposedly easy to understand even without referring to the
manual, yet when you do happen to refer to the manual, you suddenly
realize it's not that easy to understand:

--rebase-merges[=(rebase-cousins|no-rebase-cousins)]
Rebase merge commits instead of flattening the history by replaying
merges. Merge conflict resolutions or manual amendments to merge
commits are not rebased automatically, but have to be applied
manually.

???

Please read the description. Actually read as if you never knew what's
all this about.

Why does it use "flattening the history" that is supposedly hard to
understand to explain "--rebase-merges" that is supposedly easy to
understand? How comes? And if it's actually a good explanation, why
didn't author just call the option --no-flatten-history, to match its
description?

Next, what is "replaying merges", exactly? That's explaining one term
with another that has not being explained and sounds being even more
vague.

Further, "Merge conflict resolutions or manual amendments to merge
commits are not rebased automatically, but have to be applied manually."
is mutually exclusive with "Rebase merge commits", making all this even
more messy. A merge commit is just content with multiple parents, and
`git rebase`, by definition, reapplies the changes the content
introduces. Any "amendments" or "resolutions" that could have been
happening (or not) when that commit was being created are entirely
irrelevant.

Further yet it goes with:

"By default, or when `no-rebase-cousins` was specified, commits which do
not have `` as direct ancestor will keep their original branch
point."

Really? What does it actually mean? What is "commit branch point",
exactly? What "direct ancestor" means in this context, exactly? Provided
even when I do know what the option actually does, the description looks
wrong, how it could explain anything?

Having all this right here in the patch series, you guys try to convince
me that it should not be fixed? That it meets user expectations? You
know what? I, as a user, have somewhat higher expectations.

Below is my final attempt at actually defining a sane alternative. If
you still find this approach inferior, please feel free to ignore it. I
added "history" at the end of original --no-flatten, as a courtesy to
user expectations, as you seem to prefer more verbose names:

--

--flatten-history
Flatten rebased history by 

Re: [PATCH] worktree: accept -f as short for --force for removal

2018-04-17 Thread Eric Sunshine
On Tue, Apr 17, 2018 at 8:17 PM, Junio C Hamano  wrote:
> Eric Sunshine  writes:
>> Makes sense. A possible rewrite (of the entire commit message):
>>
>> worktree: remove: recognize -f as short for --force
>>
>> Many commands support a --force option, frequently abbreviated as
>> -f, however, "git worktree remove"'s hand-rolled OPT_BOOL forgets
>> to recognize the short form, despite git-worktree.txt documenting
>> -f as supported. Replace OPT_BOOL with OPT__FORCE, which provides
>> -f for free, and makes 'remove' consistent with 'add' option
>> parsing (which also specifies the PARSE_OPT_NOCOMPLETE flag).
>>
> Looks better.  I am not sure if s/--force/-f/ in the synopsis
> section is warranted, but '-f' is commonly understood as '--force'
> (and that is the point of this patch after all), so it is probably
> an improvement to be briefer.

I meant to mention the synopsis change in the rewritten commit
message. The s/--force/-f/ for 'remove' makes it consistent with the
how it's shown for 'add' in the synopsis. (Changing it the other way,
so 'add' shows --force would also work.)


Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-17 Thread Junio C Hamano
Jameson Miller  writes:

> This patch series improves the performance of loading indexes by
> reducing the number of malloc() calls. ...
>
> Jameson Miller (5):
>   read-cache: teach refresh_cache_entry to take istate
>   Add an API creating / discarding cache_entry structs
>   mem-pool: fill out functionality
>   Allocate cache entries from memory pools
>   Add optional memory validations around cache_entry lifecyle
>
>  apply.c|  26 +++---
>  blame.c|   5 +-
>  builtin/checkout.c |   8 +-
>  builtin/difftool.c |   8 +-
>  builtin/reset.c|   6 +-
>  builtin/update-index.c |  26 +++---
>  cache.h|  40 -
>  git.c  |   3 +
>  mem-pool.c | 136 -
>  mem-pool.h |  34 
>  merge-recursive.c  |   4 +-
>  read-cache.c   | 229 
> +++--
>  resolve-undo.c |   6 +-
>  split-index.c  |  31 +--
>  tree.c |   4 +-
>  unpack-trees.c |  27 +++---
>  16 files changed, 476 insertions(+), 117 deletions(-)
>
>
> base-commit: cafaccae98f749ebf33495aec42ea25060de8682

I couldn't quite figure out what these five patches were based on,
even with this line.  Basing on and referring to a commit that is
not part of our published history with "base-commit" is not all that
useful to others.

Offhand without applying these patches and viewing the changes in
wider contexts, one thing that makes me wonder is how the two
allocation schemes can be used with two implementations of free().
Upon add_index_entry() that replaces an index entry for an existing
path, we'd discard an entry that was originally read as part of
read_cache().  If we do that again, the second add_index_entry()
will be discading, in its call to replace_index_entry(), the entry
that was allocated by the caller of the first add_index_entry()
call.  And replace_index_entry() would not have a way to know from
which allocation the entry's memory came from.

Perhaps it is not how you are using the "transient" stuff, and from
the comment in 2/5, it is for "entries that are not meant to go into
the index", but then higher stage index entries in unpack_trees seem
to be allocated via the "transient" stuff, so I am not sure what the
plans are for things like merge_recursive() that uses unpack_trees()
to prepare the index and then later "fix it up" by further futzing
around the index to adjust for renames it finds, etc.

Let me read it fully once we know where these patches are to be
applied, but before that I cannot say much about them X-<.

Thanks.



Re: [PATCH 0/4] doc: cleaning up instances of \--

2018-04-17 Thread Junio C Hamano
Martin Ågren  writes:

> This is a patch series to convert \-- to -- in our documentation. The
> first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to
> --option, 2015-05-13) to fix some instances that have appeared since.
> The other three patches deal with standalone "\--" which we can't
> always turn into "--" since it can be rendered as an em dash.

All looked sensible.  As you mentioned in [2/4], "\--::" that is
part of an enumulation appear in documentation for about a dozen
commands after the series, but I do not think we can avoid it.

One thing that makes me wonder related to these patches is if a
newer AsciiDoc we assume lets us do without {litdd} macro.  This
series and our earlier effort like 1c262bb7 ("doc: convert \--option
to --option", 2015-05-13) mentions that "\--word" is less pleasant
on the eyes than "--word", but the ugliness "two{litdd}words" has
over "two--words" is far worse than that, so...

Thanks, will queue.




Re: [PATCH/RFC] completion: complete all possible -no-

2018-04-17 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> So the other half of this patch, the part in git-completion.bash, is
> to uncomplete --no- options. When you do "git checkout --",
> instead of displaying all --no- options, this patch simply displays
> one item: the --no- prefix. If you do "git checkout --no-" then
> all negative options are displayed. This helps reduce completable
> options quite efficiently.

Clever.

> Of course life is not that simple, we do have --no- options by default
> sometimes (taking priority over the positive form), e.g. "git clone
> --no-checkout". Collapsing all --no-options into --no- would be a
> regression.
>
> To avoid it, the order of options --git-completion-helper returns does
> matter. The first 4 negative options are not collapsed. Only options
> after the 4th are. Extra --no- options are always printed at the end,
> after all the --no- defined in struct option, this kinda works. Not
> pretty but works.

So, the earlier mention of "clone --no-checkout" sounded about not
losing this historical practice, but (desirabilty of magic number 4
aside) this "show first handful of --no-foo" feature is not about
historical practice but is forward looking, in the sense that you do
not mark "important" negated options in the source, which would be a
way to handle the histrical "clone --no-checkout", but let the
machinery mechanically choose among --no-foo (with the stupid choice
criterion "first four are shown").  That allows other commands to
have many --no-foo form without overwhelming the choices, but I am
not sure if it is much better than a possible alternative of only
showing --no-foo for more "important" foo's when show_gitcomp() is
asked to list all of things.  It would certainly be a more involved
solution, that might require an update to the way how the choices
are precomputed (you'd end up having to keep a separate "use this
list when completing '--no-'" in addition to the normal list).

In any case, count this as a vote to support an update in this
direction.  A quite promising work ;-)

Thanks.


Re: [PATCH] docs/git-gc: fix minor rendering issue

2018-04-17 Thread Junio C Hamano
SZEDER Gábor  writes:

> An unwanted single quote character in the paragraph documenting the
> 'gc.aggressiveWindow' config variable prevented the name of that
> config variable from being rendered correctly, ever since that piece
> of docs was added in 0d7566a5ba (Add --aggressive option to 'git gc',
> 2007-05-09).
>
> Remove that single quote.
>
> Signed-off-by: SZEDER Gábor 
> ---
>  Documentation/git-gc.txt | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
> index 3126e0dd00..7c8a2edd48 100644
> --- a/Documentation/git-gc.txt
> +++ b/Documentation/git-gc.txt
> @@ -129,7 +129,7 @@ The optional configuration variable `gc.aggressiveWindow` 
> controls how
>  much time is spent optimizing the delta compression of the objects in
>  the repository when the --aggressive option is specified.  The larger
>  the value, the more time is spent optimizing the delta compression.  See
> -the documentation for the --window' option in linkgit:git-repack[1] for
> +the documentation for the --window option in linkgit:git-repack[1] for
>  more details.  This defaults to 250.
>  
>  Similarly, the optional configuration variable `gc.aggressiveDepth`

Thanks, will queue.  We might want to `monospace` quote these
options that are to be literally typed by the users, but that is
better left to another patch.



Re: [PATCH 01/11] t9902-completion: add tests demonstrating issues with quoted pathnames

2018-04-17 Thread Junio C Hamano
SZEDER Gábor  writes:

>>> +test_expect_failure 'complete files - quoted characters on cmdline' '
>>> + test_when_finished "rm -r \"New(Dir\"" &&
>>
>> This does not use -rf unlike the previous one?
>
> Noted.
>
> The lack of '-f' is leftover from early versions of these tests, when I
> had a hard time getting the quoting-escaping right.  Without the '-f'
> 'rm' errored out when I messed up, and the error message helpfully
> contained the path it wasn't able to delete.

Sounds like we do not want 'f' from both tests, then?  I think it is
OK either way.

>>> +if test_have_prereq !MINGW &&
>>> +   mkdir $'New\034Special\035Dir' 2>/dev/null &&
>>> +   touch $'New\034Special\035Dir/New\036Special\037File' 2>/dev/null
>>
>> The $'...' quote is bash-ism, but this is about testing bash
>> completion, so as long as we make sure non-bash shells won't touch
>> this part of the test, it is OK.
>>
>> Do we want to test a more common case of a filename that is two
>> words with SP in between, i.e.
>>
>> $ >'hello world' && git add hel
>>
>> or is it known to work just fine without quoting/escaping (because
>> the funny we care about is output from ls-files and SP is not special
>> in its one-item-at-a-time-on-a-line output) and not worth checking?
>
> This particular case already works, even without this patch series.

I was more wondering about preventing regressions---"it worked
without this patch series, but now it is broken" is what I was
worried about.

> The problems start when you want to complete the filename after a space,
> e.g. 'hello\ w was the first thing I tried to write a test for, but it didn't work out:
> inside the 'test_completion' helper function the space acts as
> separator, and the completion script then sees 'hello\' and 'w' as two
> separate words.

Hmph.  That is somewhat unfortunate.


Re: What's cooking in git.git (Apr 2018, #02; Tue, 17)

2018-04-17 Thread Junio C Hamano
Stefan Beller  writes:

>>  What's the doneness of this thing?  I didn't recall seeing any
>>  response, especially ones that demonstrated the reviewer carefully
>>  read and thought about the issues surrounding the code.  Not that I
>>  spotted any problems in these patches myself, though.
>
> Stolee and Brandon provided a "quick LGTM" type of review
> https://public-inbox.org/git/20180409232536.gb102...@google.com/
> https://public-inbox.org/git/9ddfee7e-025a-79c9-8d6b-700c65a14...@gmail.com/

Yup.  Giving positive reviews is harder than giving constructive
criticism.  Much harder.  

As readers cannot tell from a "quick LGTM" between "I didn't read it
but it did not smell foul" and "I read it thoroughly, understood how
the solution works, it was presented well, and agree with the design
and implementation---there is nothing to add", the reviewers need to
come up with some way to express that it is the latter case rather
than the former.

I would not claim that I've perfected my technique to do so, but
when responding to such a "good" series, I rephrase the main idea in
the series in my own words to show that I as a reviewer read the
series well enough to be able to do so, perhaps with comparison with
possible alternatives I could think of and dicussion to argue that
the solution presented in the series is better, in an attempt to
demonstrate that I am qualified to say "this one is good" with good
enough understanding of both the issue the series addresses and the
solution in the series.


Re: [PATCH] send-email: avoid duplicate In-Reply-To/References

2018-04-17 Thread Eric Wong
Stefan Agner  wrote:
> This addresses the issue reported here:
> https://public-inbox.org/git/997160314bbafb3088a401f1c09cc...@agner.ch/

Thanks for bringing this up.

> --- a/git-send-email.perl
> +++ b/git-send-email.perl
> @@ -1642,10 +1642,15 @@ foreach my $t (@files) {
>   elsif (/^Content-Transfer-Encoding: (.*)/i) {
>   $xfer_encoding = $1 if not defined 
> $xfer_encoding;
>   }
> + elsif (/^In-Reply-To: (.*)/i) {
> + $in_reply_to = $1;
> + }
> + elsif (/^References: (.*)/i) {
> + $references = $1;
> + }

References: can span multiple lines with --thread=deep in format-patch
(technically any header can be, but References: is common)


Re: [PATCH 2/2] builtin/blame: highlight recently changed lines

2018-04-17 Thread Junio C Hamano
Eric Sunshine  writes:

> On Tue, Apr 17, 2018 at 5:30 PM, Stefan Beller  wrote:
>> Choose a different color for dates and imitate a 'temperature cool down'
>> depending upon age.
>>
>> Originally I had planned to have the temperature cooldown dependent on
>> the age of the project or file for example, as that might scale better,
>> but that can be added on top of this commit, e.g. instead of giving a
>> date, you could imagine giving a percentage that would be the linearly
>> interpolated between now and the beginning of the file.
>>
>> Signed-off-by: Stefan Beller 
>>
>> # Conflicts:
>> #   builtin/blame.c
>
> Meh.

Sloppy.

Will edit it out.

Thanks.


Re: [PATCH 1/2] builtin/blame: dim uninteresting metadata lines

2018-04-17 Thread Junio C Hamano
Stefan Beller  writes:

> @@ -375,6 +378,7 @@ static void emit_other(struct blame_scoreboard *sb, 
> struct blame_entry *ent, int
>   struct commit_info ci;
>   char hex[GIT_MAX_HEXSZ + 1];
>   int show_raw_time = !!(opt & OUTPUT_RAW_TIMESTAMP);
> + const char *color = NULL, *reset = NULL;
>  
>   get_commit_info(suspect->commit, , 1);
>   oid_to_hex_r(hex, >commit->object.oid);
> @@ -384,6 +388,18 @@ static void emit_other(struct blame_scoreboard *sb, 
> struct blame_entry *ent, int
>   char ch;
>   int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? GIT_SHA1_HEXSZ : 
> abbrev;
>  
> + if (opt & OUTPUT_COLOR_LINE) {
> + if (cnt > 0) {
> + color = repeated_meta_color;
> + reset = GIT_COLOR_RESET;
> + } else  {
> + color = "";
> + reset = "";

Shouldn't these be NULL as you do not want to fputs() these when not
in painting mode anyway?  Which would make it consistent with ...

> + }
> + }
> + if (color)
> + fputs(color, stdout);
> +

... this thing which otherwise needs to be "if (color && *color)",
but if you do NULL, can be left as-is ;-)

> @@ -433,6 +449,8 @@ static void emit_other(struct blame_scoreboard *sb, 
> struct blame_entry *ent, int
>   printf(" %*d) ",
>  max_digits, ent->lno + 1 + cnt);
>   }
> + if (reset)
> + fputs(reset, stdout);

Likewise.


Re: [PATCH] worktree: accept -f as short for --force for removal

2018-04-17 Thread Junio C Hamano
Eric Sunshine  writes:

> On Tue, Apr 17, 2018 at 2:19 PM, Stefan Beller  wrote:
>> The force flag is very common in many commands and is commonly
>> abbreviated with '-f', so add that to worktree removal, too by using
>> OPT__FORCE instead of a self cooked OPT_BOOL for force.
>
> The missing bit of this sentence:
>
> ...self cooked OPT_BOOL for force which forgets '-f'.
>
>> While at it, add the PARSE_OPT_NOCOMPLETE flag to the force command line
>> option as forcing a removal may lose files.
>>
>> The short form of "-f" is already documented in the man page,
>> so we do not have to adjust the docs.
>
> Makes sense. A possible rewrite (of the entire commit message):
>
> worktree: remove: recognize -f as short for --force
>
> Many commands support a --force option, frequently abbreviated as
> -f, however, "git worktree remove"'s hand-rolled OPT_BOOL forgets
> to recognize the short form, despite git-worktree.txt documenting
> -f as supported. Replace OPT_BOOL with OPT__FORCE, which provides
> -f for free, and makes 'remove' consistent with 'add' option
> parsing (which also specifies the PARSE_OPT_NOCOMPLETE flag).
>
>> Helped-by: Eric Sunshine 
>> Signed-off-by: Stefan Beller 

Looks better.  I am not sure if s/--force/-f/ in the synopsis
section is warranted, but '-f' is commonly understood as '--force'
(and that is the point of this patch after all), so it is probably
an improvement to be briefer.

Thanks, both.

>
> The patch itself looks good. Thanks. With or without the above rewrite
> or minor adjustment, this patch is:
>
> Reviewed-by: Eric Sunshine 


Re: [PATCH 2/2] builtin/blame: highlight recently changed lines

2018-04-17 Thread Junio C Hamano
Stefan Beller  writes:

> On Mon, Apr 16, 2018 at 8:29 PM, Junio C Hamano  wrote:
>> It seems that this
>>
>> $ git -c color.blame.repeatedlines=cyan blame --heated-lines builtin/blame.c
>>
>> refuses to run.
>>
>> Would it work if the configuration is in .git/config instead, or
>> would it forever disable --heated-lines once somebody choses to use
>> --color-lines feature by default by configuring it in?
>
> That is the unfortunate part of this series, I have not figured out how to
> treat these two options at the same time.


Perhaps I wasn't clear enough, but I did not want to use both at the
same time.  "git blame --color-lines --heated-lines" that errors out
saying these cannot be used at the same time is an acceptable
limitation.

My sole complaint was that just like command line is used to
override (weaker) configs, the wish to use "repeatedlines" painting
by default expressed in the configuration form should be overriden
when there is an explicit command line --heated-lines option that is
incompatible with it.

In this particular case, you might be able to come up with a scheme
where both can be made effective at the same time, but the principle
still stands, and that is the more important lesson I'd like to see
you learn.

Thanks.



Re: man page for "git remote set-url" seems confusing/contradictory

2018-04-17 Thread Junio C Hamano
Jacob Keller  writes:

> Maybe something like:
>
> "Note that the push URL and the fetch URL, even though they can be set
> differently, are expected to refer to the same repository. For
> example, the fetch URL might use an unauthenticated transport, while
> the push URL generally requires authentication" Then follow this bit
> with the mention of multiple remotes.
>
> (I think "repository" conveys the meaning better than "place".
> Technically, the URLs can be completely different as long as they end
> up contacting the same real server repository.)

Sounds sensible.  Let's see if there is any further input and then
somebody pleaes wrap this up in a final patch ;-)


Re: [PATCH v2] completion: reduce overhead of clearing cached --options

2018-04-17 Thread Junio C Hamano
SZEDER Gábor  writes:

> To get the names of all '$__git_builtin_*' variables caching --options
> of builtin commands in order to unset them, 8b0eaa41f2 (completion:
> clear cached --options when sourcing the completion script,
> 2018-03-22) runs a 'set |sed s///' pipeline.  This works both in Bash
> and in ZSH, but has a higher than necessary overhead with the extra
> processes.
>
> In Bash we can do better: run the 'compgen -v __gitcomp_builtin_'
> builtin command, which lists the same variables, but without a
> pipeline and 'sed' it can do so with lower overhead.
> ZSH will still continue to run that pipeline.
> 
> This change also happens to work around an issue in the default Bash
> ...
> Updated the commit message to explicitly mention that ZSH is
> unaffected.  The patch is the same.

Thanks.


Re: [PATCH 01/11] t9902-completion: add tests demonstrating issues with quoted pathnames

2018-04-17 Thread SZEDER Gábor
On Wed, Apr 18, 2018 at 1:32 AM, SZEDER Gábor  wrote:
> On Tue, Apr 17, 2018 at 5:48 AM, Junio C Hamano  wrote:
>> SZEDER Gábor  writes:
>>
>>> Do any more new tests need FUNNYNAMES* prereq?
>>
>> Hmph, all of these look like they involve some funnynames ;-)
>
> Well, I can' create a directory with a '|' in its name on FAT32 (on
> Linux), so this needs FUNNYNAMES prereq, too.

Or, on second thought, using a different, more widely usable character
would be better, so the test can be run on more platforms.


>> Do we want to test a more common case of a filename that is two
>> words with SP in between, i.e.
>>
>> $ >'hello world' && git add hel
>>
>> or is it known to work just fine without quoting/escaping (because
>> the funny we care about is output from ls-files and SP is not special
>> in its one-item-at-a-time-on-a-line output) and not worth checking?
>
> This particular case already works, even without this patch series.
>
> The problems start when you want to complete the filename after a space,
> e.g. 'hello\ w was the first thing I tried to write a test for, but it didn't work out:
> inside the 'test_completion' helper function the space acts as
> separator, and the completion script then sees 'hello\' and 'w' as two
> separate words.

On another second thought, a test for the already working 'hel'
case could make sure that we won't mess up the value of IFS when filling
COMPREPLY.


Re: [PATCH 01/11] t9902-completion: add tests demonstrating issues with quoted pathnames

2018-04-17 Thread SZEDER Gábor
On Tue, Apr 17, 2018 at 5:48 AM, Junio C Hamano  wrote:
> SZEDER Gábor  writes:
>
>> Do any more new tests need FUNNYNAMES* prereq?
>
> Hmph, all of these look like they involve some funnynames ;-)

Well, I can' create a directory with a '|' in its name on FAT32 (on
Linux), so this needs FUNNYNAMES prereq, too.

>> +test_expect_failure 'complete files - escaped characters on cmdline' '
>> + test_when_finished "rm -rf \"New|Dir\"" &&
>> + mkdir "New|Dir" &&
>> + >"New|Dir/New" &&
>> +
>> + test_completion "git test-path-comp N" \
>> + "New|Dir" &&# Bash will turn this into "New\|Dir/"
>> + test_completion "git test-path-comp New\\|D" \
>> + "New|Dir" &&
>> + test_completion "git test-path-comp New\\|Dir/N" \
>> + "New|Dir/New" && # Bash will turn this into
>> + # "New\|Dir/New\ "
>> + test_completion "git test-path-comp New\\|Dir/New\\" \
>> + "New|Dir/New"
>> +'
>> +
>> +test_expect_failure 'complete files - quoted characters on cmdline' '
>> + test_when_finished "rm -r \"New(Dir\"" &&
>
> This does not use -rf unlike the previous one?

Noted.

The lack of '-f' is leftover from early versions of these tests, when I
had a hard time getting the quoting-escaping right.  Without the '-f'
'rm' errored out when I messed up, and the error message helpfully
contained the path it wasn't able to delete.

>> + mkdir "New(Dir" &&
>> + >"New(Dir/New)File.c" &&
>> +
>> + test_completion "git test-path-comp \"New(D" "New(Dir" &&
>> + test_completion "git test-path-comp \"New(Dir/New)F" \
>> + "New(Dir/New)File.c"
>> +'
>
> OK.
>
>> +test_expect_failure 'complete files - UTF-8 in ls-files output' '
>> + test_when_finished "rm -r árvíztűrő" &&
>> + mkdir árvíztűrő &&
>> + >"árvíztűrő/Сайн яваарай" &&
>> +
>> + test_completion "git test-path-comp á" "árvíztűrő" &&
>> + test_completion "git test-path-comp árvíztűrő/С" \
>> + "árvíztűrő/Сайн яваарай"
>> +'
>> +
>> +if test_have_prereq !MINGW &&
>> +   mkdir 'New\Dir' 2>/dev/null &&
>> +   touch 'New\Dir/New"File.c' 2>/dev/null
>> +then
>> + test_set_prereq FUNNYNAMES_BS_DQ
>> +else
>> + say "Your filesystem does not allow \\ and \" in filenames."
>> + rm -rf 'New\Dir'
>> +fi
>> +test_expect_failure FUNNYNAMES_BS_DQ \
>> +'complete files - C-style escapes in ls-files output' '
>> + test_when_finished "rm -r \"NewDir\"" &&
>> +
>> + test_completion "git test-path-comp N" "New\\Dir" &&
>> + test_completion "git test-path-comp NewD" "New\\Dir" &&
>> + test_completion "git test-path-comp NewDir/N" \
>> + "New\\Dir/New\"File.c" &&
>> + test_completion "git test-path-comp NewDir/New\\\"F" \
>> + "New\\Dir/New\"File.c"
>> +'
>> +
>> +if test_have_prereq !MINGW &&
>> +   mkdir $'New\034Special\035Dir' 2>/dev/null &&
>> +   touch $'New\034Special\035Dir/New\036Special\037File' 2>/dev/null
>
> The $'...' quote is bash-ism, but this is about testing bash
> completion, so as long as we make sure non-bash shells won't touch
> this part of the test, it is OK.
>
> Do we want to test a more common case of a filename that is two
> words with SP in between, i.e.
>
> $ >'hello world' && git add hel
>
> or is it known to work just fine without quoting/escaping (because
> the funny we care about is output from ls-files and SP is not special
> in its one-item-at-a-time-on-a-line output) and not worth checking?

This particular case already works, even without this patch series.

The problems start when you want to complete the filename after a space,
e.g. 'hello\ w

Re: GDPR compliance best practices?

2018-04-17 Thread Peter Backes
On Tue, Apr 17, 2018 at 11:38:26PM +0200, Ævar Arnfjörð Bjarmason wrote:
> I've been loosely following a similar discussion around blockchains and
> my understanding of the situation is that for a project such as say
> Linux the GDPR gives you this potential out for that[1]:
> 
> "the personal data are no longer necessary in relation to the
> purposes for which they were collected or otherwise processed"
> 
> I.e. you understand that when you submit a patch to linux.git how it's
> going to get used, and that it's in a storage system that isn't going to
> be pruned just because you ask for it.
> [...]
> You can make a compelling case that for say submitting your data to the
> Bitcoin blockhcain the above quote from article 17 overrides it

Well, you're quoting from lit. a but there's also lit. b to f! It says 
"one of the following grounds applies", not "all of ...".

> This is very different from you say joining a company, committing to its
> internal git repo, and your name being there in perpetuity, or choosing
> to submit a patch to linux.git or git.git.
>
> I'd think that would be handled the same way as a structural engineering
> firm being able to record in perpetuity who it was that drew up the
> design for some bridge.

Internal repo is entirely unproblematic, since you don't need consent 
for doing that. It is covered by Art. 6 (1) lit. f.

The problem is public repos. Publishing employee information is 
generally considered not to be covered by Art. 6 (1) lit. f. After all, 
you can easily publish the software but not the repo.

> I don't think it's plausible that the GDPR,
> which is probably mainly going to be about consumer protection, is going
> to concern itself with that in practice.

Oh, no, GDPR is about privacy in general. It's not only about consumer 
protection. It applies in the same way to employees in relation to 
their employer and to citizens in relation to the authorities, and to 
open source contributors in relation to the projects, or to any other 
data processing outside family and friends (Art. 2 (2) lit. c).

I am inclined to assume that Art. 6 (1) lit. b might be the solution, 
since the licenses typically demand a history of changes to be 
distributed with the program (for example, GPLv3 section 5 a). After 
all, the author generally wants to be given credit for his changes and 
it can be assumed that this one of the conditions for licensing the 
work in the first place.

On the other hand, of course, the author could waive the condition at 
any time, which means Art. 6 (1) lit. b wouldn't apply anymore and 
you'd have the same issue as with consent-based processing of the 
information (lit. a).

Best wishes
Peter


-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: [PATCH v1 2/5] Add an API creating / discarding cache_entry structs

2018-04-17 Thread Ben Peart



On 4/17/2018 12:34 PM, Jameson Miller wrote:

Add an API around managing the lifetime of cache_entry structs. Abstracting
memory management details behind an API will allow for alternative memory
management strategies without affecting all the call sites.  This commit does
not change how memory is allocated / freed. A later commit in this series will
allocate cache entries from memory pools as appropriate.

Motivation:
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This
change is in preparation for using memory pools to reduce the number
of malloc() calls made when loading an index.

This API makes a distinction between cache entries that are intended for use
with a particular to an index and cache entries that are not. 


The wording here is awkward.  Did you mean "intended for use with a 
particular index?"


This enables us

to use the knowledge about how a cache entry will be used to make informed
decisions about how to handle the corresponding memory.
---
  apply.c|  26 ++--
  blame.c|   5 +--
  builtin/checkout.c |   8 ++--
  builtin/difftool.c |   8 ++--
  builtin/reset.c|   6 +--
  builtin/update-index.c |  26 ++--
  cache.h|  29 +-
  merge-recursive.c  |   2 +-
  read-cache.c   | 105 +++--
  resolve-undo.c |   6 ++-
  split-index.c  |   8 ++--
  tree.c |   4 +-
  unpack-trees.c |  27 -
  13 files changed, 166 insertions(+), 94 deletions(-)

diff --git a/apply.c b/apply.c
index 7e5792c996..47903f427b 100644
--- a/apply.c
+++ b/apply.c
@@ -4090,12 +4090,12 @@ static int build_fake_ancestor(struct apply_state 
*state, struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
  
-		ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);

+   ce = make_index_cache_entry(, patch->old_mode, oid.hash, 
name, 0, 0);
if (!ce)
-   return error(_("make_cache_entry failed for path '%s'"),
+   return error(_("make_index_cache_entry failed for path 
'%s'"),
 name);
if (add_index_entry(, ce, ADD_CACHE_OK_TO_ADD)) {
-   free(ce);
+   index_cache_entry_discard(ce);


I personally prefer name symmetry.  To me, make_index_cache_entry() 
should be paired with discard_index_cache_entry().


The rest of this patch looks like a fairly mechanical refactoring with 
the biggest exception being the difference between the the 
*_index_cache_entry() APIs and the *_transient_cache_entry() APIs.


There are quite a few changes but I didn't see any instances that were 
missed or any errors.  I see that later patches will put verification 
code in place to detect if any were done incorrectly and to prevent 
regressions moving forward.


Overall, it looks correct and reasonable.



  
-struct cache_entry *make_cache_entry(unsigned int mode,

-   const unsigned char *sha1, const char *path, int stage,
-   unsigned int refresh_options)
+struct cache_entry *make_empty_index_cache_entry(struct index_state *istate, 
size_t len)
+{
+   return xcalloc(1, cache_entry_size(len));
+}
+
+struct cache_entry *make_empty_transient_cache_entry(size_t len)
+{
+   return xcalloc(1, cache_entry_size(len));
+}
+
+struct cache_entry *make_index_cache_entry(struct index_state *istate, 
unsigned int mode,
+   const unsigned char *sha1, const char *path,
+   int stage, unsigned int refresh_options)
  {
-   int size, len;
struct cache_entry *ce, *ret;
+   int len;
  
  	if (!verify_path(path)) {

error("Invalid path '%s'", path);
@@ -758,8 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
}
  
  	len = strlen(path);

-   size = cache_entry_size(len);
-   ce = xcalloc(1, size);
+   ce = make_empty_index_cache_entry(istate, len);
  
  	hashcpy(ce->oid.hash, sha1);

memcpy(ce->name, path, len);
@@ -769,10 +777,34 @@ struct cache_entry *make_cache_entry(unsigned int mode,
  
  	ret = refresh_cache_entry(_index, ce, refresh_options);

if (ret != ce)
-   free(ce);
+   index_cache_entry_discard(ce);
+
return ret;
  }
  
+struct cache_entry *make_transient_cache_entry(unsigned int mode, const unsigned char *sha1,

+  const char *path, int stage)
+{
+   struct cache_entry *ce;
+   int len;
+
+   if (!verify_path(path)) {
+   error("Invalid path '%s'", path);
+   return NULL;
+   }
+
+   len = strlen(path);
+   ce = make_empty_transient_cache_entry(len);
+
+  

Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-17 Thread Philip Oakley

From: "Duy Nguyen"  : Tuesday, April 17, 2018 5:48 PM

On Tue, Apr 17, 2018 at 06:24:41PM +0200, Duy Nguyen wrote:
On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley  
wrote:
> From: "Duy Nguyen"  : Saturday, April 14, 2018 4:44 
> PM

>
>> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley 
>> wrote:
>>>
>>> I'm only just catching up, but does/can this series also capture the
>>> non-command guides that are available in git so that the 'git 
>>> help -g'

>>> can
>>> begin to list them all?
>>
>>
>> It currently does not. But I don't see why it should not. This should
>> allow git.txt to list all the guides too, for people who skip "git
>> help" and go hard core mode with "man git". Thanks for bringing this
>> up.
>> --
>> Duy
>>
> Is that something I should add to my todo to add a 'guide' category 
> etc.?


I added it too [1]. Not sure if you want anything more on top though.


What I've seen is looking good - I've not had as much time as I'd like..

I'm not sure of the status of the git/generate-cmdlist.sh though. Should 
that also be updated, or did I miss that?

--
Philip



The "anything more" that at least I had in mind was something like
this. Though I'm not sure if it's a good thing to replace a hand
crafted section with an automatedly generated one. This patch on top
combines the "SEE ALSO" and "FURTHER DOCUMENT" into one with most of
documents/guides are extracted from command-list.txt

-- 8< --
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6232143cb9..3e0ecd2e11 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -292,6 +292,7 @@ doc.dep : $(docdep_prereqs) $(wildcard *.txt) 
build-docdep.perl


cmds_txt = cmds-ancillaryinterrogators.txt \
 cmds-ancillarymanipulators.txt \
+ cmds-guide.txt \
 cmds-mainporcelain.txt \
 cmds-plumbinginterrogators.txt \
 cmds-plumbingmanipulators.txt \
diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index 5aa73cfe45..e158bd9b96 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -54,6 +54,7 @@ for (sort <>) {

for my $cat (qw(ancillaryinterrogators
 ancillarymanipulators
+ guide
 mainporcelain
 plumbinginterrogators
 plumbingmanipulators
diff --git a/Documentation/git.txt b/Documentation/git.txt
index 4767860e72..d60d2ae0c7 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -808,29 +808,6 @@ The index is also capable of storing multiple entries 
(called "stages")

for a given pathname.  These stages are used to hold the various
unmerged version of a file when a merge is in progress.

-FURTHER DOCUMENTATION
--
-
-See the references in the "description" section to get started
-using Git.  The following is probably more detail than necessary
-for a first-time user.
-
-The link:user-manual.html#git-concepts[Git concepts chapter of the
-user-manual] and linkgit:gitcore-tutorial[7] both provide
-introductions to the underlying Git architecture.
-
-See linkgit:gitworkflows[7] for an overview of recommended workflows.
-
-See also the link:howto-index.html[howto] documents for some useful
-examples.
-
-The internals are documented in the
-link:technical/api-index.html[Git API documentation].
-
-Users migrating from CVS may also want to
-read linkgit:gitcvs-migration[7].
-
-
Authors
---
Git was started by Linus Torvalds, and is currently maintained by Junio
@@ -854,11 +831,16 @@ the Git Security mailing list 
.


SEE ALSO

-linkgit:gittutorial[7], linkgit:gittutorial-2[7],
-linkgit:giteveryday[7], linkgit:gitcvs-migration[7],
-linkgit:gitglossary[7], linkgit:gitcore-tutorial[7],
-linkgit:gitcli[7], link:user-manual.html[The Git User's Manual],
-linkgit:gitworkflows[7]
+
+See the references in the "description" section to get started
+using Git.  The following is probably more detail than necessary
+for a first-time user.
+
+include::cmds-guide.txt[]
+
+See also the link:howto-index.html[howto] documents for some useful
+examples. The internals are documented in the
+link:technical/api-index.html[Git API documentation].

GIT
---
diff --git a/command-list.txt b/command-list.txt
index 1835f1a928..f26b8acd52 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -150,10 +150,14 @@ git-whatchanged 
ancillaryinterrogators

git-worktreemainporcelain
git-write-tree  plumbingmanipulators
gitattributes   guide
+gitcvs-migrationguide
+gitcli  guide
+gitcore-tutorialguide
giteveryday guide
gitglossary guide
gitignore   guide
gitmodules  guide
gitrevisionsguide
gittutorial guide
+gittutorial-2   guide

Re: [PATCH 1/1] git-p4: fix `sync --verbose` traceback due to 'fileSize'

2018-04-17 Thread Thandesha VK
Ah. I didn't realize the script is not using p4 sizes to get the size.
I assumed that it is using p4 sizes. Now I am looking at it using p4
-G print.
However, when the stack trace happened, I verified what is wrong and
found out that the fileSize key is not returned for "p4 -G sizes"
command.


So what I am saying is that we cannot use p4 sizes in this case to
solve the problem as even p4 sizes will fail with the same fileSize
not found stack trace.
We can continue as this is not as Critical. However it is a good idea
to let user know that they need to take action to correct this file at
the perforce side.

So irrespective of we are using p4 print or p4 sizes, the fileSize is
not returned in some cases and warning or aborting is something we
need to do.
Just ignoring may not be a good choice.

On Tue, Apr 17, 2018 at 2:38 PM, Mazo, Andrey  wrote:
> Thandesha,
>
> If I read your patch correctly, it adds a warning in case `p4 -G print` 
> doesn't return "fileSize" (not `p4 sizes`).
> I don't see `p4 sizes` being used by git-p4 at all.
> As I said earlier, for our ancient Perforce server, `p4 -G print` _never_ 
> returns "fileSize".
> So, it's definitely not a reason to abort.
>
> Thank you,
> Andrey
>
> From: Thandesha VK 
>> My fix is for the case where p4 -G sizes not returning the key and
>> value for fileSize. This can happen in some cases. Only option at that
>> point of time is to warn the user about the problematic file and keep
>> moving (or should we abort??)
>>
>> Thanks
>> Thandesha
>>
>> On Tue, Apr 17, 2018 at 2:18 PM, Mazo, Andrey  wrote:
>>> Luke,
>>>
>>> Thank you for reviewing and acking my patch!
>>> By the way, did you see Thandesha's proposed patch [1] to print a warning 
>>> in case of the missing "fileSize" attribute?
>>> Should we go that route instead?
>>> Or should we try harder to get the size by running `p4 -G sizes`?
>>>
>>> [1]  
>>> https://public-inbox.org/git/cajjpmi-plb4qcka5alkxa8b1vozfff+oaq0fguq9yviobrp...@mail.gmail.com/t/#m6053d2031020e08edd24ada6c9eb49721ebc4e27
>>>
>>> Thank you,
>>> Andrey
>>>
>>> From: Luke Diamand 
 On Tue, 17 Apr 2018, 09:22 Andrey Mazo,  wrote:
>  Perforce server 2007.2 (and maybe others) doesn't return "fileSize"
> attribute in its reply to `p4 -G print` command.
> This causes the following traceback when running `git p4 sync --verbose`:
> """
> Traceback (most recent call last):
>   File "/usr/libexec/git-core/git-p4", line 3839, in 
> main()
>   File "/usr/libexec/git-core/git-p4", line 3833, in main
> if not cmd.run(args):
>   File "/usr/libexec/git-core/git-p4", line 3567, in run
> self.importChanges(changes)
>   File "/usr/libexec/git-core/git-p4", line 3233, in importChanges
> self.commit(description, filesForCommit, branch, parent)
>   File "/usr/libexec/git-core/git-p4", line 2855, in commit
> self.streamP4Files(files)
>   File "/usr/libexec/git-core/git-p4", line 2747, in streamP4Files
> cb=streamP4FilesCbSelf)
>   File "/usr/libexec/git-core/git-p4", line 552, in p4CmdList
> cb(entry)
>   File "/usr/libexec/git-core/git-p4", line 2741, in 
> streamP4FilesCbSelf
> self.streamP4FilesCb(entry)
>   File "/usr/libexec/git-core/git-p4", line 2689, in streamP4FilesCb
> self.streamOneP4File(self.stream_file, self.stream_contents)
>   File "/usr/libexec/git-core/git-p4", line 2566, in streamOneP4File
> size = int(self.stream_file['fileSize'])
> KeyError: 'fileSize'
> """
>
> Fix this by omitting the file size information from the verbose print out.
> Also, don't use "self.stream_file" directly,
> but rather use passed in "file" argument.
> (which point to the same "self.stream_file" for all existing callers)
>
> Signed-off-by: Andrey Mazo 
> ---
>  git -p4.py | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
> diff --git a/git-p4.py b/git-p4.py
> index 7bb9cadc6..6f05f915a 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -2566,8 +2566,12 @@ class P4Sync(Command, P4UserMap):
>  relPath = self.stripRepoPath(file['depotFile'], 
> self.branchPrefixes)
>  relPath = self.encodeWithUTF8(relPath)
>  if verbose:
> -size = int(self.stream_file['fileSize'])
> -sys.stdout.write('\r%s --> %s (%i MB)\n' % 
> (file['depotFile'], relPath, size/1024/1024))
> +size = file.get('fileSize', None)
> +if size is None:
> +sizeStr = ''
> +else:
> +sizeStr = ' (%i MB)' % (int(size)/1024/1024)
> +sys.stdout.write('\r%s --> %s%s\n' % (file['depotFile'], 

[PATCH v2] completion: reduce overhead of clearing cached --options

2018-04-17 Thread SZEDER Gábor
To get the names of all '$__git_builtin_*' variables caching --options
of builtin commands in order to unset them, 8b0eaa41f2 (completion:
clear cached --options when sourcing the completion script,
2018-03-22) runs a 'set |sed s///' pipeline.  This works both in Bash
and in ZSH, but has a higher than necessary overhead with the extra
processes.

In Bash we can do better: run the 'compgen -v __gitcomp_builtin_'
builtin command, which lists the same variables, but without a
pipeline and 'sed' it can do so with lower overhead.
ZSH will still continue to run that pipeline.

This change also happens to work around an issue in the default Bash
version shipped in macOS (3.2.57), reported by users of the Powerline
shell prompt, which was triggered by the same commit 8b0eaa41f2 as
well.  Powerline uses several Unicode Private Use Area code points to
represent some of its pretty text UI elements (arrows and what not),
and these are stored in the $PS1 variable.  Apparently the 'set'
builtin of said Bash version on macOS has issues with these code
points, and produces garbled output where Powerline's special symbols
should be in the $PS1 variable.  This, in turn, triggers the following
error message in the downstream 'sed' process:

  sed: RE error: illegal byte sequence

Other Bash versions, notably 4.4.19 on macOS via homebrew (i.e. a
newer version on the same platform) and 3.2.25 on CentOS (i.e. a
slightly earlier version, though on a different platform) are not
affected.  ZSH in macOS (the versions shipped by default or installed
via homebrew) or on other platforms isn't affected either.

With this patch neither the 'set' builtin is invoked to print garbage,
nor 'sed' to choke on it.

Issue-on-macOS-reported-by: Stephon Harris 
Issue-on-macOS-explained-by: Matthew Coleman 
Signed-off-by: SZEDER Gábor 
---

Updated the commit message to explicitly mention that ZSH is
unaffected.  The patch is the same.

 contrib/completion/git-completion.bash | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index b09c8a2362..4ef59a51be 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -282,7 +282,11 @@ __gitcomp ()
 
 # Clear the variables caching builtins' options when (re-)sourcing
 # the completion script.
-unset $(set |sed -ne 
's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null
+if [[ -n ${ZSH_VERSION-} ]]; then
+   unset $(set |sed -ne 
's/^\(__gitcomp_builtin_[a-zA-Z0-9_][a-zA-Z0-9_]*\)=.*/\1/p') 2>/dev/null
+else
+   unset $(compgen -v __gitcomp_builtin_)
+fi
 
 # This function is equivalent to
 #
-- 
2.17.0.366.gbe216a3084



Re: [PATCH 2/2] builtin/blame: highlight recently changed lines

2018-04-17 Thread Eric Sunshine
On Tue, Apr 17, 2018 at 5:30 PM, Stefan Beller  wrote:
> Choose a different color for dates and imitate a 'temperature cool down'
> depending upon age.
>
> Originally I had planned to have the temperature cooldown dependent on
> the age of the project or file for example, as that might scale better,
> but that can be added on top of this commit, e.g. instead of giving a
> date, you could imagine giving a percentage that would be the linearly
> interpolated between now and the beginning of the file.
>
> Signed-off-by: Stefan Beller 
>
> # Conflicts:
> #   builtin/blame.c

Meh.


Re: GDPR compliance best practices?

2018-04-17 Thread Ævar Arnfjörð Bjarmason

On Tue, Apr 17 2018, Peter Backes wrote:

> I'd like to ask whether anyone has best practices for achieving GDPR
> compliance for git repos? The GDPR will come into effect in the EU next
> month.
>
> In particular, how do you cope with the "Right to erasure" concerning
> entries in the history of your git repos?
>
> Erasing author names from the history changes the commit hashes.  It is
> well known that this leads to a lot of problems.  So I don't consider
> this a workable solution.
>
> And how do you justify publishing your employee's name/email as part of
> a git commit under GDPR rules in the first place?
>
> github has the following page mentioning the "Right to erasure" but
> AFAICS nothing about how it will be implemented
> https://about.gitlab.com/gdpr/
>
> Here are discussions I found but they do not really provide a solution:
> https://law.stackexchange.com/questions/24623/gdpr-git-history
> https://news.ycombinator.com/item?id=16509755

[Not a lawyer and all that]

I've been loosely following a similar discussion around blockchains and
my understanding of the situation is that for a project such as say
Linux the GDPR gives you this potential out for that[1]:

"the personal data are no longer necessary in relation to the
purposes for which they were collected or otherwise processed"

I.e. you understand that when you submit a patch to linux.git how it's
going to get used, and that it's in a storage system that isn't going to
be pruned just because you ask for it.

In combination with the "Conditions for consent"[2] this becomes a bit
more tricky. I.e. "The data subject shall have the right to withdraw his
or her consent at any time".

You can make a compelling case that for say submitting your data to the
Bitcoin blockhcain the above quote from article 17 overrides it, but can
you for other hash-based-on-hash systems like linux.git? Maybe, maybe
not. I think nobody really knows at this point.

What I do think is for sure is that there's not going to be any one size
fits all solution based on the underlying technology.

If I start storing my webserver access logs with IP information in a git
repo, I don't get to say "sorry git stores stuff this way, I don't want
to rebase it". No court's going to buy that, I've just gone out of my
way to use technology that circumvents the GDPR for no particularly good
reason.

This is very different from you say joining a company, committing to its
internal git repo, and your name being there in perpetuity, or choosing
to submit a patch to linux.git or git.git.

I'd think that would be handled the same way as a structural engineering
firm being able to record in perpetuity who it was that drew up the
design for some bridge. I don't think it's plausible that the GDPR,
which is probably mainly going to be about consumer protection, is going
to concern itself with that in practice.

There's a lot of middle ground in between those two
though. E.g. children are specially protected under the GDPR. Is Linus
going to say he doesn't want to rebase linux.git after some 14 year old
who regrets submitting code doesn't want his name there anymore? Who
knows.

Depending on such common cases maybe git itself should eventually
support some ways to work around the issues. E.g. we could have some
mode to always supply a fake name/e-mail, or make the notice
implicit_ident_advice() spews out somewhat scarier.

1. https://gdpr-info.eu/art-17-gdpr/

2. https://gdpr-info.eu/art-7-gdpr/


[PATCH] docs/git-gc: fix minor rendering issue

2018-04-17 Thread SZEDER Gábor
An unwanted single quote character in the paragraph documenting the
'gc.aggressiveWindow' config variable prevented the name of that
config variable from being rendered correctly, ever since that piece
of docs was added in 0d7566a5ba (Add --aggressive option to 'git gc',
2007-05-09).

Remove that single quote.

Signed-off-by: SZEDER Gábor 
---
 Documentation/git-gc.txt | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 3126e0dd00..7c8a2edd48 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -129,7 +129,7 @@ The optional configuration variable `gc.aggressiveWindow` 
controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
 the value, the more time is spent optimizing the delta compression.  See
-the documentation for the --window' option in linkgit:git-repack[1] for
+the documentation for the --window option in linkgit:git-repack[1] for
 more details.  This defaults to 250.
 
 Similarly, the optional configuration variable `gc.aggressiveDepth`
-- 
2.17.0.366.gbe216a3084



Re: [PATCH 1/1] git-p4: fix `sync --verbose` traceback due to 'fileSize'

2018-04-17 Thread Mazo, Andrey
Thandesha,

If I read your patch correctly, it adds a warning in case `p4 -G print` doesn't 
return "fileSize" (not `p4 sizes`).
I don't see `p4 sizes` being used by git-p4 at all.
As I said earlier, for our ancient Perforce server, `p4 -G print` _never_ 
returns "fileSize".
So, it's definitely not a reason to abort.

Thank you,
Andrey

From: Thandesha VK 
> My fix is for the case where p4 -G sizes not returning the key and
> value for fileSize. This can happen in some cases. Only option at that
> point of time is to warn the user about the problematic file and keep
> moving (or should we abort??)
> 
> Thanks
> Thandesha
> 
> On Tue, Apr 17, 2018 at 2:18 PM, Mazo, Andrey  wrote:
>> Luke,
>>
>> Thank you for reviewing and acking my patch!
>> By the way, did you see Thandesha's proposed patch [1] to print a warning in 
>> case of the missing "fileSize" attribute?
>> Should we go that route instead?
>> Or should we try harder to get the size by running `p4 -G sizes`?
>>
>> [1]  
>> https://public-inbox.org/git/cajjpmi-plb4qcka5alkxa8b1vozfff+oaq0fguq9yviobrp...@mail.gmail.com/t/#m6053d2031020e08edd24ada6c9eb49721ebc4e27
>>
>> Thank you,
>> Andrey
>>
>> From: Luke Diamand 
>>> On Tue, 17 Apr 2018, 09:22 Andrey Mazo,  wrote:
  Perforce server 2007.2 (and maybe others) doesn't return "fileSize"
 attribute in its reply to `p4 -G print` command.
 This causes the following traceback when running `git p4 sync --verbose`:
 """
 Traceback (most recent call last):
   File "/usr/libexec/git-core/git-p4", line 3839, in 
 main()
   File "/usr/libexec/git-core/git-p4", line 3833, in main
 if not cmd.run(args):
   File "/usr/libexec/git-core/git-p4", line 3567, in run
 self.importChanges(changes)
   File "/usr/libexec/git-core/git-p4", line 3233, in importChanges
 self.commit(description, filesForCommit, branch, parent)
   File "/usr/libexec/git-core/git-p4", line 2855, in commit
 self.streamP4Files(files)
   File "/usr/libexec/git-core/git-p4", line 2747, in streamP4Files
 cb=streamP4FilesCbSelf)
   File "/usr/libexec/git-core/git-p4", line 552, in p4CmdList
 cb(entry)
   File "/usr/libexec/git-core/git-p4", line 2741, in 
streamP4FilesCbSelf
 self.streamP4FilesCb(entry)
   File "/usr/libexec/git-core/git-p4", line 2689, in streamP4FilesCb
 self.streamOneP4File(self.stream_file, self.stream_contents)
   File "/usr/libexec/git-core/git-p4", line 2566, in streamOneP4File
 size = int(self.stream_file['fileSize'])
 KeyError: 'fileSize'
 """

 Fix this by omitting the file size information from the verbose print out.
 Also, don't use "self.stream_file" directly,
 but rather use passed in "file" argument.
 (which point to the same "self.stream_file" for all existing callers)

 Signed-off-by: Andrey Mazo 
 ---
  git -p4.py | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)

 diff --git a/git-p4.py b/git-p4.py
 index 7bb9cadc6..6f05f915a 100755
 --- a/git-p4.py
 +++ b/git-p4.py
 @@ -2566,8 +2566,12 @@ class P4Sync(Command, P4UserMap):
  relPath = self.stripRepoPath(file['depotFile'], 
self.branchPrefixes)
  relPath = self.encodeWithUTF8(relPath)
  if verbose:
 -    size = int(self.stream_file['fileSize'])
 -    sys.stdout.write('\r%s --> %s (%i MB)\n' % 
 (file['depotFile'], relPath, size/1024/1024))
 +    size = file.get('fileSize', None)
 +    if size is None:
 +    sizeStr = ''
 +    else:
 +    sizeStr = ' (%i MB)' % (int(size)/1024/1024)
 +    sys.stdout.write('\r%s --> %s%s\n' % (file['depotFile'], 
 relPath, sizeStr))
  sys.stdout.flush()

  (type_base, type_mods) = split_p4_type(file["type"])
 --
 2.16.1

>>> Thanks, that looks like a good fix to me.  Ack.
>
> -- 
> Thanks & Regards
> Thandesha VK | Cellphone +1 (703) 459-5386


[PATCH 1/2] builtin/blame: dim uninteresting metadata lines

2018-04-17 Thread Stefan Beller
When using git-blame lots of lines contain redundant information, for
example in hunks that consist of multiple lines, the metadata (commit
name, author, date) are repeated. A reader may not be interested in those,
so offer an option to color the information that is repeated from the
previous line differently. Both the command line option '--color-lines'
as well as a config option 'color.blame.colorLines' is provided.

Signed-off-by: Stefan Beller 
---
 Documentation/config.txt |  5 +
 builtin/blame.c  | 42 
 t/t8012-blame-colors.sh  | 29 +++
 3 files changed, 72 insertions(+), 4 deletions(-)
 create mode 100755 t/t8012-blame-colors.sh

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2659153cb3..8128eee4f9 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1218,6 +1218,11 @@ color.status.::
status short-format), or
`unmerged` (files which have unmerged changes).
 
+color.blame.repeatedMeta::
+   Use the customized color for the part of git-blame output that
+   is repeated meta information per line (such as commit id,
+   author name, date and timezone). Defaults to cyan.
+
 color.ui::
This variable determines the default value for variables such
as `color.diff` and `color.grep` that control the use of color
diff --git a/builtin/blame.c b/builtin/blame.c
index db38c0b307..5190ff5df3 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -7,6 +7,7 @@
 
 #include "cache.h"
 #include "config.h"
+#include "color.h"
 #include "builtin.h"
 #include "commit.h"
 #include "diff.h"
@@ -46,6 +47,7 @@ static int xdl_opts;
 static int abbrev = -1;
 static int no_whole_file_rename;
 static int show_progress;
+static char repeated_meta_color[COLOR_MAXLEN];
 
 static struct date_mode blame_date_mode = { DATE_ISO8601 };
 static size_t blame_date_width;
@@ -316,10 +318,11 @@ static const char *format_time(timestamp_t time, const 
char *tz_str,
 #define OUTPUT_PORCELAIN   010
 #define OUTPUT_SHOW_NAME   020
 #define OUTPUT_SHOW_NUMBER 040
-#define OUTPUT_SHOW_SCORE  0100
-#define OUTPUT_NO_AUTHOR   0200
+#define OUTPUT_SHOW_SCORE  0100
+#define OUTPUT_NO_AUTHOR   0200
 #define OUTPUT_SHOW_EMAIL  0400
-#define OUTPUT_LINE_PORCELAIN 01000
+#define OUTPUT_LINE_PORCELAIN  01000
+#define OUTPUT_COLOR_LINE  02000
 
 static void emit_porcelain_details(struct blame_origin *suspect, int repeat)
 {
@@ -375,6 +378,7 @@ static void emit_other(struct blame_scoreboard *sb, struct 
blame_entry *ent, int
struct commit_info ci;
char hex[GIT_MAX_HEXSZ + 1];
int show_raw_time = !!(opt & OUTPUT_RAW_TIMESTAMP);
+   const char *color = NULL, *reset = NULL;
 
get_commit_info(suspect->commit, , 1);
oid_to_hex_r(hex, >commit->object.oid);
@@ -384,6 +388,18 @@ static void emit_other(struct blame_scoreboard *sb, struct 
blame_entry *ent, int
char ch;
int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? GIT_SHA1_HEXSZ : 
abbrev;
 
+   if (opt & OUTPUT_COLOR_LINE) {
+   if (cnt > 0) {
+   color = repeated_meta_color;
+   reset = GIT_COLOR_RESET;
+   } else  {
+   color = "";
+   reset = "";
+   }
+   }
+   if (color)
+   fputs(color, stdout);
+
if (suspect->commit->object.flags & UNINTERESTING) {
if (blank_boundary)
memset(hex, ' ', length);
@@ -433,6 +449,8 @@ static void emit_other(struct blame_scoreboard *sb, struct 
blame_entry *ent, int
printf(" %*d) ",
   max_digits, ent->lno + 1 + cnt);
}
+   if (reset)
+   fputs(reset, stdout);
do {
ch = *cp++;
putchar(ch);
@@ -585,6 +603,8 @@ static const char *add_prefix(const char *prefix, const 
char *path)
 
 static int git_blame_config(const char *var, const char *value, void *cb)
 {
+   int *output_option = cb;
+
if (!strcmp(var, "blame.showroot")) {
show_root = git_config_bool(var, value);
return 0;
@@ -607,6 +627,13 @@ static int git_blame_config(const char *var, const char 
*value, void *cb)
parse_date_format(value, _date_mode);
return 0;
}
+   if (!strcmp(var, "color.blame.repeatedlines")) {
+   if (color_parse_mem(value, strlen(value), repeated_meta_color))
+   warning(_("invalid color '%s' in 
color.blame.repeatedLines"),
+   value);
+   *output_option |= OUTPUT_COLOR_LINE;
+   return 0;
+  

[PATCH 2/2] builtin/blame: highlight recently changed lines

2018-04-17 Thread Stefan Beller
Choose a different color for dates and imitate a 'temperature cool down'
depending upon age.

Originally I had planned to have the temperature cooldown dependent on
the age of the project or file for example, as that might scale better,
but that can be added on top of this commit, e.g. instead of giving a
date, you could imagine giving a percentage that would be the linearly
interpolated between now and the beginning of the file.

Signed-off-by: Stefan Beller 

# Conflicts:
#   builtin/blame.c
---
 Documentation/config.txt | 17 +
 builtin/blame.c  | 81 ++--
 t/t8012-blame-colors.sh  | 25 +
 3 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 8128eee4f9..eae88be662 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1223,6 +1223,23 @@ color.blame.repeatedMeta::
is repeated meta information per line (such as commit id,
author name, date and timezone). Defaults to cyan.
 
+color.blame.highlightRecent::
+   This can be used to color the metadata of a blame line depending
+   on age of the line.
++
+This setting should be set to a comma-separated list of color and date 
settings,
+starting and ending with a color, the dates should be set from oldest to 
newest.
+The metadata will be colored given the colors if the the line was introduced
+before the given timestamp, overwriting older timestamped colors.
++
+Instead of an absolute timestamp relative timestamps work as well, e.g.
+2.weeks.ago is valid to address anything older than 2 weeks.
++
+It defaults to "blue,12 month ago,white,1 month ago,red", which colors
+everything older than one year blue, recent changes between one month and
+one year old are kept white, and lines introduced within the last month are
+colored red.
+
 color.ui::
This variable determines the default value for variables such
as `color.diff` and `color.grep` that control the use of color
diff --git a/builtin/blame.c b/builtin/blame.c
index 5190ff5df3..53999df511 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -24,6 +24,7 @@
 #include "dir.h"
 #include "progress.h"
 #include "blame.h"
+#include "string-list.h"
 
 static char blame_usage[] = N_("git blame [] [] [] 
[--] ");
 
@@ -323,6 +324,7 @@ static const char *format_time(timestamp_t time, const char 
*tz_str,
 #define OUTPUT_SHOW_EMAIL  0400
 #define OUTPUT_LINE_PORCELAIN  01000
 #define OUTPUT_COLOR_LINE  02000
+#define OUTPUT_SHOW_AGE_WITH_COLOR 04000
 
 static void emit_porcelain_details(struct blame_origin *suspect, int repeat)
 {
@@ -370,6 +372,63 @@ static void emit_porcelain(struct blame_scoreboard *sb, 
struct blame_entry *ent,
putchar('\n');
 }
 
+static struct color_field {
+   timestamp_t hop;
+   char col[COLOR_MAXLEN];
+} *colorfield;
+static int colorfield_nr, colorfield_alloc;
+
+static void parse_color_fields(const char *s)
+{
+   struct string_list l = STRING_LIST_INIT_DUP;
+   struct string_list_item *item;
+   enum { EXPECT_DATE, EXPECT_COLOR } next = EXPECT_COLOR;
+
+   colorfield_nr = 0;
+
+   /* Ideally this would be stripped and split at the same time? */
+   string_list_split(, s, ',', -1);
+   ALLOC_GROW(colorfield, colorfield_nr + 1, colorfield_alloc);
+
+   for_each_string_list_item(item, ) {
+   switch (next) {
+   case EXPECT_DATE:
+   colorfield[colorfield_nr].hop = 
approxidate(item->string);
+   next = EXPECT_COLOR;
+   colorfield_nr++;
+   ALLOC_GROW(colorfield, colorfield_nr + 1, 
colorfield_alloc);
+   break;
+   case EXPECT_COLOR:
+   if (color_parse(item->string, 
colorfield[colorfield_nr].col))
+   die(_("expecting a color: %s"), item->string);
+   next = EXPECT_DATE;
+   break;
+   }
+   }
+
+   if (next == EXPECT_COLOR)
+   die (_("must end with a color"));
+
+   colorfield[colorfield_nr].hop = TIME_MAX;
+}
+
+static void setup_default_color_by_age(void)
+{
+   parse_color_fields("blue,12 month ago,white,1 month ago,red");
+}
+
+static void determine_line_heat(struct blame_entry *ent, const char 
**dest_color)
+{
+   int i = 0;
+   struct commit_info ci;
+   get_commit_info(ent->suspect->commit, , 1);
+
+   while (i < colorfield_nr && ci.author_time > colorfield[i].hop)
+   i++;
+
+   *dest_color = colorfield[i].col;
+}
+
 static void emit_other(struct blame_scoreboard *sb, struct blame_entry *ent, 
int opt)
 {
int cnt;
@@ -378,12 +437,19 @@ static void emit_other(struct blame_scoreboard *sb, 
struct blame_entry *ent, int
struct commit_info ci;
char hex[GIT_MAX_HEXSZ + 1];
int 

[PATCH 1/3] completion: rename save_opts to default_opts for stash

2018-04-17 Thread Thomas Gummerer
'git stash save' is deprecated, but we still call the options for
completion 'save_opts'.  Simply renaming them to 'push_opts' is not
ideal because for example '--message foo' can't be passed directly to
'git stash', as non-option arguments are not allowed, and foo would be
treated as one.

Completing '--message' in 'git stash --' would end up being quite
confusing for the user.  Therefore name them 'default_opts', and keep
treating --message specially for 'git stash push'.

The ulterior motive for renaming 'save_opts' is that in the next
commit we're going to stop completing 'git stash save', so 'save_opts'
would be the only remaining thing referring to 'git stash save', which
would probably be confusing for future readers of this code.

Signed-off-by: Thomas Gummerer 
---
 contrib/completion/git-completion.bash | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index a757073945..39c123926c 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -2773,16 +2773,16 @@ _git_show_branch ()
 
 _git_stash ()
 {
-   local save_opts='--all --keep-index --no-keep-index --quiet --patch 
--include-untracked'
+   local default_opts='--all --keep-index --no-keep-index --quiet --patch 
--include-untracked'
local subcommands='push save list show apply clear drop pop create 
branch'
local subcommand="$(__git_find_on_cmdline "$subcommands")"
if [ -z "$subcommand" ]; then
case "$cur" in
--*)
-   __gitcomp "$save_opts"
+   __gitcomp "$default_opts"
;;
*)
-   if [ -z "$(__git_find_on_cmdline "$save_opts")" ]; then
+   if [ -z "$(__git_find_on_cmdline "$default_opts")" ]; 
then
__gitcomp "$subcommands"
fi
;;
@@ -2790,10 +2790,10 @@ _git_stash ()
else
case "$subcommand,$cur" in
push,--*)
-   __gitcomp "$save_opts --message"
+   __gitcomp "$default_opts --message"
;;
save,--*)
-   __gitcomp "$save_opts"
+   __gitcomp "$default_opts"
;;
apply,--*|pop,--*)
__gitcomp "--index --quiet"
-- 
2.17.0.252.gfe0a9eaf31



[PATCH 3/3] completion: make stash -p and alias for stash push -p

2018-04-17 Thread Thomas Gummerer
We define 'git stash -p' as an alias for 'git stash push -p' in the
manpage.  Do the same in the completion script, so all options that
can be given to 'git stash push' are being completed when the user is
using 'git stash -p --'.  Currently the only additional option
the user will get is '--message', but there may be more in the future.

Signed-off-by: Thomas Gummerer 
---
 contrib/completion/git-completion.bash | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 452c3d4490..8bd445b7dc 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -2776,6 +2776,9 @@ _git_stash ()
local default_opts='--all --keep-index --no-keep-index --quiet --patch 
--include-untracked'
local subcommands='push list show apply clear drop pop create branch'
local subcommand="$(__git_find_on_cmdline "$subcommands")"
+   if [ -n "$(__git_find_on_cmdline "-p")" ]; then
+   subcommand="push"
+   fi
if [ -z "$subcommand" ]; then
case "$cur" in
--*)
-- 
2.17.0.252.gfe0a9eaf31



[PATCH 2/3] completion: stop completing 'save' as stash subcommand

2018-04-17 Thread Thomas Gummerer
The 'save' subcommand in git stash has been deprecated in
fd2ebf14db ("stash: mark "git stash save" deprecated in the man page",
2017-10-22).  It is however still completed by the git bash
completion.

Stop completing the 'save' subcommand as a further step in the
deprecation process.  As the only use of the bash completion is
interactive, this wouldn't break any scripts, but may give users a
hint that the command is deprecated.

Signed-off-by: Thomas Gummerer 
---
 contrib/completion/git-completion.bash | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index 39c123926c..452c3d4490 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -2774,7 +2774,7 @@ _git_show_branch ()
 _git_stash ()
 {
local default_opts='--all --keep-index --no-keep-index --quiet --patch 
--include-untracked'
-   local subcommands='push save list show apply clear drop pop create 
branch'
+   local subcommands='push list show apply clear drop pop create branch'
local subcommand="$(__git_find_on_cmdline "$subcommands")"
if [ -z "$subcommand" ]; then
case "$cur" in
@@ -2792,9 +2792,6 @@ _git_stash ()
push,--*)
__gitcomp "$default_opts --message"
;;
-   save,--*)
-   __gitcomp "$default_opts"
-   ;;
apply,--*|pop,--*)
__gitcomp "--index --quiet"
;;
-- 
2.17.0.252.gfe0a9eaf31



[PATCH 0/3] completion: improvements for git stash

2018-04-17 Thread Thomas Gummerer
In this series we stop completing the 'git stash save' subcommand in
git-completion.bash.  The command has been deprecated for a while, so
I think we're doing the users a disservice by still completing it,
making them think it's still supported while we already deprecated it.

The first commit is a preparatory step for doing that, while the third
commit is an improvement that I thought would be good to do "while
there".  The most interesting part of the series should be the second
commit, where we actually remove the completion of 'git stash save'.

Thomas Gummerer (3):
  completion: rename save_opts to default_opts for stash
  completion: stop completing 'save' as stash subcommand
  completion: make stash -p and alias for stash push -p

 contrib/completion/git-completion.bash | 16 
 1 file changed, 8 insertions(+), 8 deletions(-)

-- 
2.17.0.252.gfe0a9eaf31



Re: [PATCH 1/1] git-p4: fix `sync --verbose` traceback due to 'fileSize'

2018-04-17 Thread Thandesha VK
My fix is for the case where p4 -G sizes not returning the key and
value for fileSize. This can happen in some cases. Only option at that
point of time is to warn the user about the problematic file and keep
moving (or should we abort??)

Thanks
Thandesha

On Tue, Apr 17, 2018 at 2:18 PM, Mazo, Andrey  wrote:
> Luke,
>
> Thank you for reviewing and acking my patch!
> By the way, did you see Thandesha's proposed patch [1] to print a warning in 
> case of the missing "fileSize" attribute?
> Should we go that route instead?
> Or should we try harder to get the size by running `p4 -G sizes`?
>
> [1] 
> https://public-inbox.org/git/cajjpmi-plb4qcka5alkxa8b1vozfff+oaq0fguq9yviobrp...@mail.gmail.com/t/#m6053d2031020e08edd24ada6c9eb49721ebc4e27
>
> Thank you,
> Andrey
>
> From: Luke Diamand 
>> On Tue, 17 Apr 2018, 09:22 Andrey Mazo,  wrote:
>>>  Perforce server 2007.2 (and maybe others) doesn't return "fileSize"
>>> attribute in its reply to `p4 -G print` command.
>>> This causes the following traceback when running `git p4 sync --verbose`:
>>> """
>>> Traceback (most recent call last):
>>>   File "/usr/libexec/git-core/git-p4", line 3839, in 
>>> main()
>>>   File "/usr/libexec/git-core/git-p4", line 3833, in main
>>> if not cmd.run(args):
>>>   File "/usr/libexec/git-core/git-p4", line 3567, in run
>>> self.importChanges(changes)
>>>   File "/usr/libexec/git-core/git-p4", line 3233, in importChanges
>>> self.commit(description, filesForCommit, branch, parent)
>>>   File "/usr/libexec/git-core/git-p4", line 2855, in commit
>>> self.streamP4Files(files)
>>>   File "/usr/libexec/git-core/git-p4", line 2747, in streamP4Files
>>> cb=streamP4FilesCbSelf)
>>>   File "/usr/libexec/git-core/git-p4", line 552, in p4CmdList
>>> cb(entry)
>>>   File "/usr/libexec/git-core/git-p4", line 2741, in streamP4FilesCbSelf
>>> self.streamP4FilesCb(entry)
>>>   File "/usr/libexec/git-core/git-p4", line 2689, in streamP4FilesCb
>>> self.streamOneP4File(self.stream_file, self.stream_contents)
>>>   File "/usr/libexec/git-core/git-p4", line 2566, in streamOneP4File
>>> size = int(self.stream_file['fileSize'])
>>> KeyError: 'fileSize'
>>> """
>>>
>>> Fix this by omitting the file size information from the verbose print out.
>>> Also, don't use "self.stream_file" directly,
>>> but rather use passed in "file" argument.
>>> (which point to the same "self.stream_file" for all existing callers)
>>>
>>> Signed-off-by: Andrey Mazo 
>>> ---
>>>  git -p4.py | 8 ++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/git-p4.py b/git-p4.py
>>> index 7bb9cadc6..6f05f915a 100755
>>> --- a/git-p4.py
>>> +++ b/git-p4.py
>>> @@ -2566,8 +2566,12 @@ class P4Sync(Command, P4UserMap):
>>>  relPath = self.stripRepoPath(file['depotFile'], 
>>> self.branchPrefixes)
>>>  relPath = self.encodeWithUTF8(relPath)
>>>  if verbose:
>>> -size = int(self.stream_file['fileSize'])
>>> -sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], 
>>> relPath, size/1024/1024))
>>> +size = file.get('fileSize', None)
>>> +if size is None:
>>> +sizeStr = ''
>>> +else:
>>> +sizeStr = ' (%i MB)' % (int(size)/1024/1024)
>>> +sys.stdout.write('\r%s --> %s%s\n' % (file['depotFile'], 
>>> relPath, sizeStr))
>>>  sys.stdout.flush()
>>>
>>>  (type_base, type_mods) = split_p4_type(file["type"])
>>> --
>>> 2.16.1
>>>
>> Thanks, that looks like a good fix to me.  Ack.



-- 
Thanks & Regards
Thandesha VK | Cellphone +1 (703) 459-5386


Re: [PATCH 1/1] git-p4: fix `sync --verbose` traceback due to 'fileSize'

2018-04-17 Thread Mazo, Andrey
Luke,

Thank you for reviewing and acking my patch!
By the way, did you see Thandesha's proposed patch [1] to print a warning in 
case of the missing "fileSize" attribute?
Should we go that route instead?
Or should we try harder to get the size by running `p4 -G sizes`?

[1] 
https://public-inbox.org/git/cajjpmi-plb4qcka5alkxa8b1vozfff+oaq0fguq9yviobrp...@mail.gmail.com/t/#m6053d2031020e08edd24ada6c9eb49721ebc4e27

Thank you,
Andrey

From: Luke Diamand 
> On Tue, 17 Apr 2018, 09:22 Andrey Mazo,  wrote:
>>  Perforce server 2007.2 (and maybe others) doesn't return "fileSize"
>> attribute in its reply to `p4 -G print` command.
>> This causes the following traceback when running `git p4 sync --verbose`:
>> """
>>     Traceback (most recent call last):
>>       File "/usr/libexec/git-core/git-p4", line 3839, in 
>>         main()
>>       File "/usr/libexec/git-core/git-p4", line 3833, in main
>>         if not cmd.run(args):
>>       File "/usr/libexec/git-core/git-p4", line 3567, in run
>>         self.importChanges(changes)
>>       File "/usr/libexec/git-core/git-p4", line 3233, in importChanges
>>         self.commit(description, filesForCommit, branch, parent)
>>       File "/usr/libexec/git-core/git-p4", line 2855, in commit
>>         self.streamP4Files(files)
>>       File "/usr/libexec/git-core/git-p4", line 2747, in streamP4Files
>>         cb=streamP4FilesCbSelf)
>>       File "/usr/libexec/git-core/git-p4", line 552, in p4CmdList
>>         cb(entry)
>>       File "/usr/libexec/git-core/git-p4", line 2741, in streamP4FilesCbSelf
>>         self.streamP4FilesCb(entry)
>>       File "/usr/libexec/git-core/git-p4", line 2689, in streamP4FilesCb
>>         self.streamOneP4File(self.stream_file, self.stream_contents)
>>       File "/usr/libexec/git-core/git-p4", line 2566, in streamOneP4File
>>         size = int(self.stream_file['fileSize'])
>>     KeyError: 'fileSize'
>> """
>> 
>> Fix this by omitting the file size information from the verbose print out.
>> Also, don't use "self.stream_file" directly,
>> but rather use passed in "file" argument.
>> (which point to the same "self.stream_file" for all existing callers)
>> 
>> Signed-off-by: Andrey Mazo 
>> ---
>>  git -p4.py | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>> 
>> diff --git a/git-p4.py b/git-p4.py
>> index 7bb9cadc6..6f05f915a 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -2566,8 +2566,12 @@ class P4Sync(Command, P4UserMap):
>>          relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
>>          relPath = self.encodeWithUTF8(relPath)
>>          if verbose:
>> -            size = int(self.stream_file['fileSize'])
>> -            sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], 
>> relPath, size/1024/1024))
>> +            size = file.get('fileSize', None)
>> +            if size is None:
>> +                sizeStr = ''
>> +            else:
>> +                sizeStr = ' (%i MB)' % (int(size)/1024/1024)
>> +            sys.stdout.write('\r%s --> %s%s\n' % (file['depotFile'], 
>> relPath, sizeStr))
>>              sys.stdout.flush()
>> 
>>          (type_base, type_mods) = split_p4_type(file["type"])
>> -- 
>> 2.16.1
>>
> Thanks, that looks like a good fix to me.  Ack.

[PATCH] send-email: avoid duplicate In-Reply-To/References

2018-04-17 Thread Stefan Agner
In case a patch already has In-Reply-To or References in the header
(e.g. when the patch has been created with format-patch --thread)
git-send-email should not add another pair of those headers.
This is also not allowed according to RFC 5322 Section 3.6:
https://tools.ietf.org/html/rfc5322#section-3.6

Avoid the second pair by reading the current headers into the
appropriate variables.

Signed-off-by: Stefan Agner 
---
This addresses the issue reported here:
https://public-inbox.org/git/997160314bbafb3088a401f1c09cc...@agner.ch/

 git-send-email.perl | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/git-send-email.perl b/git-send-email.perl
index 2fa7818ca..7157397fd 100755
--- a/git-send-email.perl
+++ b/git-send-email.perl
@@ -1642,10 +1642,15 @@ foreach my $t (@files) {
elsif (/^Content-Transfer-Encoding: (.*)/i) {
$xfer_encoding = $1 if not defined 
$xfer_encoding;
}
+   elsif (/^In-Reply-To: (.*)/i) {
+   $in_reply_to = $1;
+   }
+   elsif (/^References: (.*)/i) {
+   $references = $1;
+   }
elsif (!/^Date:\s/i && /^[-A-Za-z]+:\s+\S/) {
push @xh, $_;
}
-
} else {
# In the traditional
# "send lots of email" format,
-- 
2.17.0



Basketball Enthusiasts List

2018-04-17 Thread Stacey Parsons


Hi,

Would you be interested in acquiring an email list of "Basketball Enthusiasts" 
from USA?

We also have data for Tennis Enthusiasts, Running Enthusiasts, Boxing 
Enthusiasts, Golfers List,Health and Fitness Enthusiasts, Soccer Enthusiasts, 
Baseball Enthusiasts, Outdoor Enthusiasts, Students List, Apparel Buyers , 
Skiers Enthusiasts, Spa and Resort Visitors and many more.

Each record in the list contains Contact Name (First, Middle and Last Name), 
Mailing Address, List type and Opt-in email address.

All the contacts are opt-in verified, 100% permission based and can be used for 
unlimited multi-channel marketing.

Please let me know your thoughts towards procuring the Basketball Enthusiasts 
List.

Best Regards,
Stacey Parsons
Research Analyst

We respect your privacy, if you do not wish to receive any further emails from 
our end, please reply with a subject “Leave Out”.


---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus



GDPR compliance best practices?

2018-04-17 Thread Peter Backes
Hi,

I'd like to ask whether anyone has best practices for achieving GDPR 
compliance for git repos? The GDPR will come into effect in the EU next 
month.

In particular, how do you cope with the "Right to erasure" concerning 
entries in the history of your git repos?

Erasing author names from the history changes the commit hashes.  It is 
well known that this leads to a lot of problems.  So I don't consider 
this a workable solution.

And how do you justify publishing your employee's name/email as part of 
a git commit under GDPR rules in the first place?

github has the following page mentioning the "Right to erasure" but 
AFAICS nothing about how it will be implemented
https://about.gitlab.com/gdpr/

Here are discussions I found but they do not really provide a solution:
https://law.stackexchange.com/questions/24623/gdpr-git-history
https://news.ycombinator.com/item?id=16509755

Best wishes
Peter

-- 
Peter Backes, r...@helen.plasma.xg8.de


Re: [PATCH 2/2] builtin/blame: highlight recently changed lines

2018-04-17 Thread Stefan Beller
On Tue, Apr 17, 2018 at 12:31 PM, Stefan Beller  wrote:
> On Mon, Apr 16, 2018 at 8:29 PM, Junio C Hamano  wrote:
>> It seems that this
>>
>> $ git -c color.blame.repeatedlines=cyan blame --heated-lines builtin/blame.c
>>
>> refuses to run.
>>
>> Would it work if the configuration is in .git/config instead, or
>> would it forever disable --heated-lines once somebody choses to use
>> --color-lines feature by default by configuring it in?
>
> That is the unfortunate part of this series, I have not figured out how to
> treat these two options at the same time.
>
> One could take the approach to check the config first and see if there
> are conflicts and then overlay it with the command line options
> (and resolve conflicts there, but CLI taking precedence over config).
>
> Or we'd need to introduce another config
> blame.coloring={none, repeatedlines, highlightrecent}
> which breaks the tie.

Nevermind, we can overlay these color schemes just fine, which I'll do in
a resend.


Re: [PATCH 2/2] builtin/blame: highlight recently changed lines

2018-04-17 Thread Stefan Beller
On Mon, Apr 16, 2018 at 8:29 PM, Junio C Hamano  wrote:
> It seems that this
>
> $ git -c color.blame.repeatedlines=cyan blame --heated-lines builtin/blame.c
>
> refuses to run.
>
> Would it work if the configuration is in .git/config instead, or
> would it forever disable --heated-lines once somebody choses to use
> --color-lines feature by default by configuring it in?

That is the unfortunate part of this series, I have not figured out how to
treat these two options at the same time.

One could take the approach to check the config first and see if there
are conflicts and then overlay it with the command line options
(and resolve conflicts there, but CLI taking precedence over config).

Or we'd need to introduce another config
blame.coloring={none, repeatedlines, highlightrecent}
which breaks the tie.

Thanks,
Stefan


[PATCH 3/4] git-[short]log.txt: unify quoted standalone --

2018-04-17 Thread Martin Ågren
In git-log.txt, we have an instance of \--, which is known to sometimes
render badly. This one is even worse than normal though, since ``\-- ''
(with or without that trailing space) appears to be entirely broken,
both in HTML and manpages, both with AsciiDoc (version 8.6.9) and
Asciidoctor (version 1.5.4).

Further down in git-log.txt we have a ``--'', which renders good. In
git-shortlog.txt, we use "\-- " (including the quotes and the space),
which happens to look fairly good. I failed to find any other similar
instances. So all in all, we quote a double-dash in three different
places and do it differently each time, with various degrees of success.

Switch all of these to `--`. This sets the double-dash in monospace and
matches what we usually do with example command line usages and options.
Note that we drop the trailing space as well, since `-- ` does not
render well. These should still be clear enough since just a few lines
above each instance, the space is clearly visible in a longer context.

Signed-off-by: Martin Ågren 
---
 Documentation/git-log.txt  | 4 ++--
 Documentation/git-shortlog.txt | 2 +-
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-log.txt b/Documentation/git-log.txt
index be2f10b70b..90761f1694 100644
--- a/Documentation/git-log.txt
+++ b/Documentation/git-log.txt
@@ -96,7 +96,7 @@ include::line-range-format.txt[]
Simplification' below for details and other simplification
modes.
 +
-Paths may need to be prefixed with ``\-- '' to separate them from
+Paths may need to be prefixed with `--` to separate them from
 options or the revision range, when confusion arises.
 
 include::rev-list-options.txt[]
@@ -125,7 +125,7 @@ EXAMPLES
 `git log --since="2 weeks ago" -- gitk`::
 
Show the changes during the last two weeks to the file 'gitk'.
-   The ``--'' is necessary to avoid confusion with the *branch* named
+   The `--` is necessary to avoid confusion with the *branch* named
'gitk'
 
 `git log --name-status release..test`::
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 00a22152a2..bc80905a8a 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -73,7 +73,7 @@ them.
Consider only commits that are enough to explain how the files
that match the specified paths came to be.
 +
-Paths may need to be prefixed with "\-- " to separate them from
+Paths may need to be prefixed with `--` to separate them from
 options or the revision range, when confusion arises.
 
 MAPPING AUTHORS
-- 
2.17.0.252.gfe0a9eaf31



[PATCH 4/4] git-submodule.txt: quote usage in monospace, drop backslash

2018-04-17 Thread Martin Ågren
We tend to quote command line examples using `` to set them in a
monospace font. The immediate motivation for this patch is to get rid of
another instance of \--. As noted in the previous commits, \-- has a
tendency of rendering badly. Here, it renders ok (at least with
AsciiDoc 8.6.9 and Asciidoctor 1.5.4), but by getting rid of this
instance, we reduce the chances of \-- cropping up in places where it
matters more.

Signed-off-by: Martin Ågren 
---
 Documentation/git-submodule.txt | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 71c5618e82..630999f41a 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -213,8 +213,8 @@ sync [--recursive] [--] [...]::
submodule URLs change upstream and you need to update your local
repositories accordingly.
 +
-"git submodule sync" synchronizes all submodules while
-"git submodule sync \-- A" synchronizes submodule "A" only.
+`git submodule sync` synchronizes all submodules while
+`git submodule sync -- A` synchronizes submodule "A" only.
 +
 If `--recursive` is specified, this command will recurse into the
 registered submodules, and sync any nested submodules within.
-- 
2.17.0.252.gfe0a9eaf31



[PATCH 0/4] doc: cleaning up instances of \--

2018-04-17 Thread Martin Ågren
This is a patch series to convert \-- to -- in our documentation. The
first patch is a reiteration of 1c262bb7b2 (doc: convert \--option to
--option, 2015-05-13) to fix some instances that have appeared since.
The other three patches deal with standalone "\--" which we can't
always turn into "--" since it can be rendered as an em dash.

Martin

Martin Ågren (4):
  doc: convert \--option to --option
  doc: convert [\--] to [--]
  git-[short]log.txt: unify quoted standalone --
  git-submodule.txt: quote usage in monospace, drop backslash

 Documentation/git-format-patch.txt | 2 +-
 Documentation/git-log.txt  | 8 
 Documentation/git-push.txt | 2 +-
 Documentation/git-shortlog.txt | 6 +++---
 Documentation/git-submodule.txt| 4 ++--
 Documentation/gitk.txt | 2 +-
 6 files changed, 12 insertions(+), 12 deletions(-)

-- 
2.17.0.252.gfe0a9eaf31



[PATCH 2/4] doc: convert [\--] to [--]

2018-04-17 Thread Martin Ågren
Commit 1c262bb7b (doc: convert \--option to --option, 2015-05-13)
explains that we used to need to write \--option to play well with older
versions of AsciiDoc, but that we do not support such versions anymore
anyway, and that Asciidoctor literally renders \--.

With [\--], which is used to denote the optional separator between
revisions and paths, Asciidoctor renders the backslash literally.
Change all [\--] to [--]. This changes nothing for AsciiDoc version
8.6.9, but is an improvement for Asciidoctor version 1.5.4.

We use double-dashes in several list entries (\--::). In my testing, it
appears that we do need to use the backslash there, so leave those.

Signed-off-by: Martin Ågren 
---
 Documentation/git-log.txt  | 4 ++--
 Documentation/git-shortlog.txt | 4 ++--
 Documentation/gitk.txt | 2 +-
 3 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-log.txt b/Documentation/git-log.txt
index 5437f8b0f0..be2f10b70b 100644
--- a/Documentation/git-log.txt
+++ b/Documentation/git-log.txt
@@ -9,7 +9,7 @@ git-log - Show commit logs
 SYNOPSIS
 
 [verse]
-'git log' [] [] [[\--] ...]
+'git log' [] [] [[--] ...]
 
 DESCRIPTION
 ---
@@ -90,7 +90,7 @@ include::line-range-format.txt[]
ways to spell , see the 'Specifying Ranges'
section of linkgit:gitrevisions[7].
 
-[\--] ...::
+[--] ...::
Show only commits that are enough to explain how the files
that match the specified paths came to be.  See 'History
Simplification' below for details and other simplification
diff --git a/Documentation/git-shortlog.txt b/Documentation/git-shortlog.txt
index 5e35ea18ac..00a22152a2 100644
--- a/Documentation/git-shortlog.txt
+++ b/Documentation/git-shortlog.txt
@@ -8,7 +8,7 @@ git-shortlog - Summarize 'git log' output
 SYNOPSIS
 
 [verse]
-'git shortlog' [] [] [[\--] ...]
+'git shortlog' [] [] [[--] ...]
 git log --pretty=short | 'git shortlog' []
 
 DESCRIPTION
@@ -69,7 +69,7 @@ them.
ways to spell , see the "Specifying Ranges"
section of linkgit:gitrevisions[7].
 
-[\--] ...::
+[--] ...::
Consider only commits that are enough to explain how the files
that match the specified paths came to be.
 +
diff --git a/Documentation/gitk.txt b/Documentation/gitk.txt
index ca96c281d1..244cd01493 100644
--- a/Documentation/gitk.txt
+++ b/Documentation/gitk.txt
@@ -8,7 +8,7 @@ gitk - The Git repository browser
 SYNOPSIS
 
 [verse]
-'gitk' [] [] [\--] [...]
+'gitk' [] [] [--] [...]
 
 DESCRIPTION
 ---
-- 
2.17.0.252.gfe0a9eaf31



[PATCH 1/4] doc: convert \--option to --option

2018-04-17 Thread Martin Ågren
Rather than using a backslash in \--foo, with or without ''-quoting,
write `--foo` for better rendering. As explained in commit 1c262bb7b
(doc: convert \--option to --option, 2015-05-13), the backslash is not
needed for the versions of AsciiDoc that we support, but is rendered
literally by Asciidoctor.

Signed-off-by: Martin Ågren 
---
 Documentation/git-format-patch.txt | 2 +-
 Documentation/git-push.txt | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-format-patch.txt 
b/Documentation/git-format-patch.txt
index 6cbe462a77..b41e1329a7 100644
--- a/Documentation/git-format-patch.txt
+++ b/Documentation/git-format-patch.txt
@@ -47,7 +47,7 @@ There are two ways to specify which commits to operate on.
 
 The first rule takes precedence in the case of a single .  To
 apply the second rule, i.e., format everything since the beginning of
-history up until , use the '\--root' option: `git format-patch
+history up until , use the `--root` option: `git format-patch
 --root `.  If you want to format only  itself, you
 can do this with `git format-patch -1 `.
 
diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
index 5b08302fc2..34410f9fca 100644
--- a/Documentation/git-push.txt
+++ b/Documentation/git-push.txt
@@ -300,7 +300,7 @@ origin +master` to force a push to the `master` branch). 
See the
These options are passed to linkgit:git-send-pack[1]. A thin transfer
significantly reduces the amount of sent data when the sender and
receiver share many of the same objects in common. The default is
-   \--thin.
+   `--thin`.
 
 -q::
 --quiet::
-- 
2.17.0.252.gfe0a9eaf31



Re: [BUG] git p4 clone fails when p4 sizes does not return 'fileSize' key

2018-04-17 Thread Thandesha VK
I have few cases where even p4 -G sizes (or p4 sizes) is not returning
the size value even with latest version of p4 (17.2). In that case, we
have to regenerate the digest for file save it - It mean something is
wrong with the file in perforce.
Regarding, sys.stdout.write v/s print, I see script using both of them
without a common pattern. I can change it to whatever is more
appropriate.

On Tue, Apr 17, 2018 at 11:47 AM, Mazo, Andrey  wrote:
> Does a missing "fileSize" actually mean that there is something wrong with 
> the file?
> Because, for me, `p4 -G print` doesn't print "fileSize" for _any_ file.
> (which I attribute to our rather ancient (2007.2) Perforce server)
> I'm not an expert in Perforce, so don't know for sure.
>
> However, `p4 -G sizes` works fine even with our p4 server.
> Should we then go one step further and use `p4 -G sizes` to obtain the 
> "fileSize" when it's not returned by `p4 -G print`?
> Or is it an overkill for a simple verbose print out?
>
> Also, please, find one comment inline below.
>
> Thank you,
> Andrey
>
> From: Thandesha VK 
>> Sounds good. How about an enhanced version of fix from both of us.
>> This will let us know that something is not right with the file but
>> will not bark
>>
>> $ git diff
>> diff --git a/git-p4.py b/git-p4.py
>> index 7bb9cadc6..df901976f 100755
>> --- a/git-p4.py
>> +++ b/git-p4.py
>> @@ -2566,7 +2566,12 @@ class P4Sync(Command, P4UserMap):
>>  relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
>>  relPath = self.encodeWithUTF8(relPath)
>>  if verbose:
>> -size = int(self.stream_file['fileSize'])
>> +if 'fileSize' not in self.stream_file:
>> +   print "WARN: File size from perforce unknown. Please verify 
>> by p4 sizes %s" %(file['depotFile'])
> For whatever reason, the code below uses sys.stdout.write() instead of 
> print().
> Should it be used here for consistency as well?
>
>> +   size = "-1"
>> +else:
>> +   size = self.stream_file['fileSize']
>> +size = int(size)
>>  sys.stdout.write('\r%s --> %s (%i MB)\n' %
>> (file['depotFile'], relPath, size/1024/1024))
>>  sys.stdout.flush()
>>
>>
>> On Tue, Apr 17, 2018 at 10:33 AM, Mazo, Andrey  wrote:
>>> Sure, I totally agree.
>>> Sorry, I just wasn't clear enough in my previous email.
>>> I meant that your patch suppresses "%s --> %s (%i MB)" line in case 
>>> "fileSize" is not available,
>>> while my patch suppresses just "(%i MB)" portion if the "fileSize" is not 
>>> known.
>>> In other words,
>>>  * if "fileSize" is known:
>>>  ** both yours and mine patches don't change existing behavior;
>>>  * if "fileSize" is not known:
>>>  ** your patch makes streamOneP4File() not print anything;
>>>  ** my patch makes streamOneP4File() print "%s --> %s".
>>>
>>> Hope, I'm clearer this time.
>>>
>>> Thank you,
>>> Andrey
>>>
>>> From: Thandesha VK 
 *I* think keeping the filesize info is better with --verbose option as
 that gives some clue about the file we are working on. What do you
 think?
 Script has similar checks of key existence at other places where it is
 looking for fileSize.

 On Tue, Apr 17, 2018 at 9:22 AM, Andrey Mazo  wrote:
> Huh, I actually have a slightly different fix for the same issue.
> It doesn't suppress the corresponding verbose output completely, but just 
> removes the size information from it.
>
> Also, I'd mention that the workaround is trivial -- simply omit the 
> "--verbose" option.
>
> Andrey Mazo (1):
>   git-p4: fix `sync --verbose` traceback due to 'fileSize'
>
>  git-p4.py | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
>
> base-commit: 468165c1d8a442994a825f3684528361727cd8c0
> --
> 2.16.1
>

 --
 Thanks & Regards
 Thandesha VK | Cellphone +1 (703) 459-5386
>>
>>
>>
>> --
>> Thanks & Regards
>> Thandesha VK | Cellphone +1 (703) 459-5386



-- 
Thanks & Regards
Thandesha VK | Cellphone +1 (703) 459-5386


Re: [PATCH 1/2] builtin/blame: dim uninteresting metadata lines

2018-04-17 Thread Stefan Beller
Hi Junio,

>> -#define OUTPUT_SHOW_SCORE  0100
>> -#define OUTPUT_NO_AUTHOR   0200
>> +#define OUTPUT_SHOW_SCORE0100
>> +#define OUTPUT_NO_AUTHOR 0200
>
> I wondered what these are about; they are about aligning with HT
> assuming 8-place tab stop like the other lines.

correct.

>> -#define OUTPUT_LINE_PORCELAIN 01000
>> +#define OUTPUT_LINE_PORCELAIN01000
>
> But then this line has SP plus HT here; it should have been just a
> single HT (i.e. replace a single SP with HT), to be consistent.

fixed

>> @@ -384,6 +388,19 @@ static void emit_other(struct blame_scoreboard *sb, 
>> struct blame_entry *ent, int
>>   char ch;
>>   int length = (opt & OUTPUT_LONG_OBJECT_NAME) ? GIT_SHA1_HEXSZ 
>> : abbrev;
>>
>> + if (!(opt & OUTPUT_ANNOTATE_COMPAT)) {
>> + if (opt & OUTPUT_COLOR_LINE) {
>> + if (cnt > 0) {
>> + color = repeated_meta_color;
>> + reset = GIT_COLOR_RESET;
>> + } else  {
>> + color ="";
>
> You need a SP after '=' that assigns an empty string to 'color'.
>
>> + reset = "";
>> + }
>> + }
>> + fputs(color, stdout);
>> + }
>
> This fputs() should be hidden to the case where color is *NOT* an
> empty string, no?  In any case, it should be inside "if color-line
> is in effect" block, I would think.
>
> Users of "git annotate" would not pass the --color option, so I do
> not think the outer if () block is worth the loss of readability due
> to increased indent level.
>
> I would say that it is a job of the caller of git_config() to make
> sure color.blame.lines would not take effect when the command is
> annotate, if what you are worried about is the configuration in this
> code.

ok, so we'll have to correct these mis aligned configs before hand and
here we just go by the bits set in the flags.

>> @@ -949,8 +979,12 @@ int cmd_blame(int argc, const char **argv, const char 
>> *prefix)
>>
>>   blame_coalesce();
>>
>> - if (!(output_option & OUTPUT_PORCELAIN))
>> + if (!(output_option & OUTPUT_PORCELAIN)) {
>>   find_alignment(, _option);
>> + if (!*repeated_meta_color &&
>> + (output_option & OUTPUT_COLOR_LINE))
>> + strcpy(repeated_meta_color, GIT_COLOR_DARK);
>> + }
>
> This code does not check OUTPUT_ANNOTATE_COMPAT because it assumes
> that OUTPUT_COLOR_LINE won't be in output_option when we are working
> in annotate compatibility mode.  The deeper codepaths we saw above
> should do the same.  It should be enough to drop color-line when
> anno-compat is set, or something like that, immediately after
> reading the config and overriding them from the command line.

Makes sense.

>
>> diff --git a/color.h b/color.h
>> index cd0bcedd08..196be16058 100644
>> --- a/color.h
>> +++ b/color.h
>> @@ -30,6 +30,7 @@ struct strbuf;
>>  #define GIT_COLOR_BLUE   "\033[34m"
>>  #define GIT_COLOR_MAGENTA"\033[35m"
>>  #define GIT_COLOR_CYAN   "\033[36m"
>> +#define GIT_COLOR_DARK   "\033[1;30m"
>>  #define GIT_COLOR_BOLD_RED   "\033[1;31m"
>>  #define GIT_COLOR_BOLD_GREEN "\033[1;32m"
>>  #define GIT_COLOR_BOLD_YELLOW"\033[1;33m"
>
> I wonder if it is worth adding a new color only to give this a
> different default.
>
> Traditionally, we use CYAN for lines that are less interesting than
> others (e.g. hunk header), so reusing it might make the life easier
> to the users, especially because I envision that we may want to
> introduce another "logical" level to give another redirection
> between the configuration keys like color.diff.frag and
> color.blame.repeatedlines and the actual ANSI sequence like
> "\033[36m".  I.e. we update the system so that these two
> configuration keys take "uninteresting" (which is one of the
> "logical" colors) by default, and then map "uninteresting" to
> "\033[36m" at the physical level by default.  The users could then
> change the mapping from "uninteresting" to "\033[1;30m" and
> consistently modify both diff.frag and blame.repeated if they wanted
> to.  Of course, if they want them to be different, they can come up
> with a different "logical" color and split the two.  And from that
> point of view, adding new colors to the default set we have above
> will make life harder for the end users.

That is a good longer term vision. I'll switch to cyan for now.
Thanks,
Stefan


Re: [PATCH v1 1/5] read-cache: teach refresh_cache_entry to take istate

2018-04-17 Thread Ben Peart



On 4/17/2018 12:34 PM, Jameson Miller wrote:

Refactoring dependencies of make_cache_entry to work on a specific


Refactoring/Refactor

It's helpful to refer to functions with parens i.e. make_cache_entry().

In addition, it appears you only needed to update refresh_cache_entry() 
so perhaps something like:



Refactor refresh_cache_entry() to work on a specific index, instead of 
implicitly using the_index. This is in preparation for making the 
make_cache_entry() function work on a specific index.



Also, be sure to certify your work by adding your "Signed-off-by: " line 
to each commit.  Details at:


https://github.com/git/git/blob/master/Documentation/SubmittingPatches



index, instead of implicitly using the_index. This is in preparation
for making the make_cache_entry function work on a specific index.


Re: [RFC PATCH 00/12] Integrate commit-graph into 'fsck' and 'gc'

2018-04-17 Thread Derrick Stolee

On 4/17/2018 2:10 PM, Derrick Stolee wrote:

The commit-graph feature is not useful to end users until the
commit-graph file is maintained automatically by Git during normal
upkeep operations. One natural place to trigger this write is during
'git gc'.

Before automatically generating a commit-graph, we need to be able to
verify the contents of a commit-graph file. Integrate commit-graph
checks into 'fsck' that check the commit-graph contents against commits
in the object database.

Things to think about:

* Are these the right integration points?

* gc.commitGraph defaults to true right now for the purpose of testing,
   but may not be required to start. The goal is to have this default to
   true eventually, but we may want to delay that until the feature is
   stable.

* I implement a "--reachable" option to 'git commit-graph write' that
   iterates over all refs. This does the same as

git show-ref -s | git commit-graph write --stdin-commits

   but I don't know how to pipe two child processes together inside of Git.
   Perhaps this is a better solution, anyway.

What other things should I be considering in this case? I'm unfamiliar
with the inner-workings of 'fsck' and 'gc', so this is a new space for me.

This RFC is based on v3 of ds/generation-numbers, and the first commit
is a fixup! based on a bug in that version that I caught while prepping
this series.

Thanks,
-Stolee

Derrick Stolee (12):
   fixup! commit-graph: always load commit-graph information
   commit-graph: add 'check' subcommand
   commit-graph: check file header information
   commit-graph: parse commit from chosen graph
   commit-graph: check fanout and lookup table
   commit: force commit to parse from object database
   commit-graph: load a root tree from specific graph
   commit-graph: verify commit contents against odb
   fsck: check commit-graph
   commit-graph: add '--reachable' option
   gc: automatically write commit-graph files
   commit-graph: update design document

  Documentation/git-commit-graph.txt   |  15 +-
  Documentation/git-gc.txt |   4 +
  Documentation/technical/commit-graph.txt |   9 --
  builtin/commit-graph.c   |  79 +-
  builtin/fsck.c   |  13 ++
  builtin/gc.c |   8 +
  commit-graph.c   | 178 ++-
  commit-graph.h   |   2 +
  commit.c |  14 +-
  commit.h |   1 +
  t/t5318-commit-graph.sh  |  15 ++
  11 files changed, 311 insertions(+), 27 deletions(-)


This RFC is also available as a GitHub pull request [1]

[1] https://github.com/derrickstolee/git/pull/6



Re: [BUG] git p4 clone fails when p4 sizes does not return 'fileSize' key

2018-04-17 Thread Mazo, Andrey
Does a missing "fileSize" actually mean that there is something wrong with the 
file?
Because, for me, `p4 -G print` doesn't print "fileSize" for _any_ file.
(which I attribute to our rather ancient (2007.2) Perforce server)
I'm not an expert in Perforce, so don't know for sure.

However, `p4 -G sizes` works fine even with our p4 server.
Should we then go one step further and use `p4 -G sizes` to obtain the 
"fileSize" when it's not returned by `p4 -G print`?
Or is it an overkill for a simple verbose print out?

Also, please, find one comment inline below.

Thank you,
Andrey

From: Thandesha VK 
> Sounds good. How about an enhanced version of fix from both of us.
> This will let us know that something is not right with the file but
> will not bark
> 
> $ git diff
> diff --git a/git-p4.py b/git-p4.py
> index 7bb9cadc6..df901976f 100755
> --- a/git-p4.py
> +++ b/git-p4.py
> @@ -2566,7 +2566,12 @@ class P4Sync(Command, P4UserMap):
>  relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
>  relPath = self.encodeWithUTF8(relPath)
>  if verbose:
> -    size = int(self.stream_file['fileSize'])
> +    if 'fileSize' not in self.stream_file:
> +   print "WARN: File size from perforce unknown. Please verify 
> by p4 sizes %s" %(file['depotFile'])
For whatever reason, the code below uses sys.stdout.write() instead of print().
Should it be used here for consistency as well?

> +   size = "-1"
> +    else:
> +   size = self.stream_file['fileSize']
> +    size = int(size)
>  sys.stdout.write('\r%s --> %s (%i MB)\n' %
> (file['depotFile'], relPath, size/1024/1024))
>  sys.stdout.flush()
> 
> 
> On Tue, Apr 17, 2018 at 10:33 AM, Mazo, Andrey  wrote:
>> Sure, I totally agree.
>> Sorry, I just wasn't clear enough in my previous email.
>> I meant that your patch suppresses "%s --> %s (%i MB)" line in case 
>> "fileSize" is not available,
>> while my patch suppresses just "(%i MB)" portion if the "fileSize" is not 
>> known.
>> In other words,
>>  * if "fileSize" is known:
>>  ** both yours and mine patches don't change existing behavior;
>>  * if "fileSize" is not known:
>>  ** your patch makes streamOneP4File() not print anything;
>>  ** my patch makes streamOneP4File() print "%s --> %s".
>>
>> Hope, I'm clearer this time.
>>
>> Thank you,
>> Andrey
>>
>> From: Thandesha VK 
>>> *I* think keeping the filesize info is better with --verbose option as
>>> that gives some clue about the file we are working on. What do you
>>> think?
>>> Script has similar checks of key existence at other places where it is
>>> looking for fileSize.
>>>
>>> On Tue, Apr 17, 2018 at 9:22 AM, Andrey Mazo  wrote:
 Huh, I actually have a slightly different fix for the same issue.
 It doesn't suppress the corresponding verbose output completely, but just 
 removes the size information from it.

 Also, I'd mention that the workaround is trivial -- simply omit the 
 "--verbose" option.

 Andrey Mazo (1):
   git-p4: fix `sync --verbose` traceback due to 'fileSize'

  git-p4.py | 8 ++--
  1 file changed, 6 insertions(+), 2 deletions(-)


 base-commit: 468165c1d8a442994a825f3684528361727cd8c0
 --
 2.16.1

>>>
>>> --
>>> Thanks & Regards
>>> Thandesha VK | Cellphone +1 (703) 459-5386
>
>
>
> -- 
> Thanks & Regards
> Thandesha VK | Cellphone +1 (703) 459-5386

Re: [PATCH] worktree: accept -f as short for --force for removal

2018-04-17 Thread Eric Sunshine
On Tue, Apr 17, 2018 at 2:19 PM, Stefan Beller  wrote:
> The force flag is very common in many commands and is commonly
> abbreviated with '-f', so add that to worktree removal, too by using
> OPT__FORCE instead of a self cooked OPT_BOOL for force.

The missing bit of this sentence:

...self cooked OPT_BOOL for force which forgets '-f'.

> While at it, add the PARSE_OPT_NOCOMPLETE flag to the force command line
> option as forcing a removal may lose files.
>
> The short form of "-f" is already documented in the man page,
> so we do not have to adjust the docs.

Makes sense. A possible rewrite (of the entire commit message):

worktree: remove: recognize -f as short for --force

Many commands support a --force option, frequently abbreviated as
-f, however, "git worktree remove"'s hand-rolled OPT_BOOL forgets
to recognize the short form, despite git-worktree.txt documenting
-f as supported. Replace OPT_BOOL with OPT__FORCE, which provides
-f for free, and makes 'remove' consistent with 'add' option
parsing (which also specifies the PARSE_OPT_NOCOMPLETE flag).

> Helped-by: Eric Sunshine 
> Signed-off-by: Stefan Beller 

The patch itself looks good. Thanks. With or without the above rewrite
or minor adjustment, this patch is:

Reviewed-by: Eric Sunshine 


Re: [PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-17 Thread Ben Peart



On 4/17/2018 12:34 PM, Jameson Miller wrote:


100K

Test   baseline [4]   block_allocation
  

0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)0.02(0.01+0.01) 
-33.3%

1M:

Test   baseline   block_allocation
  

0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)0.17(0.07+0.09) 
-26.1%

2M:

Test   baseline   block_allocation
  

0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)0.39(0.17+0.20) 
-13.3%


100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries.


I see a 33% change with 100K files which is a substantial improvement 
even in the 100K case.  I do see that the actual wall clock savings 
aren't nearly as much with a small repo as it is with the larger repos 
which makes sense.


Re: [PATCH 8/8] gpg-interface: handle alternative signature types

2018-04-17 Thread Taylor Blau
On Tue, Apr 17, 2018 at 12:08:20PM -0600, Ben Toews wrote:
> On Mon, Apr 16, 2018 at 7:54 PM, Junio C Hamano  wrote:
> > "brian m. carlson"  writes:
> >
> >> If we just want to add gpgsm support, that's fine, but we should be
> >> transparent about that fact and try to avoid making an interface which
> >> is at once too generic and not generic enough.
>
> [...]
>
> My motivation with this series is not just to "add gpgsm support"
> though. I've been working on some other CMS tooling that will be open
> source eventually. My goal was to distribute this tool with a wrapper
> that emulates the GnuPG interface.
>
> To me, this series feels like a good stepping stone towards more
> generalized support for other tooling.

I agree with Ben's assessment. A stand-in tool for gpgsm support will
not be useful without a way to configure it with Git. I think that
treating this series as ``add support for _gpgsm-like tools_'' is
sensible, and a reasonable compromise between:

  - More generalized support.
  - No support at all.

Thanks,
Taylor


[PATCH] worktree: accept -f as short for --force for removal

2018-04-17 Thread Stefan Beller
The force flag is very common in many commands and is commonly
abbreviated with '-f', so add that to worktree removal, too by using
OPT__FORCE instead of a self cooked OPT_BOOL for force.
While at it, add the PARSE_OPT_NOCOMPLETE flag to the force command line
option as forcing a removal may lose files.

The short form of "-f" is already documented in the man page,
so we do not have to adjust the docs.

Helped-by: Eric Sunshine 
Signed-off-by: Stefan Beller 
---

> Should this be using OPT__FORCE, rather than OPT_BOOL, to be
> consistent with worktree.c:add()?

yes it should.

> Also, can you amend the commit message to mention that "git worktree
> remove -f" was already documented, and that it was only the
> implementation which was lacking? Thanks.

done.
I tried to clear up the docs, but after reading them a couple times,
it looks good as-is.

 Documentation/git-worktree.txt | 2 +-
 builtin/worktree.c | 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-worktree.txt b/Documentation/git-worktree.txt
index e7eb24ab85..99b713c849 100644
--- a/Documentation/git-worktree.txt
+++ b/Documentation/git-worktree.txt
@@ -14,7 +14,7 @@ SYNOPSIS
 'git worktree lock' [--reason ] 
 'git worktree move'  
 'git worktree prune' [-n] [-v] [--expire ]
-'git worktree remove' [--force] 
+'git worktree remove' [-f] 
 'git worktree unlock' 
 
 DESCRIPTION
diff --git a/builtin/worktree.c b/builtin/worktree.c
index 40a438ed6c..30647b30c5 100644
--- a/builtin/worktree.c
+++ b/builtin/worktree.c
@@ -783,8 +783,9 @@ static int remove_worktree(int ac, const char **av, const 
char *prefix)
 {
int force = 0;
struct option options[] = {
-   OPT_BOOL(0, "force", ,
-N_("force removing even if the worktree is dirty")),
+   OPT__FORCE(,
+N_("force removing even if the worktree is dirty"),
+PARSE_OPT_NOCOMPLETE),
OPT_END()
};
struct worktree **worktrees, *wt;
-- 
2.17.0.255.g8bfb7c0704



[PATCH/RFC] completion: complete all possible -no-

2018-04-17 Thread Nguyễn Thái Ngọc Duy
This is not a complete topic but I'd like to present the problem and
my approach to see if it's a good way to go.

We have started recently to rely on parse-options to help complete
options. One of the leftover items is allowing completing --no- form.
This patch enables that (some options should not have --no- form, but
that's easy not included here).

The problem with completing --no- form is that the number of
completable options now usually doubles, taking precious screen space
and also making it hard to find the option you want.

So the other half of this patch, the part in git-completion.bash, is
to uncomplete --no- options. When you do "git checkout --",
instead of displaying all --no- options, this patch simply displays
one item: the --no- prefix. If you do "git checkout --no-" then
all negative options are displayed. This helps reduce completable
options quite efficiently.

Of course life is not that simple, we do have --no- options by default
sometimes (taking priority over the positive form), e.g. "git clone
--no-checkout". Collapsing all --no-options into --no- would be a
regression.

To avoid it, the order of options --git-completion-helper returns does
matter. The first 4 negative options are not collapsed. Only options
after the 4th are. Extra --no- options are always printed at the end,
after all the --no- defined in struct option, this kinda works. Not
pretty but works.

After all this "git checkout --" now looks like this

> ~/w/git $ git co --
--conflict=   --orphan=
--detach  --ours 
--ignore-other-worktrees  --patch 
--ignore-skip-worktree-bits   --progress 
--merge   --quiet 
--no- --recurse-submodules 
--no-detach   --theirs 
--no-quiet--track 
--no-track

and all the no options

> ~/w/git $ git co --no-
--no-conflict--no-patch 
--no-detach  --no-progress 
--no-ignore-other-worktrees  --no-quiet 
--no-ignore-skip-worktree-bits   --no-recurse-submodules 
--no-merge   --no-track 
--no-orphan  

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 contrib/completion/git-completion.bash | 25 +++-
 parse-options.c| 40 +++---
 2 files changed, 60 insertions(+), 5 deletions(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index a757073945..85b9f24465 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -266,7 +266,7 @@ __gitcomp ()
case "$cur_" in
--*=)
;;
-   *)
+   --no-*)
local c i=0 IFS=$' \t\n'
for c in $1; do
c="$c${4-}"
@@ -279,6 +279,29 @@ __gitcomp ()
fi
done
;;
+   *)
+   local c i=0 IFS=$' \t\n' n=0
+   for c in $1; do
+   c="$c${4-}"
+   if [[ $c == "$cur_"* ]]; then
+   case $c in
+   --*=*|*.) ;;
+   --no-*)
+   n=$(($n+1))
+   if [ "$n" -eq 4 ]; then
+   c="--no-${4-} "
+   elif [ "$n" -gt 4 ]; then
+   continue
+   else
+   c="$c "
+   fi
+   ;;
+   *) c="$c " ;;
+   esac
+   COMPREPLY[i++]="${2-}$c"
+   fi
+   done
+   ;;
esac
 }
 
diff --git a/parse-options.c b/parse-options.c
index 0f7059a8ab..f6cd7ca8d2 100644
--- a/parse-options.c
+++ b/parse-options.c
@@ -427,13 +427,11 @@ void parse_options_start(struct parse_opt_ctx_t *ctx,
parse_options_check(options);
 }
 
-/*
- * TODO: we are not completing the --no-XXX form yet because there are
- * many options that do not suppress it properly.
- */
 static int show_gitcomp(struct parse_opt_ctx_t *ctx,
const struct option *opts)
 {
+   const struct option *original_opts = opts;
+
for (; opts->type != OPTION_END; opts++) {
const char *suffix = "";
 
@@ -465,6 +463,40 @@ static int show_gitcomp(struct parse_opt_ctx_t *ctx,
suffix = "=";
printf(" --%s%s", opts->long_name, suffix);
}
+   for (opts = original_opts; opts->type != OPTION_END; opts++) {
+   int has_unset_form = 0;
+
+   

[RFC PATCH 05/12] commit-graph: check fanout and lookup table

2018-04-17 Thread Derrick Stolee
While running 'git commit-graph check', verify that the object IDs
are listed in lexicographic order and that the fanout table correctly
navigates into that list of object IDs.

In anticipation of checking the commits in the commit-graph file
against the object database, parse the commits from that file in
advance. We perform this parse now to ensure the object cache contains
only commits from this commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 34 ++
 1 file changed, 34 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 6d0d303a7a..6e3c08cd5c 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -835,6 +835,9 @@ static int check_commit_graph_error;
 
 int check_commit_graph(struct commit_graph *g)
 {
+   uint32_t i, cur_fanout_pos = 0;
+   struct object_id prev_oid, cur_oid;
+
if (!g) {
graph_report(_("no commit-graph file loaded"));
return 1;
@@ -859,5 +862,36 @@ int check_commit_graph(struct commit_graph *g)
if (g->hash_len != GRAPH_OID_LEN)
graph_report(_("commit-graph has incorrect hash length: %d"), 
g->hash_len);
 
+   for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit;
+
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   if (i > 0 && oidcmp(_oid, _oid) >= 0)
+   graph_report(_("commit-graph has incorrect oid order: 
%s then %s"),
+
+   oid_to_hex(_oid),
+   oid_to_hex(_oid));
+   oidcpy(_oid, _oid);
+
+   while (cur_oid.hash[0] > cur_fanout_pos) {
+   uint32_t fanout_value = get_be32(g->chunk_oid_fanout + 
cur_fanout_pos);
+   if (i != fanout_value)
+   graph_report(_("commit-graph has incorrect 
fanout value: fanout[%d] = %u != %u"),
+cur_fanout_pos, fanout_value, i);
+
+   cur_fanout_pos++;
+   }
+
+   graph_commit = lookup_commit(_oid);
+
+   if (!parse_commit_in_graph_one(g, graph_commit))
+   graph_report(_("failed to parse %s from commit-graph"), 
oid_to_hex(_oid));
+
+   if (graph_commit->graph_pos != i)
+   graph_report(_("graph_pos for commit %s is %u != %u"), 
oid_to_hex(_oid),
+graph_commit->graph_pos, i);
+   }
+
return check_commit_graph_error;
 }
-- 
2.17.0.39.g685157f7fb



[RFC PATCH 00/12] Integrate commit-graph into 'fsck' and 'gc'

2018-04-17 Thread Derrick Stolee
The commit-graph feature is not useful to end users until the
commit-graph file is maintained automatically by Git during normal
upkeep operations. One natural place to trigger this write is during
'git gc'.

Before automatically generating a commit-graph, we need to be able to
verify the contents of a commit-graph file. Integrate commit-graph
checks into 'fsck' that check the commit-graph contents against commits
in the object database.

Things to think about:

* Are these the right integration points?

* gc.commitGraph defaults to true right now for the purpose of testing,
  but may not be required to start. The goal is to have this default to
  true eventually, but we may want to delay that until the feature is
  stable.

* I implement a "--reachable" option to 'git commit-graph write' that
  iterates over all refs. This does the same as 

git show-ref -s | git commit-graph write --stdin-commits

  but I don't know how to pipe two child processes together inside of Git.
  Perhaps this is a better solution, anyway.

What other things should I be considering in this case? I'm unfamiliar
with the inner-workings of 'fsck' and 'gc', so this is a new space for me.

This RFC is based on v3 of ds/generation-numbers, and the first commit
is a fixup! based on a bug in that version that I caught while prepping
this series.

Thanks,
-Stolee

Derrick Stolee (12):
  fixup! commit-graph: always load commit-graph information
  commit-graph: add 'check' subcommand
  commit-graph: check file header information
  commit-graph: parse commit from chosen graph
  commit-graph: check fanout and lookup table
  commit: force commit to parse from object database
  commit-graph: load a root tree from specific graph
  commit-graph: verify commit contents against odb
  fsck: check commit-graph
  commit-graph: add '--reachable' option
  gc: automatically write commit-graph files
  commit-graph: update design document

 Documentation/git-commit-graph.txt   |  15 +-
 Documentation/git-gc.txt |   4 +
 Documentation/technical/commit-graph.txt |   9 --
 builtin/commit-graph.c   |  79 +-
 builtin/fsck.c   |  13 ++
 builtin/gc.c |   8 +
 commit-graph.c   | 178 ++-
 commit-graph.h   |   2 +
 commit.c |  14 +-
 commit.h |   1 +
 t/t5318-commit-graph.sh  |  15 ++
 11 files changed, 311 insertions(+), 27 deletions(-)

-- 
2.17.0.39.g685157f7fb



[RFC PATCH 03/12] commit-graph: check file header information

2018-04-17 Thread Derrick Stolee
During a run of 'git commit-graph check', list the issues with the
header information in the commit-graph file. Some of this information
is inferred from the loaded 'struct commit_graph'.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index cd0634bba0..c5e5a0f860 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -820,7 +820,34 @@ void write_commit_graph(const char *obj_dir,
oids.nr = 0;
 }
 
+static int check_commit_graph_error;
+#define graph_report(...) { check_commit_graph_error = 1; printf(__VA_ARGS__); 
}
+
 int check_commit_graph(struct commit_graph *g)
 {
-   return !g;
+   if (!g) {
+   graph_report(_("no commit-graph file loaded"));
+   return 1;
+   }
+
+   check_commit_graph_error = 0;
+
+   if (get_be32(g->data) != GRAPH_SIGNATURE)
+   graph_report(_("commit-graph file has incorrect header"));
+
+   if (*(g->data + 4) != 1)
+   graph_report(_("commit-graph file version is not 1"));
+   if (*(g->data + 5) != GRAPH_OID_VERSION)
+   graph_report(_("commit-graph OID version is not 1 (SHA1)"));
+
+   if (!g->chunk_oid_fanout)
+   graph_report(_("commit-graph is missing the OID Fanout chunk"));
+   if (!g->chunk_oid_lookup)
+   graph_report(_("commit-graph is missing the OID Lookup chunk"));
+   if (!g->chunk_commit_data)
+   graph_report(_("commit-graph is missing the Commit Data 
chunk"));
+   if (g->hash_len != GRAPH_OID_LEN)
+   graph_report(_("commit-graph has incorrect hash length: %d"), 
g->hash_len);
+
+   return check_commit_graph_error;
 }
-- 
2.17.0.39.g685157f7fb



[RFC PATCH 11/12] gc: automatically write commit-graph files

2018-04-17 Thread Derrick Stolee
The commit-graph file is a very helpful feature for speeding up git
operations. In order to make it more useful, write the commit-graph file
by default during standard garbage collection operations.

Add a 'gc.commitGraph' config setting that triggers writing a
commit-graph file after any non-trivial 'git gc' command.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-gc.txt | 4 
 builtin/gc.c | 8 
 2 files changed, 12 insertions(+)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 571b5a7e3c..17dd654a59 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -119,6 +119,10 @@ The optional configuration variable `gc.packRefs` 
determines if
 it within all non-bare repos or it can be set to a boolean value.
 This defaults to true.
 
+The optional configuration variable 'gc.commitGraph' determines if
+'git gc' runs 'git commit-graph write'. This can be set to a boolean
+value. This defaults to false.
+
 The optional configuration variable `gc.aggressiveWindow` controls how
 much time is spent optimizing the delta compression of the objects in
 the repository when the --aggressive option is specified.  The larger
diff --git a/builtin/gc.c b/builtin/gc.c
index 77fa720bd0..070f2a7a3d 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -34,6 +34,7 @@ static int aggressive_depth = 50;
 static int aggressive_window = 250;
 static int gc_auto_threshold = 6700;
 static int gc_auto_pack_limit = 50;
+static int gc_commit_graph = 1;
 static int detach_auto = 1;
 static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
@@ -46,6 +47,7 @@ static struct argv_array repack = ARGV_ARRAY_INIT;
 static struct argv_array prune = ARGV_ARRAY_INIT;
 static struct argv_array prune_worktrees = ARGV_ARRAY_INIT;
 static struct argv_array rerere = ARGV_ARRAY_INIT;
+static struct argv_array commit_graph = ARGV_ARRAY_INIT;
 
 static struct tempfile *pidfile;
 static struct lock_file log_lock;
@@ -121,6 +123,7 @@ static void gc_config(void)
git_config_get_int("gc.aggressivedepth", _depth);
git_config_get_int("gc.auto", _auto_threshold);
git_config_get_int("gc.autopacklimit", _auto_pack_limit);
+   git_config_get_bool("gc.commitgraph", _commit_graph);
git_config_get_bool("gc.autodetach", _auto);
git_config_get_expiry("gc.pruneexpire", _expire);
git_config_get_expiry("gc.worktreepruneexpire", 
_worktrees_expire);
@@ -374,6 +377,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
argv_array_pushl(, "prune", "--expire", NULL);
argv_array_pushl(_worktrees, "worktree", "prune", "--expire", 
NULL);
argv_array_pushl(, "rerere", "gc", NULL);
+   argv_array_pushl(_graph, "commit-graph", "write", "--reachable", 
NULL);
 
/* default expiry time, overwritten in gc_config */
gc_config();
@@ -480,6 +484,10 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
if (pack_garbage.nr > 0)
clean_pack_garbage();
 
+   if (gc_commit_graph)
+   if (run_command_v_opt(commit_graph.argv, RUN_GIT_CMD))
+   return error(FAILED_RUN, commit_graph.argv[0]);
+
if (auto_gc && too_many_loose_objects())
warning(_("There are too many unreachable loose objects; "
"run 'git prune' to remove them."));
-- 
2.17.0.39.g685157f7fb



[RFC PATCH 02/12] commit-graph: add 'check' subcommand

2018-04-17 Thread Derrick Stolee
If the commit-graph file becomes corrupt, we need a way to verify
its contents match the object database. In the manner of 'git fsck'
we will implement a 'git commit-graph check' subcommand to report
all issues with the file.

Add the 'check' subcommand to the 'commit-graph' builtin and its
documentation. Add a simple test that ensures the command returns
a zero error code.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  7 +-
 builtin/commit-graph.c | 38 ++
 commit-graph.c |  5 
 commit-graph.h |  2 ++
 t/t5318-commit-graph.sh|  5 
 5 files changed, 56 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 4c97b555cc..93c7841ba2 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -9,10 +9,10 @@ git-commit-graph - Write and verify Git commit graph files
 SYNOPSIS
 
 [verse]
+'git commit-graph check' [--object-dir ]
 'git commit-graph read' [--object-dir ]
 'git commit-graph write'  [--object-dir ]
 
-
 DESCRIPTION
 ---
 
@@ -52,6 +52,11 @@ existing commit-graph file.
 Read a graph file given by the commit-graph file and output basic
 details about the graph file. Used for debugging purposes.
 
+'check'::
+
+Read the commit-graph file and verify its contents against the object
+database. Used to check for corrupted data.
+
 
 EXAMPLES
 
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 37420ae0fd..77c1a04932 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -7,11 +7,17 @@
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
+   N_("git commit-graph check [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
NULL
 };
 
+static const char * const builtin_commit_graph_check_usage[] = {
+   N_("git commit-graph check [--object-dir ]"),
+   NULL
+};
+
 static const char * const builtin_commit_graph_read_usage[] = {
N_("git commit-graph read [--object-dir ]"),
NULL
@@ -29,6 +35,36 @@ static struct opts_commit_graph {
int append;
 } opts;
 
+
+static int graph_check(int argc, const char **argv)
+{
+   struct commit_graph *graph = 0;
+   char *graph_name;
+
+   static struct option builtin_commit_graph_check_options[] = {
+   OPT_STRING(0, "object-dir", _dir,
+  N_("dir"),
+  N_("The object directory to store the graph")),
+   OPT_END(),
+   };
+
+   argc = parse_options(argc, argv, NULL,
+builtin_commit_graph_check_options,
+builtin_commit_graph_check_usage, 0);
+
+   if (!opts.obj_dir)
+   opts.obj_dir = get_object_directory();
+
+   graph_name = get_commit_graph_filename(opts.obj_dir);
+   graph = load_commit_graph_one(graph_name);
+
+   if (!graph)
+   die("graph file %s does not exist", graph_name);
+   FREE_AND_NULL(graph_name);
+
+   return check_commit_graph(graph);
+}
+
 static int graph_read(int argc, const char **argv)
 {
struct commit_graph *graph = NULL;
@@ -160,6 +196,8 @@ int cmd_commit_graph(int argc, const char **argv, const 
char *prefix)
 PARSE_OPT_STOP_AT_NON_OPTION);
 
if (argc > 0) {
+   if (!strcmp(argv[0], "check"))
+   return graph_check(argc, argv);
if (!strcmp(argv[0], "read"))
return graph_read(argc, argv);
if (!strcmp(argv[0], "write"))
diff --git a/commit-graph.c b/commit-graph.c
index 3f0c142603..cd0634bba0 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -819,3 +819,8 @@ void write_commit_graph(const char *obj_dir,
oids.alloc = 0;
oids.nr = 0;
 }
+
+int check_commit_graph(struct commit_graph *g)
+{
+   return !g;
+}
diff --git a/commit-graph.h b/commit-graph.h
index 96cccb10f3..e8c8d99dff 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -53,4 +53,6 @@ void write_commit_graph(const char *obj_dir,
int nr_commits,
int append);
 
+int check_commit_graph(struct commit_graph *g);
+
 #endif
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index 77d85aefe7..e91053271a 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -230,4 +230,9 @@ test_expect_success 'perform fast-forward merge in full 
repo' '
test_cmp expect output
 '
 
+test_expect_success 'git commit-graph check' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git commit-graph check >output
+'
+
 test_done
-- 
2.17.0.39.g685157f7fb



[RFC PATCH 08/12] commit-graph: verify commit contents against odb

2018-04-17 Thread Derrick Stolee
When running 'git commit-graph check', compare the contents of the
commits that are loaded from the commit-graph file with commits that are
loaded directly from the object database. This includes checking the
root tree object ID, commit date, and parents.

In addition, verify the generation number calculation is correct for all
commits in the commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 82 ++
 1 file changed, 82 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 80a2ac2a6a..b5bce2bac4 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -899,5 +899,87 @@ int check_commit_graph(struct commit_graph *g)
 graph_commit->graph_pos, i);
}
 
+   for (i = 0; i < g->num_commits; i++) {
+   struct commit *graph_commit, *odb_commit;
+   struct commit_list *graph_parents, *odb_parents;
+   int num_parents = 0;
+
+   hashcpy(cur_oid.hash, g->chunk_oid_lookup + g->hash_len * i);
+
+   graph_commit = lookup_commit(_oid);
+   odb_commit = (struct commit *)create_object(cur_oid.hash, 
alloc_commit_node());
+   if (parse_commit_internal(odb_commit, 0, 0))
+   graph_report(_("failed to parse %s from object 
database"), oid_to_hex(_oid));
+
+   if (oidcmp(_commit_tree_in_graph_one(g, 
graph_commit)->object.oid,
+  get_commit_tree_oid(odb_commit)))
+   graph_report(_("root tree object ID for commit %s in 
commit-graph is %s != %s"),
+oid_to_hex(_oid),
+
oid_to_hex(get_commit_tree_oid(graph_commit)),
+
oid_to_hex(get_commit_tree_oid(odb_commit)));
+
+   if (graph_commit->date != odb_commit->date)
+   graph_report(_("commit date for commit %s in 
commit-graph is %"PRItime" != %"PRItime""),
+oid_to_hex(_oid),
+graph_commit->date,
+odb_commit->date);
+
+
+   graph_parents = graph_commit->parents;
+   odb_parents = odb_commit->parents;
+
+   while (graph_parents) {
+   num_parents++;
+
+   if (odb_parents == NULL)
+   graph_report(_("commit-graph parent list for 
commit %s is too long (%d)"),
+oid_to_hex(_oid),
+num_parents);
+
+   if (oidcmp(_parents->item->object.oid, 
_parents->item->object.oid))
+   graph_report(_("commit-graph parent for %s is 
%s != %s"),
+oid_to_hex(_oid),
+
oid_to_hex(_parents->item->object.oid),
+
oid_to_hex(_parents->item->object.oid));
+
+   graph_parents = graph_parents->next;
+   odb_parents = odb_parents->next;
+   }
+
+   if (odb_parents != NULL)
+   graph_report(_("commit-graph parent list for commit %s 
terminates early"),
+oid_to_hex(_oid));
+
+   if (graph_commit->generation) {
+   uint32_t max_generation = 0;
+   graph_parents = graph_commit->parents;
+
+   while (graph_parents) {
+   if (graph_parents->item->generation == 
GENERATION_NUMBER_ZERO ||
+   graph_parents->item->generation == 
GENERATION_NUMBER_INFINITY)
+   graph_report(_("commit-graph has valid 
generation for %s but not its parent, %s"),
+oid_to_hex(_oid),
+
oid_to_hex(_parents->item->object.oid));
+   if (graph_parents->item->generation > 
max_generation)
+   max_generation = 
graph_parents->item->generation;
+   graph_parents = graph_parents->next;
+   }
+
+   if (graph_commit->generation != max_generation + 1)
+   graph_report(_("commit-graph has incorrect 
generation for %s"),
+oid_to_hex(_oid));
+   } else {
+   graph_parents = graph_commit->parents;
+
+   while (graph_parents) {
+   if (graph_parents->item->generation)
+   graph_report(_("commit-graph has 
generation ZERO for %s but not its parent, %s"),

[RFC PATCH 09/12] fsck: check commit-graph

2018-04-17 Thread Derrick Stolee
If a commit-graph file exists, check its contents during 'git fsck'.

Signed-off-by: Derrick Stolee 
---
 builtin/fsck.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index ef78c6c00c..9712f230ba 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -16,6 +16,7 @@
 #include "streaming.h"
 #include "decorate.h"
 #include "packfile.h"
+#include "run-command.h"
 
 #define REACHABLE 0x0001
 #define SEEN  0x0002
@@ -45,6 +46,7 @@ static int name_objects;
 #define ERROR_REACHABLE 02
 #define ERROR_PACK 04
 #define ERROR_REFS 010
+#define ERROR_COMMIT_GRAPH 020
 
 static const char *describe_object(struct object *obj)
 {
@@ -815,5 +817,16 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
}
 
check_connectivity();
+
+   if (core_commit_graph) {
+   struct child_process commit_graph_check = CHILD_PROCESS_INIT;
+   const char *check_argv[] = { "commit-graph", "check", NULL, 
NULL };
+   commit_graph_check.argv = check_argv;
+   commit_graph_check.git_cmd = 1;
+
+   if (run_command(_graph_check))
+   errors_found |= ERROR_COMMIT_GRAPH;
+   }
+
return errors_found;
 }
-- 
2.17.0.39.g685157f7fb



[RFC PATCH 06/12] commit: force commit to parse from object database

2018-04-17 Thread Derrick Stolee
In anticipation of checking commit-graph file contents against the
object database, create parse_commit_internal() to allow side-stepping
the commit-graph file and parse directly from the object database.

Due to the use of generation numbers, this method should not be called
unless the intention is explicit in avoiding commits from the
commit-graph file.

Signed-off-by: Derrick Stolee 
---
 commit.c | 14 ++
 commit.h |  1 +
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index 9ef6f699bd..07752d8503 100644
--- a/commit.c
+++ b/commit.c
@@ -392,7 +392,8 @@ int parse_commit_buffer(struct commit *item, const void 
*buffer, unsigned long s
return 0;
 }
 
-int parse_commit_gently(struct commit *item, int quiet_on_missing)
+
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph)
 {
enum object_type type;
void *buffer;
@@ -403,17 +404,17 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return -1;
if (item->object.parsed)
return 0;
-   if (parse_commit_in_graph(item))
+   if (use_commit_graph && parse_commit_in_graph(item))
return 0;
buffer = read_sha1_file(item->object.oid.hash, , );
if (!buffer)
return quiet_on_missing ? -1 :
error("Could not read %s",
-oid_to_hex(>object.oid));
+   oid_to_hex(>object.oid));
if (type != OBJ_COMMIT) {
free(buffer);
return error("Object %s not a commit",
-oid_to_hex(>object.oid));
+   oid_to_hex(>object.oid));
}
ret = parse_commit_buffer(item, buffer, size, 0);
if (save_commit_buffer && !ret) {
@@ -424,6 +425,11 @@ int parse_commit_gently(struct commit *item, int 
quiet_on_missing)
return ret;
 }
 
+int parse_commit_gently(struct commit *item, int quiet_on_missing)
+{
+   return parse_commit_internal(item, quiet_on_missing, 1);
+}
+
 void parse_commit_or_die(struct commit *item)
 {
if (parse_commit(item))
diff --git a/commit.h b/commit.h
index b5afde1ae9..5fde74fcd7 100644
--- a/commit.h
+++ b/commit.h
@@ -73,6 +73,7 @@ struct commit *lookup_commit_reference_by_name(const char 
*name);
 struct commit *lookup_commit_or_die(const struct object_id *oid, const char 
*ref_name);
 
 int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long 
size, int check_graph);
+int parse_commit_internal(struct commit *item, int quiet_on_missing, int 
use_commit_graph);
 int parse_commit_gently(struct commit *item, int quiet_on_missing);
 static inline int parse_commit(struct commit *item)
 {
-- 
2.17.0.39.g685157f7fb



[RFC PATCH 01/12] fixup! commit-graph: always load commit-graph information

2018-04-17 Thread Derrick Stolee
---
 commit-graph.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/commit-graph.c b/commit-graph.c
index 21e853c21a..3f0c142603 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -304,7 +304,7 @@ static int find_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
*pos = item->graph_pos;
return 1;
} else {
-   return bsearch_graph(commit_graph, &(item->object.oid), pos);
+   return bsearch_graph(g, &(item->object.oid), pos);
}
 }
 
-- 
2.17.0.39.g685157f7fb



[RFC PATCH 10/12] commit-graph: add '--reachable' option

2018-04-17 Thread Derrick Stolee
When writing commit-graph files, it can be convenient to ask for all
reachable commits (starting at the ref set) in the resulting file. This
is particularly helpful when writing to stdin is complicated, such as a
future integration with 'git gc' which will call
'git commit-graph write --reachable' after performing cleanup of the
object database.

Signed-off-by: Derrick Stolee 
---
 Documentation/git-commit-graph.txt |  8 --
 builtin/commit-graph.c | 41 +++---
 t/t5318-commit-graph.sh| 10 
 3 files changed, 53 insertions(+), 6 deletions(-)

diff --git a/Documentation/git-commit-graph.txt 
b/Documentation/git-commit-graph.txt
index 93c7841ba2..1b14d40590 100644
--- a/Documentation/git-commit-graph.txt
+++ b/Documentation/git-commit-graph.txt
@@ -37,12 +37,16 @@ Write a commit graph file based on the commits found in 
packfiles.
 +
 With the `--stdin-packs` option, generate the new commit graph by
 walking objects only in the specified pack-indexes. (Cannot be combined
-with --stdin-commits.)
+with --stdin-commits or --reachable.)
 +
 With the `--stdin-commits` option, generate the new commit graph by
 walking commits starting at the commits specified in stdin as a list
 of OIDs in hex, one OID per line. (Cannot be combined with
---stdin-packs.)
+--stdin-packs or --reachable.)
++
+With the `--reachable` option, generate the new commit graph by walking
+commits starting at all refs. (Cannot be combined with --stdin-commits
+or --stind-packs.)
 +
 With the `--append` option, include all commits that are present in the
 existing commit-graph file.
diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
index 77c1a04932..a89285ada8 100644
--- a/builtin/commit-graph.c
+++ b/builtin/commit-graph.c
@@ -3,13 +3,14 @@
 #include "dir.h"
 #include "lockfile.h"
 #include "parse-options.h"
+#include "refs.h"
 #include "commit-graph.h"
 
 static char const * const builtin_commit_graph_usage[] = {
N_("git commit-graph [--object-dir ]"),
N_("git commit-graph check [--object-dir ]"),
N_("git commit-graph read [--object-dir ]"),
-   N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
@@ -24,12 +25,13 @@ static const char * const builtin_commit_graph_read_usage[] 
= {
 };
 
 static const char * const builtin_commit_graph_write_usage[] = {
-   N_("git commit-graph write [--object-dir ] [--append] 
[--stdin-packs|--stdin-commits]"),
+   N_("git commit-graph write [--object-dir ] [--append] 
[--reachable|--stdin-packs|--stdin-commits]"),
NULL
 };
 
 static struct opts_commit_graph {
const char *obj_dir;
+   int reachable;
int stdin_packs;
int stdin_commits;
int append;
@@ -113,6 +115,25 @@ static int graph_read(int argc, const char **argv)
return 0;
 }
 
+struct hex_list {
+   char **hex_strs;
+   int hex_nr;
+   int hex_alloc;
+};
+
+static int add_ref_to_list(const char *refname,
+  const struct object_id *oid,
+  int flags, void *cb_data)
+{
+   struct hex_list *list = (struct hex_list*)cb_data;
+
+   ALLOC_GROW(list->hex_strs, list->hex_nr + 1, list->hex_alloc);
+   list->hex_strs[list->hex_nr] = xcalloc(GIT_MAX_HEXSZ + 1, 1);
+   strcpy(list->hex_strs[list->hex_nr], oid_to_hex(oid));
+   list->hex_nr++;
+   return 0;
+}
+
 static int graph_write(int argc, const char **argv)
 {
const char **pack_indexes = NULL;
@@ -127,6 +148,8 @@ static int graph_write(int argc, const char **argv)
OPT_STRING(0, "object-dir", _dir,
N_("dir"),
N_("The object directory to store the graph")),
+   OPT_BOOL(0, "reachable", ,
+   N_("start walk at all refs")),
OPT_BOOL(0, "stdin-packs", _packs,
N_("scan pack-indexes listed by stdin for commits")),
OPT_BOOL(0, "stdin-commits", _commits,
@@ -140,8 +163,8 @@ static int graph_write(int argc, const char **argv)
 builtin_commit_graph_write_options,
 builtin_commit_graph_write_usage, 0);
 
-   if (opts.stdin_packs && opts.stdin_commits)
-   die(_("cannot use both --stdin-commits and --stdin-packs"));
+   if (opts.reachable + opts.stdin_packs + opts.stdin_commits > 1)
+   die(_("use at most one of --reachable, --stdin-commits, or 
--stdin-packs"));
if (!opts.obj_dir)
opts.obj_dir = get_object_directory();
 
@@ -164,6 +187,16 @@ static int graph_write(int argc, const char **argv)
commit_hex = lines;
commits_nr = lines_nr;
}
+   } else if (opts.reachable) 

[RFC PATCH 07/12] commit-graph: load a root tree from specific graph

2018-04-17 Thread Derrick Stolee
When lazy-loading a tree for a commit, it will be important to select
the tree from a specific struct commit_graph. Create a new method that
specifies the commit-graph file and use that in
get_commit_tree_in_graph().

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 6e3c08cd5c..80a2ac2a6a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -354,14 +354,20 @@ static struct tree *load_tree_for_commit(struct 
commit_graph *g, struct commit *
return c->maybe_tree;
 }
 
-struct tree *get_commit_tree_in_graph(const struct commit *c)
+static struct tree *get_commit_tree_in_graph_one(struct commit_graph *g,
+const struct commit *c)
 {
if (c->maybe_tree)
return c->maybe_tree;
if (c->graph_pos == COMMIT_NOT_FROM_GRAPH)
BUG("get_commit_tree_in_graph called from non-commit-graph 
commit");
 
-   return load_tree_for_commit(commit_graph, (struct commit *)c);
+   return load_tree_for_commit(g, (struct commit *)c);
+}
+
+struct tree *get_commit_tree_in_graph(const struct commit *c)
+{
+   return get_commit_tree_in_graph_one(commit_graph, c);
 }
 
 static void write_graph_chunk_fanout(struct hashfile *f,
-- 
2.17.0.39.g685157f7fb



[RFC PATCH 12/12] commit-graph: update design document

2018-04-17 Thread Derrick Stolee
The commit-graph feature is now integrated with 'fsck' and 'gc',
so remove those items from the "Future Work" section of the
commit-graph design document.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 9 -
 1 file changed, 9 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index d9f2713efa..d04657b781 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -118,9 +118,6 @@ Future Work
 - The commit graph feature currently does not honor commit grafts. This can
   be remedied by duplicating or refactoring the current graft logic.
 
-- The 'commit-graph' subcommand does not have a "verify" mode that is
-  necessary for integration with fsck.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
@@ -142,12 +139,6 @@ Future Work
   such as "ensure_tree_loaded(commit)" that fully loads a tree before
   using commit->tree.
 
-- The current design uses the 'commit-graph' subcommand to generate the graph.
-  When this feature stabilizes enough to recommend to most users, we should
-  add automatic graph writes to common operations that create many commits.
-  For example, one could compute a graph on 'clone', 'fetch', or 'repack'
-  commands.
-
 - A server could provide a commit graph file as part of the network protocol
   to avoid extra calculations by clients. This feature is only of benefit if
   the user is willing to trust the file, because verifying the file is correct
-- 
2.17.0.39.g685157f7fb



[RFC PATCH 04/12] commit-graph: parse commit from chosen graph

2018-04-17 Thread Derrick Stolee
Before checking a commit-graph file against the object database, we
need to parse all commits from the given commit-graph file. Create
parse_commit_in_graph_one() to target a given struct commit_graph.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index c5e5a0f860..6d0d303a7a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -308,17 +308,27 @@ static int find_commit_in_graph(struct commit *item, 
struct commit_graph *g, uin
}
 }
 
-int parse_commit_in_graph(struct commit *item)
+int parse_commit_in_graph_one(struct commit_graph *g, struct commit *item)
 {
uint32_t pos;
 
if (item->object.parsed)
-   return 0;
+   return 1;
+
+   if (find_commit_in_graph(item, g, ))
+   return fill_commit_in_graph(item, g, pos);
+
+   return 0;
+}
+
+int parse_commit_in_graph(struct commit *item)
+{
if (!core_commit_graph)
return 0;
+
prepare_commit_graph();
-   if (commit_graph && find_commit_in_graph(item, commit_graph, ))
-   return fill_commit_in_graph(item, commit_graph, pos);
+   if (commit_graph)
+   return parse_commit_in_graph_one(commit_graph, item);
return 0;
 }
 
-- 
2.17.0.39.g685157f7fb



Re: [PATCH 8/8] gpg-interface: handle alternative signature types

2018-04-17 Thread Ben Toews
On Mon, Apr 16, 2018 at 7:54 PM, Junio C Hamano  wrote:
> "brian m. carlson"  writes:
>
>> If we just want to add gpgsm support, that's fine, but we should be
>> transparent about that fact and try to avoid making an interface which
>> is at once too generic and not generic enough.

This patch is definitely designed around PGP and CMS, but the config
options were intended to leave room for supporting other tools in the
future. I think allowing a PEM type to be specified makes a lot of
sense for PGP and CMS, but in the future, we can add a
`signingtool..regex` option. Similarly, the GnuPG specific
command line options and output parsing can be moved into a helper in
the future.

My motivation with this series is not just to "add gpgsm support"
though. I've been working on some other CMS tooling that will be open
source eventually. My goal was to distribute this tool with a wrapper
that emulates the GnuPG interface.

To me, this series feels like a good stepping stone towards more
generalized support for other tooling.

> One thing that makes me somewhat worried is that "add gpgsm support"
> may mean "don't encourage people to use the same PGP like everybody
> else does" and instead promote fragmenting the world.

There are a lot of projects for which PGP doesn't make sense. For
example, many large organizations and government entities don't
operate based on a web of trust, but have established PKI based on
centralized trust. For organizations like this, adopting commit/tag
signing with CMS would be far easier than adopting PGP signing.

There's a chance that 10 different software projects will end up using
10 different signing tools, but I don't see that as a problem if those
tools are well suited to the projects. Developers are already
responsible for learning how to work with the software projects they
contribute to.


Re: What's cooking in git.git (Apr 2018, #02; Tue, 17)

2018-04-17 Thread Stefan Beller
Hi Junio,

> --
> [New Topics]

> * sb/object-store-replace (2018-04-12) 15 commits
...
>  The effort to pass the repository in-core structure throughout the
>  API continues.  This round deals with the code that implements the
>  refs/replace/ mechanism.
>
>  What's the doneness of this thing?  I didn't recall seeing any
>  response, especially ones that demonstrated the reviewer carefully
>  read and thought about the issues surrounding the code.  Not that I
>  spotted any problems in these patches myself, though.

Stolee and Brandon provided a "quick LGTM" type of review
https://public-inbox.org/git/20180409232536.gb102...@google.com/
https://public-inbox.org/git/9ddfee7e-025a-79c9-8d6b-700c65a14...@gmail.com/

I do not recall an in depth review, though Rene had some design guidance
in form of a patch, which is also the first commit of the series
https://public-inbox.org/git/38962a15-1081-bbdb-b4c4-6b46222b5...@web.de/

My plan was to build the next series on top this week while waiting for
further review, though I wonder how much review will happen this week.
(Brandon, Jonathan Tan and Jonathan Nieder are all OOO,
Peff is on vacation, too)

I do not recall any discussion worthy design discussions left over, so
I'd lean on "cook in next for a while".

>
> --
> [Cooking]
>
> * sb/blame-color (2018-04-17) 2 commits
>  - builtin/blame: highlight recently changed lines
>  - builtin/blame: dim uninteresting metadata lines
>
>  "git blame" learns to unhighlight uninteresting metadata from the
>  originating commit on lines that are the same as the previous one,
>  and also paint lines in different colors depending on the age of
>  the commit.
>
>  The code to handle interaction between the config and command line
>  option smelled fishy.  Reviews and discussions are welcomed (not
>  just to this topic but others too ;-).

I'll look at the replies in thread there.


> * sb/submodule-move-nested (2018-03-29) 6 commits
>  - submodule: fixup nested submodules after moving the submodule
>  - submodule-config: remove submodule_from_cache
>  - submodule-config: add repository argument to submodule_from_{name, path}
>  - submodule-config: allow submodule_free to handle arbitrary repositories
>  - grep: remove "repo" arg from non-supporting funcs
>  - submodule.h: drop declaration of connect_work_tree_and_git_dir
>
>  Moving a submodule that itself has submodule in it with "git mv"
>  forgot to make necessary adjustment to the nested sub-submodules;
>  now the codepath learned to recurse into the submodules.
>
>  What's the doneness of this thing?

I considered this done a long time ago,

"All 6 patches look good to me, thanks.
 Reviewed-by: Jonathan Tan "

https://public-inbox.org/git/20180328161727.af10f596dffc8e01205c4...@google.com/


Thanks,
Stefan


Re: [BUG] git p4 clone fails when p4 sizes does not return 'fileSize' key

2018-04-17 Thread Thandesha VK
Sounds good. How about an enhanced version of fix from both of us.
This will let us know that something is not right with the file but
will not bark

$ git diff
diff --git a/git-p4.py b/git-p4.py
index 7bb9cadc6..df901976f 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -2566,7 +2566,12 @@ class P4Sync(Command, P4UserMap):
 relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
 relPath = self.encodeWithUTF8(relPath)
 if verbose:
-size = int(self.stream_file['fileSize'])
+if 'fileSize' not in self.stream_file:
+   print "WARN: File size from perforce unknown. Please
verify by p4 sizes %s" %(file['depotFile'])
+   size = "-1"
+else:
+   size = self.stream_file['fileSize']
+size = int(size)
 sys.stdout.write('\r%s --> %s (%i MB)\n' %
(file['depotFile'], relPath, size/1024/1024))
 sys.stdout.flush()


On Tue, Apr 17, 2018 at 10:33 AM, Mazo, Andrey  wrote:
> Sure, I totally agree.
> Sorry, I just wasn't clear enough in my previous email.
> I meant that your patch suppresses "%s --> %s (%i MB)" line in case 
> "fileSize" is not available,
> while my patch suppresses just "(%i MB)" portion if the "fileSize" is not 
> known.
> In other words,
>  * if "fileSize" is known:
>  ** both yours and mine patches don't change existing behavior;
>  * if "fileSize" is not known:
>  ** your patch makes streamOneP4File() not print anything;
>  ** my patch makes streamOneP4File() print "%s --> %s".
>
> Hope, I'm clearer this time.
>
> Thank you,
> Andrey
>
> From: Thandesha VK 
>> *I* think keeping the filesize info is better with --verbose option as
>> that gives some clue about the file we are working on. What do you
>> think?
>> Script has similar checks of key existence at other places where it is
>> looking for fileSize.
>>
>> On Tue, Apr 17, 2018 at 9:22 AM, Andrey Mazo  wrote:
>>> Huh, I actually have a slightly different fix for the same issue.
>>> It doesn't suppress the corresponding verbose output completely, but just 
>>> removes the size information from it.
>>>
>>> Also, I'd mention that the workaround is trivial -- simply omit the 
>>> "--verbose" option.
>>>
>>> Andrey Mazo (1):
>>>   git-p4: fix `sync --verbose` traceback due to 'fileSize'
>>>
>>>  git-p4.py | 8 ++--
>>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>>
>>>
>>> base-commit: 468165c1d8a442994a825f3684528361727cd8c0
>>> --
>>> 2.16.1
>>>
>>
>> --
>> Thanks & Regards
>> Thandesha VK | Cellphone +1 (703) 459-5386



-- 
Thanks & Regards
Thandesha VK | Cellphone +1 (703) 459-5386


Re: [PATCH v3 8/9] commit-graph: always load commit-graph information

2018-04-17 Thread Derrick Stolee

On 4/17/2018 1:00 PM, Derrick Stolee wrote:

Most code paths load commits using lookup_commit() and then
parse_commit(). In some cases, including some branch lookups, the commit
is parsed using parse_object_buffer() which side-steps parse_commit() in
favor of parse_commit_buffer().

With generation numbers in the commit-graph, we need to ensure that any
commit that exists in the commit-graph file has its generation number
loaded.

Create new load_commit_graph_info() method to fill in the information
for a commit that exists only in the commit-graph file. Call it from
parse_commit_buffer() after loading the other commit information from
the given buffer. Only fill this information when specified by the
'check_graph' parameter. This avoids duplicate work when we already
checked the graph in parse_commit_gently() or when simply checking the
buffer contents in check_commit().

Signed-off-by: Derrick Stolee 
---
  commit-graph.c | 51 --
  commit-graph.h |  8 
  commit.c   |  7 +--
  commit.h   |  2 +-
  object.c   |  2 +-
  sha1_file.c|  2 +-
  6 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 688d5b1801..21e853c21a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -245,13 +245,19 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
return _list_insert(c, pptr)->next;
  }
  
+static void fill_commit_graph_info(struct commit *item, struct commit_graph *g, uint32_t pos)

+{
+   const unsigned char *commit_data = g->chunk_commit_data + 
GRAPH_DATA_WIDTH * pos;
+   item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+}
+
  static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, 
uint32_t pos)
  {
uint32_t edge_value;
uint32_t *parent_data_ptr;
uint64_t date_low, date_high;
struct commit_list **pptr;
-   const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len 
+ 16) * pos;
+   const unsigned char *commit_data = g->chunk_commit_data + 
GRAPH_DATA_WIDTH * pos;
  
  	item->object.parsed = 1;

item->graph_pos = pos;
@@ -292,31 +298,40 @@ static int fill_commit_in_graph(struct commit *item, 
struct commit_graph *g, uin
return 1;
  }
  
+static int find_commit_in_graph(struct commit *item, struct commit_graph *g, uint32_t *pos)

+{
+   if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
+   *pos = item->graph_pos;
+   return 1;
+   } else {
+   return bsearch_graph(commit_graph, &(item->object.oid), pos);


The reference to 'commit_graph' in the above line should be 'g'. Sorry!


+   }
+}
+
  int parse_commit_in_graph(struct commit *item)
  {
+   uint32_t pos;
+
+   if (item->object.parsed)
+   return 0;
if (!core_commit_graph)
return 0;
-   if (item->object.parsed)
-   return 1;
-
prepare_commit_graph();
-   if (commit_graph) {
-   uint32_t pos;
-   int found;
-   if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
-   pos = item->graph_pos;
-   found = 1;
-   } else {
-   found = bsearch_graph(commit_graph, &(item->object.oid), 
);
-   }
-
-   if (found)
-   return fill_commit_in_graph(item, commit_graph, pos);
-   }
-
+   if (commit_graph && find_commit_in_graph(item, commit_graph, ))
+   return fill_commit_in_graph(item, commit_graph, pos);
return 0;
  }
  
+void load_commit_graph_info(struct commit *item)

+{
+   uint32_t pos;
+   if (!core_commit_graph)
+   return;
+   prepare_commit_graph();
+   if (commit_graph && find_commit_in_graph(item, commit_graph, ))
+   fill_commit_graph_info(item, commit_graph, pos);
+}
+
  static struct tree *load_tree_for_commit(struct commit_graph *g, struct 
commit *c)
  {
struct object_id oid;
diff --git a/commit-graph.h b/commit-graph.h
index 260a468e73..96cccb10f3 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -17,6 +17,14 @@ char *get_commit_graph_filename(const char *obj_dir);
   */
  int parse_commit_in_graph(struct commit *item);
  
+/*

+ * It is possible that we loaded commit contents from the commit buffer,
+ * but we also want to ensure the commit-graph content is correctly
+ * checked and filled. Fill the graph_pos and generation members of
+ * the given commit.
+ */
+void load_commit_graph_info(struct commit *item);
+
  struct tree *get_commit_tree_in_graph(const struct commit *c);
  
  struct commit_graph {

diff --git a/commit.c b/commit.c
index a70f120878..9ef6f699bd 100644
--- a/commit.c
+++ b/commit.c
@@ -331,7 +331,7 @@ const void *detach_commit_buffer(struct commit *commit, 
unsigned long *sizep)
return ret;
  }
  
-int 

Re: Optimizing writes to unchanged files during merges?

2018-04-17 Thread Jacob Keller
On Tue, Apr 17, 2018 at 10:27 AM, Lars Schneider
 wrote:
>
>> On 16 Apr 2018, at 19:45, Jacob Keller  wrote:
>>
>> On Mon, Apr 16, 2018 at 10:43 AM, Jacob Keller  
>> wrote:
>>> On Mon, Apr 16, 2018 at 9:07 AM, Lars Schneider
>>>  wrote:
 What if Git kept a LRU list that contains file path, content hash, and
 mtime of any file that is removed or modified during a checkout. If a
 file is checked out later with the exact same path and content hash,
 then Git could set the mtime to the previous value. This way the
 compiler would not think that the content has been changed since the
 last rebuild.
>>>
>>> That would only work until they actuall *did* a build on the second
>>> branch, and upon changing back, how would this detect that it needs to
>>> update mtime again? I don't think this solution really works.
>>> Ultimately, the problem is that the build tool relies on the mtime to
>>> determine what to rebuild. I think this would cause worse problems
>>> because we *wouldn't* rebuild in the case. How is git supposed to know
>>> that we rebuilt when switching branches or not?
>>>
>>> Thanks,
>>> Jake
>>
>> I think a better solution for your problem would be to extend the
>> build system you're using to avoid rebuilding when the contents
>> haven't changed since last build (possibly by using hashes?). At the
>> very least, I would not want this to be default, as it could possibly
>> result in *no* build when there should be one, which is far more
>> confusing to debug.
>
> I am 100% with you that this is a build system issue. But changing
> the build system for many teams in a large organization is really
> hard. That's why I wondered if Git could help with a shortcut.
> Looks like there is no shortcut (see my other reply in this thread).
>
> Thanks
> Lars

Right. I think that solutions involving hooks or scripts which "fix"
the mtimes are the best bet for this problem then, given that building
it into git would cause problems for other users. (And personally I
would always ere on the side of causing rebuilds unless we're 100%
sure)

Thanks,
Jake


Re: man page for "git remote set-url" seems confusing/contradictory

2018-04-17 Thread Jacob Keller
On Tue, Apr 17, 2018 at 8:34 AM, Robert P. J. Day  wrote:
> On Tue, 17 Apr 2018, Junio C Hamano wrote:
>
>> Jacob Keller  writes:
>>
>> > Things won't work so well if you set the push url and fetch url to
>> > different repositories. Git assumes that refs updated by "push"
>> > will also be reflected via "fetch".
>> >
>> > I don't know offhand what will break, but likely something will.
>> > For one, when you fetch again it will rewind your remotes after
>> > the push.
>>
>> Exactly.  I still haven't fully embraced it myself, but for a long
>> time, "git push" pretends as if it fetched from that remote and
>> updates the corresponding remote tracking branches (if you have
>> any), so you'll be inviting inconsistent behaviour if you set your
>> fetch and push URLs pointing at two logically separate places.
>
>   ... snip ...
>
>   oh, i totally buy all that now, i'm just suggesting that the man
> page might be tweaked to make that more obvious. in "man git-remote",
> under "set-url", remember that it reads:
>
> "Note that the push URL and the fetch URL, even though they can be set
> differently, must still refer to the same place."
>
>   i think it would be useful to be more specific about what "can be
> set differently" means, since a lot of readers might not immediately
> appreciate that it means just, say, the transport protocols. it never
> hurts to add that little bit of detail.
>
> rday
>
>

Maybe something like:

"Note that the push URL and the fetch URL, even though they can be set
differently, are expected to refer to the same repository. For
example, the fetch URL might use an unauthenticated transport, while
the push URL generally requires authentication" Then follow this bit
with the mention of multiple remotes.

(I think "repository" conveys the meaning better than "place".
Technically, the URLs can be completely different as long as they end
up contacting the same real server repository.)

Thanks,
Jake


Re: [BUG] git p4 clone fails when p4 sizes does not return 'fileSize' key

2018-04-17 Thread Mazo, Andrey
Sure, I totally agree.
Sorry, I just wasn't clear enough in my previous email.
I meant that your patch suppresses "%s --> %s (%i MB)" line in case "fileSize" 
is not available,
while my patch suppresses just "(%i MB)" portion if the "fileSize" is not known.
In other words,
 * if "fileSize" is known:
 ** both yours and mine patches don't change existing behavior;
 * if "fileSize" is not known:
 ** your patch makes streamOneP4File() not print anything;
 ** my patch makes streamOneP4File() print "%s --> %s".

Hope, I'm clearer this time.
 
Thank you,
Andrey

From: Thandesha VK 
> *I* think keeping the filesize info is better with --verbose option as
> that gives some clue about the file we are working on. What do you
> think?
> Script has similar checks of key existence at other places where it is
> looking for fileSize.
> 
> On Tue, Apr 17, 2018 at 9:22 AM, Andrey Mazo  wrote:
>> Huh, I actually have a slightly different fix for the same issue.
>> It doesn't suppress the corresponding verbose output completely, but just 
>> removes the size information from it.
>>
>> Also, I'd mention that the workaround is trivial -- simply omit the 
>> "--verbose" option.
>>
>> Andrey Mazo (1):
>>   git-p4: fix `sync --verbose` traceback due to 'fileSize'
>>
>>  git-p4.py | 8 ++--
>>  1 file changed, 6 insertions(+), 2 deletions(-)
>>
>>
>> base-commit: 468165c1d8a442994a825f3684528361727cd8c0
>> --
>> 2.16.1
>>
> 
> -- 
> Thanks & Regards
> Thandesha VK | Cellphone +1 (703) 459-5386

Re: Optimizing writes to unchanged files during merges?

2018-04-17 Thread Lars Schneider

> On 16 Apr 2018, at 19:45, Jacob Keller  wrote:
> 
> On Mon, Apr 16, 2018 at 10:43 AM, Jacob Keller  wrote:
>> On Mon, Apr 16, 2018 at 9:07 AM, Lars Schneider
>>  wrote:
>>> What if Git kept a LRU list that contains file path, content hash, and
>>> mtime of any file that is removed or modified during a checkout. If a
>>> file is checked out later with the exact same path and content hash,
>>> then Git could set the mtime to the previous value. This way the
>>> compiler would not think that the content has been changed since the
>>> last rebuild.
>> 
>> That would only work until they actuall *did* a build on the second
>> branch, and upon changing back, how would this detect that it needs to
>> update mtime again? I don't think this solution really works.
>> Ultimately, the problem is that the build tool relies on the mtime to
>> determine what to rebuild. I think this would cause worse problems
>> because we *wouldn't* rebuild in the case. How is git supposed to know
>> that we rebuilt when switching branches or not?
>> 
>> Thanks,
>> Jake
> 
> I think a better solution for your problem would be to extend the
> build system you're using to avoid rebuilding when the contents
> haven't changed since last build (possibly by using hashes?). At the
> very least, I would not want this to be default, as it could possibly
> result in *no* build when there should be one, which is far more
> confusing to debug.

I am 100% with you that this is a build system issue. But changing
the build system for many teams in a large organization is really
hard. That's why I wondered if Git could help with a shortcut.
Looks like there is no shortcut (see my other reply in this thread).

Thanks
Lars

Re: Optimizing writes to unchanged files during merges?

2018-04-17 Thread Lars Schneider

> On 16 Apr 2018, at 19:04, Ævar Arnfjörð Bjarmason  wrote:
> 
> 
> On Mon, Apr 16 2018, Lars Schneider wrote:
> 
>>> On 16 Apr 2018, at 04:03, Linus Torvalds  
>>> wrote:
>>> 
>>> On Sun, Apr 15, 2018 at 6:44 PM, Junio C Hamano  wrote:
 
 I think Elijah's corrected was_tracked() also does not care "has
 this been renamed".
>>> 
>>> I'm perfectly happy with the slightly smarter patches. My patch was
>>> really just an RFC and because I had tried it out.
>>> 
 One thing that makes me curious is what happens (and what we want to
 happen) when such a "we already have the changes the side branch
 tries to bring in" path has local (i.e. not yet in the index)
 changes.  For a dirty file that trivially merges (e.g. a path we
 modified since our histories forked, while the other side didn't do
 anything, has local changes in the working tree), we try hard to
 make the merge succeed while keeping the local changes, and we
 should be able to do the same in this case, too.
>>> 
>>> I think it might be nice, but probably not really worth it.
>>> 
>>> I find the "you can merge even if some files are dirty" to be really
>>> convenient, because I often keep stupid test patches in my tree that I
>>> may not even intend to commit, and I then use the same tree for
>>> merging.
>>> 
>>> For example, I sometimes end up editing the Makefile for the release
>>> version early, but I won't *commit* that until I actually cut the
>>> release. But if I pull some branch that has also changed the Makefile,
>>> it's not worth any complexity to try to be nice about the dirty state.
>>> 
>>> If it's a file that actually *has* been changed in the branch I'm
>>> merging, and I'm more than happy to just stage the patch (or throw it
>>> away - I think it's about 50:50 for me).
>>> 
>>> So I don't think it's a big deal, and I'd rather have the merge fail
>>> very early with "that file has seen changes in the branch you are
>>> merging" than add any real complexity to the merge logic.
>> 
>> I am happy to see this discussion and the patches, because long rebuilds
>> are a constant annoyance for us. We might have been bitten by the exact
>> case discussed here, but more often, we have a slightly different
>> situation:
>> 
>> An engineer works on a task branch and runs incremental builds — all
>> is good. The engineer switches to another branch to review another
>> engineer's work. This other branch changes a low-level header file,
>> but no rebuild is triggered. The engineer switches back to the previous
>> task branch. At this point, the incremental build will rebuild
>> everything, as the compiler thinks that the low-level header file has
>> been changed (because the mtime is different).
>> 
>> Of course, this problem can be solved with a separate worktree. However,
>> our engineers forget about that sometimes, and then, they are annoyed by
>> a 4h rebuild.
>> 
>> Is this situation a problem for others too?
>> If yes, what do you think about the following approach:
>> 
>> What if Git kept a LRU list that contains file path, content hash, and
>> mtime of any file that is removed or modified during a checkout. If a
>> file is checked out later with the exact same path and content hash,
>> then Git could set the mtime to the previous value. This way the
>> compiler would not think that the content has been changed since the
>> last rebuild.
>> 
>> I think that would fix the problem that our engineers run into and also
>> the problem that Linus experienced during the merge, wouldn't it?
> 
> Could what you're describing be prototyped as a post-checkout hook that
> looks at the reflog? It sounds to me like it could, but perhaps I've
> missed some subtlety.

Yeah, probably. You don't even need the reflog I think. I just wanted
to get a sense if other people run into this problem too.


> Not re-writing out a file that hasn't changed is one thing, but I think
> for more complex behaviors (such as the "I want everything to have the
> same mtime" mentioned in another thread on-list), and this, it makes
> sense to provide some hook mechanism than have git itself do all the
> work.
> 
> I also don't see how what you're describing could be generalized, or
> even be made to work reliably in the case you're describing. If the
> engineer runs "make" on this branch he's testing out that might produce
> an object file that'll get used as-is once he switches back, since
> you've set the mtime in the past for that file because you re-checked it
> out.

Ohh... you're right. I thought Visual Studio looks *just* at ctime/mtime 
of the files. But this seems not to be true [1]:
 
   "MSBuild to build it quickly checks if any of a project’s input files 
are modified later than any of the project’s outputs"

In that case my idea outlined above is garbage.

Thanks,
Lars


[1] 

Re: [BUG] git p4 clone fails when p4 sizes does not return 'fileSize' key

2018-04-17 Thread Thandesha VK
*I* think keeping the filesize info is better with --verbose option as
that gives some clue about the file we are working on. What do you
think?
Script has similar checks of key existence at other places where it is
looking for fileSize.

On Tue, Apr 17, 2018 at 9:22 AM, Andrey Mazo  wrote:
> Huh, I actually have a slightly different fix for the same issue.
> It doesn't suppress the corresponding verbose output completely, but just 
> removes the size information from it.
>
> Also, I'd mention that the workaround is trivial -- simply omit the 
> "--verbose" option.
>
> Andrey Mazo (1):
>   git-p4: fix `sync --verbose` traceback due to 'fileSize'
>
>  git-p4.py | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
>
> base-commit: 468165c1d8a442994a825f3684528361727cd8c0
> --
> 2.16.1
>



-- 
Thanks & Regards
Thandesha VK | Cellphone +1 (703) 459-5386


Re: [BUG] git p4 clone fails when p4 sizes does not return 'fileSize' key

2018-04-17 Thread Mazo, Andrey
Huh, I actually have a slightly different fix for the same issue.
It doesn't suppress the corresponding verbose output completely, but just 
removes the size information from it.
I'll (try to) post it as a reply to this email.

Also, I'd mention that the workaround is trivial -- simply omit the "--verbose" 
option.

Thank you,
Andrey

[PATCH v3 9/9] merge: check config before loading commits

2018-04-17 Thread Derrick Stolee
Now that we use generation numbers from the commit-graph, we must
ensure that all commits that exist in the commit-graph are loaded
from that file instead of from the object database. Since the
commit-graph file is only checked if core.commitGraph is true, we
must check the default config before we load any commits.

In the merge builtin, the config was checked after loading the HEAD
commit. This was due to the use of the global 'branch' when checking
merge-specific config settings.

Move the config load to be between the initialization of 'branch' and
the commit lookup.

Without this change, a fast-forward merge would hit a BUG("bad
generation skip") statement in commit.c during paint_down_to_common().
This is because the HEAD commit would be loaded with "infinite"
generation but then reached by commits with "finite" generation
numbers.

Add a test to t5318-commit-graph.sh that exercises this code path to
prevent a regression.

Signed-off-by: Derrick Stolee 
---
 builtin/merge.c | 5 +++--
 t/t5318-commit-graph.sh | 9 +
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/builtin/merge.c b/builtin/merge.c
index 5e5e4497e3..7e1da6c6ea 100644
--- a/builtin/merge.c
+++ b/builtin/merge.c
@@ -1148,13 +1148,14 @@ int cmd_merge(int argc, const char **argv, const char 
*prefix)
branch = branch_to_free = resolve_refdup("HEAD", 0, _oid, NULL);
if (branch)
skip_prefix(branch, "refs/heads/", );
+   init_diff_ui_defaults();
+   git_config(git_merge_config, NULL);
+
if (!branch || is_null_oid(_oid))
head_commit = NULL;
else
head_commit = lookup_commit_or_die(_oid, "HEAD");
 
-   init_diff_ui_defaults();
-   git_config(git_merge_config, NULL);
 
if (branch_mergeoptions)
parse_branch_merge_options(branch_mergeoptions);
diff --git a/t/t5318-commit-graph.sh b/t/t5318-commit-graph.sh
index a380419b65..77d85aefe7 100755
--- a/t/t5318-commit-graph.sh
+++ b/t/t5318-commit-graph.sh
@@ -221,4 +221,13 @@ test_expect_success 'write graph in bare repo' '
 graph_git_behavior 'bare repo with graph, commit 8 vs merge 1' bare commits/8 
merge/1
 graph_git_behavior 'bare repo with graph, commit 8 vs merge 2' bare commits/8 
merge/2
 
+test_expect_success 'perform fast-forward merge in full repo' '
+   cd "$TRASH_DIRECTORY/full" &&
+   git checkout -b merge-5-to-8 commits/5 &&
+   git merge commits/8 &&
+   git show-ref -s merge-5-to-8 >output &&
+   git show-ref -s commits/8 >expect &&
+   test_cmp expect output
+'
+
 test_done
-- 
2.17.0.39.g685157f7fb



[PATCH v3 5/9] ref-filter: use generation number for --contains

2018-04-17 Thread Derrick Stolee
A commit A can reach a commit B only if the generation number of A
is larger than the generation number of B. This condition allows
significantly short-circuiting commit-graph walks.

Use generation number for 'git tag --contains' queries.

On a copy of the Linux repository where HEAD is containd in v4.13
but no earlier tag, the command 'git tag --contains HEAD' had the
following peformance improvement:

Before: 0.81s
After:  0.04s
Rel %:  -95%

Helped-by: Jeff King 
Signed-off-by: Derrick Stolee 
---
 ref-filter.c | 23 +++
 1 file changed, 19 insertions(+), 4 deletions(-)

diff --git a/ref-filter.c b/ref-filter.c
index cffd8bf3ce..e2fea6d635 100644
--- a/ref-filter.c
+++ b/ref-filter.c
@@ -1587,7 +1587,8 @@ static int in_commit_list(const struct commit_list *want, 
struct commit *c)
  */
 static enum contains_result contains_test(struct commit *candidate,
  const struct commit_list *want,
- struct contains_cache *cache)
+ struct contains_cache *cache,
+ uint32_t cutoff)
 {
enum contains_result *cached = contains_cache_at(cache, candidate);
 
@@ -1603,6 +1604,10 @@ static enum contains_result contains_test(struct commit 
*candidate,
 
/* Otherwise, we don't know; prepare to recurse */
parse_commit_or_die(candidate);
+
+   if (candidate->generation < cutoff)
+   return CONTAINS_NO;
+
return CONTAINS_UNKNOWN;
 }
 
@@ -1618,8 +1623,18 @@ static enum contains_result contains_tag_algo(struct 
commit *candidate,
  struct contains_cache *cache)
 {
struct contains_stack contains_stack = { 0, 0, NULL };
-   enum contains_result result = contains_test(candidate, want, cache);
+   enum contains_result result;
+   uint32_t cutoff = GENERATION_NUMBER_INFINITY;
+   const struct commit_list *p;
+
+   for (p = want; p; p = p->next) {
+   struct commit *c = p->item;
+   parse_commit_or_die(c);
+   if (c->generation < cutoff)
+   cutoff = c->generation;
+   }
 
+   result = contains_test(candidate, want, cache, cutoff);
if (result != CONTAINS_UNKNOWN)
return result;
 
@@ -1637,7 +1652,7 @@ static enum contains_result contains_tag_algo(struct 
commit *candidate,
 * If we just popped the stack, parents->item has been marked,
 * therefore contains_test will return a meaningful yes/no.
 */
-   else switch (contains_test(parents->item, want, cache)) {
+   else switch (contains_test(parents->item, want, cache, cutoff)) 
{
case CONTAINS_YES:
*contains_cache_at(cache, commit) = CONTAINS_YES;
contains_stack.nr--;
@@ -1651,7 +1666,7 @@ static enum contains_result contains_tag_algo(struct 
commit *candidate,
}
}
free(contains_stack.contains_stack);
-   return contains_test(candidate, want, cache);
+   return contains_test(candidate, want, cache, cutoff);
 }
 
 static int commit_contains(struct ref_filter *filter, struct commit *commit,
-- 
2.17.0.39.g685157f7fb



[PATCH v3 4/9] commit-graph.txt: update design document

2018-04-17 Thread Derrick Stolee
We now calculate generation numbers in the commit-graph file and use
them in paint_down_to_common().

Expand the section on generation numbers to discuss how the three
special generation numbers GENERATION_NUMBER_INFINITY, _ZERO, and
_MAX interact with other generation numbers.

Signed-off-by: Derrick Stolee 
---
 Documentation/technical/commit-graph.txt | 30 +++-
 1 file changed, 24 insertions(+), 6 deletions(-)

diff --git a/Documentation/technical/commit-graph.txt 
b/Documentation/technical/commit-graph.txt
index 0550c6d0dc..d9f2713efa 100644
--- a/Documentation/technical/commit-graph.txt
+++ b/Documentation/technical/commit-graph.txt
@@ -77,6 +77,29 @@ in the commit graph. We can treat these commits as having 
"infinite"
 generation number and walk until reaching commits with known generation
 number.
 
+We use the macro GENERATION_NUMBER_INFINITY = 0x to mark commits not
+in the commit-graph file. If a commit-graph file was written by a version
+of Git that did not compute generation numbers, then those commits will
+have generation number represented by the macro GENERATION_NUMBER_ZERO = 0.
+
+Since the commit-graph file is closed under reachability, we can guarantee
+the following weaker condition on all commits:
+
+If A and B are commits with generation numbers N amd M, respectively,
+and N < M, then A cannot reach B.
+
+Note how the strict inequality differs from the inequality when we have
+fully-computed generation numbers. Using strict inequality may result in
+walking a few extra commits, but the simplicity in dealing with commits
+with generation number *_INFINITY or *_ZERO is valuable.
+
+We use the macro GENERATION_NUMBER_MAX = 0x3FFF to for commits whose
+generation numbers are computed to be at least this value. We limit at
+this value since it is the largest value that can be stored in the
+commit-graph file using the 30 bits available to generation numbers. This
+presents another case where a commit can have generation number equal to
+that of a parent.
+
 Design Details
 --
 
@@ -98,17 +121,12 @@ Future Work
 - The 'commit-graph' subcommand does not have a "verify" mode that is
   necessary for integration with fsck.
 
-- The file format includes room for precomputed generation numbers. These
-  are not currently computed, so all generation numbers will be marked as
-  0 (or "uncomputed"). A later patch will include this calculation.
-
 - After computing and storing generation numbers, we must make graph
   walks aware of generation numbers to gain the performance benefits they
   enable. This will mostly be accomplished by swapping a commit-date-ordered
   priority queue with one ordered by generation number. The following
-  operations are important candidates:
+  operation is an important candidate:
 
-- paint_down_to_common()
 - 'log --topo-order'
 
 - Currently, parse_commit_gently() requires filling in the root tree
-- 
2.17.0.39.g685157f7fb



[PATCH v3 7/9] commit: add short-circuit to paint_down_to_common()

2018-04-17 Thread Derrick Stolee
When running 'git branch --contains', the in_merge_bases_many()
method calls paint_down_to_common() to discover if a specific
commit is reachable from a set of branches. Commits with lower
generation number are not needed to correctly answer the
containment query of in_merge_bases_many().

Add a new parameter, min_generation, to paint_down_to_common() that
prevents walking commits with generation number strictly less than
min_generation. If 0 is given, then there is no functional change.

For in_merge_bases_many(), we can pass commit->generation as the
cutoff, and this saves time during 'git branch --contains' queries
that would otherwise walk "around" the commit we are inspecting.

For a copy of the Linux repository, where HEAD is checked out at
v4.13~100, we get the following performance improvement for
'git branch --contains' over the previous commit:

Before: 0.21s
After:  0.13s
Rel %: -38%

Signed-off-by: Derrick Stolee 
---
 commit.c | 18 ++
 1 file changed, 14 insertions(+), 4 deletions(-)

diff --git a/commit.c b/commit.c
index bceb79c419..a70f120878 100644
--- a/commit.c
+++ b/commit.c
@@ -805,11 +805,14 @@ static int queue_has_nonstale(struct prio_queue *queue)
 }
 
 /* all input commits in one and twos[] must have been parsed! */
-static struct commit_list *paint_down_to_common(struct commit *one, int n, 
struct commit **twos)
+static struct commit_list *paint_down_to_common(struct commit *one, int n,
+   struct commit **twos,
+   int min_generation)
 {
struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
struct commit_list *result = NULL;
int i;
+   uint32_t last_gen = GENERATION_NUMBER_INFINITY;
 
one->object.flags |= PARENT1;
if (!n) {
@@ -828,6 +831,13 @@ static struct commit_list *paint_down_to_common(struct 
commit *one, int n, struc
struct commit_list *parents;
int flags;
 
+   if (commit->generation > last_gen)
+   BUG("bad generation skip");
+   last_gen = commit->generation;
+
+   if (commit->generation < min_generation)
+   break;
+
flags = commit->object.flags & (PARENT1 | PARENT2 | STALE);
if (flags == (PARENT1 | PARENT2)) {
if (!(commit->object.flags & RESULT)) {
@@ -876,7 +886,7 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
return NULL;
}
 
-   list = paint_down_to_common(one, n, twos);
+   list = paint_down_to_common(one, n, twos, 0);
 
while (list) {
struct commit *commit = pop_commit();
@@ -943,7 +953,7 @@ static int remove_redundant(struct commit **array, int cnt)
filled_index[filled] = j;
work[filled++] = array[j];
}
-   common = paint_down_to_common(array[i], filled, work);
+   common = paint_down_to_common(array[i], filled, work, 0);
if (array[i]->object.flags & PARENT2)
redundant[i] = 1;
for (j = 0; j < filled; j++)
@@ -1067,7 +1077,7 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
if (commit->generation > min_generation)
return 0;
 
-   bases = paint_down_to_common(commit, nr_reference, reference);
+   bases = paint_down_to_common(commit, nr_reference, reference, 
commit->generation);
if (commit->object.flags & PARENT2)
ret = 1;
clear_commit_marks(commit, all_flags);
-- 
2.17.0.39.g685157f7fb



[PATCH v3 2/9] commit-graph: compute generation numbers

2018-04-17 Thread Derrick Stolee
While preparing commits to be written into a commit-graph file, compute
the generation numbers using a depth-first strategy.

The only commits that are walked in this depth-first search are those
without a precomputed generation number. Thus, computation time will be
relative to the number of new commits to the commit-graph file.

If a computed generation number would exceed GENERATION_NUMBER_MAX, then
use GENERATION_NUMBER_MAX instead.

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 46 ++
 1 file changed, 46 insertions(+)

diff --git a/commit-graph.c b/commit-graph.c
index 9ad21c3ffb..688d5b1801 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -439,6 +439,10 @@ static void write_graph_chunk_data(struct hashfile *f, int 
hash_len,
else
packedDate[0] = 0;
 
+   if ((*list)->generation != GENERATION_NUMBER_INFINITY) {
+   packedDate[0] |= htonl((*list)->generation << 2);
+   }
+
packedDate[1] = htonl((*list)->date);
hashwrite(f, packedDate, 8);
 
@@ -571,6 +575,46 @@ static void close_reachable(struct packed_oid_list *oids)
}
 }
 
+static void compute_generation_numbers(struct commit** commits,
+  int nr_commits)
+{
+   int i;
+   struct commit_list *list = NULL;
+
+   for (i = 0; i < nr_commits; i++) {
+   if (commits[i]->generation != GENERATION_NUMBER_INFINITY &&
+   commits[i]->generation != GENERATION_NUMBER_ZERO)
+   continue;
+
+   commit_list_insert(commits[i], );
+   while (list) {
+   struct commit *current = list->item;
+   struct commit_list *parent;
+   int all_parents_computed = 1;
+   uint32_t max_generation = 0;
+
+   for (parent = current->parents; parent; parent = 
parent->next) {
+   if (parent->item->generation == 
GENERATION_NUMBER_INFINITY ||
+   parent->item->generation == 
GENERATION_NUMBER_ZERO) {
+   all_parents_computed = 0;
+   commit_list_insert(parent->item, );
+   break;
+   } else if (parent->item->generation > 
max_generation) {
+   max_generation = 
parent->item->generation;
+   }
+   }
+
+   if (all_parents_computed) {
+   current->generation = max_generation + 1;
+   pop_commit();
+   }
+
+   if (current->generation > GENERATION_NUMBER_MAX)
+   current->generation = GENERATION_NUMBER_MAX;
+   }
+   }
+}
+
 void write_commit_graph(const char *obj_dir,
const char **pack_indexes,
int nr_packs,
@@ -694,6 +738,8 @@ void write_commit_graph(const char *obj_dir,
if (commits.nr >= GRAPH_PARENT_MISSING)
die(_("too many commits to write graph"));
 
+   compute_generation_numbers(commits.list, commits.nr);
+
graph_name = get_commit_graph_filename(obj_dir);
fd = hold_lock_file_for_update(, graph_name, 0);
 
-- 
2.17.0.39.g685157f7fb



[PATCH v3 8/9] commit-graph: always load commit-graph information

2018-04-17 Thread Derrick Stolee
Most code paths load commits using lookup_commit() and then
parse_commit(). In some cases, including some branch lookups, the commit
is parsed using parse_object_buffer() which side-steps parse_commit() in
favor of parse_commit_buffer().

With generation numbers in the commit-graph, we need to ensure that any
commit that exists in the commit-graph file has its generation number
loaded.

Create new load_commit_graph_info() method to fill in the information
for a commit that exists only in the commit-graph file. Call it from
parse_commit_buffer() after loading the other commit information from
the given buffer. Only fill this information when specified by the
'check_graph' parameter. This avoids duplicate work when we already
checked the graph in parse_commit_gently() or when simply checking the
buffer contents in check_commit().

Signed-off-by: Derrick Stolee 
---
 commit-graph.c | 51 --
 commit-graph.h |  8 
 commit.c   |  7 +--
 commit.h   |  2 +-
 object.c   |  2 +-
 sha1_file.c|  2 +-
 6 files changed, 49 insertions(+), 23 deletions(-)

diff --git a/commit-graph.c b/commit-graph.c
index 688d5b1801..21e853c21a 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -245,13 +245,19 @@ static struct commit_list **insert_parent_or_die(struct 
commit_graph *g,
return _list_insert(c, pptr)->next;
 }
 
+static void fill_commit_graph_info(struct commit *item, struct commit_graph 
*g, uint32_t pos)
+{
+   const unsigned char *commit_data = g->chunk_commit_data + 
GRAPH_DATA_WIDTH * pos;
+   item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+}
+
 static int fill_commit_in_graph(struct commit *item, struct commit_graph *g, 
uint32_t pos)
 {
uint32_t edge_value;
uint32_t *parent_data_ptr;
uint64_t date_low, date_high;
struct commit_list **pptr;
-   const unsigned char *commit_data = g->chunk_commit_data + (g->hash_len 
+ 16) * pos;
+   const unsigned char *commit_data = g->chunk_commit_data + 
GRAPH_DATA_WIDTH * pos;
 
item->object.parsed = 1;
item->graph_pos = pos;
@@ -292,31 +298,40 @@ static int fill_commit_in_graph(struct commit *item, 
struct commit_graph *g, uin
return 1;
 }
 
+static int find_commit_in_graph(struct commit *item, struct commit_graph *g, 
uint32_t *pos)
+{
+   if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
+   *pos = item->graph_pos;
+   return 1;
+   } else {
+   return bsearch_graph(commit_graph, &(item->object.oid), pos);
+   }
+}
+
 int parse_commit_in_graph(struct commit *item)
 {
+   uint32_t pos;
+
+   if (item->object.parsed)
+   return 0;
if (!core_commit_graph)
return 0;
-   if (item->object.parsed)
-   return 1;
-
prepare_commit_graph();
-   if (commit_graph) {
-   uint32_t pos;
-   int found;
-   if (item->graph_pos != COMMIT_NOT_FROM_GRAPH) {
-   pos = item->graph_pos;
-   found = 1;
-   } else {
-   found = bsearch_graph(commit_graph, 
&(item->object.oid), );
-   }
-
-   if (found)
-   return fill_commit_in_graph(item, commit_graph, pos);
-   }
-
+   if (commit_graph && find_commit_in_graph(item, commit_graph, ))
+   return fill_commit_in_graph(item, commit_graph, pos);
return 0;
 }
 
+void load_commit_graph_info(struct commit *item)
+{
+   uint32_t pos;
+   if (!core_commit_graph)
+   return;
+   prepare_commit_graph();
+   if (commit_graph && find_commit_in_graph(item, commit_graph, ))
+   fill_commit_graph_info(item, commit_graph, pos);
+}
+
 static struct tree *load_tree_for_commit(struct commit_graph *g, struct commit 
*c)
 {
struct object_id oid;
diff --git a/commit-graph.h b/commit-graph.h
index 260a468e73..96cccb10f3 100644
--- a/commit-graph.h
+++ b/commit-graph.h
@@ -17,6 +17,14 @@ char *get_commit_graph_filename(const char *obj_dir);
  */
 int parse_commit_in_graph(struct commit *item);
 
+/*
+ * It is possible that we loaded commit contents from the commit buffer,
+ * but we also want to ensure the commit-graph content is correctly
+ * checked and filled. Fill the graph_pos and generation members of
+ * the given commit.
+ */
+void load_commit_graph_info(struct commit *item);
+
 struct tree *get_commit_tree_in_graph(const struct commit *c);
 
 struct commit_graph {
diff --git a/commit.c b/commit.c
index a70f120878..9ef6f699bd 100644
--- a/commit.c
+++ b/commit.c
@@ -331,7 +331,7 @@ const void *detach_commit_buffer(struct commit *commit, 
unsigned long *sizep)
return ret;
 }
 
-int parse_commit_buffer(struct commit *item, const void *buffer, unsigned long 
size)
+int parse_commit_buffer(struct commit *item, const void *buffer, 

[PATCH v3 1/9] commit: add generation number to struct commmit

2018-04-17 Thread Derrick Stolee
The generation number of a commit is defined recursively as follows:

* If a commit A has no parents, then the generation number of A is one.
* If a commit A has parents, then the generation number of A is one
  more than the maximum generation number among the parents of A.

Add a uint32_t generation field to struct commit so we can pass this
information to revision walks. We use three special values to signal
the generation number is invalid:

GENERATION_NUMBER_INFINITY 0x
GENERATION_NUMBER_MAX 0x3FFF
GENERATION_NUMBER_ZERO 0

The first (_INFINITY) means the generation number has not been loaded or
computed. The second (_MAX) means the generation number is too large to
store in the commit-graph file. The third (_ZERO) means the generation
number was loaded from a commit graph file that was written by a version
of git that did not support generation numbers.

Signed-off-by: Derrick Stolee 
---
 alloc.c| 1 +
 commit-graph.c | 2 ++
 commit.h   | 4 
 3 files changed, 7 insertions(+)

diff --git a/alloc.c b/alloc.c
index cf4f8b61e1..e8ab14f4a1 100644
--- a/alloc.c
+++ b/alloc.c
@@ -94,6 +94,7 @@ void *alloc_commit_node(void)
c->object.type = OBJ_COMMIT;
c->index = alloc_commit_index();
c->graph_pos = COMMIT_NOT_FROM_GRAPH;
+   c->generation = GENERATION_NUMBER_INFINITY;
return c;
 }
 
diff --git a/commit-graph.c b/commit-graph.c
index 70fa1b25fd..9ad21c3ffb 100644
--- a/commit-graph.c
+++ b/commit-graph.c
@@ -262,6 +262,8 @@ static int fill_commit_in_graph(struct commit *item, struct 
commit_graph *g, uin
date_low = get_be32(commit_data + g->hash_len + 12);
item->date = (timestamp_t)((date_high << 32) | date_low);
 
+   item->generation = get_be32(commit_data + g->hash_len + 8) >> 2;
+
pptr = >parents;
 
edge_value = get_be32(commit_data + g->hash_len);
diff --git a/commit.h b/commit.h
index 23a3f364ed..aac3b8c56f 100644
--- a/commit.h
+++ b/commit.h
@@ -10,6 +10,9 @@
 #include "pretty.h"
 
 #define COMMIT_NOT_FROM_GRAPH 0x
+#define GENERATION_NUMBER_INFINITY 0x
+#define GENERATION_NUMBER_MAX 0x3FFF
+#define GENERATION_NUMBER_ZERO 0
 
 struct commit_list {
struct commit *item;
@@ -30,6 +33,7 @@ struct commit {
 */
struct tree *maybe_tree;
uint32_t graph_pos;
+   uint32_t generation;
 };
 
 extern int save_commit_buffer;
-- 
2.17.0.39.g685157f7fb



[PATCH v3 6/9] commit: use generation numbers for in_merge_bases()

2018-04-17 Thread Derrick Stolee
The containment algorithm for 'git branch --contains' is different
from that for 'git tag --contains' in that it uses is_descendant_of()
instead of contains_tag_algo(). The expensive portion of the branch
algorithm is computing merge bases.

When a commit-graph file exists with generation numbers computed,
we can avoid this merge-base calculation when the target commit has
a larger generation number than the target commits.

Performance tests were run on a copy of the Linux repository where
HEAD is contained in v4.13 but no earlier tag. Also, all tags were
copied to branches and 'git branch --contains' was tested:

Before: 60.0s
After:   0.4s
Rel %: -99.3%

Reported-by: Jeff King 
Signed-off-by: Derrick Stolee 
---
 commit.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index a44899c733..bceb79c419 100644
--- a/commit.c
+++ b/commit.c
@@ -1053,12 +1053,19 @@ int in_merge_bases_many(struct commit *commit, int 
nr_reference, struct commit *
 {
struct commit_list *bases;
int ret = 0, i;
+   uint32_t min_generation = GENERATION_NUMBER_INFINITY;
 
if (parse_commit(commit))
return ret;
-   for (i = 0; i < nr_reference; i++)
+   for (i = 0; i < nr_reference; i++) {
if (parse_commit(reference[i]))
return ret;
+   if (min_generation > reference[i]->generation)
+   min_generation = reference[i]->generation;
+   }
+
+   if (commit->generation > min_generation)
+   return 0;
 
bases = paint_down_to_common(commit, nr_reference, reference);
if (commit->object.flags & PARENT2)
-- 
2.17.0.39.g685157f7fb



[PATCH v3 0/9] Compute and consume generation numbers

2018-04-17 Thread Derrick Stolee
Thanks for all the help on v2. Here are a few changes between versions:

* Removed the constant-time check in queue_has_nonstale() due to the
  possibility of a performance hit and no evidence of a performance
  benefit in typical cases.

* Reordered the commits about loading commits from the commit-graph.
  This way it is easier to demonstrate the incorrect checks. On my
  machine, every commit compiles and the test suite passes, but patches
  6-8 have the bug that is fixed in patch 9 "merge: check config before
  loading commits".

* The interaction with parse_commit_in_graph() from parse_object() is
  replaced with a new 'check_graph' parameter in parse_commit_buffer().
  This allows us to fill in the graph_pos and generation values for
  commits that are parsed directly from a buffer. This keeps the existing
  behavior that a commit parsed this way should match its buffer.

* There was discussion about making GENERATION_NUMBER_MAX assignable by
  an environment variable so we could add tests that exercise the behavior
  of capping a generation at that value. Perhaps the code around this is
  simple enough that we do not need to add that complexity.

Thanks,
-Stolee

-- >8 --

This is the one of several "small" patches that follow the serialized
Git commit graph patch (ds/commit-graph) and lazy-loading trees
(ds/lazy-load-trees).

As described in Documentation/technical/commit-graph.txt, the generation
number of a commit is one more than the maximum generation number among
its parents (trivially, a commit with no parents has generation number
one). This section is expanded to describe the interaction with special
generation numbers GENERATION_NUMBER_INFINITY (commits not in the commit-graph
file) and *_ZERO (commits in a commit-graph file written before generation
numbers were implemented).

This series makes the computation of generation numbers part of the
commit-graph write process.

Finally, generation numbers are used to order commits in the priority
queue in paint_down_to_common(). This allows a short-circuit mechanism
to improve performance of `git branch --contains`.

Further, use generation numbers for 'git tag --contains), providing a
significant speedup (at least 95% for some cases).

A more substantial refactoring of revision.c is required before making
'git log --graph' use generation numbers effectively.

This patch series is build on ds/lazy-load-trees.

Derrick Stolee (9):
  commit: add generation number to struct commmit
  commit-graph: compute generation numbers
  commit: use generations in paint_down_to_common()
  commit-graph.txt: update design document
  ref-filter: use generation number for --contains
  commit: use generation numbers for in_merge_bases()
  commit: add short-circuit to paint_down_to_common()
  commit-graph: always load commit-graph information
  merge: check config before loading commits

 Documentation/technical/commit-graph.txt | 30 +--
 alloc.c  |  1 +
 builtin/merge.c  |  5 +-
 commit-graph.c   | 99 +++-
 commit-graph.h   |  8 ++
 commit.c | 54 +++--
 commit.h |  7 +-
 object.c |  2 +-
 ref-filter.c | 23 +-
 sha1_file.c  |  2 +-
 t/t5318-commit-graph.sh  |  9 +++
 11 files changed, 199 insertions(+), 41 deletions(-)


base-commit: 7b8a21dba1bce44d64bd86427d3d92437adc4707
-- 
2.17.0.39.g685157f7fb



[PATCH v3 3/9] commit: use generations in paint_down_to_common()

2018-04-17 Thread Derrick Stolee
Define compare_commits_by_gen_then_commit_date(), which uses generation
numbers as a primary comparison and commit date to break ties (or as a
comparison when both commits do not have computed generation numbers).

Since the commit-graph file is closed under reachability, we know that
all commits in the file have generation at most GENERATION_NUMBER_MAX
which is less than GENERATION_NUMBER_INFINITY.

This change does not affect the number of commits that are walked during
the execution of paint_down_to_common(), only the order that those
commits are inspected. In the case that commit dates violate topological
order (i.e. a parent is "newer" than a child), the previous code could
walk a commit twice: if a commit is reached with the PARENT1 bit, but
later is re-visited with the PARENT2 bit, then that PARENT2 bit must be
propagated to its parents. Using generation numbers avoids this extra
effort, even if it is somewhat rare.

Signed-off-by: Derrick Stolee 
---
 commit.c | 20 +++-
 commit.h |  1 +
 2 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/commit.c b/commit.c
index 711f674c18..a44899c733 100644
--- a/commit.c
+++ b/commit.c
@@ -640,6 +640,24 @@ static int compare_commits_by_author_date(const void *a_, 
const void *b_,
return 0;
 }
 
+int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, 
void *unused)
+{
+   const struct commit *a = a_, *b = b_;
+
+   /* newer commits first */
+   if (a->generation < b->generation)
+   return 1;
+   else if (a->generation > b->generation)
+   return -1;
+
+   /* use date as a heuristic when generataions are equal */
+   if (a->date < b->date)
+   return 1;
+   else if (a->date > b->date)
+   return -1;
+   return 0;
+}
+
 int compare_commits_by_commit_date(const void *a_, const void *b_, void 
*unused)
 {
const struct commit *a = a_, *b = b_;
@@ -789,7 +807,7 @@ static int queue_has_nonstale(struct prio_queue *queue)
 /* all input commits in one and twos[] must have been parsed! */
 static struct commit_list *paint_down_to_common(struct commit *one, int n, 
struct commit **twos)
 {
-   struct prio_queue queue = { compare_commits_by_commit_date };
+   struct prio_queue queue = { compare_commits_by_gen_then_commit_date };
struct commit_list *result = NULL;
int i;
 
diff --git a/commit.h b/commit.h
index aac3b8c56f..64436ff44e 100644
--- a/commit.h
+++ b/commit.h
@@ -341,6 +341,7 @@ extern int remove_signature(struct strbuf *buf);
 extern int check_commit_signature(const struct commit *commit, struct 
signature_check *sigc);
 
 int compare_commits_by_commit_date(const void *a_, const void *b_, void 
*unused);
+int compare_commits_by_gen_then_commit_date(const void *a_, const void *b_, 
void *unused);
 
 LAST_ARG_MUST_BE_NULL
 extern int run_commit_hook(int editor_is_used, const char *index_file, const 
char *name, ...);
-- 
2.17.0.39.g685157f7fb



Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-17 Thread Duy Nguyen
On Tue, Apr 17, 2018 at 06:24:41PM +0200, Duy Nguyen wrote:
> On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley  wrote:
> > From: "Duy Nguyen"  : Saturday, April 14, 2018 4:44 PM
> >
> >> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley 
> >> wrote:
> >>>
> >>> I'm only just catching up, but does/can this series also capture the
> >>> non-command guides that are available in git so that the 'git help -g'
> >>> can
> >>> begin to list them all?
> >>
> >>
> >> It currently does not. But I don't see why it should not. This should
> >> allow git.txt to list all the guides too, for people who skip "git
> >> help" and go hard core mode with "man git". Thanks for bringing this
> >> up.
> >> --
> >> Duy
> >>
> > Is that something I should add to my todo to add a 'guide' category etc.?
> 
> I added it too [1]. Not sure if you want anything more on top though.

The "anything more" that at least I had in mind was something like
this. Though I'm not sure if it's a good thing to replace a hand
crafted section with an automatedly generated one. This patch on top
combines the "SEE ALSO" and "FURTHER DOCUMENT" into one with most of
documents/guides are extracted from command-list.txt

-- 8< --
diff --git a/Documentation/Makefile b/Documentation/Makefile
index 6232143cb9..3e0ecd2e11 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -292,6 +292,7 @@ doc.dep : $(docdep_prereqs) $(wildcard *.txt) 
build-docdep.perl
 
 cmds_txt = cmds-ancillaryinterrogators.txt \
cmds-ancillarymanipulators.txt \
+   cmds-guide.txt \
cmds-mainporcelain.txt \
cmds-plumbinginterrogators.txt \
cmds-plumbingmanipulators.txt \
diff --git a/Documentation/cmd-list.perl b/Documentation/cmd-list.perl
index 5aa73cfe45..e158bd9b96 100755
--- a/Documentation/cmd-list.perl
+++ b/Documentation/cmd-list.perl
@@ -54,6 +54,7 @@ for (sort <>) {
 
 for my $cat (qw(ancillaryinterrogators
ancillarymanipulators
+   guide
mainporcelain
plumbinginterrogators
plumbingmanipulators
diff --git a/Documentation/git.txt b/Documentation/git.txt
index 4767860e72..d60d2ae0c7 100644
--- a/Documentation/git.txt
+++ b/Documentation/git.txt
@@ -808,29 +808,6 @@ The index is also capable of storing multiple entries 
(called "stages")
 for a given pathname.  These stages are used to hold the various
 unmerged version of a file when a merge is in progress.
 
-FURTHER DOCUMENTATION
--
-
-See the references in the "description" section to get started
-using Git.  The following is probably more detail than necessary
-for a first-time user.
-
-The link:user-manual.html#git-concepts[Git concepts chapter of the
-user-manual] and linkgit:gitcore-tutorial[7] both provide
-introductions to the underlying Git architecture.
-
-See linkgit:gitworkflows[7] for an overview of recommended workflows.
-
-See also the link:howto-index.html[howto] documents for some useful
-examples.
-
-The internals are documented in the
-link:technical/api-index.html[Git API documentation].
-
-Users migrating from CVS may also want to
-read linkgit:gitcvs-migration[7].
-
-
 Authors
 ---
 Git was started by Linus Torvalds, and is currently maintained by Junio
@@ -854,11 +831,16 @@ the Git Security mailing list 
.
 
 SEE ALSO
 
-linkgit:gittutorial[7], linkgit:gittutorial-2[7],
-linkgit:giteveryday[7], linkgit:gitcvs-migration[7],
-linkgit:gitglossary[7], linkgit:gitcore-tutorial[7],
-linkgit:gitcli[7], link:user-manual.html[The Git User's Manual],
-linkgit:gitworkflows[7]
+
+See the references in the "description" section to get started
+using Git.  The following is probably more detail than necessary
+for a first-time user.
+
+include::cmds-guide.txt[]
+
+See also the link:howto-index.html[howto] documents for some useful
+examples. The internals are documented in the
+link:technical/api-index.html[Git API documentation].
 
 GIT
 ---
diff --git a/command-list.txt b/command-list.txt
index 1835f1a928..f26b8acd52 100644
--- a/command-list.txt
+++ b/command-list.txt
@@ -150,10 +150,14 @@ git-whatchanged 
ancillaryinterrogators
 git-worktreemainporcelain
 git-write-tree  plumbingmanipulators
 gitattributes   guide
+gitcvs-migrationguide
+gitcli  guide
+gitcore-tutorialguide
 giteveryday guide
 gitglossary guide
 gitignore   guide
 gitmodules  guide
 gitrevisionsguide
 gittutorial guide
+gittutorial-2   guide
 gitworkflowsguide
-- 8< --


RE: [PATCH v4 0/3] Extract memory pool logic into reusable component

2018-04-17 Thread Jameson Miller
I think this version (V4) should reflect the latest round of feedback. Please 
let me know if there are any other questions or outstanding work to finish here.

I have submitted a patch series to have a second component use this memory pool 
[1].

Thank you

[1] https://public-inbox.org/git/20180417163400.3875-2-jam...@microsoft.com/

-Original Message-
From: Jameson Miller 
Sent: Wednesday, April 11, 2018 2:38 PM
To: git@vger.kernel.org
Cc: gits...@pobox.com; p...@peff.net; sunsh...@sunshineco.com; 
ram...@ramsayjones.plus.com; Jameson Miller 
Subject: [PATCH v4 0/3] Extract memory pool logic into reusable component

Thank you everyone for taking the time review and provide feedback on this 
patch series.

Changes from v3:

  - Based patch off of new commit, to resolve merge conflict.

  - Updated log message in 2/3 based on feedback.

  - Squashed patch from Ramsay Jones into 2/3 to fix warning from
sparse.

  - Updated variable names in 2/3 to reflect updated usage of
variable.

Jameson Miller (3):
  fast-import: rename mem_pool type to mp_block
  fast-import: introduce mem_pool type
  Move reusable parts of memory pool into its own file

 Makefile  |  1 +
 fast-import.c | 77 +--
 mem-pool.c| 55 ++
 mem-pool.h| 34 ++
 4 files changed, 106 insertions(+), 61 deletions(-)  create mode 100644 
mem-pool.c  create mode 100644 mem-pool.h

--
2.14.3



[PATCH v1 4/5] Allocate cache entries from memory pools

2018-04-17 Thread Jameson Miller
Improve performance of reading a large index by reducing the number of
malloc() calls. When reading an index with a large number of entries,
a portion of the time is dominated in malloc() calls. This can be
mitigated by allocating a single large block of memory up front into a
memory pool and have git hand out chunks of time.

This change moves the cache entry allocation to be on top of memory
pools.

Design:

The index_state struct will gain a notion of an associated memory_pool
from which cache_entry structs will be allocated from. When reading in
the index from disk, we have information on the number of entries and
their size, which can guide us in deciding how large our initial
memory allocation should be. When an index is discarded, the
associated memory_pool and cache entries from the memory pool will be
discarded as well. This means the lifetime of cache_entry structs are
tied to the lifetime of the index_state they were allocated for.

In the case of a Split Index, the following rules are followed. 1st,
some terminology is defined:

Terminology:
  - 'the_index': represents the logical view of the index

  - 'split_index': represents the "base" cache entries. Read from the
split index file.

'the_index' can reference a single split_index, as well as
cache_entries from the split_index. `the_index` will be discarded
before the `split_index` is.  This means that when we are allocating
cache_entries in the presence of a split index, we need to allocate
the entries from the `split_index`'s memory pool. This allows us to
follow the pattern that `the_index` can reference cache_entries from
the `split_index`, and that the cache_entries will not be freed while
they are still being referenced.
---
 cache.h   |  2 ++
 read-cache.c  | 95 +--
 split-index.c | 23 +--
 3 files changed, 95 insertions(+), 25 deletions(-)

diff --git a/cache.h b/cache.h
index eedf154815..7c0d2343c3 100644
--- a/cache.h
+++ b/cache.h
@@ -15,6 +15,7 @@
 #include "path.h"
 #include "sha1-array.h"
 #include "repository.h"
+#include "mem-pool.h"
 
 #include 
 typedef struct git_zstream {
@@ -328,6 +329,7 @@ struct index_state {
struct untracked_cache *untracked;
uint64_t fsmonitor_last_update;
struct ewah_bitmap *fsmonitor_dirty;
+   struct mem_pool *ce_mem_pool;
 };
 
 extern struct index_state the_index;
diff --git a/read-cache.c b/read-cache.c
index 04fa7e1bd0..67438bf375 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -46,6 +46,42 @@
 CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \
 SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | FSMONITOR_CHANGED)
 
+
+/*
+ * This is an estimate of the average pathname length in the index.  We use
+ * this for V4 index files to guess the un-deltafied size of the index in
+ * memory because of pathname deltafication.  This is not required for V2/V3
+ * index formats because their pathnames are not compressed.  If the initial
+ * amount of memory set aside is not sufficient, the mem pool will allocate
+ * extra memory.
+ */
+#define CACHE_ENTRY_AVG_PATH_LENGTH_ESTIMATE 80
+
+static inline struct cache_entry *mem_pool__ce_alloc(struct mem_pool 
*mem_pool, size_t len)
+{
+   return mem_pool_alloc(mem_pool, cache_entry_size(len));
+}
+
+static inline struct cache_entry *mem_pool__ce_calloc(struct mem_pool 
*mem_pool, size_t len)
+{
+   return mem_pool_calloc(mem_pool, 1, cache_entry_size(len));
+}
+
+static struct mem_pool *find_mem_pool(struct index_state *istate)
+{
+   struct mem_pool **pool_ptr;
+
+   if (istate->split_index && istate->split_index->base)
+   pool_ptr = >split_index->base->ce_mem_pool;
+   else
+   pool_ptr = >ce_mem_pool;
+
+   if (!*pool_ptr)
+   mem_pool_init(pool_ptr, 0);
+
+   return *pool_ptr;
+}
+
 struct index_state the_index;
 static const char *alternate_index_output;
 
@@ -746,7 +782,7 @@ int add_file_to_index(struct index_state *istate, const 
char *path, int flags)
 
 struct cache_entry *make_empty_index_cache_entry(struct index_state *istate, 
size_t len)
 {
-   return xcalloc(1, cache_entry_size(len));
+   return mem_pool__ce_calloc(find_mem_pool(istate), len);
 }
 
 struct cache_entry *make_empty_transient_cache_entry(size_t len)
@@ -1641,13 +1677,13 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file(), get_git_dir());
 }
 
-static struct cache_entry *cache_entry_from_ondisk(struct index_state *istate,
+static struct cache_entry *cache_entry_from_ondisk(struct mem_pool *mem_pool,
   struct ondisk_cache_entry 
*ondisk,
   unsigned int flags,
   const char *name,
   size_t len)
 {
-   struct cache_entry *ce = 

[PATCH v1 5/5] Add optional memory validations around cache_entry lifecyle

2018-04-17 Thread Jameson Miller
Add an option (controlled by an environment variable) perform extra
validations on mem_pool allocated cache entries. When set:

  1) Invalidate cache_entry memory when discarding cache_entry.

  2) When discarding index_state struct, verify that all cache_entries
 were allocated from expected mem_pool.

  3) When discarding mem_pools, invalidate mem_pool memory.

This should provide extra checks that mem_pools and their allocated
cache_entries are being used as expected.
---
 cache.h  |  7 +++
 git.c|  3 +++
 mem-pool.c   | 35 ++-
 mem-pool.h   |  2 ++
 read-cache.c | 44 
 5 files changed, 90 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index 7c0d2343c3..f8934d8113 100644
--- a/cache.h
+++ b/cache.h
@@ -369,6 +369,13 @@ void index_cache_entry_discard(struct cache_entry *ce);
  */
 void transient_cache_entry_discard(struct cache_entry *ce);
 
+/*
+ * Validate the cache entries in the index.  This is an internal
+ * consistency check that the cache_entry structs are allocated from
+ * the expected memory pool.
+ */
+void validate_cache_entries(const struct index_state *istate);
+
 #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS
 #define active_cache (the_index.cache)
 #define active_nr (the_index.cache_nr)
diff --git a/git.c b/git.c
index 3a89893712..16b6c1685b 100644
--- a/git.c
+++ b/git.c
@@ -347,7 +347,10 @@ static int run_builtin(struct cmd_struct *p, int argc, 
const char **argv)
 
trace_argv_printf(argv, "trace: built-in: git");
 
+   validate_cache_entries(_index);
status = p->fn(argc, argv, prefix);
+   validate_cache_entries(_index);
+
if (status)
return status;
 
diff --git a/mem-pool.c b/mem-pool.c
index 09fb78d093..a7e28934b0 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -60,20 +60,44 @@ void mem_pool_discard(struct mem_pool *mem_pool)
 {
int i;
struct mp_block *block, *block_to_free;
+   int invalidate_memory = should_validate_cache_entries();
+
for (block = mem_pool->mp_block; block;) {
block_to_free = block;
block = block->next_block;
+
+   if (invalidate_memory)
+   memset(block_to_free->space, 0xDD, ((char 
*)block_to_free->end) - ((char *)block_to_free->space));
+
free(block_to_free);
}
 
-   for (i = 0; i < mem_pool->nr; i++)
+   for (i = 0; i < mem_pool->nr; i++) {
+   if (invalidate_memory)
+   memset(mem_pool->custom[i], 0xDD, ((char 
*)mem_pool->custom_end[i]) - ((char *)mem_pool->custom[i]));
+
free(mem_pool->custom[i]);
+   }
 
free(mem_pool->custom);
free(mem_pool->custom_end);
free(mem_pool);
 }
 
+int should_validate_cache_entries(void)
+{
+   static int validate_index_cache_entries = -1;
+
+   if (validate_index_cache_entries < 0) {
+   if (getenv("GIT_TEST_VALIDATE_INDEX_CACHE_ENTRIES"))
+   validate_index_cache_entries = 1;
+   else
+   validate_index_cache_entries = 0;
+   }
+
+   return validate_index_cache_entries;
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
@@ -110,11 +134,20 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
 int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
 {
struct mp_block *p;
+   int i;
+
+   /* Check if memory is allocated in a block */
for (p = mem_pool->mp_block; p; p = p->next_block)
if ((mem >= ((void *)p->space)) &&
(mem < ((void *)p->end)))
return 1;
 
+   /* Check custom memory allocations */
+   for (i = 0; i < mem_pool->nr; i++)
+   if (mem >= mem_pool->custom[i] &&
+   mem < mem_pool->custom_end[i])
+   return 1;
+
return 0;
 }
 
diff --git a/mem-pool.h b/mem-pool.h
index 34df4fa709..b1f9a920ba 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -63,4 +63,6 @@ void mem_pool_combine(struct mem_pool *dst, struct mem_pool 
*src);
  */
 int mem_pool_contains(struct mem_pool *mem_pool, void *mem);
 
+int should_validate_cache_entries(void);
+
 #endif
diff --git a/read-cache.c b/read-cache.c
index 67438bf375..d2181a0334 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1290,6 +1290,7 @@ int add_index_entry(struct index_state *istate, struct 
cache_entry *ce, int opti
   istate->cache_nr - pos - 1);
set_index_entry(istate, pos, ce);
istate->cache_changed |= CE_ENTRY_ADDED;
+
return 0;
 }
 
@@ -2013,6 +2014,8 @@ int is_index_unborn(struct index_state *istate)
 
 int discard_index(struct index_state *istate)
 {
+   validate_cache_entries(istate);
+
resolve_undo_clear_index(istate);
istate->cache_nr = 0;

[PATCH v1 3/5] mem-pool: fill out functionality

2018-04-17 Thread Jameson Miller
Adds the following functionality to memory pools:

 - Lifecycle management functions (init, discard)

 - Test whether a memory location is part of the managed pool

 - Function to combine 2 pools

This also adds logic to track all memory allocations made by a memory
pool.

These functions will be used in a future commit in this commit series.

Signed-off-by: Jameson Miller 
---
 mem-pool.c | 103 ++---
 mem-pool.h |  32 +++
 2 files changed, 131 insertions(+), 4 deletions(-)

diff --git a/mem-pool.c b/mem-pool.c
index 389d7af447..09fb78d093 100644
--- a/mem-pool.c
+++ b/mem-pool.c
@@ -5,6 +5,8 @@
 #include "cache.h"
 #include "mem-pool.h"
 
+#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block);
+
 static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t 
block_alloc)
 {
struct mp_block *p;
@@ -19,6 +21,59 @@ static struct mp_block *mem_pool_alloc_block(struct mem_pool 
*mem_pool, size_t b
return p;
 }
 
+static void *mem_pool_alloc_custom(struct mem_pool *mem_pool, size_t 
block_alloc)
+{
+   char *p;
+   ALLOC_GROW(mem_pool->custom, mem_pool->nr + 1, mem_pool->alloc);
+   ALLOC_GROW(mem_pool->custom_end, mem_pool->nr_end + 1, 
mem_pool->alloc_end);
+
+   p = xmalloc(block_alloc);
+   mem_pool->custom[mem_pool->nr++] = p;
+   mem_pool->custom_end[mem_pool->nr_end++] = p + block_alloc;
+
+   mem_pool->pool_alloc += block_alloc;
+
+   return mem_pool->custom[mem_pool->nr - 1];
+}
+
+void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size)
+{
+   if (!(*mem_pool))
+   {
+   *mem_pool = xmalloc(sizeof(struct mem_pool));
+   (*mem_pool)->pool_alloc = 0;
+   (*mem_pool)->mp_block = NULL;
+   (*mem_pool)->block_alloc = BLOCK_GROWTH_SIZE;
+   (*mem_pool)->custom = NULL;
+   (*mem_pool)->nr = 0;
+   (*mem_pool)->alloc = 0;
+   (*mem_pool)->custom_end = NULL;
+   (*mem_pool)->nr_end = 0;
+   (*mem_pool)->alloc_end = 0;
+
+   if (initial_size > 0)
+   mem_pool_alloc_block(*mem_pool, initial_size);
+   }
+}
+
+void mem_pool_discard(struct mem_pool *mem_pool)
+{
+   int i;
+   struct mp_block *block, *block_to_free;
+   for (block = mem_pool->mp_block; block;) {
+   block_to_free = block;
+   block = block->next_block;
+   free(block_to_free);
+   }
+
+   for (i = 0; i < mem_pool->nr; i++)
+   free(mem_pool->custom[i]);
+
+   free(mem_pool->custom);
+   free(mem_pool->custom_end);
+   free(mem_pool);
+}
+
 void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
 {
struct mp_block *p;
@@ -33,10 +88,8 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len)
break;
 
if (!p) {
-   if (len >= (mem_pool->block_alloc / 2)) {
-   mem_pool->pool_alloc += len;
-   return xmalloc(len);
-   }
+   if (len >= (mem_pool->block_alloc / 2))
+   return mem_pool_alloc_custom(mem_pool, len);
 
p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc);
}
@@ -53,3 +106,45 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t 
count, size_t size)
memset(r, 0, len);
return r;
 }
+
+int mem_pool_contains(struct mem_pool *mem_pool, void *mem)
+{
+   struct mp_block *p;
+   for (p = mem_pool->mp_block; p; p = p->next_block)
+   if ((mem >= ((void *)p->space)) &&
+   (mem < ((void *)p->end)))
+   return 1;
+
+   return 0;
+}
+
+void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src)
+{
+   int i;
+   struct mp_block **tail = >mp_block;
+
+   /* Find pointer of dst's last block (if any) */
+   while (*tail)
+   tail = &(*tail)->next_block;
+
+   /* Append the blocks from src to dst */
+   *tail = src->mp_block;
+
+   ALLOC_GROW(dst->custom, dst->nr + src->nr, dst->alloc);
+   ALLOC_GROW(dst->custom_end, dst->nr_end + src->nr_end, dst->alloc_end);
+
+   for (i = 0; i < src->nr; i++) {
+   dst->custom[dst->nr++] = src->custom[i];
+   dst->custom_end[dst->nr_end++] = src->custom_end[i];
+   }
+
+   dst->pool_alloc += src->pool_alloc;
+   src->pool_alloc = 0;
+   src->mp_block = NULL;
+   src->custom = NULL;
+   src->nr = 0;
+   src->alloc = 0;
+   src->custom_end = NULL;
+   src->nr_end = 0;
+   src->alloc_end = 0;
+}
diff --git a/mem-pool.h b/mem-pool.h
index 829ad58ecf..34df4fa709 100644
--- a/mem-pool.h
+++ b/mem-pool.h
@@ -19,8 +19,27 @@ struct mem_pool {
 
/* The total amount of memory allocated by the pool. */
size_t pool_alloc;
+
+   /*

[PATCH v1 2/5] Add an API creating / discarding cache_entry structs

2018-04-17 Thread Jameson Miller
Add an API around managing the lifetime of cache_entry structs. Abstracting
memory management details behind an API will allow for alternative memory
management strategies without affecting all the call sites.  This commit does
not change how memory is allocated / freed. A later commit in this series will
allocate cache entries from memory pools as appropriate.

Motivation:
It has been observed that the time spent loading an index with a large
number of entries is partly dominated by malloc() calls. This
change is in preparation for using memory pools to reduce the number
of malloc() calls made when loading an index.

This API makes a distinction between cache entries that are intended for use
with a particular to an index and cache entries that are not. This enables us
to use the knowledge about how a cache entry will be used to make informed
decisions about how to handle the corresponding memory.
---
 apply.c|  26 ++--
 blame.c|   5 +--
 builtin/checkout.c |   8 ++--
 builtin/difftool.c |   8 ++--
 builtin/reset.c|   6 +--
 builtin/update-index.c |  26 ++--
 cache.h|  29 +-
 merge-recursive.c  |   2 +-
 read-cache.c   | 105 +++--
 resolve-undo.c |   6 ++-
 split-index.c  |   8 ++--
 tree.c |   4 +-
 unpack-trees.c |  27 -
 13 files changed, 166 insertions(+), 94 deletions(-)

diff --git a/apply.c b/apply.c
index 7e5792c996..47903f427b 100644
--- a/apply.c
+++ b/apply.c
@@ -4090,12 +4090,12 @@ static int build_fake_ancestor(struct apply_state 
*state, struct patch *list)
return error(_("sha1 information is lacking or useless "
   "(%s)."), name);
 
-   ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0);
+   ce = make_index_cache_entry(, patch->old_mode, oid.hash, 
name, 0, 0);
if (!ce)
-   return error(_("make_cache_entry failed for path '%s'"),
+   return error(_("make_index_cache_entry failed for path 
'%s'"),
 name);
if (add_index_entry(, ce, ADD_CACHE_OK_TO_ADD)) {
-   free(ce);
+   index_cache_entry_discard(ce);
return error(_("could not add %s to temporary index"),
 name);
}
@@ -4263,12 +4263,11 @@ static int add_index_file(struct apply_state *state,
struct stat st;
struct cache_entry *ce;
int namelen = strlen(path);
-   unsigned ce_size = cache_entry_size(namelen);
 
if (!state->update_index)
return 0;
 
-   ce = xcalloc(1, ce_size);
+   ce = make_empty_index_cache_entry(_index, namelen);
memcpy(ce->name, path, namelen);
ce->ce_mode = create_ce_mode(mode);
ce->ce_flags = create_ce_flags(0);
@@ -4278,13 +4277,13 @@ static int add_index_file(struct apply_state *state,
 
if (!skip_prefix(buf, "Subproject commit ", ) ||
get_oid_hex(s, >oid)) {
-   free(ce);
-  return error(_("corrupt patch for submodule %s"), path);
+   index_cache_entry_discard(ce);
+   return error(_("corrupt patch for submodule %s"), path);
}
} else {
if (!state->cached) {
if (lstat(path, ) < 0) {
-   free(ce);
+   index_cache_entry_discard(ce);
return error_errno(_("unable to stat newly "
 "created file '%s'"),
   path);
@@ -4292,13 +4291,13 @@ static int add_index_file(struct apply_state *state,
fill_stat_cache_info(ce, );
}
if (write_object_file(buf, size, blob_type, >oid) < 0) {
-   free(ce);
+   index_cache_entry_discard(ce);
return error(_("unable to create backing store "
   "for newly created file %s"), path);
}
}
if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD) < 0) {
-   free(ce);
+   index_cache_entry_discard(ce);
return error(_("unable to add cache entry for %s"), path);
}
 
@@ -4422,27 +4421,26 @@ static int add_conflicted_stages_file(struct 
apply_state *state,
   struct patch *patch)
 {
int stage, namelen;
-   unsigned ce_size, mode;
+   unsigned mode;
struct cache_entry *ce;
 
if (!state->update_index)
return 0;
namelen = strlen(patch->new_name);

[PATCH v1 1/5] read-cache: teach refresh_cache_entry to take istate

2018-04-17 Thread Jameson Miller
Refactoring dependencies of make_cache_entry to work on a specific
index, instead of implicitly using the_index. This is in preparation
for making the make_cache_entry function work on a specific index.
---
 cache.h   | 2 +-
 merge-recursive.c | 2 +-
 read-cache.c  | 7 ---
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/cache.h b/cache.h
index bbaf5c349a..e50a847aea 100644
--- a/cache.h
+++ b/cache.h
@@ -743,7 +743,7 @@ extern void fill_stat_cache_info(struct cache_entry *ce, 
struct stat *st);
 #define REFRESH_IGNORE_SUBMODULES  0x0010  /* ignore submodules */
 #define REFRESH_IN_PORCELAIN   0x0020  /* user friendly output, not "needs 
update" */
 extern int refresh_index(struct index_state *, unsigned int flags, const 
struct pathspec *pathspec, char *seen, const char *header_msg);
-extern struct cache_entry *refresh_cache_entry(struct cache_entry *, unsigned 
int);
+extern struct cache_entry *refresh_cache_entry(struct index_state *, struct 
cache_entry *, unsigned int);
 
 /*
  * Opportunistically update the index but do not complain if we can't.
diff --git a/merge-recursive.c b/merge-recursive.c
index 0c0d48624d..693f60e0a3 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -260,7 +260,7 @@ static int add_cacheinfo(struct merge_options *o,
if (refresh) {
struct cache_entry *nce;
 
-   nce = refresh_cache_entry(ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
+   nce = refresh_cache_entry(_index, ce, CE_MATCH_REFRESH | 
CE_MATCH_IGNORE_MISSING);
if (!nce)
return err(o, _("addinfo_cache failed for path '%s'"), 
path);
if (nce != ce)
diff --git a/read-cache.c b/read-cache.c
index 10f1c6bb8a..2cb4f53b57 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -767,7 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode,
ce->ce_namelen = len;
ce->ce_mode = create_ce_mode(mode);
 
-   ret = refresh_cache_entry(ce, refresh_options);
+   ret = refresh_cache_entry(_index, ce, refresh_options);
if (ret != ce)
free(ce);
return ret;
@@ -1448,10 +1448,11 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
return has_errors;
 }
 
-struct cache_entry *refresh_cache_entry(struct cache_entry *ce,
+struct cache_entry *refresh_cache_entry(struct index_state *istate,
+  struct cache_entry *ce,
   unsigned int options)
 {
-   return refresh_cache_ent(_index, ce, options, NULL, NULL);
+   return refresh_cache_ent(istate, ce, options, NULL, NULL);
 }
 
 
-- 
2.14.3



[PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-17 Thread Jameson Miller
This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. Loading the index from disk is
partly dominated by the time in malloc(), which is called for each
index entry. This patch series reduces the number of times malloc() is
called as part of loading the index, and instead allocates a block of
memory upfront that is large enough to hold all of the cache entries,
and chunks this memory itself. This change builds on [1], which is a
prerequisite for this change.

Git previously allocated block of memory for the index cache entries
until [2].

This 5 part patch series is broken up as follows:

  1/5, 2/5 - Move cache entry lifecycle methods behind an API

  3/5 - Fill out memory pool API to include lifecycle and other
methods used in later patches

  4/5 - Allocate cache entry structs from memory pool

  5/5 - Add extra optional validation

Performance Benchmarks:

To evaluate the performance of this approach, the p0002-read-cache.sh
test was run with several combinations of allocators (glibc default,
tcmalloc, jemalloc), with and without block allocation, and across
several different index sized (100K, 1M, 2M entries). The details on
how these repositories were constructed can be found in [3].The
p0002-read-cache.sh was run with the iteration count set to 1 and
$GIT_PERF_REPEAT_COUNT=10.

The tests were run with iteration count set to 1 because this best
approximates the real behavior. The read_cache/discard_cache test will
load / free the index N times, and the performance of this logic is
different between N = 1 and N > 1. As the production code does not
read / discard the index in a loop, a better approximation is when N =
1.

100K

Test   baseline [4]   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)0.02(0.01+0.01) 
-33.3%

1M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)0.17(0.07+0.09) 
-26.1%

2M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)0.39(0.17+0.20) 
-13.3%


100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries.

For completeness, here is the p0002-read-cache tests for git.git and
linux.git:

git.git:

Test  baseline [4] block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   0.30(0.26+0.03)   0.17(0.13+0.03) 
-43.3%

linux.git:

Test  baseline  block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   7.05(6.01+0.84)   4.61(3.74+0.66) 
-34.6% 


We also investigated the performance of just using different
allocators. We can see that there is not a consistent performance
gain.

100K

Test   baseline [4]  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)   0.03(0.01+0.01) 
+0.0% 0.03(0.02+0.01) +0.0% 

1M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)   0.21(0.10+0.10) 
-8.7% 0.27(0.16+0.10) +17.4%

2M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)   0.46(0.25+0.21) 
+2.2% 0.57(0.36+0.21) +26.7%



[1] https://public-inbox.org/git/20180321164152.204869-1-jam...@microsoft.com/

[2] debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24)

[3] Constructing test repositories:

The test repositories were constructed with t/perf/repos/many_files.sh with the 
following parameters:

100K:many-files.sh 4 10 9
1M:  many-files.sh 5 10 9
2M:  many-files.sh 6 8 7

[4] baseline commit: 8b026eda Revert "Merge branch 
'en/rename-directory-detection'"

Jameson Miller (5):
  read-cache: teach refresh_cache_entry to take istate
  Add an API creating / discarding cache_entry structs

[PATCH v1 0/5] Allocate cache entries from memory pool

2018-04-17 Thread Jameson Miller
This patch series improves the performance of loading indexes by
reducing the number of malloc() calls. Loading the index from disk is
partly dominated by the time in malloc(), which is called for each
index entry. This patch series reduces the number of times malloc() is
called as part of loading the index, and instead allocates a block of
memory upfront that is large enough to hold all of the cache entries,
and chunks this memory itself. This change builds on [1], which is a
prerequisite for this change.

Git previously allocated block of memory for the index cache entries
until [2].

This 5 part patch series is broken up as follows:

  1/5, 2/5 - Move cache entry lifecycle methods behind an API

  3/5 - Fill out memory pool API to include lifecycle and other
methods used in later patches

  4/5 - Allocate cache entry structs from memory pool

  5/5 - Add extra optional validation

Performance Benchmarks:

To evaluate the performance of this approach, the p0002-read-cache.sh
test was run with several combinations of allocators (glibc default,
tcmalloc, jemalloc), with and without block allocation, and across
several different index sized (100K, 1M, 2M entries). The details on
how these repositories were constructed can be found in [3].The
p0002-read-cache.sh was run with the iteration count set to 1 and
$GIT_PERF_REPEAT_COUNT=10.

The tests were run with iteration count set to 1 because this best
approximates the real behavior. The read_cache/discard_cache test will
load / free the index N times, and the performance of this logic is
different between N = 1 and N > 1. As the production code does not
read / discard the index in a loop, a better approximation is when N =
1.

100K

Test   baseline [4]   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)0.02(0.01+0.01) 
-33.3%

1M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)0.17(0.07+0.09) 
-26.1%

2M:

Test   baseline   block_allocation
 

0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)0.39(0.17+0.20) 
-13.3%


100K is not a large enough sample size to show the perf impact of this
change, but we can see a perf improvement with 1M and 2M entries. For
completeness, here is the p0002-read-cache tests for git.git and
linux.git:

git.git:

Test  baseline  block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   0.30(0.26+0.03)   0.17(0.13+0.03) 
-43.3%

linux.git:

Test  baseline  block_allocation
 
-
0002.1: read_cache/discard_cache 1000 times   7.05(6.01+0.84)   4.61(3.74+0.66) 
-34.6% 


We also investigated the performance of just using different
allocators. We can see that there is not a consistent performance
gain.

100K

Test   baseline [4]  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.03(0.01+0.01)   0.03(0.01+0.01) 
+0.0% 0.03(0.02+0.01) +0.0% 

1M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.23(0.12+0.11)   0.21(0.10+0.10) 
-8.7% 0.27(0.16+0.10) +17.4%

2M:

Test   baseline  tcmalloc   
   jemalloc
 
--
0002.1: read_cache/discard_cache 1 times   0.45(0.26+0.19)   0.46(0.25+0.21) 
+2.2% 0.57(0.36+0.21) +26.7%


[1] https://public-inbox.org/git/20180321164152.204869-1-jam...@microsoft.com/

[2] debed2a629 (read-cache.c: allocate index entries individually - 2011-10-24)

[3] Constructing test repositories:

The test repositories were constructed with t/perf/repos/many_files.sh with the 
following parameters:

100K:many-files.sh 4 10 9
1M:  many-files.sh 5 10 9
2M:  many-files.sh 6 8 7

[4] baseline commit: 8b026eda Revert "Merge branch 
'en/rename-directory-detection'"

Jameson Miller (5):
  read-cache: teach refresh_cache_entry to take istate
  Add an API creating / discarding cache_entry structs
 

Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-17 Thread Duy Nguyen
On Sun, Apr 15, 2018 at 11:21 PM, Philip Oakley  wrote:
> From: "Duy Nguyen"  : Saturday, April 14, 2018 4:44 PM
>
>> On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley 
>> wrote:
>>>
>>> I'm only just catching up, but does/can this series also capture the
>>> non-command guides that are available in git so that the 'git help -g'
>>> can
>>> begin to list them all?
>>
>>
>> It currently does not. But I don't see why it should not. This should
>> allow git.txt to list all the guides too, for people who skip "git
>> help" and go hard core mode with "man git". Thanks for bringing this
>> up.
>> --
>> Duy
>>
> Is that something I should add to my todo to add a 'guide' category etc.?

I added it too [1]. Not sure if you want anything more on top though.

[1] https://public-inbox.org/git/20180415164238.9107-7-pclo...@gmail.com/T/#u
-- 
Duy


[PATCH 1/1] git-p4: fix `sync --verbose` traceback due to 'fileSize'

2018-04-17 Thread Andrey Mazo
Perforce server 2007.2 (and maybe others) doesn't return "fileSize"
attribute in its reply to `p4 -G print` command.
This causes the following traceback when running `git p4 sync --verbose`:
"""
Traceback (most recent call last):
  File "/usr/libexec/git-core/git-p4", line 3839, in 
main()
  File "/usr/libexec/git-core/git-p4", line 3833, in main
if not cmd.run(args):
  File "/usr/libexec/git-core/git-p4", line 3567, in run
self.importChanges(changes)
  File "/usr/libexec/git-core/git-p4", line 3233, in importChanges
self.commit(description, filesForCommit, branch, parent)
  File "/usr/libexec/git-core/git-p4", line 2855, in commit
self.streamP4Files(files)
  File "/usr/libexec/git-core/git-p4", line 2747, in streamP4Files
cb=streamP4FilesCbSelf)
  File "/usr/libexec/git-core/git-p4", line 552, in p4CmdList
cb(entry)
  File "/usr/libexec/git-core/git-p4", line 2741, in streamP4FilesCbSelf
self.streamP4FilesCb(entry)
  File "/usr/libexec/git-core/git-p4", line 2689, in streamP4FilesCb
self.streamOneP4File(self.stream_file, self.stream_contents)
  File "/usr/libexec/git-core/git-p4", line 2566, in streamOneP4File
size = int(self.stream_file['fileSize'])
KeyError: 'fileSize'
"""

Fix this by omitting the file size information from the verbose print out.
Also, don't use "self.stream_file" directly,
but rather use passed in "file" argument.
(which point to the same "self.stream_file" for all existing callers)

Signed-off-by: Andrey Mazo 
---
 git-p4.py | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 7bb9cadc6..6f05f915a 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -2566,8 +2566,12 @@ class P4Sync(Command, P4UserMap):
 relPath = self.stripRepoPath(file['depotFile'], self.branchPrefixes)
 relPath = self.encodeWithUTF8(relPath)
 if verbose:
-size = int(self.stream_file['fileSize'])
-sys.stdout.write('\r%s --> %s (%i MB)\n' % (file['depotFile'], 
relPath, size/1024/1024))
+size = file.get('fileSize', None)
+if size is None:
+sizeStr = ''
+else:
+sizeStr = ' (%i MB)' % (int(size)/1024/1024)
+sys.stdout.write('\r%s --> %s%s\n' % (file['depotFile'], relPath, 
sizeStr))
 sys.stdout.flush()
 
 (type_base, type_mods) = split_p4_type(file["type"])
-- 
2.16.1



Re: [BUG] git p4 clone fails when p4 sizes does not return 'fileSize' key

2018-04-17 Thread Andrey Mazo
Huh, I actually have a slightly different fix for the same issue.
It doesn't suppress the corresponding verbose output completely, but just 
removes the size information from it.

Also, I'd mention that the workaround is trivial -- simply omit the "--verbose" 
option.

Andrey Mazo (1):
  git-p4: fix `sync --verbose` traceback due to 'fileSize'

 git-p4.py | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)


base-commit: 468165c1d8a442994a825f3684528361727cd8c0
-- 
2.16.1



  1   2   >