Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)

2018-10-29 Thread Kaartic Sivaraam
Hi Junio,


On 30 October 2018 11:07:39 GMT+05:30, Junio C Hamano  wrote:
>Eric Sunshine  writes:
>>> On platforms with recent cURL library, http.sslBackend
>configuration
>>> variable can be used to choose different SSL backend at runtime.
>>
>> s/choose/& a/
>>
>>> The Windows port uses this mechanism to switch between OpenSSL
>and
>>> Secure Channel while talking over the HTTPS protocol.
>
>Thanks.
>
>By the way, I am beginning to like this phrasing quite a lot.  It
>encourages those with other ports like Linux and Mac to see if they
>want a similar runtime switching by singling out Windows port, which
>was what I wanted to achive with the original one, but I think the
>updated text does so more effectively.

The new phrasing seems to be reading quite better indeed.

-- 
Sivaraam

Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)

2018-10-29 Thread Junio C Hamano
Eric Sunshine  writes:

> On Mon, Oct 29, 2018 at 10:10 PM Junio C Hamano  wrote:
>> How's this?
>>
>> On platforms with recent cURL library, http.sslBackend configuration
>> variable can be used to choose different SSL backend at runtime.
>
> s/choose/& a/
>
>> The Windows port uses this mechanism to switch between OpenSSL and
>> Secure Channel while talking over the HTTPS protocol.

Thanks.

By the way, I am beginning to like this phrasing quite a lot.  It
encourages those with other ports like Linux and Mac to see if they
want a similar runtime switching by singling out Windows port, which
was what I wanted to achive with the original one, but I think the
updated text does so more effectively.


Re: [RFC] Generation Number v2

2018-10-29 Thread Junio C Hamano
Derrick Stolee  writes:

> **V3: Corrected Commit Date.**
> For a commit C, let its _corrected commit date_ (denoted by cdate(C))
> be the maximum of the commit date of C and the commit dates of its
> parents.

"maximum of the commit date of C and the corrected commit dates of
its parents"?

We've talked about exactly this one in the past (long before any of
Microsoft folks other than Dscho came to the scene) but without an
implementation, or detailed design and analysis.  I am very happy to
see the idea among other possibilities to be considered again.  This
time around, we may finally come up with something better than the
"commit dates with SLOP" thing ;-).

> Essentially, the felineY order is selected with the goal of swapping
> positions of topologically-independent commits relative to the felinX
> ordering. The resulting reachability index is as follows:
>
>If felineX(A) < felineY(B), then A cannot reach B.
>If felineY(A) < felineY(B), then A cannot reach B.

I presume that the first line is a typo and you compare the same X index?

> * **Compatible?** In our test implementation, we use a previously unused
>   byte of data in the commit-graph format to indicate which reachability
>   index version we are using. Existing clients ignore this value, so we
>   will want to consider if these new indexes are _backwards compatible_.
>   That is, will they still report correct values if they ignore this byte
>   and use the generation number column from the commit-graph file assuming
>   the values are minimum generation numbers?

I personally consider that the current commit-graph with generation
numbers experimental, so I am not sure how much we care about this.

Having said that.

By the above definition, any new index that is wider than the
current generation number cannot be compatible (can we even tell the
existing clients how wide each elements in the ix array is?)

In any case, perhaps the first thing to do is to update the clients
so that they stop ignoring the version number field, and instead
work without generation number when there is no version of reach.ix
available in the file?  That way, a better reachablility index can
be chosen freely without having to worry about the compatibility.

> * **Immutable?** Git objects are _immutable_. If you change an object you
>   actually create a new object with a new object ID. Are the values we
> store
>   for these reachability indexes also immutable?

Even if we do not embed the reachability ix in commit objects,
having an immutable value is probably a must if we want to make them
incrementally computable, so this is a very good property to have.
Unless there is a clever idea to incrementally compute a mutable
reach.ix, my gut instinct says that this property is a must.

Another thing, perhaps related to "local" below, is if exactly the
same reach.ix is computed by anybody, given an identical commit
history graph (perhaps "reproducibility"?).  I think most of the
candidates you listed are reproducible without a fixed tie-breaker,
but I am not sure about felineY() thing.

> * **Local?** Are these values **locally computable**? That is, do we only
>   need to look at the parents of a commit (assuming those parents have
>   computed values) in order to determine the value at that commit?

A subset of non-local reachability ix, for example, the ones that
need to know what each commit's children are, cannot be immutable,
as adding new objects to the graph (either with locally committing,
or transferring objects from other repositories) would affect the
ix; is this true for all non-local reachability ix, I wonder?

> We focused on three types of performance tests that test the indexes
> in different ways. Each test lists the `git` command that is used,
> and the table lists which repository is used and which inputs.
>
> ### Test 1: `git log --topo-order -N`
>
> This test focuses on the number of commits that are parsed during
> a `git log --topo-order` before writing `N` commits to output.

A devil's advocate comment.  Your patches seem to be very focused on
this "unlimited" case for the past few weeks, but I am not so sure
if that is a case worth optimizing for.  If "git log --topo-order -N
HEAD~M.." (for some number M) gives us a similar result as unlimited
case but with much less effort, wouldn't it be good enough that lets
us concentrate on other use cases instead?

> Based on the performance results alone, we should remove minimum
> generation numbers, (epoch, date) pairs, and FELINE index from
> consideration. There are enough examples of these indexes performing
> poorly.

OK.

> In contrast, maximum generation numbers and corrected commit
> dates both performed quite well. They are frequently the top
> two performing indexes, and rarely significantly different.
>
> The trade-off here now seems to be: which _property_ is more important,
> locally-computable or backwards-compatible?

Nice summary.  

As I already said, I personally do not think 

Re: [PATCH] completion: use builtin completion for format-patch

2018-10-29 Thread Denton Liu
On Tue, Oct 30, 2018 at 11:20:45AM +0900, Junio C Hamano wrote:
> We saw a similar change proposed and then found out it was not such
> a good idea in:
> 
> https://public-inbox.org/git/cacsjy8durvju0hn7kuceo4iv5aimwbytr+e-7kenpvdx90d...@mail.gmail.com/
> 
> It seems that this one loses options like --full-index, --no-prefix,
> etc. compared to the earlier effort?

In _git_send_email, we have the following lines:

__gitcomp "--annotate --bcc --cc --cc-cmd --chain-reply-to
... more options ...
$__git_format_patch_options"

Would it make sense to take the old `__git_format_patch_options` and
just roll them into here, then make `_git_format_patch` use
`__gitcomp_builtin format-patch`? That way, we'd be able to reap the
benefits of using `__gitcomp_builtin` where we can.


Re: [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking

2018-10-29 Thread Geert Jansen
On Sun, Oct 28, 2018 at 10:50:19PM +, Ævar Arnfjörð Bjarmason wrote:

> I left the door open for that in the new config option 4/4 implements,
> but I suspect for Geert's purposes this is something he'd prefer to
> turn off in git on clone entirely, i.e. because it may be running on
> some random Amazon's customer's EFS instance, and they won't know
> about this new core.checkCollisions option.
> 
> But maybe I'm wrong about that and Geert is happy to just turn on
> core.checkCollisions=false and use this series instead.

I think that the best user experience would probably be if git were fast by
default without having to give up on (future) security by removing the sha1
collision check.  Maybe core.checkCollisons could default to "on" only when
there's no loose objects in the repository? That would give a fast experience
for many common cases (git clone, git init && git fetch) while still doing the
collision check when relevant.

My patch used the --cloning flag as an approximation of "no loose objects".
Maybe a better option would be to check for the non-existence of the [00-ff]
directories under .git/objects.


Re: Lost changes after merge

2018-10-29 Thread Gray King
Sorry, seems the link has been expired, here is the new one:


* Before merge run `git log --format="%h %p %d" -n 20 --all --graph`:

https://cfp.vim-cn.com/cbfq6

* After merged run `git log --format="%h %p %d" -n 20 --all --graph`:

https://cfp.vim-cn.com/cbfq7

在 2018年10月29日 下午10:18:07, Jeff King (p...@peff.net(mailto:p...@peff.net)) 写到:

> On Mon, Oct 29, 2018 at 09:49:20AM +0100, Gray King wrote:
>
> > Hello,
> >
> > I have a very strange issue described below:
> >
> > * Here is the tree before I merge via `git log --format="%h %p %d" -n
> > 20 --all --graph`:
> >
> > https://upaste.de/9Pe
>
> FWIW, neither this nor the other paste link in your email seem to work
> for me (which makes it hard to comment on the rest of the email).
>
> -Peff


Re: [PATCH v1] speed up refresh_index() by utilizing preload_index()

2018-10-29 Thread Junio C Hamano
Ben Peart  writes:

> From: Ben Peart 
>
> Speed up refresh_index() by utilizing preload_index() to do most of the work
> spread across multiple threads.  This works because most cache entries will
> get marked CE_UPTODATE so that refresh_cache_ent() can bail out early when
> called from within refresh_index().
>
> On a Windows repo with ~200K files, this drops refresh times from 6.64
> seconds to 2.87 seconds for a savings of 57%.
>
> Signed-off-by: Ben Peart 
> ---

OK.  We used to only expose the whole "read the index file into an
istate, and then do the lstat() part in parallel", but now we also
make the "do the lstat() part" available separately.

Which makes sense.


> diff --git a/cache.h b/cache.h
> index f7fabdde8f..883099db08 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -659,6 +659,9 @@ extern int daemonize(void);
>  /* Initialize and use the cache information */
>  struct lock_file;
>  extern int read_index(struct index_state *);
> +extern void preload_index(struct index_state *index,
> +   const struct pathspec *pathspec,
> +   unsigned int refresh_flags);
>  extern int read_index_preload(struct index_state *,
> const struct pathspec *pathspec,
> unsigned int refresh_flags);
> diff --git a/preload-index.c b/preload-index.c
> index 9e7152ab14..222792ccbc 100644
> --- a/preload-index.c
> +++ b/preload-index.c
> @@ -9,7 +9,7 @@
>  #include "progress.h"
>  
>  #ifdef NO_PTHREADS
> -static void preload_index(struct index_state *index,
> +void preload_index(struct index_state *index,
> const struct pathspec *pathspec,
> unsigned int refresh_flags)
>  {
> @@ -100,9 +100,9 @@ static void *preload_thread(void *_data)
>   return NULL;
>  }
>  
> -static void preload_index(struct index_state *index,
> -   const struct pathspec *pathspec,
> -   unsigned int refresh_flags)
> +void preload_index(struct index_state *index,
> +const struct pathspec *pathspec,
> +unsigned int refresh_flags)
>  {
>   int threads, i, work, offset;
>   struct thread_data data[MAX_PARALLEL];
> diff --git a/read-cache.c b/read-cache.c
> index d57958233e..53733d651d 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1496,6 +1496,12 @@ int refresh_index(struct index_state *istate, unsigned 
> int flags,
>   typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");
>   added_fmt = (in_porcelain ? "A\t%s\n" : "%s needs update\n");
>   unmerged_fmt = (in_porcelain ? "U\t%s\n" : "%s: needs merge\n");
> + /*
> +  * Use the multi-threaded preload_index() to refresh most of the
> +  * cache entries quickly then in the single threaded loop below,
> +  * we only have to do the special cases that are left.
> +  */
> + preload_index(istate, pathspec, 0);
>   for (i = 0; i < istate->cache_nr; i++) {
>   struct cache_entry *ce, *new_entry;
>   int cache_errno = 0;
>
> base-commit: c670b1f876521c9f7cd40184bf7ed05aad843433


Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)

2018-10-29 Thread Eric Sunshine
On Mon, Oct 29, 2018 at 10:10 PM Junio C Hamano  wrote:
> How's this?
>
> On platforms with recent cURL library, http.sslBackend configuration
> variable can be used to choose different SSL backend at runtime.

s/choose/& a/

> The Windows port uses this mechanism to switch between OpenSSL and
> Secure Channel while talking over the HTTPS protocol.


Re: [PATCH] completion: use builtin completion for format-patch

2018-10-29 Thread Junio C Hamano
Denton Liu  writes:

> Signed-off-by: Denton Liu 
> ---
>  contrib/completion/git-completion.bash | 11 +--
>  1 file changed, 1 insertion(+), 10 deletions(-)

We saw a similar change proposed and then found out it was not such
a good idea in:

https://public-inbox.org/git/cacsjy8durvju0hn7kuceo4iv5aimwbytr+e-7kenpvdx90d...@mail.gmail.com/

It seems that this one loses options like --full-index, --no-prefix,
etc. compared to the earlier effort?

> diff --git a/contrib/completion/git-completion.bash 
> b/contrib/completion/git-completion.bash
> index d63d2dffd..da77da481 100644
> --- a/contrib/completion/git-completion.bash
> +++ b/contrib/completion/git-completion.bash
> @@ -1531,15 +1531,6 @@ _git_fetch ()
>   __git_complete_remote_or_refspec
>  }
>  
> -__git_format_patch_options="
> - --stdout --attach --no-attach --thread --thread= --no-thread
> - --numbered --start-number --numbered-files --keep-subject --signoff
> - --signature --no-signature --in-reply-to= --cc= --full-index --binary
> - --not --all --cover-letter --no-prefix --src-prefix= --dst-prefix=
> - --inline --suffix= --ignore-if-in-upstream --subject-prefix=
> - --output-directory --reroll-count --to= --quiet --notes
> -"
> -
>  _git_format_patch ()
>  {
>   case "$cur" in
> @@ -1550,7 +1541,7 @@ _git_format_patch ()
>   return
>   ;;
>   --*)
> - __gitcomp "$__git_format_patch_options"
> + __gitcomp_builtin format-patch
>   return
>   ;;
>   esac


Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)

2018-10-29 Thread Junio C Hamano
Johannes Schindelin  writes:

>> >>  On Windows with recent enough cURL library, the configuration
>> >>  variable http.sslBackend can be used to choose between OpenSSL and
>> >>  Secure Channel at runtime as the SSL backend while talking over
>> >>  the HTTPS protocol.
>> ... 
>> Yeah, but "http.sslBackend can be used to choose betnween OpenSSL
>> and Scure Channel" applies only to Windows ...
> ...
>> > The two patches on top are Windows-only, of course, as they really apply
>> > only to the Secure Channel backend (which *is* Windows-only).
>> 
>> Yes, that is why the summary for the topic as a whole focuses on
>> Windows, as that is the primary audience who would benefit from the
>> topic.

How's this?

On platforms with recent cURL library, http.sslBackend configuration
variable can be used to choose different SSL backend at runtime.
The Windows port uses this mechanism to switch between OpenSSL and
Secure Channel while talking over the HTTPS protocol.



Re: [PATCH 09/10] read-cache.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Junio C Hamano
Duy Nguyen  writes:

>> This shouldn't be needed either.  My assumption was that if someone
>> explicitly asked for multiple threads we're better off failing than
>> silently ignoring their request.  The default/automatic setting will
>> detect the number of cpus and behave correctly.
>
> No. I can have ~/.gitconfig shared between different machines. One
> with multithread support and one without. I should not have to fall
> back to conditional includes in order to use the same config file on
> the machine without multithread.

Our attitude has been to fail unsatisfyable requests from the
command line loudly to let the users' know, and tolerate having
elements from the future versions of Git in the configuration file
by ignoring them or tweaking them (silently or with a warning).

So this hunk is in line with the current practice, and we should
keep it this way.  It is a separate discussion if we want to rethink
that whole attitude, that discussion may be worth having, and I
think that discussion is wider than this single topic.


Re: [PATCH 06/10] grep: remove #ifdef NO_PTHREADS

2018-10-29 Thread Junio C Hamano
Jeff King  writes:

> On Mon, Oct 29, 2018 at 05:01:41PM +0100, Duy Nguyen wrote:
>
>> On Mon, Oct 29, 2018 at 3:25 PM Jeff King  wrote:
>> > But if the problem is simply that we are not quite there yet in the grep
>> > code, I am OK with taking this as the first pass, and knowing that there
>> > is more cleanup to be done later (though that sort of thing is IMHO very
>> > useful in a commit message).
>> 
>> Since the problem pops up now, I'm ok with updating/cleaning up all
>> this in this series, unless there's benefits in keeping this series
>> simple and merging it early (probably not?)
>
> Mostly I did not want to tax you. I would rather have this series and
> some cleanup left over, than to not have anything. But if you are
> interested in moving it further, I will not say no. :)

Likewise.


Re: [PATCH 3/3] commit-reach.h: add missing declarations (hdr-check)

2018-10-29 Thread Ramsay Jones



On 29/10/2018 01:13, Junio C Hamano wrote:
> Ramsay Jones  writes:
> 
>> ...
>>24 clear_contains_cache
>>   $
>>
>> you will find 24 copies of the commit-slab routines for the contains_cache.
>> Of course, when you enable optimizations again, these duplicate static
>> functions (mostly) disappear. Compiling with gcc at -O2, leaves two static
>> functions, thus:
>>
>>   $ nm commit-reach.o | grep contains_cache
>>   0870 t contains_cache_at_peek.isra.1.constprop.6
>>   $ nm ref-filter.o | grep contains_cache
>>   02b0 t clear_contains_cache.isra.14
>>   $
>>
>> However, using a shared 'contains_cache' would result in all six of the
>> above functions as external public functions in the git binary.
> 
> Sorry, but you lost me here.  Where does the 6 in above 'all six'
> come from?  I saw 24 (from unoptimized), and I saw 2 (from
> optimized), but...

As you already gathered, the 'six of the above functions' was the
list of 6 functions which were each duplicated 24 times (you only
left the last one un-snipped above) in the unoptimized git binary.

> Ah, OK, the slab system gives us a familiy of 6 helper functions to
> deal with the "contains_cache" slab, and we call only 3 of them
> (i.e. the implementation of other three are left unused).  Makes
> sense.

Yep, only clear_contains_cache(), contains_cache_at() and
init_contains_cache() are called.

>> At present,
>> only three of these functions are actually called, so the trade-off
>> seems to favour letting the compiler inline the commit-slab functions.
> 
> OK.  I vaguely recall using a linker that can excise the
> implementation an unused public function out of the resulting
> executable in the past, which may tip the balance in the opposite
> direction, 

Yes, and I thought I was using such a linker - I was surprised that
seems not to be the case! ;-) [I know I have used such a linker, and
I could have sworn it was on Linux ... ]

As I said in [1], the first version of this patch actually used a
shared contains_cache (so as not to #include "commit.h"). I changed
that just before sending it out, because I felt the patch conflated
two issues - I fully intended to send a "let's use a shared contains
cache instead" follow up patch! (Again, see [1] for the initial version
of that follow up patch).

But then Derrick said he preferred this version of the patch and I
couldn't really justify the follow up patch, other than to say "you
are making your compiler work harder than it needs to ..." - not very
convincing! :-P

For example, applying the RFC/PATCH from [1] on top of this patch
we can see:

  $ nm git | grep contains_cache
  000d21f0 T clear_contains_cache
  000d2400 T contains_cache_at
  000d2240 T contains_cache_at_peek
  000d2410 T contains_cache_peek
  000d21d0 T init_contains_cache
  000d2190 T init_contains_cache_with_stride
  $ size git
 text  data bss dec hex filename
  2531234 70736  274832 2876802  2be582 git
  $ 

whereas, without that patch on top (ie this patch):

  $ nm git | grep contains_cache
  00161e30 t clear_contains_cache.isra.14
  000d2190 t contains_cache_at_peek.isra.1.constprop.6
  $ size git
 text  data bss dec hex filename
  2530962 70736  274832 2876530  2be472 git
  $ 

which means the 'shared contains_cache' patch leads to a +272 bytes
of bloat in text section. (from the 3 unused external functions).


[1] 
https://public-inbox.org/git/2809251f-6eba-b0ac-68fe-b972809cc...@ramsayjones.plus.com/

> but the above reasonong certainly makes sense to me.

Thanks!

ATB,
Ramsay Jones



Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 03:35:58PM -0700, Stefan Beller wrote:

> On Mon, Oct 29, 2018 at 3:27 PM Jeff King  wrote:
> 
> > So yeah, that's the other thing I'm thinking about regarding having a
> > maximum loose cache size.
> 
> tangent:
> This preloading/caching could be used for a more precise approach
> to decide when to gc instead of using some statistical sampling
> of objects/17, eventually.

Isn't it exactly the same thing? Ideally we'd break down the cache to
the directory level, so we could fill the cache list for "17" and ask
"how full are you". Or we could just readdir objects/17 ourselves. But
either way, it's the same amount of work.

-Peff


Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 08:36:07PM +0100, Ævar Arnfjörð Bjarmason wrote:

> > Yeah, especially given recent advances in SHA-1 attacks, I'm not super
> > comfortable with the idea of disabling the duplicate-object check at
> > this point.
> 
> I'd be comfortable with it in my setup since it's been limited to
> collision attacks that are computationally prohibitive, and there being
> no sign of preimage attacks, which is the case we really need to worry
> about.

I agree, and I'm not actually that worried about the current state. But
what makes me more nervous is the life-cycle around Git. In 5 years,
people are still going to be running what we ship today, and will
grumble about upgrading to deal with SHA-1.

I suppose it's not the end of the world as long as they can un-flip a
config switch to get back the more-paranoid behavior (which is all that
you're really proposing).

> It does introduce a race condition where you can introduce a colliding
> object to the repository by doing two concurrent pushes, but as you note
> in
> https://public-inbox.org/git/20181029151842.gj17...@sigill.intra.peff.net/
> this already applies to packs, so you can trigger that with the right
> sized push (depending on transfer.unpackLimit), and we also have this in
> existing forms for other stuff.

Right. It can also trigger currently if somebody runs "git repack"
simultaneously (the loose becomes packed, but we don't re-scan the pack
directory).

> I do think it's amazingly paranoid to be worried about SHA-1 collisions
> in the first place, and a bit odd to leave the door open on these race
> conditions. I.e. it's hard to imagine a state-level[1] actor with
> sufficient motivation to exploit this who wouldn't find some way to make
> the race condition work as an escape hatch.

Yeah, I agree there's an element of that. I think the "push twice
quickly to race" thing is actually not all that interesting, though. In
that case, you're providing both the objects already, so why not just
push the one you want?

What's more interesting is racing with the victim of your collision (I
feed Junio the good half of the collision, and then try to race his
push and get my evil half in at the same time). Or racing a repack. But
timing the race there seems a lot trickier.

I suspect you could open up the window substantially by feeding your
pack really slowly. So I start to push at 1pm, but trickle in a byte at
a time of my 1GB pack, taking several hours. Meanwhile Junio pushes, and
then as soon as I see that, I send the rest of my pack. My index-pack
doesn't see Junio's push because it started before.

And ditto with repack, if the servers runs it predictably in response to
load.  So maybe not so tricky after all.

I think the other thing that helps here is that _everybody_ runs the
collision check. So yeah, you can race pushing your evil stuff to my
server. But it only takes one person fetching into their quiescent
laptop repository to notice the collision and sound the alarm.

I'll admit that there's a whole lot of hand-waving there, for a security
claim. I'll be glad to simply move off of SHA-1.

> In a busy repo that gets a lot of branches / branch deletions (so not
> quite as extreme as [2], but close) and the default expiry policy you
> can easily have 20-100K loose objects (something near the lower bound of
> that is the current live state of one server I'm looking at).
> 
> A recursive opendir()/readdir() on that on local disk is really fast if
> it's in cache, but can easily be 1-5 seconds on NFS. So for a push we'd
> now pay up to 5s just populating a cache we'll bearly use to accept some
> tiny push with just a few objects.

That 1-5 seconds is a little scary. Locally for a million objects I was
looking at 400ms. But obviously NFS is going to be much worse.

I do agree with your sentiment below that even if this should be on by
default, it should have a config knob. After all, "please flip this
switch and see if things improve" is a good escape hatch to have.

>  * Re-roll my 4 patch series to include the patch you have in
><20181027093300.ga23...@sigill.intra.peff.net>

I don't think it's quite ready for inclusion as-is. I hope to brush it
up a bit, but I have quite a backlog of stuff to review, as well.

-Peff


Re: [PATCH] pretty: Add %(trailer:X) to display single trailer

2018-10-29 Thread Anders Waldenborg


On Mon, Oct 29, 2018 at 3:14 PM Jeff King  wrote:
> Junio's review already covered my biggest question, which is why not
> something like "%(trailers:key=ticket)". And likewise making things like
> comma-separation options.

Jeff, Junio,

thanks!

Your questions pretty much matches what I (and a colleague I discussed
this with before posting) was concerned about.

My first try actually had it as an option to "trailers". But it got a
bit messy with the argument parsing, and the fact that there was a
fast path making it work when only specified. I did not want to spend
lot of time reworking fixing that before I had some feedback, so I
went for a smallest possible patch to float the idea with (a patch is
worth a 1000 words).

I'll start by reworking my patch to handle %(trailers:key=X)  (I'll
assume keys never contain ')' or ','), and ignore any formatting until
the way forward there is decided (see below).

> But my second question is whether we want to provide something more
> flexible than the always-parentheses that "%d" provides. That has been a
> problem in the past when people want to format the decoration in some
> other way.

Maybe just like +/-/space can be used directly after %, a () pair can
be allowed..   E.g "%d" would just be an alias for "%()D",  and for
trailers it would be something like "%()(trailers:key=foo)"

There is another special cased placeholder %f (sanitized subject line,
suitable for a filename). Which also could be changed to be a format
specifiier, allowing sanitize any thing, e.g "%!an" for sanitized
author name.

Is even the linebreak to commaseparation a generic thing?
"% ,()(trailers:key=Ticket)"   it starts go look a bit silly.

Then there are the padding modifiers. %<() %<|(). They operate on next
placeholder. "%<(10)%s" Is that a better syntax?
"%()%(trailers:key=Ticket,comma)"

I can also imagine moving all these modifiers into a generic modifier
syntax in brackets (and keeping old for backwards compat)
%[lpad=10,ltrunc=10]s ==  %<(10,trunc)%s
%[nonempty-prefix="%n"]GS ==  %+GS
%[nonempty-prefix=" (",nonempty-suffix=")"]D ==  %d
Which would mean something like this for tickets thing:
%[nonempty-prefix=" 
(Tickets:",nonempty-suffix=")",commaseparatelines](trailers:key=Ticket,nokey)
which is kinda verbose.

> We have formatting magic for "if this thing is non-empty, then show this
> prefix" in the for-each-ref formatter, but I'm not sure that we do in
> the commit pretty-printer beyond "% ". I wonder if we could/should add a
> a placeholder for "if this thing is non-empty, put in a space and
> enclose it in parentheses".

Would there be any interest in consolidating those formatters? Even
though they are totally separate beasts today. I think having all
attributes available on long form (e.g "%(authorname)") in addition to
existing short forms in pretty-formatter would make sense.

 anders


Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Stefan Beller
On Mon, Oct 29, 2018 at 3:27 PM Jeff King  wrote:

> So yeah, that's the other thing I'm thinking about regarding having a
> maximum loose cache size.

tangent:
This preloading/caching could be used for a more precise approach
to decide when to gc instead of using some statistical sampling
of objects/17, eventually.


Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 09:34:53PM +, Geert Jansen wrote:

> As an example, this means that when you're recieving a pack file with 1K
> objects in a repository with 10K loose objects that the loose-object-cache
> patch has roughly the same performance as the current git. I'm not sure if 
> this
> is something to worry about as I'm not sure people run repos with this many
> loose files. If it is a concern, there could be a flag to turn the loose 
> object
> cache on/off.

So yeah, that's the other thing I'm thinking about regarding having a
maximum loose cache size.

10k objects is only 200KB in memory. That's basically nothing. At some
point you run into pathological cases, like having a million objects
(but that's still only 20MB, much less than we devote to other caches,
though of course they do add up).

If you have a million loose objects, I strongly suspect you're going to
run into other problems (like space, since you're not getting any
deltas).

The one thing that gives me pause is that if you have a bunch of unused
and unreachable loose objects on disk, most operations won't actually
look at them at all. The majority of operations are only looking for
objects we expect to be present (e.g., resolving a ref, walking a tree)
and are fulfilled by checking the pack indices first.

So it's possible that Git is _tolerable_ for most operations with a
million loose objects, and we could make it slightly worse by loading
the cache. But I find it hard to get too worked up about spending an
extra 20MB (and the time to readdir() it in) in that case. It seems like
about 400ms on my machine, and the correct next step is almost always
going to be "pack" or "prune" anyway.

-Peff


Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Geert Jansen
On Mon, Oct 29, 2018 at 05:50:50PM -0400, Jeff King wrote:

> > I believe the loose-object-cache approach would have a performance 
> > regression
> > when you're receiving a small pack file and there's many loose objects in 
> > the
> > repo. Basically you're trading off
> > 
> > MIN(256, num_objects_in_pack / dentries_per_readdir) * readdir_latency
> > 
> > against
> > 
> > num_loose_objects * stat_latency
> 
> Should num_loose_objects and num_objects_in_pack be swapped here? Just
> making sure I understand what you're saying.

Whoops, yes, thanks for spotting that!

> So the 256 in your MIN() is potentially much smaller. We still have to
> deal with the fact that if you have a large number of loose objects,
> they may be split cross multiple readdir (or getdents) calls. The "cache
> maximum" we discussed does bound that, but in some ways that's worse:
> you waste time doing the bounded amount of readdir and then don't even
> get the benefit of the cache. ;)

Yup. To get the performance benefit you'd like the cache to hold all loose
objects except in clearly degenerate cases with far too many loose objects.

> > On Amazon EFS (and I expect on other NFS server implementations too) it is 
> > more
> > efficient to do readdir() on a large directory than to stat() each of the
> > individual files in the same directory. I don't have exact numbers but 
> > based on
> > a very rough calculation the difference is roughly 10x for large directories
> > under normal circumstances.
> 
> I'd expect readdir() to be much faster than stat() in general (e.g., "ls
> -f" versus "ls -l" is faster even on a warm cache; there's more
> formatting going on in the latter, but I think a lot of it is the effort
> to stat).

In the case of NFS, the client usually requests that the READDIR response also
contains some of the stat flags (like st_mode). But even in this case it's
still more efficient to return multiple entries in one batch through READDIR
rather than as individual responses to GETATTR (which is what stat() maps to).


Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 09:34:53PM +, Geert Jansen wrote:

> On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote:
> 
> > A real question is how much performance gain, relative to ".cloning"
> > thing, this approach gives us.  If it gives us 80% or more of the
> > gain compared to doing no checking, I'd say we have a clear winner.
> 
> I've tested Jeff's loose-object-cache patch and the performance is within 
> error
> bounds of my .cloning patch. A git clone of the same repo as in my initial
> tests:
> 
>   .cloning -> 10m04
>   loose-object-cache -> 9m59
> 
> Jeff's patch does a little more work (256 readdir() calls, which in case of an
> empty repo translate into 256 LOOKUP calls that return NFS4ERR_NOENT) but that
> appears to be insignificant.

Yep, that makes sense. Thanks for timing it.

> I believe the loose-object-cache approach would have a performance regression
> when you're receiving a small pack file and there's many loose objects in the
> repo. Basically you're trading off
> 
> MIN(256, num_objects_in_pack / dentries_per_readdir) * readdir_latency
> 
> against
> 
> num_loose_objects * stat_latency

Should num_loose_objects and num_objects_in_pack be swapped here? Just
making sure I understand what you're saying.

The patch I showed just blindly reads each of the 256 object
subdirectories. I think if we pursue this (and it seems like everybody
is on board), we should cache each of those individually. So a single
object would incur at most one opendir/readdir (and subsequent objects
may, too, or they may hit that cache if they share the first byte).

So the 256 in your MIN() is potentially much smaller. We still have to
deal with the fact that if you have a large number of loose objects,
they may be split cross multiple readdir (or getdents) calls. The "cache
maximum" we discussed does bound that, but in some ways that's worse:
you waste time doing the bounded amount of readdir and then don't even
get the benefit of the cache. ;)

> On Amazon EFS (and I expect on other NFS server implementations too) it is 
> more
> efficient to do readdir() on a large directory than to stat() each of the
> individual files in the same directory. I don't have exact numbers but based 
> on
> a very rough calculation the difference is roughly 10x for large directories
> under normal circumstances.

I'd expect readdir() to be much faster than stat() in general (e.g., "ls
-f" versus "ls -l" is faster even on a warm cache; there's more
formatting going on in the latter, but I think a lot of it is the effort
to stat).

-Peff


Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Geert Jansen
On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote:

> A real question is how much performance gain, relative to ".cloning"
> thing, this approach gives us.  If it gives us 80% or more of the
> gain compared to doing no checking, I'd say we have a clear winner.

I've tested Jeff's loose-object-cache patch and the performance is within error
bounds of my .cloning patch. A git clone of the same repo as in my initial
tests:

  .cloning -> 10m04
  loose-object-cache -> 9m59

Jeff's patch does a little more work (256 readdir() calls, which in case of an
empty repo translate into 256 LOOKUP calls that return NFS4ERR_NOENT) but that
appears to be insignificant.

I agree that the loose-object-cache approach is preferred as it applies to more
git commands and also benefits performance when there are loose objects
already in the repository.

As pointed out in the thread, maybe the default cache size should be some
integer times gc.auto.

I believe the loose-object-cache approach would have a performance regression
when you're receiving a small pack file and there's many loose objects in the
repo. Basically you're trading off

MIN(256, num_objects_in_pack / dentries_per_readdir) * readdir_latency

against

num_loose_objects * stat_latency

On Amazon EFS (and I expect on other NFS server implementations too) it is more
efficient to do readdir() on a large directory than to stat() each of the
individual files in the same directory. I don't have exact numbers but based on
a very rough calculation the difference is roughly 10x for large directories
under normal circumstances.

As an example, this means that when you're recieving a pack file with 1K
objects in a repository with 10K loose objects that the loose-object-cache
patch has roughly the same performance as the current git. I'm not sure if this
is something to worry about as I'm not sure people run repos with this many
loose files. If it is a concern, there could be a flag to turn the loose object
cache on/off.


Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries

2018-10-29 Thread Johannes Schindelin
Hi Jonathan,

On Fri, 26 Oct 2018, Jonathan Tan wrote:

> > So even better would be to use `is_promisor_object(oid) ||
> > has_object_file(oid)`, right?
> > 
> > This is something that is probably not even needed: as I mentioned,
> > the shallow commits are *expected* to be local. It should not ever
> > happen that they are fetched.
> 
> That would help, but I don't think it would help in the "fast-forward
> from A to B where A is B's parent" case I describe in [1].

I don't think that that analysis is correct. It assumes that there could
be a promised commit that is also listed as shallow. I do not think that
is a possible scenario.

And even if it were, why would asking for a promised commit object
download not only that object but also all of its trees and all of its
ancestors? That's not how I understood the idea of the lazy clone: I
understood promised objects to be downloaded on demand, individually.

> My suggestion was:
> 
> > It sounds safer to me to use the fast approach in this patch when the
> > repository is not partial, and stick to the slow approach when it is.
> 
> which can be done by replacing "prune_shallow(0, 1)" in patch 3 with
> "prune_shallow(0, !repository_format_partial_clone)", possibly with a comment
> that the fast method checks object existence for each shallow line directly,
> which is undesirable when the repository is a partial clone.

I am afraid that that would re-introduce the bug pointed out by Peff: you
*really* would need to traverse all reachable commits to use the non-fast
pruning method. And we simply don't.

Ciao,
Dscho

> (repository_format_partial_clone is non-NULL with the name of the promisor
> remote if the repository is a partial clone, and NULL otherwise).
> 
> [1] 
> https://public-inbox.org/git/20181025185459.206127-1-jonathanta...@google.com/
> 


[PATCH v1] speed up refresh_index() by utilizing preload_index()

2018-10-29 Thread Ben Peart
From: Ben Peart 

Speed up refresh_index() by utilizing preload_index() to do most of the work
spread across multiple threads.  This works because most cache entries will
get marked CE_UPTODATE so that refresh_cache_ent() can bail out early when
called from within refresh_index().

On a Windows repo with ~200K files, this drops refresh times from 6.64
seconds to 2.87 seconds for a savings of 57%.

Signed-off-by: Ben Peart 
---

Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/feee1054c2
Checkout: git fetch https://github.com/benpeart/git 
refresh-index-multithread-preload-v1 && git checkout feee1054c2

 cache.h | 3 +++
 preload-index.c | 8 
 read-cache.c| 6 ++
 3 files changed, 13 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index f7fabdde8f..883099db08 100644
--- a/cache.h
+++ b/cache.h
@@ -659,6 +659,9 @@ extern int daemonize(void);
 /* Initialize and use the cache information */
 struct lock_file;
 extern int read_index(struct index_state *);
+extern void preload_index(struct index_state *index,
+ const struct pathspec *pathspec,
+ unsigned int refresh_flags);
 extern int read_index_preload(struct index_state *,
  const struct pathspec *pathspec,
  unsigned int refresh_flags);
diff --git a/preload-index.c b/preload-index.c
index 9e7152ab14..222792ccbc 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -9,7 +9,7 @@
 #include "progress.h"
 
 #ifdef NO_PTHREADS
-static void preload_index(struct index_state *index,
+void preload_index(struct index_state *index,
  const struct pathspec *pathspec,
  unsigned int refresh_flags)
 {
@@ -100,9 +100,9 @@ static void *preload_thread(void *_data)
return NULL;
 }
 
-static void preload_index(struct index_state *index,
- const struct pathspec *pathspec,
- unsigned int refresh_flags)
+void preload_index(struct index_state *index,
+  const struct pathspec *pathspec,
+  unsigned int refresh_flags)
 {
int threads, i, work, offset;
struct thread_data data[MAX_PARALLEL];
diff --git a/read-cache.c b/read-cache.c
index d57958233e..53733d651d 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1496,6 +1496,12 @@ int refresh_index(struct index_state *istate, unsigned 
int flags,
typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n");
added_fmt = (in_porcelain ? "A\t%s\n" : "%s needs update\n");
unmerged_fmt = (in_porcelain ? "U\t%s\n" : "%s: needs merge\n");
+   /*
+* Use the multi-threaded preload_index() to refresh most of the
+* cache entries quickly then in the single threaded loop below,
+* we only have to do the special cases that are left.
+*/
+   preload_index(istate, pathspec, 0);
for (i = 0; i < istate->cache_nr; i++) {
struct cache_entry *ce, *new_entry;
int cache_errno = 0;

base-commit: c670b1f876521c9f7cd40184bf7ed05aad843433
-- 
2.9.2.gvfs.2.27918.g0990287eef



Re: [RFC] Generation Number v2

2018-10-29 Thread Derrick Stolee

Here is a re-formatted version of the tables I introduced earlier.
The tables were too wide for public-inbox to render correctly (when
paired with my email client). Hopefully this bulleted-list format
works better. Thanks, Stefan, for pointing out the rendering
problems!

### Test 1: `git log --topo-order -N`

This test focuses on the number of commits that are parsed during
a `git log --topo-order` before writing `N` commits to output.

You can reproduce this test using `topo-order-tests.sh` and
see all the data in `topo-order-summary.txt`. The values reported
here are a sampling of the data, ignoring tests where all values
were the same or extremely close in value.

android-base, N = 100
    V0: 5487
    V1: 8534
    V2: 6937
    V3: 6419
    V4: 6453
android-base, N = 1000
    V0: 36029
    V1: 44030
    V2: 41493
    V3: 41206
    V4: 45431
chromium, N = 100
    V0:    101
    V1: 424406
    V2:    101
    V3:    101
    V4:    101
gerrit, N = 100
    V0: 8212
    V1: 8533
    V2:  164
    V3:  159
    V4:  162
gerrit, N = 1000
    V0: 8512
    V1: 8533
    V2: 1990
    V3: 1973
    V4: 3766
Linux, N = 100
    V0: 12458
    V1: 12444
    V2: 13683
    V3: 13123
    V4: 13124
Linux, N = 1000
    V0: 24436
    V1: 26247
    V2: 27878
    V3: 26430
    V4: 27875
Linux, N = 1
    V0: 30364
    V1: 28891
    V2: 27878
    V3: 26430
    V4: 27875
electron, N = 1000
    V0: 19927
    V1: 18733
    V2:  1072
    V3: 18214
    V4: 18214
Ffmpeg, N = 1
    V0: 32154
    V1: 47429
    V2: 10435
    V3: 11054
    V4: 11054
jgit, N = 1000
    V0: 1550
    V1: 6264
    V2: 1067
    V3: 1060
    V4: 1233
julia, N = 1
    V0: 43043
    V1: 43043
    V2: 10201
    V3: 23505
    V4: 23828
odoo, N = 1000
    V0: 17175
    V1:  9714
    V2:  4043
    V3:  4046
    V4:  4111
php-src, N = 1000
    V0: 19014
    V1: 27530
    V2:  1311
    V3:  1305
    V4:  1320
rails, N = 100
    V0: 1420
    V1: 2041
    V2: 1757
    V3: 1428
    V4: 1441
rails, N = 1000
    V0:  7952
    V1: 10145
    V2: 10053
    V3:  8373
    V4:  8373
swift, N = 1000
    V0: 1914
    V1: 4004
    V2: 2071
    V3: 1939
    V4: 1940
tensorflow, N = 1000
    V0: 10019
    V1: 39221
    V2:  6711
    V3: 10051
    V4: 10051
TypeScript, N = 1000
    V0:  2873
    V1: 12014
    V2:  3049
    V3:  2876
    V4:  2876


### Test 2: `git log --topo-order -10 A..B`

This test focuses on the number of commits that are parsed during
a `git log --topo-order A..B` before writing ten commits to output.
Since we fix a very small set of output commits, we care more about
the part of the walk that determines which commits are reachable
from `B` but not reachable from `A`. This part of the walk uses
commit date as a heuristic in the existing implementation.

You can reproduce this test using `topo-compare-tests.sh` and
see all the data in `topo-compare-summary.txt`. The values reported
here are a sampling of the data, ignoring tests where all values
were the same or extremely close in value.

_Note:_ For some of the rows, the `A` and `B` values may be
swapped from what is expected. This is due to (1) a bug in the
reference implementation that doesn't short-circuit the walk
when `A` can reach `B`, and (2) data-entry errors by the author.
The bug can be fixed, but would have introduced strange-looking
rows in this table.

android-base 53c1972bc8f 92f18ac3e39
  OLD: 39403
   V0:  1544
   V1:  6957
   V2:    26
   V3:  1015
   V4:  1098
gerrit c4311f7642 777a8cd1e0
  OLD:  6457
   V0:  7836
   V1: 10869
   V2:   415
   V3:   414
   V4:   445
electron 7da7dd85e addf069f2
  OLD: 18164
   V0:   945
   V1:  6528
   V2:    17
   V3:    17
   V4:    18
julia 7faee1b201 e2022b9f0f
  OLD: 22800
   V0:  4221
   V1: 12710
   V2:   377
   V3:   213
   V4:   213
julia ae69259cd9 c8b5402afc
  OLD:  1864
   V0:  1859
   V1: 13287
   V2:    12
   V3:  1859
   V4:  1859
Linux 69973b830859 c470abd4fde4
  OLD: 111692
   V0:  77263
   V1:  96598
   V2:  80238
   V3:  76332
   V4:  76495
Linux c3b92c878736 19f949f52599
  OLD: 167418
   V0:   5736
   V1:   4684
   V2:   9675
   V3:   3887
   V4:   3923
Linux c8d2bc9bc39e 69973b830859
  OLD: 44940
   V0:  4056
   V1: 16636
   V2: 10405
   V3:  3475
   V4:  4022
odoo 4a31f55d0a0 93fb2b4a616
  OLD: 25139
   V0: 19528
   V1: 20418
   V2: 19874
   V3: 19634
   V4: 27247
swift 4046359efd b34b6a14c7
  OLD: 13411
   V0:   662
   V1:   321
   V2:    12
   V3:    80
   V4:   134
tensorflow ec6d17219c fa1db5eb0d
  OLD: 10373
   V0:  4762
   V1: 36272
   V2:   174
   V3:  3631
   V4:  3632
TypeScript 35ea2bea76 123edced90
  OLD:  3450
   V0:   267
   V1: 10386
   V2:    27
   V3:   259
   V4:   259


### Test 3: `git merge-base A B`

This test focuses on the number of commits that are parsed during
a `git merge-base A B`. This part of the walk uses commit date as
a heuristic in the existing implementation.

You can reproduce this test using `merge-base-tests.sh` and
see all the data in `merge-base-summary.txt`. The values reported
here are a 

Re: [PATCH v2 06/10] preload-index.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 02:05:58PM -0400, Ben Peart wrote:

> On 10/29/2018 1:26 PM, Duy Nguyen wrote:
> > On Mon, Oct 29, 2018 at 6:21 PM Ben Peart  wrote:
> > > @@ -114,6 +104,9 @@ static void preload_index(struct index_state *index,
> > >   threads = index->cache_nr / THREAD_COST;
> > >   if ((index->cache_nr > 1) && (threads < 2) &&
> > > git_env_bool("GIT_TEST_PRELOAD_INDEX", 0))
> > >   threads = 2;
> > > +   cpus = online_cpus();
> > > +   if (threads > cpus)
> > > +   threads = cpus;
> > >   if (threads < 2)
> > >   return;
> > >   trace_performance_enter();
> > 
> > Capping the number of threads to online_cpus() does not always make
> > sense. In this case (or at least the original use case where we stat()
> > over nfs) we want to issue as many requests to nfs server as possible
> > to reduce latency and it has nothing to do with how many cores we
> > have. Using more threads than cores might make sense since threads are
> > blocked by nfs client, but we still want to send more to the server.
> > 
> 
> That makes sense.  Some test will be necessary then.
> 
> I guess HAVE_THREADS is functionally the same as online_cpus() == 1. For
> some reason, I prefer the latter - probably because I'm typically already
> calling it and so it doesn't feel like a 'special' value/test I have to
> remember. I just know I need to make sure the logic works correctly with any
> number of cps greater than zero. :-)

I don't think they're functionally the same. If you see online_cpus()
== 1, you are in one of two cases:

  - the system supports threads, but CPU-bound tasks should stick to a
single thread

  - the system does not even support threading, and calling
pthread_create() will give you ENOSYS

In the first case, if your task is _not_ CPU bound (i.e., the stat()
thing), then you want to distinguish those cases.

I'm not sure if I'm violently agreeing, but it just wasn't clear to me
from the above discussion that we all agreed that HAVE_THREADS is still
necessary in some cases.

(I do think in cases where it is not, that we would do just as well to
avoid it; if that is all you were saying, then I agree. ;) ).

-Peff


Re: [RFC] Generation Number v2

2018-10-29 Thread Derrick Stolee

On 10/29/2018 3:22 PM, Stefan Beller wrote:

Based on the performance results alone, we should remove minimum
generation numbers, (epoch, date) pairs, and FELINE index from
consideration. There are enough examples of these indexes performing
poorly.

In contrast, maximum generation numbers and corrected commit
dates both performed quite well. They are frequently the top
two performing indexes, and rarely significantly different.

The trade-off here now seems to be: which _property_ is more important,
locally-computable or backwards-compatible?

* Maximum generation number is backwards-compatible but not
locally-computable or immutable.

These maximum generation numbers sound like the reverse of
the generation numbers as they are currently implemented, i.e.
we count all commits between the commit A and all heads.
How would this impact creation of a commit?


The indexes listed here would be computed at the same time as a 
commit-graph (and only change on commit-graph writes).


This idea of having something depend on the "global state" isn't ideal 
(in my opinion) but it certainly works for the tests specified below.



The current generation numbers can be lazily updated or not
updated at all. In my understanding of the maximum generation
numbers, a new commit would make these maximum generation
numbers invalid (i.e. they have to be recomputed).


New commits would change the value you would get by the definition (on 
rewrite of the commit-graph) but don't violate the reachability index 
property (maxgen(A) < maxgen(B) =>

A can't reach B). So that doesn't effect their usefulness.


Are there ways out by deferring the computation of maximum
generation numbers while still being able to work with some
commits that are un-accounted for?


Creating a commit that doesn't exist in the commit-graph file results in 
a commit with generation number INFINITY. The new commits can't be 
reached by commits with finite generation number.



When recomputing these numbers, the current generation number
(V0) has the property that already existing numbers can be re-used
as-is. We only need to compute the numbers for new commits,
and then insert this to the appropriate data structures (which is
a separate problem, one could imagine a split generation
numbers file like the split index)

For the V2 maximum generation numbers, would we need to
rewrite the numbers for all commits once we recompute them?
Assuming that is true, it sounds like the benchmark doesn't
cover the whole costs associated with V2, which is why the
exceptional performance can be explained.
You're right, we need to recompute the entire list of maximum generation 
number values. However, you are a bit hung up on "maxgen" being a fixed 
function that is correct at all times. We could compute the "maxgen" for 
the set of commits that are written to the "split" commit-graph while 
leaving the base the same. There is one tricky part here: we need to 
initialize the values starting at "num commits in the combined 
commit-graph" (commits in the base and split portion). The minimum 
possible value in the split portion will be larger than the maximum 
possible value in the base portion. Since the (future) implementation of 
split commit-graphs would have all commit-parent edges pointing down 
into the base and not from the base into the split, this will continue 
to satisfy our reachability index property.


As for the cost: we need to do a topo-order walk and then a quick 
minimum-value check across all parents. This is O(N), and the constant 
is small. This may be a cost worth paying.



(Note: The table as read in
https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51ef...@gmail.com/
seems line broken, using gmails web interface is not good
for ASCII art and patches, git-send-email would fare better)


Sorry for that. I copy-pasted into Thunderbird. Hopefully most users can 
redirect to the GitHub-rendered markdown if they have trouble.





* Corrected commit-date is locally-computable and immutable,
but not backwards-compatible.

How are these dates not backwards incompatible?
We don't have to expose these dates to the user, but
- just like generation numbers - could store them and use them
but not tell the user about it.

We would need to be really careful to not expose them at all
as they look like the real dates, so that could make for an
awkward bug report.


By "backwards-compatible" I mean that we could store them in the 
commit-graph file without breaking existing clients (or by introducing 
extra data).


We could easily add these corrected commit dates as an optional chunk, 
but that adds 8 bytes per commit, and we already use 8 bytes for the 
(generation, date) pairs.


The solution I made in my prototype is to replace the generation number 
with an offset for how much to add to the commit-date to get the 
corrected commit-date. However, an old client would interpret these 
offsets as generations and have incorrect output.


A 

Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Ævar Arnfjörð Bjarmason


On Mon, Oct 29 2018, Jeff King wrote:

> On Sat, Oct 27, 2018 at 01:22:16PM +0200, Ævar Arnfjörð Bjarmason wrote:
>
>> > Taking one step back, the root problem in this thread is that stat() on
>> > non-existing files is slow (which makes has_sha1_file slow).
>> >
>> > One solution there is to cache the results of looking in .git/objects
>> > (or any alternate object store) for loose files. And indeed, this whole
>> > scheme is just a specialized form of that: it's a flag to say "hey, we
>> > do not have any objects yet, so do not bother looking".
>> >
>> > Could we implement that in a more direct and central way? And could we
>> > implement it in a way that catches more cases? E.g., if I have _one_
>> > object, that defeats this specialized optimization, but it is probably
>> > still beneficial to cache that knowledge (and the reasonable cutoff is
>> > probably not 1, but some value of N loose objects).
>> [...]
>>
>> The assumption with making it exactly 0 objects and not any value of >0
>> is that we can safely assume that a "clone" or initial "fetch"[1] is
>> special in ways that a clone isn't. I.e. we're starting out with nothing
>> and doing the initial population, that's probably not as true in an
>> existing repo that's getting concurrent fetches, commits, rebases etc.
>
> I assume you mean s/that a clone isn't/that a fetch isn't/.

Yes, sorry.

> I agree there are cases where you might be able to go further if you
> assume a full "0". But my point is that "clone" is an ambiguous concept,
> and it doesn't map completely to what's actually slow here. So if you
> only look at "are we cloning", then:
>
>   - you have a bunch of cases which are "clones", but aren't actually
> starting from scratch
>
>   - you get zero benefit in the non-clone cases, when we could be
> scaling the benefit smoothly

Indeed. It's not special in principle, but I think in practice the
biggest wins are in the clone case, and unlike "fetch" we can safely
assume we're free of race conditions. More on that below.

>> But in the spirit of taking a step back, maybe we should take two steps
>> back and consider why we're doing this at all.
>
> OK, I think it's worth discussing, and I'll do that below. But first I
> want to say...
>
>> Three of our tests fail if we compile git like this, and cloning is much
>> faster (especially on NFS):
>>
>> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
>> index 2004e25da2..0c2d008ee0 100644
>> --- a/builtin/index-pack.c
>> +++ b/builtin/index-pack.c
>> @@ -796,3 +796,3 @@ static void sha1_object(const void *data, struct 
>> object_entry *obj_entry,
>>
>> -   if (startup_info->have_repository) {
>> +   if (0) {
>> read_lock();
>>
>> Even on a local disk I'm doing 262759 lstat() calls cloning git.git and
>> spending 5% of my time on that.
>
> With the caching patch I posted earlier, I see roughly the same speedup
> on an index-pack of git.git as I do with disabling the collision check
> entirely (I did see about a 1% difference in favor of what you wrote
> above, which was within the noise, but may well be valid due to slightly
> reduced lock contention).
>
> TBH I'm not sure if any of this is actually worth caring about on a
> normal Linux system, though. There stat() is fast. It might be much more
> interesting on macOS or Windows, or on a Linux system on NFS.

It matters a *lot* on NFS as my performance numbers in
https://public-inbox.org/git/87d0rslhkl@evledraar.gmail.com/ show.

>> But why do we have this in the first place? It's because of 8685da4256
>> ("don't ever allow SHA1 collisions to exist by fetching a pack",
>> 2007-03-20) and your 51054177b3 ("index-pack: detect local corruption in
>> collision check", 2017-04-01).
>>
>> I.e. we are worried about (and those tests check for):
>>
>>  a) A malicious user trying to give us repository where they have
>> created an object with the same SHA-1 that's different, as in the
>> SHAttered attack.
>>
>> I remember (but have not dug up) an old E-Mail from Linus saying
>> that this was an important security aspect of git, i.e. even if
>> SHA-1 was broken you couldn't easily propagate bad objects.
>
> Yeah, especially given recent advances in SHA-1 attacks, I'm not super
> comfortable with the idea of disabling the duplicate-object check at
> this point.

I'd be comfortable with it in my setup since it's been limited to
collision attacks that are computationally prohibitive, and there being
no sign of preimage attacks, which is the case we really need to worry
about.

>>  b) Cases where we've ended up with different content for a SHA-1 due to
>> e.g. a local FS corruption. Which is the subject of your commit in
>> 2017.
>
> Sort of. We actually detected it before my patch, but we just gave a
> really crappy error message. ;)
>
>>  c) Are there cases where fetch.fsckObjects is off and we just flip a
>> bit on the wire and don't notice? I 

Re: [RFC] Generation Number v2

2018-10-29 Thread Stefan Beller
> Based on the performance results alone, we should remove minimum
> generation numbers, (epoch, date) pairs, and FELINE index from
> consideration. There are enough examples of these indexes performing
> poorly.
>
> In contrast, maximum generation numbers and corrected commit
> dates both performed quite well. They are frequently the top
> two performing indexes, and rarely significantly different.
>
> The trade-off here now seems to be: which _property_ is more important,
> locally-computable or backwards-compatible?
>
> * Maximum generation number is backwards-compatible but not
>locally-computable or immutable.

These maximum generation numbers sound like the reverse of
the generation numbers as they are currently implemented, i.e.
we count all commits between the commit A and all heads.
How would this impact creation of a commit?

The current generation numbers can be lazily updated or not
updated at all. In my understanding of the maximum generation
numbers, a new commit would make these maximum generation
numbers invalid (i.e. they have to be recomputed).

Are there ways out by deferring the computation of maximum
generation numbers while still being able to work with some
commits that are un-accounted for?

When recomputing these numbers, the current generation number
(V0) has the property that already existing numbers can be re-used
as-is. We only need to compute the numbers for new commits,
and then insert this to the appropriate data structures (which is
a separate problem, one could imagine a split generation
numbers file like the split index)

For the V2 maximum generation numbers, would we need to
rewrite the numbers for all commits once we recompute them?
Assuming that is true, it sounds like the benchmark doesn't
cover the whole costs associated with V2, which is why the
exceptional performance can be explained.

(Note: The table as read in
https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51ef...@gmail.com/
seems line broken, using gmails web interface is not good
for ASCII art and patches, git-send-email would fare better)

>
> * Corrected commit-date is locally-computable and immutable,
>but not backwards-compatible.

How are these dates not backwards incompatible?
We don't have to expose these dates to the user, but
- just like generation numbers - could store them and use them
but not tell the user about it.

We would need to be really careful to not expose them at all
as they look like the real dates, so that could make for an
awkward bug report.

The approach of "corrected commit date" sounds like we could
have a very lazy approach, i.e. no extra data structures needed
for many commits (as the corrected date equals the real date)
and only need to store the corrections for some commits.
Such an approach however would not make it easy to know
if we operate on corrected dates, or if we even checked them
for correctness before.

So if we'd have an additional row in the generation numbers file
telling the corrected date, then we should be able to be backwards
compatible?


Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Ævar Arnfjörð Bjarmason


On Mon, Oct 29 2018, Jeff King wrote:

> On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote:
>
>> > Of course any cache raises questions of cache invalidation, but I think
>> > we've already dealt with that for this case. When we use
>> > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of
>> > accuracy/speed tradeoff (which does a similar caching thing with
>> > packfiles).
>> >
>> > So putting that all together, could we have something like:
>>
>> I think this conceptually is a vast improvement relative to
>> ".cloning" optimization.  Obviously this does not have the huge
>> downside of the other approach that turns the collision detection
>> completely off.
>>
>> A real question is how much performance gain, relative to ".cloning"
>> thing, this approach gives us.  If it gives us 80% or more of the
>> gain compared to doing no checking, I'd say we have a clear winner.
>
> My test runs showed it improving index-pack by about 3%, versus 4% for
> no collision checking at all. But there was easily 1% of noise. And much
> more importantly, that was on a Linux system on ext4, where stat is
> fast. I'd be much more curious to hear timing results from people on
> macOS or Windows, or from Geert's original NFS case.

At work we make copious use of NetApp over NFS for filers. I'd say this
is probably typical for enterprise environments. Raw I/O performance
over the wire (writing a large file) is really good, but metadata
(e.g. stat) performance tends to be atrocious.

We both host the in-house Git server (GitLab) on such a filer (for HA
etc.), as well as many types of clients.

As noted by Geert upthread you need to mount the git directories with
lookupcache=positive (see e.g. [1]).

Cloning git.git as --bare onto such a partition with my patch:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 60.981.802091  19 93896 19813 futex
 14.640.432782   7 6141516 read
  9.400.277804   1199576   pread64
  4.880.144172   3 4935511 write
  3.100.091498  31  2919  2880 stat
  2.530.074812  31  2431   737 lstat
  1.960.057934   3 17257  1276 recvfrom
  0.910.026815   3  8543   select
  0.620.018425   2  8543   poll
[...]
real0m32.053s
user0m21.451s
sys 0m7.806s

Without:

% time seconds  usecs/call callserrors syscall
-- --- --- - - 
 71.01   31.653787  50628265 21608 futex
 24.14   10.761950  41260658258964 lstat
  2.220.988001   5199576   pread64
  1.320.587844  10 59662 3 read
  0.790.350625   7 5037611 write
  0.220.096019  33  2919  2880 stat
  0.130.057950   4 1582112 recvfrom
  0.050.022385   3  7949   select
  0.040.015988   2  7949   poll
  0.030.0136223406 4   wait4
[...]
real4m38.670s
user0m29.015s
sys 0m33.894s

So a reduction in clone time by ~90%.

Performance would be basically the same with your patch. But let's
discuss that elsewhere in this thread. Just wanted to post the
performance numbers here.

1. 
https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/109#note_12528896


Re: [PATCH v2 06/10] preload-index.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Ben Peart




On 10/29/2018 1:26 PM, Duy Nguyen wrote:

On Mon, Oct 29, 2018 at 6:21 PM Ben Peart  wrote:

@@ -114,6 +104,9 @@ static void preload_index(struct index_state *index,
  threads = index->cache_nr / THREAD_COST;
  if ((index->cache_nr > 1) && (threads < 2) &&
git_env_bool("GIT_TEST_PRELOAD_INDEX", 0))
  threads = 2;
+   cpus = online_cpus();
+   if (threads > cpus)
+   threads = cpus;
  if (threads < 2)
  return;
  trace_performance_enter();


Capping the number of threads to online_cpus() does not always make
sense. In this case (or at least the original use case where we stat()
over nfs) we want to issue as many requests to nfs server as possible
to reduce latency and it has nothing to do with how many cores we
have. Using more threads than cores might make sense since threads are
blocked by nfs client, but we still want to send more to the server.



That makes sense.  Some test will be necessary then.

I guess HAVE_THREADS is functionally the same as online_cpus() == 1. 
For some reason, I prefer the latter - probably because I'm typically 
already calling it and so it doesn't feel like a 'special' value/test I 
have to remember. I just know I need to make sure the logic works 
correctly with any number of cps greater than zero. :-)


Re: [PATCH 09/10] read-cache.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Ben Peart




On 10/29/2018 1:21 PM, Duy Nguyen wrote:

On Mon, Oct 29, 2018 at 6:05 PM Ben Peart  wrote:

@@ -2756,8 +2745,11 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
   if (ce_write(, newfd, , sizeof(hdr)) < 0)
   return -1;

-#ifndef NO_PTHREADS
- nr_threads = git_config_get_index_threads();
+ if (HAVE_THREADS)


This shouldn't be needed either.  My assumption was that if someone
explicitly asked for multiple threads we're better off failing than
silently ignoring their request.  The default/automatic setting will
detect the number of cpus and behave correctly.


No. I can have ~/.gitconfig shared between different machines. One
with multithread support and one without. I should not have to fall
back to conditional includes in order to use the same config file on
the machine without multithread.



If you want to share the ~/.gitconfig across machines that have 
different numbers of CPUs, it makes more sense to me to leave the 
setting at the "auto" setting rather than specifically overriding it to 
a number that won't work on at least one of your machines.  Then it all 
"just works" without the need to conditional includes.


[PATCH] completion: use builtin completion for format-patch

2018-10-29 Thread Denton Liu
Signed-off-by: Denton Liu 
---
 contrib/completion/git-completion.bash | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/contrib/completion/git-completion.bash 
b/contrib/completion/git-completion.bash
index d63d2dffd..da77da481 100644
--- a/contrib/completion/git-completion.bash
+++ b/contrib/completion/git-completion.bash
@@ -1531,15 +1531,6 @@ _git_fetch ()
__git_complete_remote_or_refspec
 }
 
-__git_format_patch_options="
-   --stdout --attach --no-attach --thread --thread= --no-thread
-   --numbered --start-number --numbered-files --keep-subject --signoff
-   --signature --no-signature --in-reply-to= --cc= --full-index --binary
-   --not --all --cover-letter --no-prefix --src-prefix= --dst-prefix=
-   --inline --suffix= --ignore-if-in-upstream --subject-prefix=
-   --output-directory --reroll-count --to= --quiet --notes
-"
-
 _git_format_patch ()
 {
case "$cur" in
@@ -1550,7 +1541,7 @@ _git_format_patch ()
return
;;
--*)
-   __gitcomp "$__git_format_patch_options"
+   __gitcomp_builtin format-patch
return
;;
esac
-- 
2.19.1



Re: [PATCH 12/12] fsck: mark strings for translation

2018-10-29 Thread Ævar Arnfjörð Bjarmason


On Mon, Oct 29 2018, Ævar Arnfjörð Bjarmason wrote:

> On Mon, Oct 29 2018, Duy Nguyen wrote:
>
>> On Mon, Oct 29, 2018 at 3:09 PM Junio C Hamano  wrote:
>>>
>>> SZEDER Gábor  writes:
>>>
>>> >> -fprintf(stderr, "%s in %s %s: %s\n",
>>> >> -msg_type, printable_type(obj), describe_object(obj), err);
>>> >> +fprintf_ln(stderr, _("%s in %s %s: %s"),
>>> >
>>> > Are the (f)printf() -> (f)printf_ln() changes all over
>>> > 'builtin/fsck.c' really necessary to mark strings for translation?
>>>
>>> It is beyond absolute minimum but I saw it argued here that this
>>> makes it easier to manage the .po and .pot files if your message
>>> strings do not end with LF, a you are much less likely to _add_
>>> unneeded LF to the translated string than _lose_ LF at the end of
>>> translated string.
>>
>> Especially when \n plays an important role and we don't trust
>> translators to keep it [1] [2]. It's probably a too defensive stance
>> and often does not apply, so nowadays I do it just to keep a
>> consistent pattern in the code.
>>
>> [1] https://public-inbox.org/git/20120308220131.GA10122@burratino/#t
>> [2] but then translators can crash programs anyway (e.g. changing %d
>> to %s...) we just trust gettext tools to catch problems early.
>
> As Jonathan pointed out in the follow-up message[1] this sort of thing
> is checked for in msgfmt, so sure, let's strip the \n, but it's really
> not something we need to worry about. Likewise with translators turning
> "%s" into "%d" (or "% s") or whatever.
>
> $ git diff -U0
> diff --git a/po/de.po b/po/de.po
> index 47986814c9..62de46c63d 100644
> --- a/po/de.po
> +++ b/po/de.po
> @@ -23 +23 @@ msgid "%shint: %.*s%s\n"
> -msgstr "%sHinweis: %.*s%s\n"
> +msgstr "%sHinweis: %.*s%s"
> $ make [...]
> [...]
> po/de.po:23: 'msgid' and 'msgstr' entries do not both end with '\n'
> msgfmt: found 1 fatal error
> 3470 translated messages.
> make: *** [Makefile:2488: po/build/locale/de/LC_MESSAGES/git.mo] Error 1
>
> 1. 
> https://public-inbox.org/git/cacsjy8acuy9fziiehgc7mel4i+xp6u0pmh1rgor-wznhyt1...@mail.gmail.com/

...to add to that, if there *are* formatting errors that --check doesn't
spot let's hear about it so we can just patch gettext upstream:
https://github.com/autotools-mirror/gettext/blob/master/gettext-tools/src/msgl-check.c#L572-L756

Unlike the rest of our stack where we need to support however many years
old tools we can freely rely on bleeding-edge gettext features, since
the only person we need to convince to upgrade is Jiang. I.e. he accepts
the PRs from translators, (presumably) runs msgfmt --check and then
submits the result to Junio.

See the "Usually..." paragraph in my 66f5f6dca9 ("C style: use standard
style for "TRANSLATORS" comments", 2017-05-11) for an example. We
already rely on a fairly recent gettext toolchain.


Re: [PATCH 12/12] fsck: mark strings for translation

2018-10-29 Thread Ævar Arnfjörð Bjarmason


On Mon, Oct 29 2018, Duy Nguyen wrote:

> On Mon, Oct 29, 2018 at 3:09 PM Junio C Hamano  wrote:
>>
>> SZEDER Gábor  writes:
>>
>> >> -fprintf(stderr, "%s in %s %s: %s\n",
>> >> -msg_type, printable_type(obj), describe_object(obj), err);
>> >> +fprintf_ln(stderr, _("%s in %s %s: %s"),
>> >
>> > Are the (f)printf() -> (f)printf_ln() changes all over
>> > 'builtin/fsck.c' really necessary to mark strings for translation?
>>
>> It is beyond absolute minimum but I saw it argued here that this
>> makes it easier to manage the .po and .pot files if your message
>> strings do not end with LF, a you are much less likely to _add_
>> unneeded LF to the translated string than _lose_ LF at the end of
>> translated string.
>
> Especially when \n plays an important role and we don't trust
> translators to keep it [1] [2]. It's probably a too defensive stance
> and often does not apply, so nowadays I do it just to keep a
> consistent pattern in the code.
>
> [1] https://public-inbox.org/git/20120308220131.GA10122@burratino/#t
> [2] but then translators can crash programs anyway (e.g. changing %d
> to %s...) we just trust gettext tools to catch problems early.

As Jonathan pointed out in the follow-up message[1] this sort of thing
is checked for in msgfmt, so sure, let's strip the \n, but it's really
not something we need to worry about. Likewise with translators turning
"%s" into "%d" (or "% s") or whatever.

$ git diff -U0
diff --git a/po/de.po b/po/de.po
index 47986814c9..62de46c63d 100644
--- a/po/de.po
+++ b/po/de.po
@@ -23 +23 @@ msgid "%shint: %.*s%s\n"
-msgstr "%sHinweis: %.*s%s\n"
+msgstr "%sHinweis: %.*s%s"
$ make [...]
[...]
po/de.po:23: 'msgid' and 'msgstr' entries do not both end with '\n'
msgfmt: found 1 fatal error
3470 translated messages.
make: *** [Makefile:2488: po/build/locale/de/LC_MESSAGES/git.mo] Error 1

1. 
https://public-inbox.org/git/cacsjy8acuy9fziiehgc7mel4i+xp6u0pmh1rgor-wznhyt1...@mail.gmail.com/


Re: [PATCH v2 06/10] preload-index.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Duy Nguyen
On Mon, Oct 29, 2018 at 6:21 PM Ben Peart  wrote:
> @@ -114,6 +104,9 @@ static void preload_index(struct index_state *index,
>  threads = index->cache_nr / THREAD_COST;
>  if ((index->cache_nr > 1) && (threads < 2) &&
> git_env_bool("GIT_TEST_PRELOAD_INDEX", 0))
>  threads = 2;
> +   cpus = online_cpus();
> +   if (threads > cpus)
> +   threads = cpus;
>  if (threads < 2)
>  return;
>  trace_performance_enter();

Capping the number of threads to online_cpus() does not always make
sense. In this case (or at least the original use case where we stat()
over nfs) we want to issue as many requests to nfs server as possible
to reduce latency and it has nothing to do with how many cores we
have. Using more threads than cores might make sense since threads are
blocked by nfs client, but we still want to send more to the server.
-- 
Duy


Re: [PATCH v2 08/10] read-cache.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Ben Peart




On 10/29/2018 10:30 AM, Jeff King wrote:

On Sat, Oct 27, 2018 at 07:30:06PM +0200, Nguyễn Thái Ngọc Duy wrote:


-#ifndef NO_PTHREADS
-   nr_threads = git_config_get_index_threads();
+   if (HAVE_THREADS) {
+   nr_threads = git_config_get_index_threads();
  
-	/* TODO: does creating more threads than cores help? */

-   if (!nr_threads) {
-   nr_threads = istate->cache_nr / THREAD_COST;
-   cpus = online_cpus();
-   if (nr_threads > cpus)
-   nr_threads = cpus;
+   /* TODO: does creating more threads than cores help? */
+   if (!nr_threads) {
+   nr_threads = istate->cache_nr / THREAD_COST;
+   cpus = online_cpus();
+   if (nr_threads > cpus)
+   nr_threads = cpus;
+   }
+   } else {
+   nr_threads = 1;
}


I'd have thought we could just rely on online_cpus() returning 1 here to
avoid having to ask "do we even have thread support?". But I guess that
TODO comment implies that we might one day two 2 threads on a single
CPU.

-Peff



And here is the patch I created when testing out the idea of removing 
NO_PTHREADS.  It doesn't require any 'HAVE_THREADS' tests.




diff --git a/read-cache.c b/read-cache.c
index 1df5c16dbc..1f088799fc 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1920,19 +1920,15 @@ struct index_entry_offset_table
struct index_entry_offset entries[FLEX_ARRAY];
 };

-#ifndef NO_PTHREADS
 static struct index_entry_offset_table *read_ieot_extension(const char 
*mmap, size_t mmap_size, size_t offset

);
 static void write_ieot_extension(struct strbuf *sb, struct 
index_entry_offset_table *ieot);

-#endif

 static size_t read_eoie_extension(const char *mmap, size_t mmap_size);
 static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, size_t offset);


 struct load_index_extensions
 {
-#ifndef NO_PTHREADS
pthread_t pthread;
-#endif
struct index_state *istate;
const char *mmap;
size_t mmap_size;
@@ -2010,8 +2006,6 @@ static unsigned long load_all_cache_entries(struct 
index_state *istate,

return consumed;
 }

-#ifndef NO_PTHREADS
-
 /*
  * Mostly randomly chosen maximum thread counts: we
  * cap the parallelism to online_cpus() threads, and we want
@@ -2122,7 +2116,6 @@ static unsigned long 
load_cache_entries_threaded(struct index_state *istate, con



return consumed;
 }
-#endif

 /* remember to discard_cache() before reading a different cache! */
 int do_read_index(struct index_state *istate, const char *path, int 
must_exist)
@@ -2135,10 +2128,8 @@ int do_read_index(struct index_state *istate, 
const char *path, int must_exist)


size_t mmap_size;
struct load_index_extensions p;
size_t extension_offset = 0;
-#ifndef NO_PTHREADS
int nr_threads, cpus;
struct index_entry_offset_table *ieot = NULL;
-#endif

if (istate->initialized)
return istate->cache_nr;
@@ -2181,16 +2172,12 @@ int do_read_index(struct index_state *istate, 
const char *path, int must_exist

)

src_offset = sizeof(*hdr);

-#ifndef NO_PTHREADS
nr_threads = git_config_get_index_threads();
-
-   /* TODO: does creating more threads than cores help? */
-   if (!nr_threads) {
+   if (!nr_threads)
nr_threads = istate->cache_nr / THREAD_COST;
-   cpus = online_cpus();
-   if (nr_threads > cpus)
-   nr_threads = cpus;
-   }
+   cpus = online_cpus();
+   if (nr_threads > cpus)
+   nr_threads = cpus;

if (nr_threads > 1) {
extension_offset = read_eoie_extension(mmap, mmap_size);
@@ -2219,22 +2206,16 @@ int do_read_index(struct index_state *istate, 
const char *path, int must_exist

)
} else {
src_offset += load_all_cache_entries(istate, mmap, 
mmap_size, src_offset);

}
-#else
-   src_offset += load_all_cache_entries(istate, mmap, mmap_size, 
src_offset);

-#endif

istate->timestamp.sec = st.st_mtime;
istate->timestamp.nsec = ST_MTIME_NSEC(st);

/* if we created a thread, join it otherwise load the 
extensions on the primary thread */

-#ifndef NO_PTHREADS
if (extension_offset) {
int ret = pthread_join(p.pthread, NULL);
if (ret)
die(_("unable to join load_index_extensions 
thread: %s"), strerror(ret));

-   }
-#endif
-   if (!extension_offset) {
+   } else {
p.src_offset = src_offset;
load_index_extensions();
}
@@ -2756,7 +2737,6 @@ static int do_write_index(struct index_state 
*istate, struct tempfile *tempfile,


if (ce_write(, newfd, , sizeof(hdr)) < 0)
return -1;

-#ifndef NO_PTHREADS
nr_threads = 

Re: [PATCH 09/10] read-cache.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Duy Nguyen
On Mon, Oct 29, 2018 at 6:05 PM Ben Peart  wrote:
> > @@ -2756,8 +2745,11 @@ static int do_write_index(struct index_state 
> > *istate, struct tempfile *tempfile,
> >   if (ce_write(, newfd, , sizeof(hdr)) < 0)
> >   return -1;
> >
> > -#ifndef NO_PTHREADS
> > - nr_threads = git_config_get_index_threads();
> > + if (HAVE_THREADS)
>
> This shouldn't be needed either.  My assumption was that if someone
> explicitly asked for multiple threads we're better off failing than
> silently ignoring their request.  The default/automatic setting will
> detect the number of cpus and behave correctly.

No. I can have ~/.gitconfig shared between different machines. One
with multithread support and one without. I should not have to fall
back to conditional includes in order to use the same config file on
the machine without multithread.
-- 
Duy


Re: [PATCH v2 06/10] preload-index.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Ben Peart




On 10/27/2018 1:30 PM, Nguyễn Thái Ngọc Duy wrote:

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
  preload-index.c | 15 ++-
  1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/preload-index.c b/preload-index.c
index 9e7152ab14..0e24886aca 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -7,17 +7,7 @@
  #include "fsmonitor.h"
  #include "config.h"
  #include "progress.h"
-
-#ifdef NO_PTHREADS
-static void preload_index(struct index_state *index,
- const struct pathspec *pathspec,
- unsigned int refresh_flags)
-{
-   ; /* nothing */
-}
-#else
-
-#include 
+#include "thread-utils.h"
  
  /*

   * Mostly randomly chosen maximum thread counts: we
@@ -108,7 +98,7 @@ static void preload_index(struct index_state *index,
struct thread_data data[MAX_PARALLEL];
struct progress_data pd;
  
-	if (!core_preload_index)

+   if (!HAVE_THREADS || !core_preload_index)
return;
  
  	threads = index->cache_nr / THREAD_COST;

@@ -151,7 +141,6 @@ static void preload_index(struct index_state *index,
  
  	trace_performance_leave("preload index");

  }
-#endif
  
  int read_index_preload(struct index_state *index,

   const struct pathspec *pathspec,




In the theory that code speaks louder than comments, here is the patch I 
used when testing out the idea of getting rid of NO_PTHREADS:



 git diff head~1 preload-index.c
diff --git a/preload-index.c b/preload-index.c
index 9e7152ab14..266bc9d8d7 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -7,17 +7,7 @@
 #include "fsmonitor.h"
 #include "config.h"
 #include "progress.h"
-
-#ifdef NO_PTHREADS
-static void preload_index(struct index_state *index,
- const struct pathspec *pathspec,
- unsigned int refresh_flags)
-{
-   ; /* nothing */
-}
-#else
-
-#include 
+#include "thread-utils.h"

 /*
  * Mostly randomly chosen maximum thread counts: we
@@ -104,7 +94,7 @@ static void preload_index(struct index_state *index,
  const struct pathspec *pathspec,
  unsigned int refresh_flags)
 {
-   int threads, i, work, offset;
+   int cpus, threads, i, work, offset;
struct thread_data data[MAX_PARALLEL];
struct progress_data pd;

@@ -114,6 +104,9 @@ static void preload_index(struct index_state *index,
threads = index->cache_nr / THREAD_COST;
if ((index->cache_nr > 1) && (threads < 2) && 
git_env_bool("GIT_TEST_PRELOAD_INDEX", 0))

threads = 2;
+   cpus = online_cpus();
+   if (threads > cpus)
+   threads = cpus;
if (threads < 2)
return;
trace_performance_enter();
@@ -151,7 +144,6 @@ static void preload_index(struct index_state *index,

trace_performance_leave("preload index");
 }
-#endif

 int read_index_preload(struct index_state *index,
   const struct pathspec *pathspec,


Re: [PATCH v3 2/8] Add a place for (not) sharing stuff between worktrees

2018-10-29 Thread Duy Nguyen
On Mon, Oct 22, 2018 at 6:28 AM Junio C Hamano  wrote:
> > When multiple worktrees are used, we need rules to determine if
> > something belongs to one worktree or all of them. Instead of keeping
> > adding rules when new stuff comes (*), have a generic rule:
> >
> > - Inside $GIT_DIR, which is per-worktree by default, add
> >   $GIT_DIR/common which is always shared. New features that want to
> >   share stuff should put stuff under this directory.
> >
> > - Inside refs/, which is shared by default except refs/bisect, add
> >   refs/worktree/ which is per-worktree. We may eventually move
> >   refs/bisect to this new location and remove the exception in refs
> >   code.
> >
> > (*) And it may also include stuff from external commands which will
> > have no way to modify common/per-worktree rules.
>
> OK.  Establishing such a convention is a good role for the core-git
> should play to help third-party tools.
>
> Should this play well with the per-worktree configuration as well?
> Is it better to carve out a configuration variable namespace so that
> certain keys are never read from common ones (or per-worktree ones),
> so that people can tell which ones are what?  I know your current
> design says "this is just another new layer, and the users can hang
> themselves with this new rope".  I am wondering if there is a need
> to do something a bit more structured.

I actually find the layered design of config files more elegant. Even
before config.worktree is added, the user has system/per-user/per-repo
places to put stuff in. "git config" gives control to read or write
from a specific layer. We have to add layers to $GIT_DIR/refs to
handle multiple worktrees, but since the "API" for those don't have
the concept of layers at all, we need a single view where items from
different worktrees are accessible via the same path/ref name.

Adding variable namespace in config files does not look easy either.
Suppose we have perWorktree.* section just for per-workree stuff, what
do we do when the user just do "git config -f not-per-worktree-file
perWorktree.something foo" (or --global instead of -f)? What we could
do (and already do for some config keys) is ignore perWorktree.*
unless they come from config.worktree. Which does help a bit, but not
really perfect.

And a namespace like this means the user cannot have subsections
anymore (e.g. perWorktree.group..var is not supported if I
remember correctly). So yeah... perhaps we should wait a bit and see
what the user (or third party tool makers) comes up with.
-- 
Duy


Re: [PATCH v2 08/10] read-cache.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Ben Peart




On 10/29/2018 10:30 AM, Jeff King wrote:

On Sat, Oct 27, 2018 at 07:30:06PM +0200, Nguyễn Thái Ngọc Duy wrote:


-#ifndef NO_PTHREADS
-   nr_threads = git_config_get_index_threads();
+   if (HAVE_THREADS) {
+   nr_threads = git_config_get_index_threads();
  
-	/* TODO: does creating more threads than cores help? */

-   if (!nr_threads) {
-   nr_threads = istate->cache_nr / THREAD_COST;
-   cpus = online_cpus();
-   if (nr_threads > cpus)
-   nr_threads = cpus;
+   /* TODO: does creating more threads than cores help? */
+   if (!nr_threads) {
+   nr_threads = istate->cache_nr / THREAD_COST;
+   cpus = online_cpus();
+   if (nr_threads > cpus)
+   nr_threads = cpus;
+   }
+   } else {
+   nr_threads = 1;
}


I'd have thought we could just rely on online_cpus() returning 1 here to
avoid having to ask "do we even have thread support?". But I guess that
TODO comment implies that we might one day two 2 threads on a single
CPU.

-Peff



See my earlier response - the HAVE_THREADS tests are not needed.

We can remove the "TODO" comment - I tested it and it wasn't faster to 
have more threads than CPUs.


Re: [PATCH 09/10] read-cache.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Ben Peart




On 10/27/2018 3:10 AM, Nguyễn Thái Ngọc Duy wrote:

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
  read-cache.c | 49 ++---
  1 file changed, 18 insertions(+), 31 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index d57958233e..ba870bc3fd 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1920,19 +1920,15 @@ struct index_entry_offset_table
struct index_entry_offset entries[FLEX_ARRAY];
  };
  
-#ifndef NO_PTHREADS

  static struct index_entry_offset_table *read_ieot_extension(const char *mmap, 
size_t mmap_size, size_t offset);
  static void write_ieot_extension(struct strbuf *sb, struct 
index_entry_offset_table *ieot);
-#endif
  
  static size_t read_eoie_extension(const char *mmap, size_t mmap_size);

  static void write_eoie_extension(struct strbuf *sb, git_hash_ctx 
*eoie_context, size_t offset);
  
  struct load_index_extensions

  {
-#ifndef NO_PTHREADS
pthread_t pthread;
-#endif
struct index_state *istate;
const char *mmap;
size_t mmap_size;
@@ -2010,8 +2006,6 @@ static unsigned long load_all_cache_entries(struct 
index_state *istate,
return consumed;
  }
  
-#ifndef NO_PTHREADS

-
  /*
   * Mostly randomly chosen maximum thread counts: we
   * cap the parallelism to online_cpus() threads, and we want
@@ -2122,7 +2116,6 @@ static unsigned long load_cache_entries_threaded(struct 
index_state *istate, con
  
  	return consumed;

  }
-#endif
  
  /* remember to discard_cache() before reading a different cache! */

  int do_read_index(struct index_state *istate, const char *path, int 
must_exist)
@@ -2135,10 +2128,8 @@ int do_read_index(struct index_state *istate, const char 
*path, int must_exist)
size_t mmap_size;
struct load_index_extensions p;
size_t extension_offset = 0;
-#ifndef NO_PTHREADS
int nr_threads, cpus;
struct index_entry_offset_table *ieot = NULL;
-#endif
  
  	if (istate->initialized)

return istate->cache_nr;
@@ -2181,15 +2172,18 @@ int do_read_index(struct index_state *istate, const 
char *path, int must_exist)
  
  	src_offset = sizeof(*hdr);
  
-#ifndef NO_PTHREADS

-   nr_threads = git_config_get_index_threads();
+   if (HAVE_THREADS) {


In this case, nr_threads is already capped to online_cpus() below so 
this HAVE_THREADS test isn't needed.



+   nr_threads = git_config_get_index_threads();
  
-	/* TODO: does creating more threads than cores help? */

-   if (!nr_threads) {
-   nr_threads = istate->cache_nr / THREAD_COST;
-   cpus = online_cpus();
-   if (nr_threads > cpus)
-   nr_threads = cpus;
+   /* TODO: does creating more threads than cores help? */
+   if (!nr_threads) {
+   nr_threads = istate->cache_nr / THREAD_COST;
+   cpus = online_cpus();
+   if (nr_threads > cpus)
+   nr_threads = cpus;
+   }
+   } else {
+   nr_threads = 1;
}
  
  	if (nr_threads > 1) {

@@ -2219,21 +2213,16 @@ int do_read_index(struct index_state *istate, const 
char *path, int must_exist)
} else {
src_offset += load_all_cache_entries(istate, mmap, mmap_size, 
src_offset);
}
-#else
-   src_offset += load_all_cache_entries(istate, mmap, mmap_size, 
src_offset);
-#endif
  
  	istate->timestamp.sec = st.st_mtime;

istate->timestamp.nsec = ST_MTIME_NSEC(st);
  
  	/* if we created a thread, join it otherwise load the extensions on the primary thread */

-#ifndef NO_PTHREADS
-   if (extension_offset) {
+   if (HAVE_THREADS && extension_offset) {


Here extension_offset won't be set if there wasn't a thread created so 
the HAVE_THREADS test is not needed.



int ret = pthread_join(p.pthread, NULL);
if (ret)
die(_("unable to join load_index_extensions thread: 
%s"), strerror(ret));
}
-#endif
if (!extension_offset) {
p.src_offset = src_offset;
load_index_extensions();
@@ -2756,8 +2745,11 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
if (ce_write(, newfd, , sizeof(hdr)) < 0)
return -1;
  
-#ifndef NO_PTHREADS

-   nr_threads = git_config_get_index_threads();
+   if (HAVE_THREADS)


This shouldn't be needed either.  My assumption was that if someone 
explicitly asked for multiple threads we're better off failing than 
silently ignoring their request.  The default/automatic setting will 
detect the number of cpus and behave correctly.



+   nr_threads = git_config_get_index_threads();
+   else
+   nr_threads = 1;
+
if (nr_threads != 1) {
int ieot_blocks, cpus;
  
@@ -2787,7 +2779,6 @@ static int do_write_index(struct index_state *istate, struct tempfile 

[PATCH v2 1/1] remote-curl.c: xcurl_off_t is not portable (on 32 bit platfoms)

2018-10-29 Thread tboegi
From: Torsten Bögershausen 

When  setting
DEVELOPER = 1
DEVOPTS = extra-all

"gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with
"comparison is always false due to limited range of data type"
"[-Werror=type-limits]"

It turns out that the function xcurl_off_t() has 2 flavours:
- It gives a warning 32 bit systems, like Linux
- It takes the signed ssize_t as a paramter, but the only caller is using
  a size_t (which is typically unsigned these days)

The original motivation of this function is to make sure that sizes > 2GiB
are handled correctly. The curl documentation says:
"For any given platform/compiler curl_off_t must be typedef'ed to a 64-bit
 wide signed integral data type"
On a 32 bit system "size_t" can be promoted into a 64 bit signed value
without loss of data, and therefore we may see the
"comparison is always false" warning.
On a 64 bit system it may happen, at least in theory, that size_t is > 2^63,
and then the promotion from an unsigned "size_t" into a signed "curl_off_t"
may be a problem.

One solution to suppress a possible compiler warning could be to remove
the function xcurl_off_t().

However, to be on the very safe side, we keep it and improve it:
- The len parameter is changed from ssize_t to size_t
- A temporally variable "size" is used, promoted int uintmax_t and the comopared
  with "maximum_signed_value_of_type(curl_off_t)".
  Thanks to Junio C Hamano for this hint.

Signed-off-by: Torsten Bögershausen 
---
 remote-curl.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/remote-curl.c b/remote-curl.c
index 762a55a75f..1220dffcdc 100644
--- a/remote-curl.c
+++ b/remote-curl.c
@@ -617,10 +617,11 @@ static int probe_rpc(struct rpc_state *rpc, struct 
slot_results *results)
return err;
 }
 
-static curl_off_t xcurl_off_t(ssize_t len) {
-   if (len > maximum_signed_value_of_type(curl_off_t))
+static curl_off_t xcurl_off_t(size_t len) {
+   uintmax_t size = len;
+   if (size > maximum_signed_value_of_type(curl_off_t))
die("cannot handle pushes this big");
-   return (curl_off_t) len;
+   return (curl_off_t)size;
 }
 
 static int post_rpc(struct rpc_state *rpc)
-- 
2.19.0.271.gfe8321ec05



[RFC] Generation Number v2

2018-10-29 Thread Derrick Stolee

We've discussed in several places how to improve upon generation
numbers. This RFC is a report based on my investigation into a
few new options, and how they compare for Git's purposes on
several existing open-source repos.

You can find this report and the associated test scripts at
https://github.com/derrickstolee/gen-test.

I have some more work to do in order to make this fully
reproducible (including a single script to clone all of the
repos I used, compute the commit-graphs using the different
index versions, and run the tests).

TL;DR: I recommend we pursue one of the following two options:

1. Maximum Generation Number.
2. Corrected Commit Date.

Each of these perform well, but have different implementation
drawbacks. I'd like to hear what everyone thinks about those
drawbacks.

Please also let me know about any additional tests that I could
run. Now that I've got a lot of test scripts built up, I can re-run
the test suite pretty quickly.

Thanks,
-Stolee



Generation Number Performance Test
==

Git uses the commit history to answer many basic questions, such as
computing a merge-base. Some of these algorithms can benefit from a
_reachability index_, which is a condition that allows us to answer
"commit A cannot reach commit B" in some cases. These require pre-
computing some values that we can load and use at run-time. Git
already has a notion of _generation number_, stores it in the commit-
graph file, and uses it in several reachability algorithms.

You can read more about generation numbers and how to use them in
algorithms in [this blog 
post](https://blogs.msdn.microsoft.com/devops/2018/07/09/supercharging-the-git-commit-graph-iii-generations/).


However, [generation numbers do not always improve our 
algorithms](https://public-inbox.org/git/pull.28.git.gitgitgad...@gmail.com/T/#u).

Specifically, some algorithms in Git already use commit date as a
heuristic reachability index. This has some problems, though, since
commit date can be incorrect for several reasons (clock skew between
machines, purposefully setting GIT_COMMIT_DATE to the author date, etc.).
However, the speed boost by using commit date as a cutoff was so
important in these cases, that the potential for incorrect answers was
considered acceptable.

When these algorithms were converted to use generation numbers, we
_added_ the extra constraint that the algorithms are _never incorrect_.
Unfortunately, this led to some cases where performance was worse than
before. There are known cases where `git merge-base A B` or
`git log --topo-order A..B` are worse when using generation numbers
than when using commit dates.

This report investigates four replacements for generation numbers, and
compares the number of walked commits to the existing algorithms (both
using generation numbers and not using them at all). We can use this
data to make decisions for the future of the feature.

### Implementation

The reachability indexes below are implemented in
[the `reach-perf` branch in 
https://github.com/derrickstolee/git](https://github.com/derrickstolee/git/tree/reach-perf).

This implementation is in a very rough condition, as it is intended to
be a prototype, not production-quality.

Using this implementation, you can compute commit-graph files for the
specified reachability index using `git commit-graph write --reachable 
--version=`.

The `git` client will read the version from the file, so our tests
store each version as `.git/objects/info/commit-graph.` and copy
the necessary file to `.git/objects/info/commit-graph` before testing.

The algorithms count the number of commits walked, as to avoid the
noise that would occur with time-based performance reporting. We use
the (in progress) trace2 library for this. To find the values reported,
use the `GIT_TR2_PERFORMANCE` environment variable.

To ignore reachability indexes entirely and use the old algorithms
(reported here as "OLD" values) use the environment variable
`GIT_TEST_OLD_PAINT=1`.


Reachability Index Versions
---

**V0: (Minimum) Generation Number.**
The _generation number_ of a commit is exactly one more than the maximum
generation number of a parent (by convention, the generation number of a
commit with no parents is 1). This is the definition currently used by
Git (2.19.0 and later). Given two commits A and B, we can then use the
following reachability index:

    If gen(A) < gen(B), then A cannot reach B.

_Commentary:_ One issue with generation numbers is that some algorithms
in Git use commit date as a heuristic, and allow incorrect answers if
there is enough clock skew. Using that heuristic, the algorithms can walk
fewer commits than the algorithms using generation number. The other
reachability indexes introduced below attempt to fix this problem.

**V1: (Epoch, Date) Pairs.**
For each commit, store a pair of values called the _epoch_ and the _date_.
The date is the normal commit date. The _epoch_ of a commit is 

Re: [PATCH 07/10] preload-index.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Ben Peart




On 10/27/2018 3:10 AM, Nguyễn Thái Ngọc Duy wrote:

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
  preload-index.c | 15 ++-
  1 file changed, 2 insertions(+), 13 deletions(-)

diff --git a/preload-index.c b/preload-index.c
index 9e7152ab14..0e24886aca 100644
--- a/preload-index.c
+++ b/preload-index.c
@@ -7,17 +7,7 @@
  #include "fsmonitor.h"
  #include "config.h"
  #include "progress.h"
-
-#ifdef NO_PTHREADS
-static void preload_index(struct index_state *index,
- const struct pathspec *pathspec,
- unsigned int refresh_flags)
-{
-   ; /* nothing */
-}
-#else
-
-#include 
+#include "thread-utils.h"
  
  /*

   * Mostly randomly chosen maximum thread counts: we
@@ -108,7 +98,7 @@ static void preload_index(struct index_state *index,
struct thread_data data[MAX_PARALLEL];
struct progress_data pd;
  
-	if (!core_preload_index)

+   if (!HAVE_THREADS || !core_preload_index)


I also "hoped in general that the non-threaded code paths could mostly 
just collapse into the same as the "threads == 1" cases, and we wouldn't 
have to "are threads even supported" in a lot of places."


In this case, if we cap the threads to online_cpus() the later test for 
'if (threads < 2)' would do that.  I haven't measured this specific case 
but in the other cases I have measured, having more threads than cpus 
did not result in a performance win.



return;
  
  	threads = index->cache_nr / THREAD_COST;

@@ -151,7 +141,6 @@ static void preload_index(struct index_state *index,
  
  	trace_performance_leave("preload index");

  }
-#endif
  
  int read_index_preload(struct index_state *index,

   const struct pathspec *pathspec,



Re: [PATCH/WIP 00/19] Kill the_index, final part

2018-10-29 Thread Duy Nguyen
On Fri, Oct 19, 2018 at 7:38 PM Stefan Beller  wrote:
>
> On Fri, Oct 19, 2018 at 7:52 AM Nguyễn Thái Ngọc Duy  
> wrote:
> >
> > Stefan I see you start removing more references of the_repository in
> > another series (haven't really looked at it yet) so this is just heads
> > up so we could coordinate if needed. Still WIP, not really ready for
> > review yet.
>
> I'll take a brief look anyway, otherwise you wouldn't have put in on
> a mailing list :-P

Thanks. Just fyi. Since there are still plenty conflicts when merging
this with 'pu', I'll just wait until things settle a bit in 'master'
before resending.
-- 
Duy


Re: [PATCH] Prevent warning

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 03:39:30PM +, Pete wrote:

> Prevent the following warning in the web server error log:
> gitweb.cgi: Odd number of elements in anonymous hash at 
> /usr/share/gitweb/gitweb.cgi line 3305
> [...]
> - $remotes{$remote} ||= { 'heads' => () };
> + $remotes{$remote} ||= { 'heads' => [] };

This will indeed silence the warning, but the end result is different.
In the original, 'heads' would be set to undef, but now it is set to an
empty array reference.

Do the later users of the struct care about the distinction?

Grepping around, I see two callers that look at "heads":

 1. fill_remote_heads() assigns to it unconditionally, overwriting
whatever was there before.

 2. git_remote_block() dereferences it as an array reference, and so
probably would have complained if we ever got there with the undef.
But we never would, because every call to git_remote_block() is
preceded by a call to fill_remote_heads().

So nobody actually ever looks at the value we set here. Is an empty
array reference better than undef as a default value? I'd actually argue
no. If we add new code that does not call fill_remote_heads(), it's
probably better for it to be undef to distinguish it from the empty list
(and possibly raise an error that would tell us that we forgot to call
fill_remote_heads().

So I'd say that:

  $remotes{$remote} ||= { heads => undef };

is a preferable conversion. Or simply deleting the line entirely, which
has roughly the same effect.

-Peff


Re: [PATCH 06/10] grep: remove #ifdef NO_PTHREADS

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 05:01:41PM +0100, Duy Nguyen wrote:

> On Mon, Oct 29, 2018 at 3:25 PM Jeff King  wrote:
> > But if the problem is simply that we are not quite there yet in the grep
> > code, I am OK with taking this as the first pass, and knowing that there
> > is more cleanup to be done later (though that sort of thing is IMHO very
> > useful in a commit message).
> 
> Since the problem pops up now, I'm ok with updating/cleaning up all
> this in this series, unless there's benefits in keeping this series
> simple and merging it early (probably not?)

Mostly I did not want to tax you. I would rather have this series and
some cleanup left over, than to not have anything. But if you are
interested in moving it further, I will not say no. :)

-Peff


Re: t7405.17 breakage vanishes with GETTEXT_POISON=1

2018-10-29 Thread Duy Nguyen
On Sun, Oct 28, 2018 at 1:15 PM SZEDER Gábor  wrote:
>
> On Sun, Oct 28, 2018 at 06:41:06AM +0100, Duy Nguyen wrote:
> > Something fishy is going on but I don't think I'll spend time hunting
> > it down so I post here in case somebody else is interested. It might
> > also indicate a problem with poison gettext, not the test case too.
>
> I haven't actually run the test under GETTEXT_POISON, but I stongly
> suspect it's the test, or more accurately the helper function
> 'test_i18ngrep'.
>
> The test in question runs
>
>   test_i18ngrep ! "refusing to lose untracked file at" err
>
> which fails in normal test runs, because 'grep' does find the
> undesired string; that's the known breakage.  Under GETTEXT_POISION,
> however, 'test_i18ngrep' always succeeds because of this condition:
>
>   if test -n "$GETTEXT_POISON"
>   then
>   # pretend success
>   return 0
>   fi
>
> and then in turn the whole test succeeds.

And this is something your poison-with-scrambling code does help ;-) ;-)
-- 
Duy


Re: [PATCH 08/12] remote.c: mark messages for translation

2018-10-29 Thread Duy Nguyen
On Mon, Oct 29, 2018 at 8:56 AM Junio C Hamano  wrote:
> There are other series in flight that touch the same area of code
> and in different ways, causing unnecessary conflicts, which does not
> help us either X-<.

I will of course fix all other comments, but I can hold this off if
it's causing too much trouble. I'm in no hurry to get anything merged
fast.
-- 
Duy


Re: [PATCH 12/12] fsck: mark strings for translation

2018-10-29 Thread Duy Nguyen
On Mon, Oct 29, 2018 at 3:09 PM Junio C Hamano  wrote:
>
> SZEDER Gábor  writes:
>
> >> -fprintf(stderr, "%s in %s %s: %s\n",
> >> -msg_type, printable_type(obj), describe_object(obj), err);
> >> +fprintf_ln(stderr, _("%s in %s %s: %s"),
> >
> > Are the (f)printf() -> (f)printf_ln() changes all over
> > 'builtin/fsck.c' really necessary to mark strings for translation?
>
> It is beyond absolute minimum but I saw it argued here that this
> makes it easier to manage the .po and .pot files if your message
> strings do not end with LF, a you are much less likely to _add_
> unneeded LF to the translated string than _lose_ LF at the end of
> translated string.

Especially when \n plays an important role and we don't trust
translators to keep it [1] [2]. It's probably a too defensive stance
and often does not apply, so nowadays I do it just to keep a
consistent pattern in the code.

[1] https://public-inbox.org/git/20120308220131.GA10122@burratino/#t
[2] but then translators can crash programs anyway (e.g. changing %d
to %s...) we just trust gettext tools to catch problems early.
-- 
Duy


Re: [PATCH 06/10] grep: remove #ifdef NO_PTHREADS

2018-10-29 Thread Duy Nguyen
On Mon, Oct 29, 2018 at 3:25 PM Jeff King  wrote:
> But if the problem is simply that we are not quite there yet in the grep
> code, I am OK with taking this as the first pass, and knowing that there
> is more cleanup to be done later (though that sort of thing is IMHO very
> useful in a commit message).

Since the problem pops up now, I'm ok with updating/cleaning up all
this in this series, unless there's benefits in keeping this series
simple and merging it early (probably not?)
-- 
Duy


Re: [PATCH] Prevent warning

2018-10-29 Thread Torsten Bögershausen
Thanks for the patch.
Could you please sign it off ?
The other remark would be if the header line could be written longer than
just
"Prevent warning".
to give people digging into the Git history an initial information,
where the warning occured and from which module it was caused.
May be something like this as a head line ?

gitweb.perl: Fix Odd number of elements warning



On Mon, Oct 29, 2018 at 03:39:30PM +, Pete wrote:
> Prevent the following warning in the web server error log:
> gitweb.cgi: Odd number of elements in anonymous hash at 
> /usr/share/gitweb/gitweb.cgi line 3305
> ---
>  gitweb/gitweb.perl | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
> index 2594a4badb3d7..200647b683225 100755
> --- a/gitweb/gitweb.perl
> +++ b/gitweb/gitweb.perl
> @@ -3302,7 +3302,7 @@ sub git_get_remotes_list {
>   next if $wanted and not $remote eq $wanted;
>   my ($url, $key) = ($1, $2);
>  
> - $remotes{$remote} ||= { 'heads' => () };
> + $remotes{$remote} ||= { 'heads' => [] };
>   $remotes{$remote}{$key} = $url;
>   }
>   close $fd or return;
> 
> --
> https://github.com/git/git/pull/548


Re: [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode

2018-10-29 Thread Duy Nguyen
On Mon, Oct 29, 2018 at 2:24 PM Ævar Arnfjörð Bjarmason
 wrote:
> This patch looks good to me, but I think it's a bad state of affairs to
> keep changing these semantics and not having something like a
> "gitwildmatch" doc were we document this matching syntax.

While we don't have a separate document for it, the behavior _is_
documented even if perhaps it wasn't as clear. There were even tests
for this corner case.

> Do you have any thoughts on how to proceed with getting this documented
> / into some stable state where we can specify it?

wildmatch has been used in git for a few years if I remember correctly
so to me it is stable. Granted there are corner cases like this, but I
can't prevent all bugs (especially this one which is more like design
mistake than bug per se). You are welcome to refactor gitignore.txt
and add gitwildmatch.txt.
-- 
Duy


Re: [PATCH] test-lib: introduce the '-V' short option for '--verbose-log'

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 02:53:35PM +0100, SZEDER Gábor wrote:

> > Interesting. I'm not opposed to something like this, but I added
> > "--verbose-log" specifically for scripted cases, like running an
> > unattended "prove" that needs to preserve stdout. When running
> > individual tests, I'd just use "-v" itself, and possibly redirect the
> > output.
> > 
> > For my curiosity, can you describe your use case a bit more?
> 
> Even when I run individual test scripts by hand, I prefer to have a
> file catching all output of the test, because I don't like it when the
> test output floods my terminal (especially with '-x'), and because the
> file is searchable but the terminal isn't.  And that's exactly what
> '--verbose-log' does.
> 
> Redirecting the '-v' output (i.e. stdout) alone is insufficient,
> because any error messages within the tests and the '-x' trace go to
> stderr, so they still end up on the terminal.  Therefore I would have
> to remember to redirect stderr every time as well.
> 
> I find it's much easier to just always use '--verbose-log'... except
> for the length of the option, that is, hence this patch.

OK, fair enough. Maybe I should start using "-V" too, then. ;)

(I find myself most often coupling "-v" with "-i" to stop at the failure
and just read what's left on the screen).

-Peff


[PATCH] Prevent warning

2018-10-29 Thread Pete
Prevent the following warning in the web server error log:
gitweb.cgi: Odd number of elements in anonymous hash at 
/usr/share/gitweb/gitweb.cgi line 3305
---
 gitweb/gitweb.perl | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl
index 2594a4badb3d7..200647b683225 100755
--- a/gitweb/gitweb.perl
+++ b/gitweb/gitweb.perl
@@ -3302,7 +3302,7 @@ sub git_get_remotes_list {
next if $wanted and not $remote eq $wanted;
my ($url, $key) = ($1, $2);
 
-   $remotes{$remote} ||= { 'heads' => () };
+   $remotes{$remote} ||= { 'heads' => [] };
$remotes{$remote}{$key} = $url;
}
close $fd or return;

--
https://github.com/git/git/pull/548


Re: [PATCH v2 00/16] sequencer: refactor functions working on a todo_list

2018-10-29 Thread Alban Gruin
Hi Junio,

Le 29/10/2018 à 04:05, Junio C Hamano a écrit :
> Alban Gruin  writes:
[…]
>> It is based onto ag/rebase-i-in-c (34b4731, "rebase -i: move
>> rebase--helper modes to rebase--interactive").
> 
> As there are quite a lot of fixes to the sequencer machinery since
> that topic forked from the mainline.  For example, [06/16] has
> unpleasant merge conflicts with 1ace63bc ("rebase --exec: make it
> work with --rebase-merges", 2018-08-09) that has been in master for
> the past couple of months.  IOW, the tip of ag/rebase-i-in-c is a
> bit too old a base to work on by now.  
> 
> I think I queued the previous round on the result of merging
> ag/rebase-i-in-c into master, i.e. 61dc7b24 ("Merge branch
> 'ag/rebase-i-in-c' into ag/sequencer-reduce-rewriting-todo",
> 2018-10-09).  That may be a more reasonable place to start this
> update on.
> 

Right.

My next iteration will be based on 61dc7b24, I just rebased my branch
onto it.



Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote:

> > Of course any cache raises questions of cache invalidation, but I think
> > we've already dealt with that for this case. When we use
> > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of
> > accuracy/speed tradeoff (which does a similar caching thing with
> > packfiles).
> >
> > So putting that all together, could we have something like:
> 
> I think this conceptually is a vast improvement relative to
> ".cloning" optimization.  Obviously this does not have the huge
> downside of the other approach that turns the collision detection
> completely off.
> 
> A real question is how much performance gain, relative to ".cloning"
> thing, this approach gives us.  If it gives us 80% or more of the
> gain compared to doing no checking, I'd say we have a clear winner.

My test runs showed it improving index-pack by about 3%, versus 4% for
no collision checking at all. But there was easily 1% of noise. And much
more importantly, that was on a Linux system on ext4, where stat is
fast. I'd be much more curious to hear timing results from people on
macOS or Windows, or from Geert's original NFS case.

-Peff


Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Jeff King
On Sat, Oct 27, 2018 at 04:04:32PM +0200, Duy Nguyen wrote:

> > Of course any cache raises questions of cache invalidation, but I think
> > we've already dealt with that for this case. When we use
> > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of
> > accuracy/speed tradeoff (which does a similar caching thing with
> > packfiles).
> 
> We don't care about a separate process adding more loose objects while
> index-pack is running, do we? I'm guessing we don't but just to double
> check...

Right. That's basically what QUICK means: don't bother re-examining the
repository to handle simultaneous writes, even if it means saying an
object is not there when it has recently appeared.

So far it has only applied to packs, but this is really just the same
concept (just as we would not notice a new pack arriving, we will not
notice a new loose object arriving).

> > +/* probably should be configurable? */
> > +#define LOOSE_OBJECT_CACHE_MAX 65536
> 
> Yes, perhaps with gc.auto config value (multiplied by 256) as the cut
> point. If it's too big maybe just go with a bloom filter. For this
> particular case we expect like 99% of calls to miss.

I wonder, though, if we should have a maximum at all. The existing
examples I've found of this technique are:

  - mark_complete_and_common_ref(), which is trying to cover this exact
case. It looks like it avoids adding more objects than there are
refs, so I guess it actually has a pretty small cutoff.

  - find_short_object_filename(), which does the same thing with no
limits. And there if we _do_ have a lot of objects, we'd still
prefer to keep the cache.

And really, this list is pretty much equivalent to looking at a pack
.idx. The only difference is that one is mmap'd, but here we'd use the
heap. So it's not shared between processes, but otherwise the working
set size is similar.

-Peff


Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 11:04:53AM -0400, Jeff King wrote:

> > Even if someone wants to make the argument that this is behavior that we
> > absolutely *MUST* keep and not make configurable, there's still much
> > smarter ways to do it.
> 
> I don't have any real object to a configuration like this, if people
> want to experiment with it. But in contrast, the patch I showed earlier:
> 
>   - is safe enough to just turn on all the time, without the user having
> to configure anything nor make a safety tradeoff
> 
>   - speeds up all the other spots that use OBJECT_INFO_QUICK (like
> fetch's tag-following, or what appears to be the exact same
> optimization done manually inside mark_complete_and_common-ref()).

One thing I forgot to add. We're focusing here on the case where the
objects _aren't_ present, and we're primarily trying to get rid of the
stat call.

But when we actually do see a duplicate, we open up the object and
actually compare the bytes. Eliminating the collision check entirely
would save that work, which is obviously something that can't be
improved by just caching the existence of loose objects.

I'm not sure how often that case happens in a normal repository. We see
it a fair on GitHub servers because of the way we use alternates (i.e.,
we often already have the object you pushed up because it's present in
another fork and available via objects/info/alternates).

-Peff


Re: [RFC PATCH] index-pack: improve performance on NFS

2018-10-29 Thread Jeff King
On Sat, Oct 27, 2018 at 01:22:16PM +0200, Ævar Arnfjörð Bjarmason wrote:

> > Taking one step back, the root problem in this thread is that stat() on
> > non-existing files is slow (which makes has_sha1_file slow).
> >
> > One solution there is to cache the results of looking in .git/objects
> > (or any alternate object store) for loose files. And indeed, this whole
> > scheme is just a specialized form of that: it's a flag to say "hey, we
> > do not have any objects yet, so do not bother looking".
> >
> > Could we implement that in a more direct and central way? And could we
> > implement it in a way that catches more cases? E.g., if I have _one_
> > object, that defeats this specialized optimization, but it is probably
> > still beneficial to cache that knowledge (and the reasonable cutoff is
> > probably not 1, but some value of N loose objects).
> [...]
> 
> The assumption with making it exactly 0 objects and not any value of >0
> is that we can safely assume that a "clone" or initial "fetch"[1] is
> special in ways that a clone isn't. I.e. we're starting out with nothing
> and doing the initial population, that's probably not as true in an
> existing repo that's getting concurrent fetches, commits, rebases etc.

I assume you mean s/that a clone isn't/that a fetch isn't/.

I agree there are cases where you might be able to go further if you
assume a full "0". But my point is that "clone" is an ambiguous concept,
and it doesn't map completely to what's actually slow here. So if you
only look at "are we cloning", then:

  - you have a bunch of cases which are "clones", but aren't actually
starting from scratch

  - you get zero benefit in the non-clone cases, when we could be
scaling the benefit smoothly

> But in the spirit of taking a step back, maybe we should take two steps
> back and consider why we're doing this at all.

OK, I think it's worth discussing, and I'll do that below. But first I
want to say...

> Three of our tests fail if we compile git like this, and cloning is much
> faster (especially on NFS):
> 
> diff --git a/builtin/index-pack.c b/builtin/index-pack.c
> index 2004e25da2..0c2d008ee0 100644
> --- a/builtin/index-pack.c
> +++ b/builtin/index-pack.c
> @@ -796,3 +796,3 @@ static void sha1_object(const void *data, struct 
> object_entry *obj_entry,
> 
> -   if (startup_info->have_repository) {
> +   if (0) {
> read_lock();
> 
> Even on a local disk I'm doing 262759 lstat() calls cloning git.git and
> spending 5% of my time on that.

With the caching patch I posted earlier, I see roughly the same speedup
on an index-pack of git.git as I do with disabling the collision check
entirely (I did see about a 1% difference in favor of what you wrote
above, which was within the noise, but may well be valid due to slightly
reduced lock contention).

TBH I'm not sure if any of this is actually worth caring about on a
normal Linux system, though. There stat() is fast. It might be much more
interesting on macOS or Windows, or on a Linux system on NFS.

> But why do we have this in the first place? It's because of 8685da4256
> ("don't ever allow SHA1 collisions to exist by fetching a pack",
> 2007-03-20) and your 51054177b3 ("index-pack: detect local corruption in
> collision check", 2017-04-01).
> 
> I.e. we are worried about (and those tests check for):
> 
>  a) A malicious user trying to give us repository where they have
> created an object with the same SHA-1 that's different, as in the
> SHAttered attack.
> 
> I remember (but have not dug up) an old E-Mail from Linus saying
> that this was an important security aspect of git, i.e. even if
> SHA-1 was broken you couldn't easily propagate bad objects.

Yeah, especially given recent advances in SHA-1 attacks, I'm not super
comfortable with the idea of disabling the duplicate-object check at
this point.

>  b) Cases where we've ended up with different content for a SHA-1 due to
> e.g. a local FS corruption. Which is the subject of your commit in
> 2017.

Sort of. We actually detected it before my patch, but we just gave a
really crappy error message. ;)

>  c) Are there cases where fetch.fsckObjects is off and we just flip a
> bit on the wire and don't notice? I think not because we always
> check the pack checksum (don't we), but I'm not 100% sure.

We'd detect bit-blips on the wire due to the pack checksum. But what's
more interesting are bit-flips on the disk of the sender, which would
then put the bogus data into the pack checksum they generate on the fly.

However, we do detect such a bit-flip, even without fsckObjects, because
the sender does not tell us the expected sha-1 of each object. It gives
us a stream of objects, and the receiver computes the sha-1's
themselves. So a bit flip manifests in the connectivity-check when we
say "hey, the other side should have sent us object X but did not" (we
do not say "gee, what is this object Y they 

Re: [PATCH] test-lib: introduce the '-V' short option for '--verbose-log'

2018-10-29 Thread SZEDER Gábor
On Mon, Oct 29, 2018 at 10:21:08AM -0400, Jeff King wrote:
> On Mon, Oct 29, 2018 at 01:13:59PM +0100, SZEDER Gábor wrote:
> 
> > '--verbose-log' is one of the most useful and thus most frequently
> > used test options, but due to its length it's a pain to type on the
> > command line.
> > 
> > Let's introduce the corresponding short option '-V' to save some
> > keystrokes.
> 
> Interesting. I'm not opposed to something like this, but I added
> "--verbose-log" specifically for scripted cases, like running an
> unattended "prove" that needs to preserve stdout. When running
> individual tests, I'd just use "-v" itself, and possibly redirect the
> output.
> 
> For my curiosity, can you describe your use case a bit more?

Even when I run individual test scripts by hand, I prefer to have a
file catching all output of the test, because I don't like it when the
test output floods my terminal (especially with '-x'), and because the
file is searchable but the terminal isn't.  And that's exactly what
'--verbose-log' does.

Redirecting the '-v' output (i.e. stdout) alone is insufficient,
because any error messages within the tests and the '-x' trace go to
stderr, so they still end up on the terminal.  Therefore I would have
to remember to redirect stderr every time as well.

I find it's much easier to just always use '--verbose-log'... except
for the length of the option, that is, hence this patch.




Re: [PATCH v2 10/10] read-cache.c: initialize copy_len to shut up gcc 8

2018-10-29 Thread Jeff King
On Sat, Oct 27, 2018 at 07:30:08PM +0200, Nguyễn Thái Ngọc Duy wrote:

> It was reported that when building with NO_PTHREADS=1,
> -Wmaybe-uninitialized is triggered. Just initialize the variable from
> the beginning to shut the compiler up (because this warning is enabled
> in config.dev)
> 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  read-cache.c | 4 +---
>  1 file changed, 1 insertion(+), 3 deletions(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index ba870bc3fd..4307b9a7bf 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1746,7 +1746,7 @@ static struct cache_entry *create_from_disk(struct 
> mem_pool *ce_mem_pool,
>   size_t len;
>   const char *name;
>   unsigned int flags;
> - size_t copy_len;
> + size_t copy_len = 0;
>   /*
>* Adjacent cache entries tend to share the leading paths, so it makes
>* sense to only store the differences in later entries.  In the v4
> @@ -1786,8 +1786,6 @@ static struct cache_entry *create_from_disk(struct 
> mem_pool *ce_mem_pool,
>   die(_("malformed name field in the index, near 
> path '%s'"),
>   previous_ce->name);
>   copy_len = previous_len - strip_len;
> - } else {
> - copy_len = 0;
>   }
>   name = (const char *)cp;
>   }

Yes, this silences the compiler warning I saw (and is exactly the same
patch I wrote to get past it the other day ;) ).

-Peff


Re: [PATCH v2 08/10] read-cache.c: remove #ifdef NO_PTHREADS

2018-10-29 Thread Jeff King
On Sat, Oct 27, 2018 at 07:30:06PM +0200, Nguyễn Thái Ngọc Duy wrote:

> -#ifndef NO_PTHREADS
> - nr_threads = git_config_get_index_threads();
> + if (HAVE_THREADS) {
> + nr_threads = git_config_get_index_threads();
>  
> - /* TODO: does creating more threads than cores help? */
> - if (!nr_threads) {
> - nr_threads = istate->cache_nr / THREAD_COST;
> - cpus = online_cpus();
> - if (nr_threads > cpus)
> - nr_threads = cpus;
> + /* TODO: does creating more threads than cores help? */
> + if (!nr_threads) {
> + nr_threads = istate->cache_nr / THREAD_COST;
> + cpus = online_cpus();
> + if (nr_threads > cpus)
> + nr_threads = cpus;
> + }
> + } else {
> + nr_threads = 1;
>   }

I'd have thought we could just rely on online_cpus() returning 1 here to
avoid having to ask "do we even have thread support?". But I guess that
TODO comment implies that we might one day two 2 threads on a single
CPU.

-Peff


Re: [PATCH 06/10] grep: remove #ifdef NO_PTHREADS

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 11:16:39AM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Cases like this are kind of weird. I'd expect to see wait_all() return
> > immediately when !HAVE_THREADS. But we don't need to, because later we
> > do this:
> >
> >> -  if (num_threads)
> >> +  if (HAVE_THREADS && num_threads)
> >>hit |= wait_all();
> >
> > Which I think works, but it feels like we're introducing some subtle
> > dependencies about which functions get called in which cases. I'd hoped
> > in general that the non-threaded code paths could mostly just collapse
> > down into the same as the "threads == 1" cases, and we wouldn't have to
> > ask "are threads even supported" in a lot of places.
> 
> True, but the way "grep" code works with threads is already strange,
> and I suspect that the subtle strangeness you feel mostly comes from
> that.  The single threaded code signaled "hit" with return value of
> individual functions like grep_file(), but the meaning of return
> value from them is changed to "does not matter--we do not have
> meaningful result yet at this point" when threading is used.
> 
> In the new world order where "threading is the norm but
> single-threaded is still supported", I wonder if we would want to
> drive the single threaded case the same way with the add_work(run)
> interface and return the "did we see hits?" etc. from the same (or
> at lesat "more similar than today's") interface, so that the flow of
> data smells the same in both cases.

Right, your second paragraph here is a better statement of what I was
trying to get at. ;)

But if the problem is simply that we are not quite there yet in the grep
code, I am OK with taking this as the first pass, and knowing that there
is more cleanup to be done later (though that sort of thing is IMHO very
useful in a commit message).

-Peff


Re: [PATCH] test-lib: introduce the '-V' short option for '--verbose-log'

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 01:13:59PM +0100, SZEDER Gábor wrote:

> '--verbose-log' is one of the most useful and thus most frequently
> used test options, but due to its length it's a pain to type on the
> command line.
> 
> Let's introduce the corresponding short option '-V' to save some
> keystrokes.

Interesting. I'm not opposed to something like this, but I added
"--verbose-log" specifically for scripted cases, like running an
unattended "prove" that needs to preserve stdout. When running
individual tests, I'd just use "-v" itself, and possibly redirect the
output.

For my curiosity, can you describe your use case a bit more?

>  t/README  | 1 +
>  t/test-lib.sh | 4 ++--
>  2 files changed, 3 insertions(+), 2 deletions(-)

The patch itself looks good to me.

-Peff


Re: Lost changes after merge

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 09:49:20AM +0100, Gray King wrote:

> Hello,
> 
>   I have a very strange issue described below:
> 
> * Here is the tree before I merge via `git log --format="%h %p %d" -n
> 20 --all --graph`:
> 
> https://upaste.de/9Pe

FWIW, neither this nor the other paste link in your email seem to work
for me (which makes it hard to comment on the rest of the email).

-Peff


Re: [PATCH] alias: detect loops in mixed execution mode

2018-10-29 Thread Jeff King
On Mon, Oct 29, 2018 at 12:44:58PM +0900, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Hmph. So I was speaking before purely hypothetically, but now that your
> > patch is in 'next', it is part of my daily build. And indeed, I hit a
> > false positive within 5 minutes of building it. ;)
> 
> Sounds like somebody is having not-so-fun-a-time having "I told you
> so" moment.  The 'dotgit' thing already feels bit convoluted but I
> would say that it is still within the realm of reasonable workflow
> elements.

To be clear, the "dotgit" thing _is_ weird and convoluted. And I imagine
that I push Git more than 99% of our users would. But I also won't be
surprised if somebody else has something similarly disgusting in the
wild. :)

TBH, I'm still not really sold on the idea of doing loop detection at
all in this case. But I can live with it if others feel strongly. What
makes the current patch so bad is that there's no escape hatch (i.e.,
even a depth counter with a default of "1" would have given me something
I could bump).

-Peff


Re: [PATCH] pretty: Add %(trailer:X) to display single trailer

2018-10-29 Thread Jeff King
On Sun, Oct 28, 2018 at 01:50:25PM +0100, Anders Waldenborg wrote:

> This new format placeholder allows displaying only a single
> trailer. The formatting done is similar to what is done for
> --decorate/%d using parentheses and comma separation.

Displaying a single trailer makes sense as a goal. It was one of the
things I considered when working on %(trailers), actually, but I ended
up needing something a bit more flexible (hence the ability to dump the
trailers in a parse-able format, where I feed them to another script).
But your ticket example makes sense for just ordinary log displays.

Junio's review already covered my biggest question, which is why not
something like "%(trailers:key=ticket)". And likewise making things like
comma-separation options.

But my second question is whether we want to provide something more
flexible than the always-parentheses that "%d" provides. That has been a
problem in the past when people want to format the decoration in some
other way.

We have formatting magic for "if this thing is non-empty, then show this
prefix" in the for-each-ref formatter, but I'm not sure that we do in
the commit pretty-printer beyond "% ". I wonder if we could/should add a
a placeholder for "if this thing is non-empty, put in a space and
enclose it in parentheses".

> diff --git a/Documentation/pretty-formats.txt 
> b/Documentation/pretty-formats.txt
> index 6109ef09aa..a46d0c0717 100644
> --- a/Documentation/pretty-formats.txt
> +++ b/Documentation/pretty-formats.txt
> @@ -211,6 +211,10 @@ endif::git-rev-list[]
>If the `unfold` option is given, behave as if interpret-trailer's
>`--unfold` option was given.  E.g., `%(trailers:only,unfold)` to do
>both.
> +- %(trailer:): display the specified trailer in parentheses (like
> +  %d does for refnames). If there are multiple entries of that trailer
> +  they are shown comma separated. If there are no matching trailers
> +  nothing is displayed.

It might be worth specifying how this match is done. I'm thinking
specifically of whether it's case-sensitive, but I wonder if there
should be any allowance for other normalization (e.g., allowing a regex
to match "coauthored-by" and "co-authored-by" or something).

-Peff


Re: [PATCH 12/12] fsck: mark strings for translation

2018-10-29 Thread Junio C Hamano
SZEDER Gábor  writes:

>> -fprintf(stderr, "%s in %s %s: %s\n",
>> -msg_type, printable_type(obj), describe_object(obj), err);
>> +fprintf_ln(stderr, _("%s in %s %s: %s"),
>
> Are the (f)printf() -> (f)printf_ln() changes all over
> 'builtin/fsck.c' really necessary to mark strings for translation?

It is beyond absolute minimum but I saw it argued here that this
makes it easier to manage the .po and .pot files if your message
strings do not end with LF, a you are much less likely to _add_
unneeded LF to the translated string than _lose_ LF at the end of
translated string.


Re: [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode

2018-10-29 Thread Ævar Arnfjörð Bjarmason


On Sat, Oct 27 2018, Nguyễn Thái Ngọc Duy wrote:

> In WM_PATHNAME mode (or FNM_PATHNAME), '*' does not match '/' and '**'
> can but only in three patterns:
>
> - '**/' matches zero or more leading directories
> - '/**/' matches zero or more directories in between
> - '/**' matches zero or more trailing directories/files
>
> When '**' is present but not in one of these patterns, the current
> behavior is consider the pattern invalid and stop matching. In other
> words, 'foo**bar' never matches anything, whatever you throw at it.
>
> This behavior is arguably a bit confusing partly because we can't
> really tell the user their pattern is invalid so that they can fix
> it. So instead, tolerate it and make '**' act like two regular '*'s
> (which is essentially the same as a single asterisk). This behavior
> seems more predictable.
>
> Noticed-by: dana 
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  Documentation/gitignore.txt | 3 ++-
>  t/t3070-wildmatch.sh| 4 ++--
>  wildmatch.c | 4 ++--
>  wildmatch.h | 1 -
>  4 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt
> index d107daaffd..1c94f08ff4 100644
> --- a/Documentation/gitignore.txt
> +++ b/Documentation/gitignore.txt
> @@ -129,7 +129,8 @@ full pathname may have special meaning:
> matches zero or more directories. For example, "`a/**/b`"
> matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on.
>
> - - Other consecutive asterisks are considered invalid.
> + - Other consecutive asterisks are considered regular asterisks and
> +   will match according to the previous rules.
>
>  NOTES
>  -
> diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
> index 46aca0af10..891d4d7cb9 100755
> --- a/t/t3070-wildmatch.sh
> +++ b/t/t3070-wildmatch.sh
> @@ -237,7 +237,7 @@ match 0 0 0 0 foobar 'foo\*bar'
>  match 1 1 1 1 'f\oo' 'f\\oo'
>  match 1 1 1 1 ball '*[al]?'
>  match 0 0 0 0 ten '[ten]'
> -match 0 0 1 1 ten '**[!te]'
> +match 1 1 1 1 ten '**[!te]'
>  match 0 0 0 0 ten '**[!ten]'
>  match 1 1 1 1 ten 't[a-g]n'
>  match 0 0 0 0 ten 't[!a-g]n'
> @@ -253,7 +253,7 @@ match 1 1 1 1 ']' ']'
>  # Extended slash-matching features
>  match 0 0 1 1 'foo/baz/bar' 'foo*bar'
>  match 0 0 1 1 'foo/baz/bar' 'foo**bar'
> -match 0 0 1 1 'foobazbar' 'foo**bar'
> +match 1 1 1 1 'foobazbar' 'foo**bar'
>  match 1 1 1 1 'foo/baz/bar' 'foo/**/bar'
>  match 1 1 0 0 'foo/baz/bar' 'foo/**/**/bar'
>  match 1 1 1 1 'foo/b/a/z/bar' 'foo/**/bar'
> diff --git a/wildmatch.c b/wildmatch.c
> index d074c1be10..9e9e2a2f95 100644
> --- a/wildmatch.c
> +++ b/wildmatch.c
> @@ -104,8 +104,8 @@ static int dowild(const uchar *p, const uchar *text, 
> unsigned int flags)
>   dowild(p + 1, text, flags) == 
> WM_MATCH)
>   return WM_MATCH;
>   match_slash = 1;
> - } else
> - return WM_ABORT_MALFORMED;
> + } else /* WM_PATHNAME is set */
> + match_slash = 0;
>   } else
>   /* without WM_PATHNAME, '*' == '**' */
>   match_slash = flags & WM_PATHNAME ? 0 : 1;
> diff --git a/wildmatch.h b/wildmatch.h
> index b8c826aa68..5993696298 100644
> --- a/wildmatch.h
> +++ b/wildmatch.h
> @@ -4,7 +4,6 @@
>  #define WM_CASEFOLD 1
>  #define WM_PATHNAME 2
>
> -#define WM_ABORT_MALFORMED 2
>  #define WM_NOMATCH 1
>  #define WM_MATCH 0
>  #define WM_ABORT_ALL -1

This patch looks good to me, but I think it's a bad state of affairs to
keep changing these semantics and not having something like a
"gitwildmatch" doc were we document this matching syntax.

Also I still need to dig up the work for using PCRE as an alternate
matching engine, the PCRE devs produced a bug-for-bug compatible version
of our wildmatch function (all the more reason to document it), so I
think they'll need to change it now that this is in, but I haven't
rebased those ancient patches yet.

Do you have any thoughts on how to proceed with getting this documented
/ into some stable state where we can specify it? Even if we don't end
up using PCRE as a matching engine (sometimes it was faster, sometimes
slower) I think it would be very useful if we can spew out "here's your
pattern as a regex" for self-documentation purposes.

Then that can be piped into e.g. "perl -Mre=debug" to see a step-by-step
guide for how the pattern compiles, and why it does or doesn't match a
given thing.


Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)

2018-10-29 Thread Johannes Schindelin
Hi Junio,

On Fri, 26 Oct 2018, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > On Fri, 26 Oct 2018, Junio C Hamano wrote:
> >
> >> * js/mingw-http-ssl (2018-10-26) 3 commits
> >>   (merged to 'next' on 2018-10-26 at 318e82e101)
> >>  + http: when using Secure Channel, ignore sslCAInfo by default
> >>  + http: add support for disabling SSL revocation checks in cURL
> >>  + http: add support for selecting SSL backends at runtime
> >>  (this branch is used by jc/http-curlver-warnings.)
> >> 
> >>  On Windows with recent enough cURL library, the configuration
> >>  variable http.sslBackend can be used to choose between OpenSSL and
> >>  Secure Channel at runtime as the SSL backend while talking over
> >>  the HTTPS protocol.
> >
> > Just a quick clarification: the http.sslBackend feature is in no way
> > Windows-only.  Sure, it was championed there, and sure, we had the first
> > multi-ssl-capable libcurl, but this feature applies to all libcurl
> > versions that are built with multiple SSL/TLS backends.
> 
> Yeah, but "http.sslBackend can be used to choose betnween OpenSSL
> and Scure Channel" applies only to Windows (especially the "between
> A and B" part, when B is Windows only), right?  I had a hard time
> coming up with a phrasing to summarize what the immediate merit
> users would get from the topic in a simple paragraph.

On Linux, with an appropriately built libcurl, you can use http.sslBackend
to choose between OpenSSL, GNU TLS, NSS and mbedTLS.

> > The two patches on top are Windows-only, of course, as they really apply
> > only to the Secure Channel backend (which *is* Windows-only).
> 
> Yes, that is why the summary for the topic as a whole focuses on
> Windows, as that is the primary audience who would benefit from the
> topic.

In contrast, I think that the main purpose of this patch series is to
bring http.sslBackend to everybody. And then we also include fall-out
patches that are Windows-only. :-)

Ciao,
Dscho


Re: [PATCH] packfile: close multi-pack-index in close_all_packs

2018-10-29 Thread Derrick Stolee

On 10/29/2018 7:10 AM, SZEDER Gábor wrote:

On Thu, Oct 25, 2018 at 12:54:05PM +, Derrick Stolee wrote:

Whenever we delete pack-files from the object directory, we need
to also delete the multi-pack-index that may refer to those
objects. Sometimes, this connection is obvious, like during a
repack. Other times, this is less obvious, like when gc calls
a repack command and then does other actions on the objects, like
write a commit-graph file.

The pattern we use to avoid out-of-date in-memory packed_git
structs is to call close_all_packs(). This should also call
close_midx(). Since we already pass an object store to
close_all_packs(), this is a nicely scoped operation.

This fixes a test failure when running t6500-gc.sh with
GIT_TEST_MULTI_PACK_INDEX=1.

Reported-by: Szeder Gábor 
Signed-off-by: Derrick Stolee 
---

Thanks for the report, Szeder! I think this is the correct fix,
and more likely to be permanent across all builtins that run
auto-GC. I based it on ds/test-multi-pack-index so it could easily
be added on top.

OK, avoiding those errors in the first place is good, of course...
However, I still find it disconcerting that those errors didn't cause
'git gc' to error out, and, consequently, what other MIDX-related
errors/bugs might do the same.
When I added GIT_TEST_MULTI_PACK_INDEX, one of the important pieces was 
to delete the multi-pack-index file whenever deleting the pack-files it 
contains. This only happens during a 'git repack'.


The thing I had missed (and is covered by this patch) is when we run a 
subcommand that can remove pack-files while we have a multi-pack-index open.


The reason this wasn't caught is that the test suite did not include any 
cases where the following things happened in order:


1. Open pack-files and multi-pack-index.
2. Delete pack-files, but keep multi-pack-index open.
3. Read objects (from the multi-pack-index).

This step 3 was added to 'git gc' by the commit-graph write, hence the 
break. The pack-file deletion happens in the repack subcommand.


Since close_all_packs() is the way we guarded against this problem in 
the traditional pack-file environment, this is the right place to insert 
a close_midx() call, and will avoid cases like this in the future (at 
least, the multi-pack-index will not be the reason on its own).


Thanks,
-Stolee


[PATCH] test-lib: introduce the '-V' short option for '--verbose-log'

2018-10-29 Thread SZEDER Gábor
'--verbose-log' is one of the most useful and thus most frequently
used test options, but due to its length it's a pain to type on the
command line.

Let's introduce the corresponding short option '-V' to save some
keystrokes.

Signed-off-by: SZEDER Gábor 
---

Or it could be '-L', to emphasize the "log" part of the long option, I
don't really care.

 t/README  | 1 +
 t/test-lib.sh | 4 ++--
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/t/README b/t/README
index 8847489640..2e9bef2852 100644
--- a/t/README
+++ b/t/README
@@ -154,6 +154,7 @@ appropriately before running "make".
As the names depend on the tests' file names, it is safe to
run the tests with this option in parallel.
 
+-V::
 --verbose-log::
Write verbose output to the same logfile as `--tee`, but do
_not_ write it to stdout. Unlike `--tee --verbose`, this option
diff --git a/t/test-lib.sh b/t/test-lib.sh
index 897e6fcc94..47a99aa0ed 100644
--- a/t/test-lib.sh
+++ b/t/test-lib.sh
@@ -67,7 +67,7 @@ case "$GIT_TEST_TEE_STARTED, $* " in
 done,*)
# do not redirect again
;;
-*' --tee '*|*' --va'*|*' --verbose-log '*)
+*' --tee '*|*' --va'*|*' -V '*|*' --verbose-log '*)
mkdir -p "$TEST_OUTPUT_DIRECTORY/test-results"
BASE="$TEST_OUTPUT_DIRECTORY/test-results/$(basename "$0" .sh)"
 
@@ -316,7 +316,7 @@ do
echo >&2 "warning: ignoring -x; '$0' is untraceable 
without BASH_XTRACEFD"
fi
shift ;;
-   --verbose-log)
+   -V|--verbose-log)
verbose_log=t
shift ;;
*)
-- 
2.19.1.723.g6817e59acc



Re: 'ds/test-multi-pack-index' vs. 'ab/commit-graph-progress'

2018-10-29 Thread SZEDER Gábor
On Thu, Oct 25, 2018 at 11:11:55PM +0200, Ævar Arnfjörð Bjarmason wrote:
> > Anyway, this test seems to be too fragile, because that
> >
> >   test_line_count = 1 stderr
> 
> Yeah maybe it's too fragile, on the other hand it caught what seems to
> be a bug under GIT_TEST_MULTI_PACK_INDEX=true, and the rest of the test
> suite passes, so there's that.

I can image more prudent approaches that would have done the same,
e.g. '! grep ^error stderr'.

> > line will trigger, when anything else during 'git gc' prints
> > something.  And I find it quite strange that an option called
> > '--no-quiet' only shows the commit-graph progress, but not the regular
> > output of 'git gc'.
> 
> It's confusing, but the reason for this seeming discrepancy is that
> writing the commit-graph happens in-process, whereas the rest of the
> work done by git-gc (and its output) comes from subprocesses. Most of
> that output is from "git-repack" / "git-pack-objects" which doesn't pay
> the same attention to --quiet and --no-quiet, instead it checks
> isatty(). See cmd_pack_objects().

This explains what implementation details led to the current
behaviour, but does not justify the actual inconsistency.



Re: [PATCH] packfile: close multi-pack-index in close_all_packs

2018-10-29 Thread SZEDER Gábor
On Thu, Oct 25, 2018 at 12:54:05PM +, Derrick Stolee wrote:
> Whenever we delete pack-files from the object directory, we need
> to also delete the multi-pack-index that may refer to those
> objects. Sometimes, this connection is obvious, like during a
> repack. Other times, this is less obvious, like when gc calls
> a repack command and then does other actions on the objects, like
> write a commit-graph file.
> 
> The pattern we use to avoid out-of-date in-memory packed_git
> structs is to call close_all_packs(). This should also call
> close_midx(). Since we already pass an object store to
> close_all_packs(), this is a nicely scoped operation.
> 
> This fixes a test failure when running t6500-gc.sh with
> GIT_TEST_MULTI_PACK_INDEX=1.
> 
> Reported-by: Szeder Gábor 
> Signed-off-by: Derrick Stolee 
> ---
> 
> Thanks for the report, Szeder! I think this is the correct fix,
> and more likely to be permanent across all builtins that run
> auto-GC. I based it on ds/test-multi-pack-index so it could easily
> be added on top.

OK, avoiding those errors in the first place is good, of course...
However, I still find it disconcerting that those errors didn't cause
'git gc' to error out, and, consequently, what other MIDX-related
errors/bugs might do the same.

> -Stolee
> 
>  packfile.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/packfile.c b/packfile.c
> index 841b36182f..37fcd8f136 100644
> --- a/packfile.c
> +++ b/packfile.c
> @@ -339,6 +339,11 @@ void close_all_packs(struct raw_object_store *o)
>   BUG("want to close pack marked 'do-not-close'");
>   else
>   close_pack(p);
> +
> + if (o->multi_pack_index) {
> + close_midx(o->multi_pack_index);
> + o->multi_pack_index = NULL;
> + }
>  }
>  
>  /*
> 
> base-commit: 0465a50506023df0932fe0534fe6ac6712c0d854
> -- 
> 2.17.1
> 


Re: [PATCH 12/12] fsck: mark strings for translation

2018-10-29 Thread SZEDER Gábor
On Sun, Oct 28, 2018 at 07:51:57AM +0100, Nguyễn Thái Ngọc Duy wrote:
> Signed-off-by: Nguyễn Thái Ngọc Duy 
> ---
>  builtin/fsck.c | 113 -
>  t/t1410-reflog.sh  |   6 +-
>  t/t1450-fsck.sh|  50 
>  t/t6050-replace.sh |   4 +-
>  t/t7415-submodule-names.sh |   6 +-
>  5 files changed, 94 insertions(+), 85 deletions(-)
> 
> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index 06eb421720..13f8fe35c5 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -108,8 +108,9 @@ static int fsck_config(const char *var, const char 
> *value, void *cb)
>  static void objreport(struct object *obj, const char *msg_type,
>   const char *err)
>  {
> - fprintf(stderr, "%s in %s %s: %s\n",
> - msg_type, printable_type(obj), describe_object(obj), err);
> + fprintf_ln(stderr, _("%s in %s %s: %s"),

Are the (f)printf() -> (f)printf_ln() changes all over
'builtin/fsck.c' really necessary to mark strings for translation?

> diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
> index 90c765df3a..500c21366e 100755
> --- a/t/t1450-fsck.sh
> +++ b/t/t1450-fsck.sh

> @@ -594,7 +594,7 @@ test_expect_success 'fsck --name-objects' '
>   remove_object $(git rev-parse julius:caesar.t) &&
>   test_must_fail git fsck --name-objects >out &&
>   tree=$(git rev-parse --verify julius:) &&
> - egrep "$tree \((refs/heads/master|HEAD)@\{[0-9]*\}:" out
> + test -n "$GETTEXT_POISON" || egrep "$tree 
> \((refs/heads/master|HEAD)@\{[0-9]*\}:" out

'test_i18ngrep' accepts all 'grep' options, so this could be written
as:

  test_i18ngrep -E "..." out


There might be something else wrong with the patch, because now this
test tends to fail on current 'pu' on Travis CI:

  [... snipped output from 'test_commit' ...]
  +git rev-parse julius:caesar.t
  +remove_object be45bbd3809e0829297cefa576e699c134abacfd
  +sha1_file be45bbd3809e0829297cefa576e699c134abacfd
  +remainder=45bbd3809e0829297cefa576e699c134abacfd
  +firsttwo=be
  +echo .git/objects/be/45bbd3809e0829297cefa576e699c134abacfd
  +rm .git/objects/be/45bbd3809e0829297cefa576e699c134abacfd
  +test_must_fail git fsck --name-objects
  +_test_ok=
  +git fsck --name-objects
  +exit_code=2
  +test 2 -eq 0
  +test_match_signal 13 2
  +test 2 = 141
  +test 2 = 269
  +return 1
  +test 2 -gt 129
  +test 2 -eq 127
  +test 2 -eq 126
  +return 0
  +git rev-parse --verify julius:
  +tree=c2fab98f409a47394d992eca10a20e0b22377c0c
  +test -n 
  +egrep c2fab98f409a47394d992eca10a20e0b22377c0c 
\((refs/heads/master|HEAD)@\{[0-9]*\}: out
  error: last command exited with $?=1
  not ok 65 - fsck --name-objects

The contents of 'out':

  broken link fromtree be45bbd3809e0829297cefa576e699c134abacfd 
(refs/heads/master@{1112912113}:caesar.t)
toblob be45bbd3809e0829297cefa576e699c134abacfd 
(refs/heads/master@{1112912113}:caesar.t)
  missing blob be45bbd3809e0829297cefa576e699c134abacfd 
(refs/heads/master@{1112912113}:caesar.t)

On a related (side)note, I think it would be better if this 'grep'
pattern were more explicit about which of the three lines it is
supposed to match.


Anyway, I couldn't reproduce the failure locally yet, but, admittedly,
I didn't try that hard...  And my builds on Travis CI haven't yet come
that far to try this topic on its own.



Re: Lost changes after merge

2018-10-29 Thread Gray King
After merged, the latest commit(a008c4d580) has lost and the second
commit(a274b6e7ca) has been the latest,
and changes missed too.



在 2018年10月29日 下午4:49:20, Gray King
(graykin...@gmail.com(mailto:graykin...@gmail.com)) 写到:

>
>
> Hello,
>
> I have a very strange issue described below:
>
> * Here is the tree before I merge via `git log --format="%h %p %d" -n 20 
> --all --graph`:
>
> https://upaste.de/9Pe
>
> * Here is the output of `git log --format="%h %p %d" -2 path/to/file`:
>
> a008c4d580 c61f96eb5d
> a274b6e7ca 67c1000ca3
>
> * Here is the merge commands:
>
> git merge f087081868
> # fix conflicts
>
> * Here is the tree after I merged via `git log --format="%h %p %d" -n 20 
> --all --graph`:
>
> https://upaste.de/8Bx
>
>
> * Here is the output of `git log --format="%h %p %d" -2 path/to/file`:
>
> a274b6e7ca 67c1000ca3
> 67c1000ca3 00bd5c8c89


Lost changes after merge

2018-10-29 Thread Gray King
Hello,

  I have a very strange issue described below:

* Here is the tree before I merge via `git log --format="%h %p %d" -n
20 --all --graph`:

https://upaste.de/9Pe

* Here is the output of `git log --format="%h %p %d" -2 path/to/file`:

a008c4d580 c61f96eb5d
a274b6e7ca 67c1000ca3

* Here is the merge commands:

git merge f087081868
# fix conflicts

* Here is the tree after I merged via `git log --format="%h %p %d" -n
20 --all --graph`:

https://upaste.de/8Bx


* Here is the output of `git log --format="%h %p %d" -2 path/to/file`:

a274b6e7ca 67c1000ca3
67c1000ca3 00bd5c8c89


Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:"

2018-10-29 Thread Ævar Arnfjörð Bjarmason


On Mon, Oct 29 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
>
>> Add DWYM support for pushing a ref in refs/remotes/* when the 
>
> I think most people call it do-what-*I*-mean, not do-what-you-mean.

FWIW I picked this up from the perl list where both are used depending
on context. I.e. DW[IY]M depending on if the sentence is one where you'd
describe things from a first or second-person perspective "I implemented
this to DWIM, and it'll DWYM if you invoke it as...". Also
"dwimerry"[1].

>> ref is unqualified. Now instead of erroring out we support e.g.:
>>
>> $ ./git-push avar refs/remotes/origin/master:upstream-master -n
>> To github.com:avar/git.git
>>  * [new branch]origin/master -> upstream-master
>>
>> Before this we wouldn't know what do do with
>> refs/remotes/origin/master, now we'll look it up and discover that
>> it's a commit (or tag) and add a refs/{heads,tags}/ prefix to  as
>> appropriate.
>
> I am not sure if this is a good change, as I already hinted earlier.
> If I were doing "git push" to any of my publishing places, I would
> be irritated if "refs/remotes/ko/master:something" created a local
> "something" branch at the desitnation, instead of erroring out.
>
> On the other hand, I do not think I mind all that much if a src that
> is a tag object to automatically go to refs/tags/ (having a tag
> object in refs/remotes/** is rare enough to matter in the first
> place).

Yeah maybe this is going too far. I don't think so, but happy to me
challenged on that point :)

I don't think so because the only reason I've ever needed this is
because I deleted some branch accidentally and am using a push from
"remotes/*" to bring it back. I.e. I'll always want branch-for-branch,
not to push that as a tag.

Of course it isn't actually a "branch", but just a commit object
(depending on the refspec configuration), so we can't quite make that
hard assumption.

But I figured screwing with how refs/remotes/* works by manually adding
new refspecs is such an advanced feature that the people doing it are
probably all here on-list, and would be the sort of users advanced
enough to either know the semantics or try this with -n.

Whereas the vast majority of users won't ever screw with it, and if they
ever push from remotes/* to another remote almost definitely want a new
branch under the new name.

1. https://www.urbandictionary.com/define.php?term=DWYM


Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:"

2018-10-29 Thread Ævar Arnfjörð Bjarmason


On Mon, Oct 29 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason   writes:
>
>> This is the first use of the %N$ style of printf format in
>> the *.[ch] files in our codebase. It's supported by POSIX[2] and
>> there's existing uses for it in po/*.po files,...
>
> For now, I'll eject this from 'pu', as I had spent way too much time
> trying to make it and other topics work there.

I was compiling with DEVELOPER=1 but as it turns out:

CFLAGS="-O0" DEVELOPER=1

Wasn't doing what I thought, i.e. we just take 'CFLAGS' from the
command-line and don't add any of the DEVELOPER #leftoverbits to
it. Will fix this and other issues raised.

> CC remote.o
> remote.c: In function 'show_push_unqualified_ref_name_error':
> remote.c:1035:2: error: $ operand number used after format without operand 
> number [-Werror=format=]
>   error(_("The destination you provided is not a full refname (i.e.,\n"
>   ^
> cc1: all warnings being treated as errors
> Makefile:2323: recipe for target 'remote.o' failed
> make: *** [remote.o] Error 1

Will fix this and other issues raised. FWIW clang gives a much better
error about the actual issue:

remote.c:1042:46: error: cannot mix positional and non-positional arguments 
in format string [-Werror,-Wformat]
"- Checking if the  being pushed ('%2$s')\n"

I.e. this on top fixes it:

-   "- Looking for a ref that matches '%s' on the remote 
side.\n"
-   "- Checking if the  being pushed ('%s')\n"
+   "- Looking for a ref that matches '%1$s' on the remote 
side.\n"
+   "- Checking if the  being pushed ('%2$s')\n"

Maybe  this whole thing isn't worth it and I should just do:

@@ -1042 +1042 @@ static void show_push_unqualified_ref_name_error(const 
char *dst_value,
-   "- Checking if the  being pushed ('%2$s')\n"
+   "- Checking if the  being pushed ('%s')\n"
@@ -1047 +1047 @@ static void show_push_unqualified_ref_name_error(const 
char *dst_value,
- dst_value, matched_src_name);
+ dst_value, matched_src_name, matched_src_name);

But I'm leaning on the side of keeping it for the self-documentation
aspect of "this is a repeated parameter". Your objections to this whole
thing being a stupid idea non-withstanding.


Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:"

2018-10-29 Thread Junio C Hamano
Junio C Hamano  writes:

> Ævar Arnfjörð Bjarmason   writes:
>
>> This is the first use of the %N$ style of printf format in
>> the *.[ch] files in our codebase. It's supported by POSIX[2] and
>> there's existing uses for it in po/*.po files,...
>
> For now, I'll eject this from 'pu', as I had spent way too much time
> trying to make it and other topics work there.
>
> CC remote.o
> remote.c: In function 'show_push_unqualified_ref_name_error':
> remote.c:1035:2: error: $ operand number used after format without operand 
> number [-Werror=format=]
>   error(_("The destination you provided is not a full refname (i.e.,\n"
>   ^
> cc1: all warnings being treated as errors
> Makefile:2323: recipe for target 'remote.o' failed
> make: *** [remote.o] Error 1

I'll redo 'pu' with the following fix-up on top.

 remote.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/remote.c b/remote.c
index c243e3d89e..1927c4b376 100644
--- a/remote.c
+++ b/remote.c
@@ -1035,8 +1035,8 @@ static void show_push_unqualified_ref_name_error(const 
char *dst_value,
error(_("The destination you provided is not a full refname (i.e.,\n"
"starting with \"refs/\"). We tried to guess what you meant 
by:\n"
"\n"
-   "- Looking for a ref that matches '%s' on the remote side.\n"
-   "- Checking if the  being pushed ('%s')\n"
+   "- Looking for a ref that matches '%1$s' on the remote side.\n"
+   "- Checking if the  being pushed ('%2$s')\n"
"  is a ref in \"refs/{heads,tags}/\". If so we add a 
corresponding\n"
"  refs/{heads,tags}/ prefix on the remote side.\n"
"- Checking if the  being pushed ('%2$s')\n"
-- 
2.19.1-593-gc670b1f876



Re: [PATCH 08/12] remote.c: mark messages for translation

2018-10-29 Thread Junio C Hamano
Nguyễn Thái Ngọc Duy   writes:

> @@ -995,12 +995,12 @@ static int match_explicit_lhs(struct ref *src,
>* way to delete 'other' ref at the remote end.
>*/
>   if (try_explicit_object_name(rs->src, match) < 0)
> - return error("src refspec %s does not match any.", 
> rs->src);
> + return error(_("src refspec %s does not match any"), 
> rs->src);
>   if (allocated_match)
>   *allocated_match = 1;
>   return 0;
>   default:
> - return error("src refspec %s matches more than one.", rs->src);
> + return error(_("src refspec %s matches more than one"), 
> rs->src);
>   }
>  }

These minor changes that are not accompanied by their own
justification mean that the patches in this series cannot blindly be
trusted, which in turn means that I won't have bandwidth to process
this series properly for now.

I also saw die() -> BUG() that was not highlighted in the proposed
log message; the single instance I happened to notice looked
sensible, but I am not sure about the others.

There are other series in flight that touch the same area of code
and in different ways, causing unnecessary conflicts, which does not
help us either X-<.




Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:"

2018-10-29 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason   writes:

> This is the first use of the %N$ style of printf format in
> the *.[ch] files in our codebase. It's supported by POSIX[2] and
> there's existing uses for it in po/*.po files,...

For now, I'll eject this from 'pu', as I had spent way too much time
trying to make it and other topics work there.

CC remote.o
remote.c: In function 'show_push_unqualified_ref_name_error':
remote.c:1035:2: error: $ operand number used after format without operand 
number [-Werror=format=]
  error(_("The destination you provided is not a full refname (i.e.,\n"
  ^
cc1: all warnings being treated as errors
Makefile:2323: recipe for target 'remote.o' failed
make: *** [remote.o] Error 1


Re: [PATCH] worktree: refactor lock_reason_valid and lock_reason to be more sensible

2018-10-29 Thread Junio C Hamano
Nickolai Belakovski  writes:

> Either way, I do see an issue with the current code that anybody who
> wants to know the lock status and/or lock reason of a worktree gets
> faced with a confusing, misleading, and opaque piece of code.

Sorry, I don't.  I do not mind a better documentation for
is_worktree_locked() without doing anything else.

I do not see any reason to remove fields, split the helper funciton
into two, drop the caching, etc., especially when the only
justification is "I am new to the codebase and find it confusing".



Re: [PATCH] worktree: refactor lock_reason_valid and lock_reason to be more sensible

2018-10-29 Thread Eric Sunshine
On Mon, Oct 29, 2018 at 1:45 AM Nickolai Belakovski
 wrote:
> On Sun, Oct 28, 2018 at 9:01 PM Eric Sunshine  wrote:
> > That said, I wouldn't necessarily oppose renaming the function, but I
> > also don't think it's particularly important to do so.
>
> To me, I would just go lookup the signature of worktree_lock_reason
> and see that it returns a pointer and I'd be satisfied with that. I
> could also infer that from looking at the code if I'm just skimming
> through. But if I see code like "reason = is_worktree_locked(wt)" I'm
> like hold on, what's going on here?! :P

I don't feel strongly about it, and, as indicated, wouldn't
necessarily be opposed to it. If you do want to make that change,
perhaps send it as the second patch of a 2-patch series in which patch
1 just updates the API documentation. That way, if anyone does oppose
the rename in patch 2, then that patch can be dropped without having
to re-send.