Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)
Hi Junio, On 30 October 2018 11:07:39 GMT+05:30, Junio C Hamano wrote: >Eric Sunshine writes: >>> On platforms with recent cURL library, http.sslBackend >configuration >>> variable can be used to choose different SSL backend at runtime. >> >> s/choose/& a/ >> >>> The Windows port uses this mechanism to switch between OpenSSL >and >>> Secure Channel while talking over the HTTPS protocol. > >Thanks. > >By the way, I am beginning to like this phrasing quite a lot. It >encourages those with other ports like Linux and Mac to see if they >want a similar runtime switching by singling out Windows port, which >was what I wanted to achive with the original one, but I think the >updated text does so more effectively. The new phrasing seems to be reading quite better indeed. -- Sivaraam Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)
Eric Sunshine writes: > On Mon, Oct 29, 2018 at 10:10 PM Junio C Hamano wrote: >> How's this? >> >> On platforms with recent cURL library, http.sslBackend configuration >> variable can be used to choose different SSL backend at runtime. > > s/choose/& a/ > >> The Windows port uses this mechanism to switch between OpenSSL and >> Secure Channel while talking over the HTTPS protocol. Thanks. By the way, I am beginning to like this phrasing quite a lot. It encourages those with other ports like Linux and Mac to see if they want a similar runtime switching by singling out Windows port, which was what I wanted to achive with the original one, but I think the updated text does so more effectively.
Re: [RFC] Generation Number v2
Derrick Stolee writes: > **V3: Corrected Commit Date.** > For a commit C, let its _corrected commit date_ (denoted by cdate(C)) > be the maximum of the commit date of C and the commit dates of its > parents. "maximum of the commit date of C and the corrected commit dates of its parents"? We've talked about exactly this one in the past (long before any of Microsoft folks other than Dscho came to the scene) but without an implementation, or detailed design and analysis. I am very happy to see the idea among other possibilities to be considered again. This time around, we may finally come up with something better than the "commit dates with SLOP" thing ;-). > Essentially, the felineY order is selected with the goal of swapping > positions of topologically-independent commits relative to the felinX > ordering. The resulting reachability index is as follows: > >If felineX(A) < felineY(B), then A cannot reach B. >If felineY(A) < felineY(B), then A cannot reach B. I presume that the first line is a typo and you compare the same X index? > * **Compatible?** In our test implementation, we use a previously unused > byte of data in the commit-graph format to indicate which reachability > index version we are using. Existing clients ignore this value, so we > will want to consider if these new indexes are _backwards compatible_. > That is, will they still report correct values if they ignore this byte > and use the generation number column from the commit-graph file assuming > the values are minimum generation numbers? I personally consider that the current commit-graph with generation numbers experimental, so I am not sure how much we care about this. Having said that. By the above definition, any new index that is wider than the current generation number cannot be compatible (can we even tell the existing clients how wide each elements in the ix array is?) In any case, perhaps the first thing to do is to update the clients so that they stop ignoring the version number field, and instead work without generation number when there is no version of reach.ix available in the file? That way, a better reachablility index can be chosen freely without having to worry about the compatibility. > * **Immutable?** Git objects are _immutable_. If you change an object you > actually create a new object with a new object ID. Are the values we > store > for these reachability indexes also immutable? Even if we do not embed the reachability ix in commit objects, having an immutable value is probably a must if we want to make them incrementally computable, so this is a very good property to have. Unless there is a clever idea to incrementally compute a mutable reach.ix, my gut instinct says that this property is a must. Another thing, perhaps related to "local" below, is if exactly the same reach.ix is computed by anybody, given an identical commit history graph (perhaps "reproducibility"?). I think most of the candidates you listed are reproducible without a fixed tie-breaker, but I am not sure about felineY() thing. > * **Local?** Are these values **locally computable**? That is, do we only > need to look at the parents of a commit (assuming those parents have > computed values) in order to determine the value at that commit? A subset of non-local reachability ix, for example, the ones that need to know what each commit's children are, cannot be immutable, as adding new objects to the graph (either with locally committing, or transferring objects from other repositories) would affect the ix; is this true for all non-local reachability ix, I wonder? > We focused on three types of performance tests that test the indexes > in different ways. Each test lists the `git` command that is used, > and the table lists which repository is used and which inputs. > > ### Test 1: `git log --topo-order -N` > > This test focuses on the number of commits that are parsed during > a `git log --topo-order` before writing `N` commits to output. A devil's advocate comment. Your patches seem to be very focused on this "unlimited" case for the past few weeks, but I am not so sure if that is a case worth optimizing for. If "git log --topo-order -N HEAD~M.." (for some number M) gives us a similar result as unlimited case but with much less effort, wouldn't it be good enough that lets us concentrate on other use cases instead? > Based on the performance results alone, we should remove minimum > generation numbers, (epoch, date) pairs, and FELINE index from > consideration. There are enough examples of these indexes performing > poorly. OK. > In contrast, maximum generation numbers and corrected commit > dates both performed quite well. They are frequently the top > two performing indexes, and rarely significantly different. > > The trade-off here now seems to be: which _property_ is more important, > locally-computable or backwards-compatible? Nice summary. As I already said, I personally do not think
Re: [PATCH] completion: use builtin completion for format-patch
On Tue, Oct 30, 2018 at 11:20:45AM +0900, Junio C Hamano wrote: > We saw a similar change proposed and then found out it was not such > a good idea in: > > https://public-inbox.org/git/cacsjy8durvju0hn7kuceo4iv5aimwbytr+e-7kenpvdx90d...@mail.gmail.com/ > > It seems that this one loses options like --full-index, --no-prefix, > etc. compared to the earlier effort? In _git_send_email, we have the following lines: __gitcomp "--annotate --bcc --cc --cc-cmd --chain-reply-to ... more options ... $__git_format_patch_options" Would it make sense to take the old `__git_format_patch_options` and just roll them into here, then make `_git_format_patch` use `__gitcomp_builtin format-patch`? That way, we'd be able to reap the benefits of using `__gitcomp_builtin` where we can.
Re: [PATCH 0/4] index-pack: optionally turn off SHA-1 collision checking
On Sun, Oct 28, 2018 at 10:50:19PM +, Ævar Arnfjörð Bjarmason wrote: > I left the door open for that in the new config option 4/4 implements, > but I suspect for Geert's purposes this is something he'd prefer to > turn off in git on clone entirely, i.e. because it may be running on > some random Amazon's customer's EFS instance, and they won't know > about this new core.checkCollisions option. > > But maybe I'm wrong about that and Geert is happy to just turn on > core.checkCollisions=false and use this series instead. I think that the best user experience would probably be if git were fast by default without having to give up on (future) security by removing the sha1 collision check. Maybe core.checkCollisons could default to "on" only when there's no loose objects in the repository? That would give a fast experience for many common cases (git clone, git init && git fetch) while still doing the collision check when relevant. My patch used the --cloning flag as an approximation of "no loose objects". Maybe a better option would be to check for the non-existence of the [00-ff] directories under .git/objects.
Re: Lost changes after merge
Sorry, seems the link has been expired, here is the new one: * Before merge run `git log --format="%h %p %d" -n 20 --all --graph`: https://cfp.vim-cn.com/cbfq6 * After merged run `git log --format="%h %p %d" -n 20 --all --graph`: https://cfp.vim-cn.com/cbfq7 在 2018年10月29日 下午10:18:07, Jeff King (p...@peff.net(mailto:p...@peff.net)) 写到: > On Mon, Oct 29, 2018 at 09:49:20AM +0100, Gray King wrote: > > > Hello, > > > > I have a very strange issue described below: > > > > * Here is the tree before I merge via `git log --format="%h %p %d" -n > > 20 --all --graph`: > > > > https://upaste.de/9Pe > > FWIW, neither this nor the other paste link in your email seem to work > for me (which makes it hard to comment on the rest of the email). > > -Peff
Re: [PATCH v1] speed up refresh_index() by utilizing preload_index()
Ben Peart writes: > From: Ben Peart > > Speed up refresh_index() by utilizing preload_index() to do most of the work > spread across multiple threads. This works because most cache entries will > get marked CE_UPTODATE so that refresh_cache_ent() can bail out early when > called from within refresh_index(). > > On a Windows repo with ~200K files, this drops refresh times from 6.64 > seconds to 2.87 seconds for a savings of 57%. > > Signed-off-by: Ben Peart > --- OK. We used to only expose the whole "read the index file into an istate, and then do the lstat() part in parallel", but now we also make the "do the lstat() part" available separately. Which makes sense. > diff --git a/cache.h b/cache.h > index f7fabdde8f..883099db08 100644 > --- a/cache.h > +++ b/cache.h > @@ -659,6 +659,9 @@ extern int daemonize(void); > /* Initialize and use the cache information */ > struct lock_file; > extern int read_index(struct index_state *); > +extern void preload_index(struct index_state *index, > + const struct pathspec *pathspec, > + unsigned int refresh_flags); > extern int read_index_preload(struct index_state *, > const struct pathspec *pathspec, > unsigned int refresh_flags); > diff --git a/preload-index.c b/preload-index.c > index 9e7152ab14..222792ccbc 100644 > --- a/preload-index.c > +++ b/preload-index.c > @@ -9,7 +9,7 @@ > #include "progress.h" > > #ifdef NO_PTHREADS > -static void preload_index(struct index_state *index, > +void preload_index(struct index_state *index, > const struct pathspec *pathspec, > unsigned int refresh_flags) > { > @@ -100,9 +100,9 @@ static void *preload_thread(void *_data) > return NULL; > } > > -static void preload_index(struct index_state *index, > - const struct pathspec *pathspec, > - unsigned int refresh_flags) > +void preload_index(struct index_state *index, > +const struct pathspec *pathspec, > +unsigned int refresh_flags) > { > int threads, i, work, offset; > struct thread_data data[MAX_PARALLEL]; > diff --git a/read-cache.c b/read-cache.c > index d57958233e..53733d651d 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1496,6 +1496,12 @@ int refresh_index(struct index_state *istate, unsigned > int flags, > typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n"); > added_fmt = (in_porcelain ? "A\t%s\n" : "%s needs update\n"); > unmerged_fmt = (in_porcelain ? "U\t%s\n" : "%s: needs merge\n"); > + /* > + * Use the multi-threaded preload_index() to refresh most of the > + * cache entries quickly then in the single threaded loop below, > + * we only have to do the special cases that are left. > + */ > + preload_index(istate, pathspec, 0); > for (i = 0; i < istate->cache_nr; i++) { > struct cache_entry *ce, *new_entry; > int cache_errno = 0; > > base-commit: c670b1f876521c9f7cd40184bf7ed05aad843433
Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)
On Mon, Oct 29, 2018 at 10:10 PM Junio C Hamano wrote: > How's this? > > On platforms with recent cURL library, http.sslBackend configuration > variable can be used to choose different SSL backend at runtime. s/choose/& a/ > The Windows port uses this mechanism to switch between OpenSSL and > Secure Channel while talking over the HTTPS protocol.
Re: [PATCH] completion: use builtin completion for format-patch
Denton Liu writes: > Signed-off-by: Denton Liu > --- > contrib/completion/git-completion.bash | 11 +-- > 1 file changed, 1 insertion(+), 10 deletions(-) We saw a similar change proposed and then found out it was not such a good idea in: https://public-inbox.org/git/cacsjy8durvju0hn7kuceo4iv5aimwbytr+e-7kenpvdx90d...@mail.gmail.com/ It seems that this one loses options like --full-index, --no-prefix, etc. compared to the earlier effort? > diff --git a/contrib/completion/git-completion.bash > b/contrib/completion/git-completion.bash > index d63d2dffd..da77da481 100644 > --- a/contrib/completion/git-completion.bash > +++ b/contrib/completion/git-completion.bash > @@ -1531,15 +1531,6 @@ _git_fetch () > __git_complete_remote_or_refspec > } > > -__git_format_patch_options=" > - --stdout --attach --no-attach --thread --thread= --no-thread > - --numbered --start-number --numbered-files --keep-subject --signoff > - --signature --no-signature --in-reply-to= --cc= --full-index --binary > - --not --all --cover-letter --no-prefix --src-prefix= --dst-prefix= > - --inline --suffix= --ignore-if-in-upstream --subject-prefix= > - --output-directory --reroll-count --to= --quiet --notes > -" > - > _git_format_patch () > { > case "$cur" in > @@ -1550,7 +1541,7 @@ _git_format_patch () > return > ;; > --*) > - __gitcomp "$__git_format_patch_options" > + __gitcomp_builtin format-patch > return > ;; > esac
Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)
Johannes Schindelin writes: >> >> On Windows with recent enough cURL library, the configuration >> >> variable http.sslBackend can be used to choose between OpenSSL and >> >> Secure Channel at runtime as the SSL backend while talking over >> >> the HTTPS protocol. >> ... >> Yeah, but "http.sslBackend can be used to choose betnween OpenSSL >> and Scure Channel" applies only to Windows ... > ... >> > The two patches on top are Windows-only, of course, as they really apply >> > only to the Secure Channel backend (which *is* Windows-only). >> >> Yes, that is why the summary for the topic as a whole focuses on >> Windows, as that is the primary audience who would benefit from the >> topic. How's this? On platforms with recent cURL library, http.sslBackend configuration variable can be used to choose different SSL backend at runtime. The Windows port uses this mechanism to switch between OpenSSL and Secure Channel while talking over the HTTPS protocol.
Re: [PATCH 09/10] read-cache.c: remove #ifdef NO_PTHREADS
Duy Nguyen writes: >> This shouldn't be needed either. My assumption was that if someone >> explicitly asked for multiple threads we're better off failing than >> silently ignoring their request. The default/automatic setting will >> detect the number of cpus and behave correctly. > > No. I can have ~/.gitconfig shared between different machines. One > with multithread support and one without. I should not have to fall > back to conditional includes in order to use the same config file on > the machine without multithread. Our attitude has been to fail unsatisfyable requests from the command line loudly to let the users' know, and tolerate having elements from the future versions of Git in the configuration file by ignoring them or tweaking them (silently or with a warning). So this hunk is in line with the current practice, and we should keep it this way. It is a separate discussion if we want to rethink that whole attitude, that discussion may be worth having, and I think that discussion is wider than this single topic.
Re: [PATCH 06/10] grep: remove #ifdef NO_PTHREADS
Jeff King writes: > On Mon, Oct 29, 2018 at 05:01:41PM +0100, Duy Nguyen wrote: > >> On Mon, Oct 29, 2018 at 3:25 PM Jeff King wrote: >> > But if the problem is simply that we are not quite there yet in the grep >> > code, I am OK with taking this as the first pass, and knowing that there >> > is more cleanup to be done later (though that sort of thing is IMHO very >> > useful in a commit message). >> >> Since the problem pops up now, I'm ok with updating/cleaning up all >> this in this series, unless there's benefits in keeping this series >> simple and merging it early (probably not?) > > Mostly I did not want to tax you. I would rather have this series and > some cleanup left over, than to not have anything. But if you are > interested in moving it further, I will not say no. :) Likewise.
Re: [PATCH 3/3] commit-reach.h: add missing declarations (hdr-check)
On 29/10/2018 01:13, Junio C Hamano wrote: > Ramsay Jones writes: > >> ... >>24 clear_contains_cache >> $ >> >> you will find 24 copies of the commit-slab routines for the contains_cache. >> Of course, when you enable optimizations again, these duplicate static >> functions (mostly) disappear. Compiling with gcc at -O2, leaves two static >> functions, thus: >> >> $ nm commit-reach.o | grep contains_cache >> 0870 t contains_cache_at_peek.isra.1.constprop.6 >> $ nm ref-filter.o | grep contains_cache >> 02b0 t clear_contains_cache.isra.14 >> $ >> >> However, using a shared 'contains_cache' would result in all six of the >> above functions as external public functions in the git binary. > > Sorry, but you lost me here. Where does the 6 in above 'all six' > come from? I saw 24 (from unoptimized), and I saw 2 (from > optimized), but... As you already gathered, the 'six of the above functions' was the list of 6 functions which were each duplicated 24 times (you only left the last one un-snipped above) in the unoptimized git binary. > Ah, OK, the slab system gives us a familiy of 6 helper functions to > deal with the "contains_cache" slab, and we call only 3 of them > (i.e. the implementation of other three are left unused). Makes > sense. Yep, only clear_contains_cache(), contains_cache_at() and init_contains_cache() are called. >> At present, >> only three of these functions are actually called, so the trade-off >> seems to favour letting the compiler inline the commit-slab functions. > > OK. I vaguely recall using a linker that can excise the > implementation an unused public function out of the resulting > executable in the past, which may tip the balance in the opposite > direction, Yes, and I thought I was using such a linker - I was surprised that seems not to be the case! ;-) [I know I have used such a linker, and I could have sworn it was on Linux ... ] As I said in [1], the first version of this patch actually used a shared contains_cache (so as not to #include "commit.h"). I changed that just before sending it out, because I felt the patch conflated two issues - I fully intended to send a "let's use a shared contains cache instead" follow up patch! (Again, see [1] for the initial version of that follow up patch). But then Derrick said he preferred this version of the patch and I couldn't really justify the follow up patch, other than to say "you are making your compiler work harder than it needs to ..." - not very convincing! :-P For example, applying the RFC/PATCH from [1] on top of this patch we can see: $ nm git | grep contains_cache 000d21f0 T clear_contains_cache 000d2400 T contains_cache_at 000d2240 T contains_cache_at_peek 000d2410 T contains_cache_peek 000d21d0 T init_contains_cache 000d2190 T init_contains_cache_with_stride $ size git text data bss dec hex filename 2531234 70736 274832 2876802 2be582 git $ whereas, without that patch on top (ie this patch): $ nm git | grep contains_cache 00161e30 t clear_contains_cache.isra.14 000d2190 t contains_cache_at_peek.isra.1.constprop.6 $ size git text data bss dec hex filename 2530962 70736 274832 2876530 2be472 git $ which means the 'shared contains_cache' patch leads to a +272 bytes of bloat in text section. (from the 3 unused external functions). [1] https://public-inbox.org/git/2809251f-6eba-b0ac-68fe-b972809cc...@ramsayjones.plus.com/ > but the above reasonong certainly makes sense to me. Thanks! ATB, Ramsay Jones
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29, 2018 at 03:35:58PM -0700, Stefan Beller wrote: > On Mon, Oct 29, 2018 at 3:27 PM Jeff King wrote: > > > So yeah, that's the other thing I'm thinking about regarding having a > > maximum loose cache size. > > tangent: > This preloading/caching could be used for a more precise approach > to decide when to gc instead of using some statistical sampling > of objects/17, eventually. Isn't it exactly the same thing? Ideally we'd break down the cache to the directory level, so we could fill the cache list for "17" and ask "how full are you". Or we could just readdir objects/17 ourselves. But either way, it's the same amount of work. -Peff
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29, 2018 at 08:36:07PM +0100, Ævar Arnfjörð Bjarmason wrote: > > Yeah, especially given recent advances in SHA-1 attacks, I'm not super > > comfortable with the idea of disabling the duplicate-object check at > > this point. > > I'd be comfortable with it in my setup since it's been limited to > collision attacks that are computationally prohibitive, and there being > no sign of preimage attacks, which is the case we really need to worry > about. I agree, and I'm not actually that worried about the current state. But what makes me more nervous is the life-cycle around Git. In 5 years, people are still going to be running what we ship today, and will grumble about upgrading to deal with SHA-1. I suppose it's not the end of the world as long as they can un-flip a config switch to get back the more-paranoid behavior (which is all that you're really proposing). > It does introduce a race condition where you can introduce a colliding > object to the repository by doing two concurrent pushes, but as you note > in > https://public-inbox.org/git/20181029151842.gj17...@sigill.intra.peff.net/ > this already applies to packs, so you can trigger that with the right > sized push (depending on transfer.unpackLimit), and we also have this in > existing forms for other stuff. Right. It can also trigger currently if somebody runs "git repack" simultaneously (the loose becomes packed, but we don't re-scan the pack directory). > I do think it's amazingly paranoid to be worried about SHA-1 collisions > in the first place, and a bit odd to leave the door open on these race > conditions. I.e. it's hard to imagine a state-level[1] actor with > sufficient motivation to exploit this who wouldn't find some way to make > the race condition work as an escape hatch. Yeah, I agree there's an element of that. I think the "push twice quickly to race" thing is actually not all that interesting, though. In that case, you're providing both the objects already, so why not just push the one you want? What's more interesting is racing with the victim of your collision (I feed Junio the good half of the collision, and then try to race his push and get my evil half in at the same time). Or racing a repack. But timing the race there seems a lot trickier. I suspect you could open up the window substantially by feeding your pack really slowly. So I start to push at 1pm, but trickle in a byte at a time of my 1GB pack, taking several hours. Meanwhile Junio pushes, and then as soon as I see that, I send the rest of my pack. My index-pack doesn't see Junio's push because it started before. And ditto with repack, if the servers runs it predictably in response to load. So maybe not so tricky after all. I think the other thing that helps here is that _everybody_ runs the collision check. So yeah, you can race pushing your evil stuff to my server. But it only takes one person fetching into their quiescent laptop repository to notice the collision and sound the alarm. I'll admit that there's a whole lot of hand-waving there, for a security claim. I'll be glad to simply move off of SHA-1. > In a busy repo that gets a lot of branches / branch deletions (so not > quite as extreme as [2], but close) and the default expiry policy you > can easily have 20-100K loose objects (something near the lower bound of > that is the current live state of one server I'm looking at). > > A recursive opendir()/readdir() on that on local disk is really fast if > it's in cache, but can easily be 1-5 seconds on NFS. So for a push we'd > now pay up to 5s just populating a cache we'll bearly use to accept some > tiny push with just a few objects. That 1-5 seconds is a little scary. Locally for a million objects I was looking at 400ms. But obviously NFS is going to be much worse. I do agree with your sentiment below that even if this should be on by default, it should have a config knob. After all, "please flip this switch and see if things improve" is a good escape hatch to have. > * Re-roll my 4 patch series to include the patch you have in ><20181027093300.ga23...@sigill.intra.peff.net> I don't think it's quite ready for inclusion as-is. I hope to brush it up a bit, but I have quite a backlog of stuff to review, as well. -Peff
Re: [PATCH] pretty: Add %(trailer:X) to display single trailer
On Mon, Oct 29, 2018 at 3:14 PM Jeff King wrote: > Junio's review already covered my biggest question, which is why not > something like "%(trailers:key=ticket)". And likewise making things like > comma-separation options. Jeff, Junio, thanks! Your questions pretty much matches what I (and a colleague I discussed this with before posting) was concerned about. My first try actually had it as an option to "trailers". But it got a bit messy with the argument parsing, and the fact that there was a fast path making it work when only specified. I did not want to spend lot of time reworking fixing that before I had some feedback, so I went for a smallest possible patch to float the idea with (a patch is worth a 1000 words). I'll start by reworking my patch to handle %(trailers:key=X) (I'll assume keys never contain ')' or ','), and ignore any formatting until the way forward there is decided (see below). > But my second question is whether we want to provide something more > flexible than the always-parentheses that "%d" provides. That has been a > problem in the past when people want to format the decoration in some > other way. Maybe just like +/-/space can be used directly after %, a () pair can be allowed.. E.g "%d" would just be an alias for "%()D", and for trailers it would be something like "%()(trailers:key=foo)" There is another special cased placeholder %f (sanitized subject line, suitable for a filename). Which also could be changed to be a format specifiier, allowing sanitize any thing, e.g "%!an" for sanitized author name. Is even the linebreak to commaseparation a generic thing? "% ,()(trailers:key=Ticket)" it starts go look a bit silly. Then there are the padding modifiers. %<() %<|(). They operate on next placeholder. "%<(10)%s" Is that a better syntax? "%()%(trailers:key=Ticket,comma)" I can also imagine moving all these modifiers into a generic modifier syntax in brackets (and keeping old for backwards compat) %[lpad=10,ltrunc=10]s == %<(10,trunc)%s %[nonempty-prefix="%n"]GS == %+GS %[nonempty-prefix=" (",nonempty-suffix=")"]D == %d Which would mean something like this for tickets thing: %[nonempty-prefix=" (Tickets:",nonempty-suffix=")",commaseparatelines](trailers:key=Ticket,nokey) which is kinda verbose. > We have formatting magic for "if this thing is non-empty, then show this > prefix" in the for-each-ref formatter, but I'm not sure that we do in > the commit pretty-printer beyond "% ". I wonder if we could/should add a > a placeholder for "if this thing is non-empty, put in a space and > enclose it in parentheses". Would there be any interest in consolidating those formatters? Even though they are totally separate beasts today. I think having all attributes available on long form (e.g "%(authorname)") in addition to existing short forms in pretty-formatter would make sense. anders
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29, 2018 at 3:27 PM Jeff King wrote: > So yeah, that's the other thing I'm thinking about regarding having a > maximum loose cache size. tangent: This preloading/caching could be used for a more precise approach to decide when to gc instead of using some statistical sampling of objects/17, eventually.
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29, 2018 at 09:34:53PM +, Geert Jansen wrote: > As an example, this means that when you're recieving a pack file with 1K > objects in a repository with 10K loose objects that the loose-object-cache > patch has roughly the same performance as the current git. I'm not sure if > this > is something to worry about as I'm not sure people run repos with this many > loose files. If it is a concern, there could be a flag to turn the loose > object > cache on/off. So yeah, that's the other thing I'm thinking about regarding having a maximum loose cache size. 10k objects is only 200KB in memory. That's basically nothing. At some point you run into pathological cases, like having a million objects (but that's still only 20MB, much less than we devote to other caches, though of course they do add up). If you have a million loose objects, I strongly suspect you're going to run into other problems (like space, since you're not getting any deltas). The one thing that gives me pause is that if you have a bunch of unused and unreachable loose objects on disk, most operations won't actually look at them at all. The majority of operations are only looking for objects we expect to be present (e.g., resolving a ref, walking a tree) and are fulfilled by checking the pack indices first. So it's possible that Git is _tolerable_ for most operations with a million loose objects, and we could make it slightly worse by loading the cache. But I find it hard to get too worked up about spending an extra 20MB (and the time to readdir() it in) in that case. It seems like about 400ms on my machine, and the correct next step is almost always going to be "pack" or "prune" anyway. -Peff
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29, 2018 at 05:50:50PM -0400, Jeff King wrote: > > I believe the loose-object-cache approach would have a performance > > regression > > when you're receiving a small pack file and there's many loose objects in > > the > > repo. Basically you're trading off > > > > MIN(256, num_objects_in_pack / dentries_per_readdir) * readdir_latency > > > > against > > > > num_loose_objects * stat_latency > > Should num_loose_objects and num_objects_in_pack be swapped here? Just > making sure I understand what you're saying. Whoops, yes, thanks for spotting that! > So the 256 in your MIN() is potentially much smaller. We still have to > deal with the fact that if you have a large number of loose objects, > they may be split cross multiple readdir (or getdents) calls. The "cache > maximum" we discussed does bound that, but in some ways that's worse: > you waste time doing the bounded amount of readdir and then don't even > get the benefit of the cache. ;) Yup. To get the performance benefit you'd like the cache to hold all loose objects except in clearly degenerate cases with far too many loose objects. > > On Amazon EFS (and I expect on other NFS server implementations too) it is > > more > > efficient to do readdir() on a large directory than to stat() each of the > > individual files in the same directory. I don't have exact numbers but > > based on > > a very rough calculation the difference is roughly 10x for large directories > > under normal circumstances. > > I'd expect readdir() to be much faster than stat() in general (e.g., "ls > -f" versus "ls -l" is faster even on a warm cache; there's more > formatting going on in the latter, but I think a lot of it is the effort > to stat). In the case of NFS, the client usually requests that the READDIR response also contains some of the stat flags (like st_mode). But even in this case it's still more efficient to return multiple entries in one batch through READDIR rather than as individual responses to GETATTR (which is what stat() maps to).
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29, 2018 at 09:34:53PM +, Geert Jansen wrote: > On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote: > > > A real question is how much performance gain, relative to ".cloning" > > thing, this approach gives us. If it gives us 80% or more of the > > gain compared to doing no checking, I'd say we have a clear winner. > > I've tested Jeff's loose-object-cache patch and the performance is within > error > bounds of my .cloning patch. A git clone of the same repo as in my initial > tests: > > .cloning -> 10m04 > loose-object-cache -> 9m59 > > Jeff's patch does a little more work (256 readdir() calls, which in case of an > empty repo translate into 256 LOOKUP calls that return NFS4ERR_NOENT) but that > appears to be insignificant. Yep, that makes sense. Thanks for timing it. > I believe the loose-object-cache approach would have a performance regression > when you're receiving a small pack file and there's many loose objects in the > repo. Basically you're trading off > > MIN(256, num_objects_in_pack / dentries_per_readdir) * readdir_latency > > against > > num_loose_objects * stat_latency Should num_loose_objects and num_objects_in_pack be swapped here? Just making sure I understand what you're saying. The patch I showed just blindly reads each of the 256 object subdirectories. I think if we pursue this (and it seems like everybody is on board), we should cache each of those individually. So a single object would incur at most one opendir/readdir (and subsequent objects may, too, or they may hit that cache if they share the first byte). So the 256 in your MIN() is potentially much smaller. We still have to deal with the fact that if you have a large number of loose objects, they may be split cross multiple readdir (or getdents) calls. The "cache maximum" we discussed does bound that, but in some ways that's worse: you waste time doing the bounded amount of readdir and then don't even get the benefit of the cache. ;) > On Amazon EFS (and I expect on other NFS server implementations too) it is > more > efficient to do readdir() on a large directory than to stat() each of the > individual files in the same directory. I don't have exact numbers but based > on > a very rough calculation the difference is roughly 10x for large directories > under normal circumstances. I'd expect readdir() to be much faster than stat() in general (e.g., "ls -f" versus "ls -l" is faster even on a warm cache; there's more formatting going on in the latter, but I think a lot of it is the effort to stat). -Peff
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote: > A real question is how much performance gain, relative to ".cloning" > thing, this approach gives us. If it gives us 80% or more of the > gain compared to doing no checking, I'd say we have a clear winner. I've tested Jeff's loose-object-cache patch and the performance is within error bounds of my .cloning patch. A git clone of the same repo as in my initial tests: .cloning -> 10m04 loose-object-cache -> 9m59 Jeff's patch does a little more work (256 readdir() calls, which in case of an empty repo translate into 256 LOOKUP calls that return NFS4ERR_NOENT) but that appears to be insignificant. I agree that the loose-object-cache approach is preferred as it applies to more git commands and also benefits performance when there are loose objects already in the repository. As pointed out in the thread, maybe the default cache size should be some integer times gc.auto. I believe the loose-object-cache approach would have a performance regression when you're receiving a small pack file and there's many loose objects in the repo. Basically you're trading off MIN(256, num_objects_in_pack / dentries_per_readdir) * readdir_latency against num_loose_objects * stat_latency On Amazon EFS (and I expect on other NFS server implementations too) it is more efficient to do readdir() on a large directory than to stat() each of the individual files in the same directory. I don't have exact numbers but based on a very rough calculation the difference is roughly 10x for large directories under normal circumstances. As an example, this means that when you're recieving a pack file with 1K objects in a repository with 10K loose objects that the loose-object-cache patch has roughly the same performance as the current git. I'm not sure if this is something to worry about as I'm not sure people run repos with this many loose files. If it is a concern, there could be a flag to turn the loose object cache on/off.
Re: [PATCH v3 2/3] shallow: offer to prune only non-existing entries
Hi Jonathan, On Fri, 26 Oct 2018, Jonathan Tan wrote: > > So even better would be to use `is_promisor_object(oid) || > > has_object_file(oid)`, right? > > > > This is something that is probably not even needed: as I mentioned, > > the shallow commits are *expected* to be local. It should not ever > > happen that they are fetched. > > That would help, but I don't think it would help in the "fast-forward > from A to B where A is B's parent" case I describe in [1]. I don't think that that analysis is correct. It assumes that there could be a promised commit that is also listed as shallow. I do not think that is a possible scenario. And even if it were, why would asking for a promised commit object download not only that object but also all of its trees and all of its ancestors? That's not how I understood the idea of the lazy clone: I understood promised objects to be downloaded on demand, individually. > My suggestion was: > > > It sounds safer to me to use the fast approach in this patch when the > > repository is not partial, and stick to the slow approach when it is. > > which can be done by replacing "prune_shallow(0, 1)" in patch 3 with > "prune_shallow(0, !repository_format_partial_clone)", possibly with a comment > that the fast method checks object existence for each shallow line directly, > which is undesirable when the repository is a partial clone. I am afraid that that would re-introduce the bug pointed out by Peff: you *really* would need to traverse all reachable commits to use the non-fast pruning method. And we simply don't. Ciao, Dscho > (repository_format_partial_clone is non-NULL with the name of the promisor > remote if the repository is a partial clone, and NULL otherwise). > > [1] > https://public-inbox.org/git/20181025185459.206127-1-jonathanta...@google.com/ >
[PATCH v1] speed up refresh_index() by utilizing preload_index()
From: Ben Peart Speed up refresh_index() by utilizing preload_index() to do most of the work spread across multiple threads. This works because most cache entries will get marked CE_UPTODATE so that refresh_cache_ent() can bail out early when called from within refresh_index(). On a Windows repo with ~200K files, this drops refresh times from 6.64 seconds to 2.87 seconds for a savings of 57%. Signed-off-by: Ben Peart --- Notes: Base Ref: master Web-Diff: https://github.com/benpeart/git/commit/feee1054c2 Checkout: git fetch https://github.com/benpeart/git refresh-index-multithread-preload-v1 && git checkout feee1054c2 cache.h | 3 +++ preload-index.c | 8 read-cache.c| 6 ++ 3 files changed, 13 insertions(+), 4 deletions(-) diff --git a/cache.h b/cache.h index f7fabdde8f..883099db08 100644 --- a/cache.h +++ b/cache.h @@ -659,6 +659,9 @@ extern int daemonize(void); /* Initialize and use the cache information */ struct lock_file; extern int read_index(struct index_state *); +extern void preload_index(struct index_state *index, + const struct pathspec *pathspec, + unsigned int refresh_flags); extern int read_index_preload(struct index_state *, const struct pathspec *pathspec, unsigned int refresh_flags); diff --git a/preload-index.c b/preload-index.c index 9e7152ab14..222792ccbc 100644 --- a/preload-index.c +++ b/preload-index.c @@ -9,7 +9,7 @@ #include "progress.h" #ifdef NO_PTHREADS -static void preload_index(struct index_state *index, +void preload_index(struct index_state *index, const struct pathspec *pathspec, unsigned int refresh_flags) { @@ -100,9 +100,9 @@ static void *preload_thread(void *_data) return NULL; } -static void preload_index(struct index_state *index, - const struct pathspec *pathspec, - unsigned int refresh_flags) +void preload_index(struct index_state *index, + const struct pathspec *pathspec, + unsigned int refresh_flags) { int threads, i, work, offset; struct thread_data data[MAX_PARALLEL]; diff --git a/read-cache.c b/read-cache.c index d57958233e..53733d651d 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1496,6 +1496,12 @@ int refresh_index(struct index_state *istate, unsigned int flags, typechange_fmt = (in_porcelain ? "T\t%s\n" : "%s needs update\n"); added_fmt = (in_porcelain ? "A\t%s\n" : "%s needs update\n"); unmerged_fmt = (in_porcelain ? "U\t%s\n" : "%s: needs merge\n"); + /* +* Use the multi-threaded preload_index() to refresh most of the +* cache entries quickly then in the single threaded loop below, +* we only have to do the special cases that are left. +*/ + preload_index(istate, pathspec, 0); for (i = 0; i < istate->cache_nr; i++) { struct cache_entry *ce, *new_entry; int cache_errno = 0; base-commit: c670b1f876521c9f7cd40184bf7ed05aad843433 -- 2.9.2.gvfs.2.27918.g0990287eef
Re: [RFC] Generation Number v2
Here is a re-formatted version of the tables I introduced earlier. The tables were too wide for public-inbox to render correctly (when paired with my email client). Hopefully this bulleted-list format works better. Thanks, Stefan, for pointing out the rendering problems! ### Test 1: `git log --topo-order -N` This test focuses on the number of commits that are parsed during a `git log --topo-order` before writing `N` commits to output. You can reproduce this test using `topo-order-tests.sh` and see all the data in `topo-order-summary.txt`. The values reported here are a sampling of the data, ignoring tests where all values were the same or extremely close in value. android-base, N = 100 V0: 5487 V1: 8534 V2: 6937 V3: 6419 V4: 6453 android-base, N = 1000 V0: 36029 V1: 44030 V2: 41493 V3: 41206 V4: 45431 chromium, N = 100 V0: 101 V1: 424406 V2: 101 V3: 101 V4: 101 gerrit, N = 100 V0: 8212 V1: 8533 V2: 164 V3: 159 V4: 162 gerrit, N = 1000 V0: 8512 V1: 8533 V2: 1990 V3: 1973 V4: 3766 Linux, N = 100 V0: 12458 V1: 12444 V2: 13683 V3: 13123 V4: 13124 Linux, N = 1000 V0: 24436 V1: 26247 V2: 27878 V3: 26430 V4: 27875 Linux, N = 1 V0: 30364 V1: 28891 V2: 27878 V3: 26430 V4: 27875 electron, N = 1000 V0: 19927 V1: 18733 V2: 1072 V3: 18214 V4: 18214 Ffmpeg, N = 1 V0: 32154 V1: 47429 V2: 10435 V3: 11054 V4: 11054 jgit, N = 1000 V0: 1550 V1: 6264 V2: 1067 V3: 1060 V4: 1233 julia, N = 1 V0: 43043 V1: 43043 V2: 10201 V3: 23505 V4: 23828 odoo, N = 1000 V0: 17175 V1: 9714 V2: 4043 V3: 4046 V4: 4111 php-src, N = 1000 V0: 19014 V1: 27530 V2: 1311 V3: 1305 V4: 1320 rails, N = 100 V0: 1420 V1: 2041 V2: 1757 V3: 1428 V4: 1441 rails, N = 1000 V0: 7952 V1: 10145 V2: 10053 V3: 8373 V4: 8373 swift, N = 1000 V0: 1914 V1: 4004 V2: 2071 V3: 1939 V4: 1940 tensorflow, N = 1000 V0: 10019 V1: 39221 V2: 6711 V3: 10051 V4: 10051 TypeScript, N = 1000 V0: 2873 V1: 12014 V2: 3049 V3: 2876 V4: 2876 ### Test 2: `git log --topo-order -10 A..B` This test focuses on the number of commits that are parsed during a `git log --topo-order A..B` before writing ten commits to output. Since we fix a very small set of output commits, we care more about the part of the walk that determines which commits are reachable from `B` but not reachable from `A`. This part of the walk uses commit date as a heuristic in the existing implementation. You can reproduce this test using `topo-compare-tests.sh` and see all the data in `topo-compare-summary.txt`. The values reported here are a sampling of the data, ignoring tests where all values were the same or extremely close in value. _Note:_ For some of the rows, the `A` and `B` values may be swapped from what is expected. This is due to (1) a bug in the reference implementation that doesn't short-circuit the walk when `A` can reach `B`, and (2) data-entry errors by the author. The bug can be fixed, but would have introduced strange-looking rows in this table. android-base 53c1972bc8f 92f18ac3e39 OLD: 39403 V0: 1544 V1: 6957 V2: 26 V3: 1015 V4: 1098 gerrit c4311f7642 777a8cd1e0 OLD: 6457 V0: 7836 V1: 10869 V2: 415 V3: 414 V4: 445 electron 7da7dd85e addf069f2 OLD: 18164 V0: 945 V1: 6528 V2: 17 V3: 17 V4: 18 julia 7faee1b201 e2022b9f0f OLD: 22800 V0: 4221 V1: 12710 V2: 377 V3: 213 V4: 213 julia ae69259cd9 c8b5402afc OLD: 1864 V0: 1859 V1: 13287 V2: 12 V3: 1859 V4: 1859 Linux 69973b830859 c470abd4fde4 OLD: 111692 V0: 77263 V1: 96598 V2: 80238 V3: 76332 V4: 76495 Linux c3b92c878736 19f949f52599 OLD: 167418 V0: 5736 V1: 4684 V2: 9675 V3: 3887 V4: 3923 Linux c8d2bc9bc39e 69973b830859 OLD: 44940 V0: 4056 V1: 16636 V2: 10405 V3: 3475 V4: 4022 odoo 4a31f55d0a0 93fb2b4a616 OLD: 25139 V0: 19528 V1: 20418 V2: 19874 V3: 19634 V4: 27247 swift 4046359efd b34b6a14c7 OLD: 13411 V0: 662 V1: 321 V2: 12 V3: 80 V4: 134 tensorflow ec6d17219c fa1db5eb0d OLD: 10373 V0: 4762 V1: 36272 V2: 174 V3: 3631 V4: 3632 TypeScript 35ea2bea76 123edced90 OLD: 3450 V0: 267 V1: 10386 V2: 27 V3: 259 V4: 259 ### Test 3: `git merge-base A B` This test focuses on the number of commits that are parsed during a `git merge-base A B`. This part of the walk uses commit date as a heuristic in the existing implementation. You can reproduce this test using `merge-base-tests.sh` and see all the data in `merge-base-summary.txt`. The values reported here are a
Re: [PATCH v2 06/10] preload-index.c: remove #ifdef NO_PTHREADS
On Mon, Oct 29, 2018 at 02:05:58PM -0400, Ben Peart wrote: > On 10/29/2018 1:26 PM, Duy Nguyen wrote: > > On Mon, Oct 29, 2018 at 6:21 PM Ben Peart wrote: > > > @@ -114,6 +104,9 @@ static void preload_index(struct index_state *index, > > > threads = index->cache_nr / THREAD_COST; > > > if ((index->cache_nr > 1) && (threads < 2) && > > > git_env_bool("GIT_TEST_PRELOAD_INDEX", 0)) > > > threads = 2; > > > + cpus = online_cpus(); > > > + if (threads > cpus) > > > + threads = cpus; > > > if (threads < 2) > > > return; > > > trace_performance_enter(); > > > > Capping the number of threads to online_cpus() does not always make > > sense. In this case (or at least the original use case where we stat() > > over nfs) we want to issue as many requests to nfs server as possible > > to reduce latency and it has nothing to do with how many cores we > > have. Using more threads than cores might make sense since threads are > > blocked by nfs client, but we still want to send more to the server. > > > > That makes sense. Some test will be necessary then. > > I guess HAVE_THREADS is functionally the same as online_cpus() == 1. For > some reason, I prefer the latter - probably because I'm typically already > calling it and so it doesn't feel like a 'special' value/test I have to > remember. I just know I need to make sure the logic works correctly with any > number of cps greater than zero. :-) I don't think they're functionally the same. If you see online_cpus() == 1, you are in one of two cases: - the system supports threads, but CPU-bound tasks should stick to a single thread - the system does not even support threading, and calling pthread_create() will give you ENOSYS In the first case, if your task is _not_ CPU bound (i.e., the stat() thing), then you want to distinguish those cases. I'm not sure if I'm violently agreeing, but it just wasn't clear to me from the above discussion that we all agreed that HAVE_THREADS is still necessary in some cases. (I do think in cases where it is not, that we would do just as well to avoid it; if that is all you were saying, then I agree. ;) ). -Peff
Re: [RFC] Generation Number v2
On 10/29/2018 3:22 PM, Stefan Beller wrote: Based on the performance results alone, we should remove minimum generation numbers, (epoch, date) pairs, and FELINE index from consideration. There are enough examples of these indexes performing poorly. In contrast, maximum generation numbers and corrected commit dates both performed quite well. They are frequently the top two performing indexes, and rarely significantly different. The trade-off here now seems to be: which _property_ is more important, locally-computable or backwards-compatible? * Maximum generation number is backwards-compatible but not locally-computable or immutable. These maximum generation numbers sound like the reverse of the generation numbers as they are currently implemented, i.e. we count all commits between the commit A and all heads. How would this impact creation of a commit? The indexes listed here would be computed at the same time as a commit-graph (and only change on commit-graph writes). This idea of having something depend on the "global state" isn't ideal (in my opinion) but it certainly works for the tests specified below. The current generation numbers can be lazily updated or not updated at all. In my understanding of the maximum generation numbers, a new commit would make these maximum generation numbers invalid (i.e. they have to be recomputed). New commits would change the value you would get by the definition (on rewrite of the commit-graph) but don't violate the reachability index property (maxgen(A) < maxgen(B) => A can't reach B). So that doesn't effect their usefulness. Are there ways out by deferring the computation of maximum generation numbers while still being able to work with some commits that are un-accounted for? Creating a commit that doesn't exist in the commit-graph file results in a commit with generation number INFINITY. The new commits can't be reached by commits with finite generation number. When recomputing these numbers, the current generation number (V0) has the property that already existing numbers can be re-used as-is. We only need to compute the numbers for new commits, and then insert this to the appropriate data structures (which is a separate problem, one could imagine a split generation numbers file like the split index) For the V2 maximum generation numbers, would we need to rewrite the numbers for all commits once we recompute them? Assuming that is true, it sounds like the benchmark doesn't cover the whole costs associated with V2, which is why the exceptional performance can be explained. You're right, we need to recompute the entire list of maximum generation number values. However, you are a bit hung up on "maxgen" being a fixed function that is correct at all times. We could compute the "maxgen" for the set of commits that are written to the "split" commit-graph while leaving the base the same. There is one tricky part here: we need to initialize the values starting at "num commits in the combined commit-graph" (commits in the base and split portion). The minimum possible value in the split portion will be larger than the maximum possible value in the base portion. Since the (future) implementation of split commit-graphs would have all commit-parent edges pointing down into the base and not from the base into the split, this will continue to satisfy our reachability index property. As for the cost: we need to do a topo-order walk and then a quick minimum-value check across all parents. This is O(N), and the constant is small. This may be a cost worth paying. (Note: The table as read in https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51ef...@gmail.com/ seems line broken, using gmails web interface is not good for ASCII art and patches, git-send-email would fare better) Sorry for that. I copy-pasted into Thunderbird. Hopefully most users can redirect to the GitHub-rendered markdown if they have trouble. * Corrected commit-date is locally-computable and immutable, but not backwards-compatible. How are these dates not backwards incompatible? We don't have to expose these dates to the user, but - just like generation numbers - could store them and use them but not tell the user about it. We would need to be really careful to not expose them at all as they look like the real dates, so that could make for an awkward bug report. By "backwards-compatible" I mean that we could store them in the commit-graph file without breaking existing clients (or by introducing extra data). We could easily add these corrected commit dates as an optional chunk, but that adds 8 bytes per commit, and we already use 8 bytes for the (generation, date) pairs. The solution I made in my prototype is to replace the generation number with an offset for how much to add to the commit-date to get the corrected commit-date. However, an old client would interpret these offsets as generations and have incorrect output. A
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29 2018, Jeff King wrote: > On Sat, Oct 27, 2018 at 01:22:16PM +0200, Ævar Arnfjörð Bjarmason wrote: > >> > Taking one step back, the root problem in this thread is that stat() on >> > non-existing files is slow (which makes has_sha1_file slow). >> > >> > One solution there is to cache the results of looking in .git/objects >> > (or any alternate object store) for loose files. And indeed, this whole >> > scheme is just a specialized form of that: it's a flag to say "hey, we >> > do not have any objects yet, so do not bother looking". >> > >> > Could we implement that in a more direct and central way? And could we >> > implement it in a way that catches more cases? E.g., if I have _one_ >> > object, that defeats this specialized optimization, but it is probably >> > still beneficial to cache that knowledge (and the reasonable cutoff is >> > probably not 1, but some value of N loose objects). >> [...] >> >> The assumption with making it exactly 0 objects and not any value of >0 >> is that we can safely assume that a "clone" or initial "fetch"[1] is >> special in ways that a clone isn't. I.e. we're starting out with nothing >> and doing the initial population, that's probably not as true in an >> existing repo that's getting concurrent fetches, commits, rebases etc. > > I assume you mean s/that a clone isn't/that a fetch isn't/. Yes, sorry. > I agree there are cases where you might be able to go further if you > assume a full "0". But my point is that "clone" is an ambiguous concept, > and it doesn't map completely to what's actually slow here. So if you > only look at "are we cloning", then: > > - you have a bunch of cases which are "clones", but aren't actually > starting from scratch > > - you get zero benefit in the non-clone cases, when we could be > scaling the benefit smoothly Indeed. It's not special in principle, but I think in practice the biggest wins are in the clone case, and unlike "fetch" we can safely assume we're free of race conditions. More on that below. >> But in the spirit of taking a step back, maybe we should take two steps >> back and consider why we're doing this at all. > > OK, I think it's worth discussing, and I'll do that below. But first I > want to say... > >> Three of our tests fail if we compile git like this, and cloning is much >> faster (especially on NFS): >> >> diff --git a/builtin/index-pack.c b/builtin/index-pack.c >> index 2004e25da2..0c2d008ee0 100644 >> --- a/builtin/index-pack.c >> +++ b/builtin/index-pack.c >> @@ -796,3 +796,3 @@ static void sha1_object(const void *data, struct >> object_entry *obj_entry, >> >> - if (startup_info->have_repository) { >> + if (0) { >> read_lock(); >> >> Even on a local disk I'm doing 262759 lstat() calls cloning git.git and >> spending 5% of my time on that. > > With the caching patch I posted earlier, I see roughly the same speedup > on an index-pack of git.git as I do with disabling the collision check > entirely (I did see about a 1% difference in favor of what you wrote > above, which was within the noise, but may well be valid due to slightly > reduced lock contention). > > TBH I'm not sure if any of this is actually worth caring about on a > normal Linux system, though. There stat() is fast. It might be much more > interesting on macOS or Windows, or on a Linux system on NFS. It matters a *lot* on NFS as my performance numbers in https://public-inbox.org/git/87d0rslhkl@evledraar.gmail.com/ show. >> But why do we have this in the first place? It's because of 8685da4256 >> ("don't ever allow SHA1 collisions to exist by fetching a pack", >> 2007-03-20) and your 51054177b3 ("index-pack: detect local corruption in >> collision check", 2017-04-01). >> >> I.e. we are worried about (and those tests check for): >> >> a) A malicious user trying to give us repository where they have >> created an object with the same SHA-1 that's different, as in the >> SHAttered attack. >> >> I remember (but have not dug up) an old E-Mail from Linus saying >> that this was an important security aspect of git, i.e. even if >> SHA-1 was broken you couldn't easily propagate bad objects. > > Yeah, especially given recent advances in SHA-1 attacks, I'm not super > comfortable with the idea of disabling the duplicate-object check at > this point. I'd be comfortable with it in my setup since it's been limited to collision attacks that are computationally prohibitive, and there being no sign of preimage attacks, which is the case we really need to worry about. >> b) Cases where we've ended up with different content for a SHA-1 due to >> e.g. a local FS corruption. Which is the subject of your commit in >> 2017. > > Sort of. We actually detected it before my patch, but we just gave a > really crappy error message. ;) > >> c) Are there cases where fetch.fsckObjects is off and we just flip a >> bit on the wire and don't notice? I
Re: [RFC] Generation Number v2
> Based on the performance results alone, we should remove minimum > generation numbers, (epoch, date) pairs, and FELINE index from > consideration. There are enough examples of these indexes performing > poorly. > > In contrast, maximum generation numbers and corrected commit > dates both performed quite well. They are frequently the top > two performing indexes, and rarely significantly different. > > The trade-off here now seems to be: which _property_ is more important, > locally-computable or backwards-compatible? > > * Maximum generation number is backwards-compatible but not >locally-computable or immutable. These maximum generation numbers sound like the reverse of the generation numbers as they are currently implemented, i.e. we count all commits between the commit A and all heads. How would this impact creation of a commit? The current generation numbers can be lazily updated or not updated at all. In my understanding of the maximum generation numbers, a new commit would make these maximum generation numbers invalid (i.e. they have to be recomputed). Are there ways out by deferring the computation of maximum generation numbers while still being able to work with some commits that are un-accounted for? When recomputing these numbers, the current generation number (V0) has the property that already existing numbers can be re-used as-is. We only need to compute the numbers for new commits, and then insert this to the appropriate data structures (which is a separate problem, one could imagine a split generation numbers file like the split index) For the V2 maximum generation numbers, would we need to rewrite the numbers for all commits once we recompute them? Assuming that is true, it sounds like the benchmark doesn't cover the whole costs associated with V2, which is why the exceptional performance can be explained. (Note: The table as read in https://public-inbox.org/git/6367e30a-1b3a-4fe9-611b-d931f51ef...@gmail.com/ seems line broken, using gmails web interface is not good for ASCII art and patches, git-send-email would fare better) > > * Corrected commit-date is locally-computable and immutable, >but not backwards-compatible. How are these dates not backwards incompatible? We don't have to expose these dates to the user, but - just like generation numbers - could store them and use them but not tell the user about it. We would need to be really careful to not expose them at all as they look like the real dates, so that could make for an awkward bug report. The approach of "corrected commit date" sounds like we could have a very lazy approach, i.e. no extra data structures needed for many commits (as the corrected date equals the real date) and only need to store the corrections for some commits. Such an approach however would not make it easy to know if we operate on corrected dates, or if we even checked them for correctness before. So if we'd have an additional row in the generation numbers file telling the corrected date, then we should be able to be backwards compatible?
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29 2018, Jeff King wrote: > On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote: > >> > Of course any cache raises questions of cache invalidation, but I think >> > we've already dealt with that for this case. When we use >> > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of >> > accuracy/speed tradeoff (which does a similar caching thing with >> > packfiles). >> > >> > So putting that all together, could we have something like: >> >> I think this conceptually is a vast improvement relative to >> ".cloning" optimization. Obviously this does not have the huge >> downside of the other approach that turns the collision detection >> completely off. >> >> A real question is how much performance gain, relative to ".cloning" >> thing, this approach gives us. If it gives us 80% or more of the >> gain compared to doing no checking, I'd say we have a clear winner. > > My test runs showed it improving index-pack by about 3%, versus 4% for > no collision checking at all. But there was easily 1% of noise. And much > more importantly, that was on a Linux system on ext4, where stat is > fast. I'd be much more curious to hear timing results from people on > macOS or Windows, or from Geert's original NFS case. At work we make copious use of NetApp over NFS for filers. I'd say this is probably typical for enterprise environments. Raw I/O performance over the wire (writing a large file) is really good, but metadata (e.g. stat) performance tends to be atrocious. We both host the in-house Git server (GitLab) on such a filer (for HA etc.), as well as many types of clients. As noted by Geert upthread you need to mount the git directories with lookupcache=positive (see e.g. [1]). Cloning git.git as --bare onto such a partition with my patch: % time seconds usecs/call callserrors syscall -- --- --- - - 60.981.802091 19 93896 19813 futex 14.640.432782 7 6141516 read 9.400.277804 1199576 pread64 4.880.144172 3 4935511 write 3.100.091498 31 2919 2880 stat 2.530.074812 31 2431 737 lstat 1.960.057934 3 17257 1276 recvfrom 0.910.026815 3 8543 select 0.620.018425 2 8543 poll [...] real0m32.053s user0m21.451s sys 0m7.806s Without: % time seconds usecs/call callserrors syscall -- --- --- - - 71.01 31.653787 50628265 21608 futex 24.14 10.761950 41260658258964 lstat 2.220.988001 5199576 pread64 1.320.587844 10 59662 3 read 0.790.350625 7 5037611 write 0.220.096019 33 2919 2880 stat 0.130.057950 4 1582112 recvfrom 0.050.022385 3 7949 select 0.040.015988 2 7949 poll 0.030.0136223406 4 wait4 [...] real4m38.670s user0m29.015s sys 0m33.894s So a reduction in clone time by ~90%. Performance would be basically the same with your patch. But let's discuss that elsewhere in this thread. Just wanted to post the performance numbers here. 1. https://gitlab.com/gitlab-com/gl-infra/infrastructure/issues/109#note_12528896
Re: [PATCH v2 06/10] preload-index.c: remove #ifdef NO_PTHREADS
On 10/29/2018 1:26 PM, Duy Nguyen wrote: On Mon, Oct 29, 2018 at 6:21 PM Ben Peart wrote: @@ -114,6 +104,9 @@ static void preload_index(struct index_state *index, threads = index->cache_nr / THREAD_COST; if ((index->cache_nr > 1) && (threads < 2) && git_env_bool("GIT_TEST_PRELOAD_INDEX", 0)) threads = 2; + cpus = online_cpus(); + if (threads > cpus) + threads = cpus; if (threads < 2) return; trace_performance_enter(); Capping the number of threads to online_cpus() does not always make sense. In this case (or at least the original use case where we stat() over nfs) we want to issue as many requests to nfs server as possible to reduce latency and it has nothing to do with how many cores we have. Using more threads than cores might make sense since threads are blocked by nfs client, but we still want to send more to the server. That makes sense. Some test will be necessary then. I guess HAVE_THREADS is functionally the same as online_cpus() == 1. For some reason, I prefer the latter - probably because I'm typically already calling it and so it doesn't feel like a 'special' value/test I have to remember. I just know I need to make sure the logic works correctly with any number of cps greater than zero. :-)
Re: [PATCH 09/10] read-cache.c: remove #ifdef NO_PTHREADS
On 10/29/2018 1:21 PM, Duy Nguyen wrote: On Mon, Oct 29, 2018 at 6:05 PM Ben Peart wrote: @@ -2756,8 +2745,11 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (ce_write(, newfd, , sizeof(hdr)) < 0) return -1; -#ifndef NO_PTHREADS - nr_threads = git_config_get_index_threads(); + if (HAVE_THREADS) This shouldn't be needed either. My assumption was that if someone explicitly asked for multiple threads we're better off failing than silently ignoring their request. The default/automatic setting will detect the number of cpus and behave correctly. No. I can have ~/.gitconfig shared between different machines. One with multithread support and one without. I should not have to fall back to conditional includes in order to use the same config file on the machine without multithread. If you want to share the ~/.gitconfig across machines that have different numbers of CPUs, it makes more sense to me to leave the setting at the "auto" setting rather than specifically overriding it to a number that won't work on at least one of your machines. Then it all "just works" without the need to conditional includes.
[PATCH] completion: use builtin completion for format-patch
Signed-off-by: Denton Liu --- contrib/completion/git-completion.bash | 11 +-- 1 file changed, 1 insertion(+), 10 deletions(-) diff --git a/contrib/completion/git-completion.bash b/contrib/completion/git-completion.bash index d63d2dffd..da77da481 100644 --- a/contrib/completion/git-completion.bash +++ b/contrib/completion/git-completion.bash @@ -1531,15 +1531,6 @@ _git_fetch () __git_complete_remote_or_refspec } -__git_format_patch_options=" - --stdout --attach --no-attach --thread --thread= --no-thread - --numbered --start-number --numbered-files --keep-subject --signoff - --signature --no-signature --in-reply-to= --cc= --full-index --binary - --not --all --cover-letter --no-prefix --src-prefix= --dst-prefix= - --inline --suffix= --ignore-if-in-upstream --subject-prefix= - --output-directory --reroll-count --to= --quiet --notes -" - _git_format_patch () { case "$cur" in @@ -1550,7 +1541,7 @@ _git_format_patch () return ;; --*) - __gitcomp "$__git_format_patch_options" + __gitcomp_builtin format-patch return ;; esac -- 2.19.1
Re: [PATCH 12/12] fsck: mark strings for translation
On Mon, Oct 29 2018, Ævar Arnfjörð Bjarmason wrote: > On Mon, Oct 29 2018, Duy Nguyen wrote: > >> On Mon, Oct 29, 2018 at 3:09 PM Junio C Hamano wrote: >>> >>> SZEDER Gábor writes: >>> >>> >> -fprintf(stderr, "%s in %s %s: %s\n", >>> >> -msg_type, printable_type(obj), describe_object(obj), err); >>> >> +fprintf_ln(stderr, _("%s in %s %s: %s"), >>> > >>> > Are the (f)printf() -> (f)printf_ln() changes all over >>> > 'builtin/fsck.c' really necessary to mark strings for translation? >>> >>> It is beyond absolute minimum but I saw it argued here that this >>> makes it easier to manage the .po and .pot files if your message >>> strings do not end with LF, a you are much less likely to _add_ >>> unneeded LF to the translated string than _lose_ LF at the end of >>> translated string. >> >> Especially when \n plays an important role and we don't trust >> translators to keep it [1] [2]. It's probably a too defensive stance >> and often does not apply, so nowadays I do it just to keep a >> consistent pattern in the code. >> >> [1] https://public-inbox.org/git/20120308220131.GA10122@burratino/#t >> [2] but then translators can crash programs anyway (e.g. changing %d >> to %s...) we just trust gettext tools to catch problems early. > > As Jonathan pointed out in the follow-up message[1] this sort of thing > is checked for in msgfmt, so sure, let's strip the \n, but it's really > not something we need to worry about. Likewise with translators turning > "%s" into "%d" (or "% s") or whatever. > > $ git diff -U0 > diff --git a/po/de.po b/po/de.po > index 47986814c9..62de46c63d 100644 > --- a/po/de.po > +++ b/po/de.po > @@ -23 +23 @@ msgid "%shint: %.*s%s\n" > -msgstr "%sHinweis: %.*s%s\n" > +msgstr "%sHinweis: %.*s%s" > $ make [...] > [...] > po/de.po:23: 'msgid' and 'msgstr' entries do not both end with '\n' > msgfmt: found 1 fatal error > 3470 translated messages. > make: *** [Makefile:2488: po/build/locale/de/LC_MESSAGES/git.mo] Error 1 > > 1. > https://public-inbox.org/git/cacsjy8acuy9fziiehgc7mel4i+xp6u0pmh1rgor-wznhyt1...@mail.gmail.com/ ...to add to that, if there *are* formatting errors that --check doesn't spot let's hear about it so we can just patch gettext upstream: https://github.com/autotools-mirror/gettext/blob/master/gettext-tools/src/msgl-check.c#L572-L756 Unlike the rest of our stack where we need to support however many years old tools we can freely rely on bleeding-edge gettext features, since the only person we need to convince to upgrade is Jiang. I.e. he accepts the PRs from translators, (presumably) runs msgfmt --check and then submits the result to Junio. See the "Usually..." paragraph in my 66f5f6dca9 ("C style: use standard style for "TRANSLATORS" comments", 2017-05-11) for an example. We already rely on a fairly recent gettext toolchain.
Re: [PATCH 12/12] fsck: mark strings for translation
On Mon, Oct 29 2018, Duy Nguyen wrote: > On Mon, Oct 29, 2018 at 3:09 PM Junio C Hamano wrote: >> >> SZEDER Gábor writes: >> >> >> -fprintf(stderr, "%s in %s %s: %s\n", >> >> -msg_type, printable_type(obj), describe_object(obj), err); >> >> +fprintf_ln(stderr, _("%s in %s %s: %s"), >> > >> > Are the (f)printf() -> (f)printf_ln() changes all over >> > 'builtin/fsck.c' really necessary to mark strings for translation? >> >> It is beyond absolute minimum but I saw it argued here that this >> makes it easier to manage the .po and .pot files if your message >> strings do not end with LF, a you are much less likely to _add_ >> unneeded LF to the translated string than _lose_ LF at the end of >> translated string. > > Especially when \n plays an important role and we don't trust > translators to keep it [1] [2]. It's probably a too defensive stance > and often does not apply, so nowadays I do it just to keep a > consistent pattern in the code. > > [1] https://public-inbox.org/git/20120308220131.GA10122@burratino/#t > [2] but then translators can crash programs anyway (e.g. changing %d > to %s...) we just trust gettext tools to catch problems early. As Jonathan pointed out in the follow-up message[1] this sort of thing is checked for in msgfmt, so sure, let's strip the \n, but it's really not something we need to worry about. Likewise with translators turning "%s" into "%d" (or "% s") or whatever. $ git diff -U0 diff --git a/po/de.po b/po/de.po index 47986814c9..62de46c63d 100644 --- a/po/de.po +++ b/po/de.po @@ -23 +23 @@ msgid "%shint: %.*s%s\n" -msgstr "%sHinweis: %.*s%s\n" +msgstr "%sHinweis: %.*s%s" $ make [...] [...] po/de.po:23: 'msgid' and 'msgstr' entries do not both end with '\n' msgfmt: found 1 fatal error 3470 translated messages. make: *** [Makefile:2488: po/build/locale/de/LC_MESSAGES/git.mo] Error 1 1. https://public-inbox.org/git/cacsjy8acuy9fziiehgc7mel4i+xp6u0pmh1rgor-wznhyt1...@mail.gmail.com/
Re: [PATCH v2 06/10] preload-index.c: remove #ifdef NO_PTHREADS
On Mon, Oct 29, 2018 at 6:21 PM Ben Peart wrote: > @@ -114,6 +104,9 @@ static void preload_index(struct index_state *index, > threads = index->cache_nr / THREAD_COST; > if ((index->cache_nr > 1) && (threads < 2) && > git_env_bool("GIT_TEST_PRELOAD_INDEX", 0)) > threads = 2; > + cpus = online_cpus(); > + if (threads > cpus) > + threads = cpus; > if (threads < 2) > return; > trace_performance_enter(); Capping the number of threads to online_cpus() does not always make sense. In this case (or at least the original use case where we stat() over nfs) we want to issue as many requests to nfs server as possible to reduce latency and it has nothing to do with how many cores we have. Using more threads than cores might make sense since threads are blocked by nfs client, but we still want to send more to the server. -- Duy
Re: [PATCH v2 08/10] read-cache.c: remove #ifdef NO_PTHREADS
On 10/29/2018 10:30 AM, Jeff King wrote: On Sat, Oct 27, 2018 at 07:30:06PM +0200, Nguyễn Thái Ngọc Duy wrote: -#ifndef NO_PTHREADS - nr_threads = git_config_get_index_threads(); + if (HAVE_THREADS) { + nr_threads = git_config_get_index_threads(); - /* TODO: does creating more threads than cores help? */ - if (!nr_threads) { - nr_threads = istate->cache_nr / THREAD_COST; - cpus = online_cpus(); - if (nr_threads > cpus) - nr_threads = cpus; + /* TODO: does creating more threads than cores help? */ + if (!nr_threads) { + nr_threads = istate->cache_nr / THREAD_COST; + cpus = online_cpus(); + if (nr_threads > cpus) + nr_threads = cpus; + } + } else { + nr_threads = 1; } I'd have thought we could just rely on online_cpus() returning 1 here to avoid having to ask "do we even have thread support?". But I guess that TODO comment implies that we might one day two 2 threads on a single CPU. -Peff And here is the patch I created when testing out the idea of removing NO_PTHREADS. It doesn't require any 'HAVE_THREADS' tests. diff --git a/read-cache.c b/read-cache.c index 1df5c16dbc..1f088799fc 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1920,19 +1920,15 @@ struct index_entry_offset_table struct index_entry_offset entries[FLEX_ARRAY]; }; -#ifndef NO_PTHREADS static struct index_entry_offset_table *read_ieot_extension(const char *mmap, size_t mmap_size, size_t offset ); static void write_ieot_extension(struct strbuf *sb, struct index_entry_offset_table *ieot); -#endif static size_t read_eoie_extension(const char *mmap, size_t mmap_size); static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset); struct load_index_extensions { -#ifndef NO_PTHREADS pthread_t pthread; -#endif struct index_state *istate; const char *mmap; size_t mmap_size; @@ -2010,8 +2006,6 @@ static unsigned long load_all_cache_entries(struct index_state *istate, return consumed; } -#ifndef NO_PTHREADS - /* * Mostly randomly chosen maximum thread counts: we * cap the parallelism to online_cpus() threads, and we want @@ -2122,7 +2116,6 @@ static unsigned long load_cache_entries_threaded(struct index_state *istate, con return consumed; } -#endif /* remember to discard_cache() before reading a different cache! */ int do_read_index(struct index_state *istate, const char *path, int must_exist) @@ -2135,10 +2128,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) size_t mmap_size; struct load_index_extensions p; size_t extension_offset = 0; -#ifndef NO_PTHREADS int nr_threads, cpus; struct index_entry_offset_table *ieot = NULL; -#endif if (istate->initialized) return istate->cache_nr; @@ -2181,16 +2172,12 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist ) src_offset = sizeof(*hdr); -#ifndef NO_PTHREADS nr_threads = git_config_get_index_threads(); - - /* TODO: does creating more threads than cores help? */ - if (!nr_threads) { + if (!nr_threads) nr_threads = istate->cache_nr / THREAD_COST; - cpus = online_cpus(); - if (nr_threads > cpus) - nr_threads = cpus; - } + cpus = online_cpus(); + if (nr_threads > cpus) + nr_threads = cpus; if (nr_threads > 1) { extension_offset = read_eoie_extension(mmap, mmap_size); @@ -2219,22 +2206,16 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist ) } else { src_offset += load_all_cache_entries(istate, mmap, mmap_size, src_offset); } -#else - src_offset += load_all_cache_entries(istate, mmap, mmap_size, src_offset); -#endif istate->timestamp.sec = st.st_mtime; istate->timestamp.nsec = ST_MTIME_NSEC(st); /* if we created a thread, join it otherwise load the extensions on the primary thread */ -#ifndef NO_PTHREADS if (extension_offset) { int ret = pthread_join(p.pthread, NULL); if (ret) die(_("unable to join load_index_extensions thread: %s"), strerror(ret)); - } -#endif - if (!extension_offset) { + } else { p.src_offset = src_offset; load_index_extensions(); } @@ -2756,7 +2737,6 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (ce_write(, newfd, , sizeof(hdr)) < 0) return -1; -#ifndef NO_PTHREADS nr_threads =
Re: [PATCH 09/10] read-cache.c: remove #ifdef NO_PTHREADS
On Mon, Oct 29, 2018 at 6:05 PM Ben Peart wrote: > > @@ -2756,8 +2745,11 @@ static int do_write_index(struct index_state > > *istate, struct tempfile *tempfile, > > if (ce_write(, newfd, , sizeof(hdr)) < 0) > > return -1; > > > > -#ifndef NO_PTHREADS > > - nr_threads = git_config_get_index_threads(); > > + if (HAVE_THREADS) > > This shouldn't be needed either. My assumption was that if someone > explicitly asked for multiple threads we're better off failing than > silently ignoring their request. The default/automatic setting will > detect the number of cpus and behave correctly. No. I can have ~/.gitconfig shared between different machines. One with multithread support and one without. I should not have to fall back to conditional includes in order to use the same config file on the machine without multithread. -- Duy
Re: [PATCH v2 06/10] preload-index.c: remove #ifdef NO_PTHREADS
On 10/27/2018 1:30 PM, Nguyễn Thái Ngọc Duy wrote: Signed-off-by: Nguyễn Thái Ngọc Duy --- preload-index.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/preload-index.c b/preload-index.c index 9e7152ab14..0e24886aca 100644 --- a/preload-index.c +++ b/preload-index.c @@ -7,17 +7,7 @@ #include "fsmonitor.h" #include "config.h" #include "progress.h" - -#ifdef NO_PTHREADS -static void preload_index(struct index_state *index, - const struct pathspec *pathspec, - unsigned int refresh_flags) -{ - ; /* nothing */ -} -#else - -#include +#include "thread-utils.h" /* * Mostly randomly chosen maximum thread counts: we @@ -108,7 +98,7 @@ static void preload_index(struct index_state *index, struct thread_data data[MAX_PARALLEL]; struct progress_data pd; - if (!core_preload_index) + if (!HAVE_THREADS || !core_preload_index) return; threads = index->cache_nr / THREAD_COST; @@ -151,7 +141,6 @@ static void preload_index(struct index_state *index, trace_performance_leave("preload index"); } -#endif int read_index_preload(struct index_state *index, const struct pathspec *pathspec, In the theory that code speaks louder than comments, here is the patch I used when testing out the idea of getting rid of NO_PTHREADS: git diff head~1 preload-index.c diff --git a/preload-index.c b/preload-index.c index 9e7152ab14..266bc9d8d7 100644 --- a/preload-index.c +++ b/preload-index.c @@ -7,17 +7,7 @@ #include "fsmonitor.h" #include "config.h" #include "progress.h" - -#ifdef NO_PTHREADS -static void preload_index(struct index_state *index, - const struct pathspec *pathspec, - unsigned int refresh_flags) -{ - ; /* nothing */ -} -#else - -#include +#include "thread-utils.h" /* * Mostly randomly chosen maximum thread counts: we @@ -104,7 +94,7 @@ static void preload_index(struct index_state *index, const struct pathspec *pathspec, unsigned int refresh_flags) { - int threads, i, work, offset; + int cpus, threads, i, work, offset; struct thread_data data[MAX_PARALLEL]; struct progress_data pd; @@ -114,6 +104,9 @@ static void preload_index(struct index_state *index, threads = index->cache_nr / THREAD_COST; if ((index->cache_nr > 1) && (threads < 2) && git_env_bool("GIT_TEST_PRELOAD_INDEX", 0)) threads = 2; + cpus = online_cpus(); + if (threads > cpus) + threads = cpus; if (threads < 2) return; trace_performance_enter(); @@ -151,7 +144,6 @@ static void preload_index(struct index_state *index, trace_performance_leave("preload index"); } -#endif int read_index_preload(struct index_state *index, const struct pathspec *pathspec,
Re: [PATCH v3 2/8] Add a place for (not) sharing stuff between worktrees
On Mon, Oct 22, 2018 at 6:28 AM Junio C Hamano wrote: > > When multiple worktrees are used, we need rules to determine if > > something belongs to one worktree or all of them. Instead of keeping > > adding rules when new stuff comes (*), have a generic rule: > > > > - Inside $GIT_DIR, which is per-worktree by default, add > > $GIT_DIR/common which is always shared. New features that want to > > share stuff should put stuff under this directory. > > > > - Inside refs/, which is shared by default except refs/bisect, add > > refs/worktree/ which is per-worktree. We may eventually move > > refs/bisect to this new location and remove the exception in refs > > code. > > > > (*) And it may also include stuff from external commands which will > > have no way to modify common/per-worktree rules. > > OK. Establishing such a convention is a good role for the core-git > should play to help third-party tools. > > Should this play well with the per-worktree configuration as well? > Is it better to carve out a configuration variable namespace so that > certain keys are never read from common ones (or per-worktree ones), > so that people can tell which ones are what? I know your current > design says "this is just another new layer, and the users can hang > themselves with this new rope". I am wondering if there is a need > to do something a bit more structured. I actually find the layered design of config files more elegant. Even before config.worktree is added, the user has system/per-user/per-repo places to put stuff in. "git config" gives control to read or write from a specific layer. We have to add layers to $GIT_DIR/refs to handle multiple worktrees, but since the "API" for those don't have the concept of layers at all, we need a single view where items from different worktrees are accessible via the same path/ref name. Adding variable namespace in config files does not look easy either. Suppose we have perWorktree.* section just for per-workree stuff, what do we do when the user just do "git config -f not-per-worktree-file perWorktree.something foo" (or --global instead of -f)? What we could do (and already do for some config keys) is ignore perWorktree.* unless they come from config.worktree. Which does help a bit, but not really perfect. And a namespace like this means the user cannot have subsections anymore (e.g. perWorktree.group..var is not supported if I remember correctly). So yeah... perhaps we should wait a bit and see what the user (or third party tool makers) comes up with. -- Duy
Re: [PATCH v2 08/10] read-cache.c: remove #ifdef NO_PTHREADS
On 10/29/2018 10:30 AM, Jeff King wrote: On Sat, Oct 27, 2018 at 07:30:06PM +0200, Nguyễn Thái Ngọc Duy wrote: -#ifndef NO_PTHREADS - nr_threads = git_config_get_index_threads(); + if (HAVE_THREADS) { + nr_threads = git_config_get_index_threads(); - /* TODO: does creating more threads than cores help? */ - if (!nr_threads) { - nr_threads = istate->cache_nr / THREAD_COST; - cpus = online_cpus(); - if (nr_threads > cpus) - nr_threads = cpus; + /* TODO: does creating more threads than cores help? */ + if (!nr_threads) { + nr_threads = istate->cache_nr / THREAD_COST; + cpus = online_cpus(); + if (nr_threads > cpus) + nr_threads = cpus; + } + } else { + nr_threads = 1; } I'd have thought we could just rely on online_cpus() returning 1 here to avoid having to ask "do we even have thread support?". But I guess that TODO comment implies that we might one day two 2 threads on a single CPU. -Peff See my earlier response - the HAVE_THREADS tests are not needed. We can remove the "TODO" comment - I tested it and it wasn't faster to have more threads than CPUs.
Re: [PATCH 09/10] read-cache.c: remove #ifdef NO_PTHREADS
On 10/27/2018 3:10 AM, Nguyễn Thái Ngọc Duy wrote: Signed-off-by: Nguyễn Thái Ngọc Duy --- read-cache.c | 49 ++--- 1 file changed, 18 insertions(+), 31 deletions(-) diff --git a/read-cache.c b/read-cache.c index d57958233e..ba870bc3fd 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1920,19 +1920,15 @@ struct index_entry_offset_table struct index_entry_offset entries[FLEX_ARRAY]; }; -#ifndef NO_PTHREADS static struct index_entry_offset_table *read_ieot_extension(const char *mmap, size_t mmap_size, size_t offset); static void write_ieot_extension(struct strbuf *sb, struct index_entry_offset_table *ieot); -#endif static size_t read_eoie_extension(const char *mmap, size_t mmap_size); static void write_eoie_extension(struct strbuf *sb, git_hash_ctx *eoie_context, size_t offset); struct load_index_extensions { -#ifndef NO_PTHREADS pthread_t pthread; -#endif struct index_state *istate; const char *mmap; size_t mmap_size; @@ -2010,8 +2006,6 @@ static unsigned long load_all_cache_entries(struct index_state *istate, return consumed; } -#ifndef NO_PTHREADS - /* * Mostly randomly chosen maximum thread counts: we * cap the parallelism to online_cpus() threads, and we want @@ -2122,7 +2116,6 @@ static unsigned long load_cache_entries_threaded(struct index_state *istate, con return consumed; } -#endif /* remember to discard_cache() before reading a different cache! */ int do_read_index(struct index_state *istate, const char *path, int must_exist) @@ -2135,10 +2128,8 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) size_t mmap_size; struct load_index_extensions p; size_t extension_offset = 0; -#ifndef NO_PTHREADS int nr_threads, cpus; struct index_entry_offset_table *ieot = NULL; -#endif if (istate->initialized) return istate->cache_nr; @@ -2181,15 +2172,18 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) src_offset = sizeof(*hdr); -#ifndef NO_PTHREADS - nr_threads = git_config_get_index_threads(); + if (HAVE_THREADS) { In this case, nr_threads is already capped to online_cpus() below so this HAVE_THREADS test isn't needed. + nr_threads = git_config_get_index_threads(); - /* TODO: does creating more threads than cores help? */ - if (!nr_threads) { - nr_threads = istate->cache_nr / THREAD_COST; - cpus = online_cpus(); - if (nr_threads > cpus) - nr_threads = cpus; + /* TODO: does creating more threads than cores help? */ + if (!nr_threads) { + nr_threads = istate->cache_nr / THREAD_COST; + cpus = online_cpus(); + if (nr_threads > cpus) + nr_threads = cpus; + } + } else { + nr_threads = 1; } if (nr_threads > 1) { @@ -2219,21 +2213,16 @@ int do_read_index(struct index_state *istate, const char *path, int must_exist) } else { src_offset += load_all_cache_entries(istate, mmap, mmap_size, src_offset); } -#else - src_offset += load_all_cache_entries(istate, mmap, mmap_size, src_offset); -#endif istate->timestamp.sec = st.st_mtime; istate->timestamp.nsec = ST_MTIME_NSEC(st); /* if we created a thread, join it otherwise load the extensions on the primary thread */ -#ifndef NO_PTHREADS - if (extension_offset) { + if (HAVE_THREADS && extension_offset) { Here extension_offset won't be set if there wasn't a thread created so the HAVE_THREADS test is not needed. int ret = pthread_join(p.pthread, NULL); if (ret) die(_("unable to join load_index_extensions thread: %s"), strerror(ret)); } -#endif if (!extension_offset) { p.src_offset = src_offset; load_index_extensions(); @@ -2756,8 +2745,11 @@ static int do_write_index(struct index_state *istate, struct tempfile *tempfile, if (ce_write(, newfd, , sizeof(hdr)) < 0) return -1; -#ifndef NO_PTHREADS - nr_threads = git_config_get_index_threads(); + if (HAVE_THREADS) This shouldn't be needed either. My assumption was that if someone explicitly asked for multiple threads we're better off failing than silently ignoring their request. The default/automatic setting will detect the number of cpus and behave correctly. + nr_threads = git_config_get_index_threads(); + else + nr_threads = 1; + if (nr_threads != 1) { int ieot_blocks, cpus; @@ -2787,7 +2779,6 @@ static int do_write_index(struct index_state *istate, struct tempfile
[PATCH v2 1/1] remote-curl.c: xcurl_off_t is not portable (on 32 bit platfoms)
From: Torsten Bögershausen When setting DEVELOPER = 1 DEVOPTS = extra-all "gcc (Raspbian 6.3.0-18+rpi1+deb9u1) 6.3.0 20170516" errors out with "comparison is always false due to limited range of data type" "[-Werror=type-limits]" It turns out that the function xcurl_off_t() has 2 flavours: - It gives a warning 32 bit systems, like Linux - It takes the signed ssize_t as a paramter, but the only caller is using a size_t (which is typically unsigned these days) The original motivation of this function is to make sure that sizes > 2GiB are handled correctly. The curl documentation says: "For any given platform/compiler curl_off_t must be typedef'ed to a 64-bit wide signed integral data type" On a 32 bit system "size_t" can be promoted into a 64 bit signed value without loss of data, and therefore we may see the "comparison is always false" warning. On a 64 bit system it may happen, at least in theory, that size_t is > 2^63, and then the promotion from an unsigned "size_t" into a signed "curl_off_t" may be a problem. One solution to suppress a possible compiler warning could be to remove the function xcurl_off_t(). However, to be on the very safe side, we keep it and improve it: - The len parameter is changed from ssize_t to size_t - A temporally variable "size" is used, promoted int uintmax_t and the comopared with "maximum_signed_value_of_type(curl_off_t)". Thanks to Junio C Hamano for this hint. Signed-off-by: Torsten Bögershausen --- remote-curl.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/remote-curl.c b/remote-curl.c index 762a55a75f..1220dffcdc 100644 --- a/remote-curl.c +++ b/remote-curl.c @@ -617,10 +617,11 @@ static int probe_rpc(struct rpc_state *rpc, struct slot_results *results) return err; } -static curl_off_t xcurl_off_t(ssize_t len) { - if (len > maximum_signed_value_of_type(curl_off_t)) +static curl_off_t xcurl_off_t(size_t len) { + uintmax_t size = len; + if (size > maximum_signed_value_of_type(curl_off_t)) die("cannot handle pushes this big"); - return (curl_off_t) len; + return (curl_off_t)size; } static int post_rpc(struct rpc_state *rpc) -- 2.19.0.271.gfe8321ec05
[RFC] Generation Number v2
We've discussed in several places how to improve upon generation numbers. This RFC is a report based on my investigation into a few new options, and how they compare for Git's purposes on several existing open-source repos. You can find this report and the associated test scripts at https://github.com/derrickstolee/gen-test. I have some more work to do in order to make this fully reproducible (including a single script to clone all of the repos I used, compute the commit-graphs using the different index versions, and run the tests). TL;DR: I recommend we pursue one of the following two options: 1. Maximum Generation Number. 2. Corrected Commit Date. Each of these perform well, but have different implementation drawbacks. I'd like to hear what everyone thinks about those drawbacks. Please also let me know about any additional tests that I could run. Now that I've got a lot of test scripts built up, I can re-run the test suite pretty quickly. Thanks, -Stolee Generation Number Performance Test == Git uses the commit history to answer many basic questions, such as computing a merge-base. Some of these algorithms can benefit from a _reachability index_, which is a condition that allows us to answer "commit A cannot reach commit B" in some cases. These require pre- computing some values that we can load and use at run-time. Git already has a notion of _generation number_, stores it in the commit- graph file, and uses it in several reachability algorithms. You can read more about generation numbers and how to use them in algorithms in [this blog post](https://blogs.msdn.microsoft.com/devops/2018/07/09/supercharging-the-git-commit-graph-iii-generations/). However, [generation numbers do not always improve our algorithms](https://public-inbox.org/git/pull.28.git.gitgitgad...@gmail.com/T/#u). Specifically, some algorithms in Git already use commit date as a heuristic reachability index. This has some problems, though, since commit date can be incorrect for several reasons (clock skew between machines, purposefully setting GIT_COMMIT_DATE to the author date, etc.). However, the speed boost by using commit date as a cutoff was so important in these cases, that the potential for incorrect answers was considered acceptable. When these algorithms were converted to use generation numbers, we _added_ the extra constraint that the algorithms are _never incorrect_. Unfortunately, this led to some cases where performance was worse than before. There are known cases where `git merge-base A B` or `git log --topo-order A..B` are worse when using generation numbers than when using commit dates. This report investigates four replacements for generation numbers, and compares the number of walked commits to the existing algorithms (both using generation numbers and not using them at all). We can use this data to make decisions for the future of the feature. ### Implementation The reachability indexes below are implemented in [the `reach-perf` branch in https://github.com/derrickstolee/git](https://github.com/derrickstolee/git/tree/reach-perf). This implementation is in a very rough condition, as it is intended to be a prototype, not production-quality. Using this implementation, you can compute commit-graph files for the specified reachability index using `git commit-graph write --reachable --version=`. The `git` client will read the version from the file, so our tests store each version as `.git/objects/info/commit-graph.` and copy the necessary file to `.git/objects/info/commit-graph` before testing. The algorithms count the number of commits walked, as to avoid the noise that would occur with time-based performance reporting. We use the (in progress) trace2 library for this. To find the values reported, use the `GIT_TR2_PERFORMANCE` environment variable. To ignore reachability indexes entirely and use the old algorithms (reported here as "OLD" values) use the environment variable `GIT_TEST_OLD_PAINT=1`. Reachability Index Versions --- **V0: (Minimum) Generation Number.** The _generation number_ of a commit is exactly one more than the maximum generation number of a parent (by convention, the generation number of a commit with no parents is 1). This is the definition currently used by Git (2.19.0 and later). Given two commits A and B, we can then use the following reachability index: If gen(A) < gen(B), then A cannot reach B. _Commentary:_ One issue with generation numbers is that some algorithms in Git use commit date as a heuristic, and allow incorrect answers if there is enough clock skew. Using that heuristic, the algorithms can walk fewer commits than the algorithms using generation number. The other reachability indexes introduced below attempt to fix this problem. **V1: (Epoch, Date) Pairs.** For each commit, store a pair of values called the _epoch_ and the _date_. The date is the normal commit date. The _epoch_ of a commit is
Re: [PATCH 07/10] preload-index.c: remove #ifdef NO_PTHREADS
On 10/27/2018 3:10 AM, Nguyễn Thái Ngọc Duy wrote: Signed-off-by: Nguyễn Thái Ngọc Duy --- preload-index.c | 15 ++- 1 file changed, 2 insertions(+), 13 deletions(-) diff --git a/preload-index.c b/preload-index.c index 9e7152ab14..0e24886aca 100644 --- a/preload-index.c +++ b/preload-index.c @@ -7,17 +7,7 @@ #include "fsmonitor.h" #include "config.h" #include "progress.h" - -#ifdef NO_PTHREADS -static void preload_index(struct index_state *index, - const struct pathspec *pathspec, - unsigned int refresh_flags) -{ - ; /* nothing */ -} -#else - -#include +#include "thread-utils.h" /* * Mostly randomly chosen maximum thread counts: we @@ -108,7 +98,7 @@ static void preload_index(struct index_state *index, struct thread_data data[MAX_PARALLEL]; struct progress_data pd; - if (!core_preload_index) + if (!HAVE_THREADS || !core_preload_index) I also "hoped in general that the non-threaded code paths could mostly just collapse into the same as the "threads == 1" cases, and we wouldn't have to "are threads even supported" in a lot of places." In this case, if we cap the threads to online_cpus() the later test for 'if (threads < 2)' would do that. I haven't measured this specific case but in the other cases I have measured, having more threads than cpus did not result in a performance win. return; threads = index->cache_nr / THREAD_COST; @@ -151,7 +141,6 @@ static void preload_index(struct index_state *index, trace_performance_leave("preload index"); } -#endif int read_index_preload(struct index_state *index, const struct pathspec *pathspec,
Re: [PATCH/WIP 00/19] Kill the_index, final part
On Fri, Oct 19, 2018 at 7:38 PM Stefan Beller wrote: > > On Fri, Oct 19, 2018 at 7:52 AM Nguyễn Thái Ngọc Duy > wrote: > > > > Stefan I see you start removing more references of the_repository in > > another series (haven't really looked at it yet) so this is just heads > > up so we could coordinate if needed. Still WIP, not really ready for > > review yet. > > I'll take a brief look anyway, otherwise you wouldn't have put in on > a mailing list :-P Thanks. Just fyi. Since there are still plenty conflicts when merging this with 'pu', I'll just wait until things settle a bit in 'master' before resending. -- Duy
Re: [PATCH] Prevent warning
On Mon, Oct 29, 2018 at 03:39:30PM +, Pete wrote: > Prevent the following warning in the web server error log: > gitweb.cgi: Odd number of elements in anonymous hash at > /usr/share/gitweb/gitweb.cgi line 3305 > [...] > - $remotes{$remote} ||= { 'heads' => () }; > + $remotes{$remote} ||= { 'heads' => [] }; This will indeed silence the warning, but the end result is different. In the original, 'heads' would be set to undef, but now it is set to an empty array reference. Do the later users of the struct care about the distinction? Grepping around, I see two callers that look at "heads": 1. fill_remote_heads() assigns to it unconditionally, overwriting whatever was there before. 2. git_remote_block() dereferences it as an array reference, and so probably would have complained if we ever got there with the undef. But we never would, because every call to git_remote_block() is preceded by a call to fill_remote_heads(). So nobody actually ever looks at the value we set here. Is an empty array reference better than undef as a default value? I'd actually argue no. If we add new code that does not call fill_remote_heads(), it's probably better for it to be undef to distinguish it from the empty list (and possibly raise an error that would tell us that we forgot to call fill_remote_heads(). So I'd say that: $remotes{$remote} ||= { heads => undef }; is a preferable conversion. Or simply deleting the line entirely, which has roughly the same effect. -Peff
Re: [PATCH 06/10] grep: remove #ifdef NO_PTHREADS
On Mon, Oct 29, 2018 at 05:01:41PM +0100, Duy Nguyen wrote: > On Mon, Oct 29, 2018 at 3:25 PM Jeff King wrote: > > But if the problem is simply that we are not quite there yet in the grep > > code, I am OK with taking this as the first pass, and knowing that there > > is more cleanup to be done later (though that sort of thing is IMHO very > > useful in a commit message). > > Since the problem pops up now, I'm ok with updating/cleaning up all > this in this series, unless there's benefits in keeping this series > simple and merging it early (probably not?) Mostly I did not want to tax you. I would rather have this series and some cleanup left over, than to not have anything. But if you are interested in moving it further, I will not say no. :) -Peff
Re: t7405.17 breakage vanishes with GETTEXT_POISON=1
On Sun, Oct 28, 2018 at 1:15 PM SZEDER Gábor wrote: > > On Sun, Oct 28, 2018 at 06:41:06AM +0100, Duy Nguyen wrote: > > Something fishy is going on but I don't think I'll spend time hunting > > it down so I post here in case somebody else is interested. It might > > also indicate a problem with poison gettext, not the test case too. > > I haven't actually run the test under GETTEXT_POISON, but I stongly > suspect it's the test, or more accurately the helper function > 'test_i18ngrep'. > > The test in question runs > > test_i18ngrep ! "refusing to lose untracked file at" err > > which fails in normal test runs, because 'grep' does find the > undesired string; that's the known breakage. Under GETTEXT_POISION, > however, 'test_i18ngrep' always succeeds because of this condition: > > if test -n "$GETTEXT_POISON" > then > # pretend success > return 0 > fi > > and then in turn the whole test succeeds. And this is something your poison-with-scrambling code does help ;-) ;-) -- Duy
Re: [PATCH 08/12] remote.c: mark messages for translation
On Mon, Oct 29, 2018 at 8:56 AM Junio C Hamano wrote: > There are other series in flight that touch the same area of code > and in different ways, causing unnecessary conflicts, which does not > help us either X-<. I will of course fix all other comments, but I can hold this off if it's causing too much trouble. I'm in no hurry to get anything merged fast. -- Duy
Re: [PATCH 12/12] fsck: mark strings for translation
On Mon, Oct 29, 2018 at 3:09 PM Junio C Hamano wrote: > > SZEDER Gábor writes: > > >> -fprintf(stderr, "%s in %s %s: %s\n", > >> -msg_type, printable_type(obj), describe_object(obj), err); > >> +fprintf_ln(stderr, _("%s in %s %s: %s"), > > > > Are the (f)printf() -> (f)printf_ln() changes all over > > 'builtin/fsck.c' really necessary to mark strings for translation? > > It is beyond absolute minimum but I saw it argued here that this > makes it easier to manage the .po and .pot files if your message > strings do not end with LF, a you are much less likely to _add_ > unneeded LF to the translated string than _lose_ LF at the end of > translated string. Especially when \n plays an important role and we don't trust translators to keep it [1] [2]. It's probably a too defensive stance and often does not apply, so nowadays I do it just to keep a consistent pattern in the code. [1] https://public-inbox.org/git/20120308220131.GA10122@burratino/#t [2] but then translators can crash programs anyway (e.g. changing %d to %s...) we just trust gettext tools to catch problems early. -- Duy
Re: [PATCH 06/10] grep: remove #ifdef NO_PTHREADS
On Mon, Oct 29, 2018 at 3:25 PM Jeff King wrote: > But if the problem is simply that we are not quite there yet in the grep > code, I am OK with taking this as the first pass, and knowing that there > is more cleanup to be done later (though that sort of thing is IMHO very > useful in a commit message). Since the problem pops up now, I'm ok with updating/cleaning up all this in this series, unless there's benefits in keeping this series simple and merging it early (probably not?) -- Duy
Re: [PATCH] Prevent warning
Thanks for the patch. Could you please sign it off ? The other remark would be if the header line could be written longer than just "Prevent warning". to give people digging into the Git history an initial information, where the warning occured and from which module it was caused. May be something like this as a head line ? gitweb.perl: Fix Odd number of elements warning On Mon, Oct 29, 2018 at 03:39:30PM +, Pete wrote: > Prevent the following warning in the web server error log: > gitweb.cgi: Odd number of elements in anonymous hash at > /usr/share/gitweb/gitweb.cgi line 3305 > --- > gitweb/gitweb.perl | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl > index 2594a4badb3d7..200647b683225 100755 > --- a/gitweb/gitweb.perl > +++ b/gitweb/gitweb.perl > @@ -3302,7 +3302,7 @@ sub git_get_remotes_list { > next if $wanted and not $remote eq $wanted; > my ($url, $key) = ($1, $2); > > - $remotes{$remote} ||= { 'heads' => () }; > + $remotes{$remote} ||= { 'heads' => [] }; > $remotes{$remote}{$key} = $url; > } > close $fd or return; > > -- > https://github.com/git/git/pull/548
Re: [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode
On Mon, Oct 29, 2018 at 2:24 PM Ævar Arnfjörð Bjarmason wrote: > This patch looks good to me, but I think it's a bad state of affairs to > keep changing these semantics and not having something like a > "gitwildmatch" doc were we document this matching syntax. While we don't have a separate document for it, the behavior _is_ documented even if perhaps it wasn't as clear. There were even tests for this corner case. > Do you have any thoughts on how to proceed with getting this documented > / into some stable state where we can specify it? wildmatch has been used in git for a few years if I remember correctly so to me it is stable. Granted there are corner cases like this, but I can't prevent all bugs (especially this one which is more like design mistake than bug per se). You are welcome to refactor gitignore.txt and add gitwildmatch.txt. -- Duy
Re: [PATCH] test-lib: introduce the '-V' short option for '--verbose-log'
On Mon, Oct 29, 2018 at 02:53:35PM +0100, SZEDER Gábor wrote: > > Interesting. I'm not opposed to something like this, but I added > > "--verbose-log" specifically for scripted cases, like running an > > unattended "prove" that needs to preserve stdout. When running > > individual tests, I'd just use "-v" itself, and possibly redirect the > > output. > > > > For my curiosity, can you describe your use case a bit more? > > Even when I run individual test scripts by hand, I prefer to have a > file catching all output of the test, because I don't like it when the > test output floods my terminal (especially with '-x'), and because the > file is searchable but the terminal isn't. And that's exactly what > '--verbose-log' does. > > Redirecting the '-v' output (i.e. stdout) alone is insufficient, > because any error messages within the tests and the '-x' trace go to > stderr, so they still end up on the terminal. Therefore I would have > to remember to redirect stderr every time as well. > > I find it's much easier to just always use '--verbose-log'... except > for the length of the option, that is, hence this patch. OK, fair enough. Maybe I should start using "-V" too, then. ;) (I find myself most often coupling "-v" with "-i" to stop at the failure and just read what's left on the screen). -Peff
[PATCH] Prevent warning
Prevent the following warning in the web server error log: gitweb.cgi: Odd number of elements in anonymous hash at /usr/share/gitweb/gitweb.cgi line 3305 --- gitweb/gitweb.perl | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/gitweb/gitweb.perl b/gitweb/gitweb.perl index 2594a4badb3d7..200647b683225 100755 --- a/gitweb/gitweb.perl +++ b/gitweb/gitweb.perl @@ -3302,7 +3302,7 @@ sub git_get_remotes_list { next if $wanted and not $remote eq $wanted; my ($url, $key) = ($1, $2); - $remotes{$remote} ||= { 'heads' => () }; + $remotes{$remote} ||= { 'heads' => [] }; $remotes{$remote}{$key} = $url; } close $fd or return; -- https://github.com/git/git/pull/548
Re: [PATCH v2 00/16] sequencer: refactor functions working on a todo_list
Hi Junio, Le 29/10/2018 à 04:05, Junio C Hamano a écrit : > Alban Gruin writes: […] >> It is based onto ag/rebase-i-in-c (34b4731, "rebase -i: move >> rebase--helper modes to rebase--interactive"). > > As there are quite a lot of fixes to the sequencer machinery since > that topic forked from the mainline. For example, [06/16] has > unpleasant merge conflicts with 1ace63bc ("rebase --exec: make it > work with --rebase-merges", 2018-08-09) that has been in master for > the past couple of months. IOW, the tip of ag/rebase-i-in-c is a > bit too old a base to work on by now. > > I think I queued the previous round on the result of merging > ag/rebase-i-in-c into master, i.e. 61dc7b24 ("Merge branch > 'ag/rebase-i-in-c' into ag/sequencer-reduce-rewriting-todo", > 2018-10-09). That may be a more reasonable place to start this > update on. > Right. My next iteration will be based on 61dc7b24, I just rebased my branch onto it.
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29, 2018 at 09:48:02AM +0900, Junio C Hamano wrote: > > Of course any cache raises questions of cache invalidation, but I think > > we've already dealt with that for this case. When we use > > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of > > accuracy/speed tradeoff (which does a similar caching thing with > > packfiles). > > > > So putting that all together, could we have something like: > > I think this conceptually is a vast improvement relative to > ".cloning" optimization. Obviously this does not have the huge > downside of the other approach that turns the collision detection > completely off. > > A real question is how much performance gain, relative to ".cloning" > thing, this approach gives us. If it gives us 80% or more of the > gain compared to doing no checking, I'd say we have a clear winner. My test runs showed it improving index-pack by about 3%, versus 4% for no collision checking at all. But there was easily 1% of noise. And much more importantly, that was on a Linux system on ext4, where stat is fast. I'd be much more curious to hear timing results from people on macOS or Windows, or from Geert's original NFS case. -Peff
Re: [RFC PATCH] index-pack: improve performance on NFS
On Sat, Oct 27, 2018 at 04:04:32PM +0200, Duy Nguyen wrote: > > Of course any cache raises questions of cache invalidation, but I think > > we've already dealt with that for this case. When we use > > OBJECT_INFO_QUICK, that is a sign that we want to make this kind of > > accuracy/speed tradeoff (which does a similar caching thing with > > packfiles). > > We don't care about a separate process adding more loose objects while > index-pack is running, do we? I'm guessing we don't but just to double > check... Right. That's basically what QUICK means: don't bother re-examining the repository to handle simultaneous writes, even if it means saying an object is not there when it has recently appeared. So far it has only applied to packs, but this is really just the same concept (just as we would not notice a new pack arriving, we will not notice a new loose object arriving). > > +/* probably should be configurable? */ > > +#define LOOSE_OBJECT_CACHE_MAX 65536 > > Yes, perhaps with gc.auto config value (multiplied by 256) as the cut > point. If it's too big maybe just go with a bloom filter. For this > particular case we expect like 99% of calls to miss. I wonder, though, if we should have a maximum at all. The existing examples I've found of this technique are: - mark_complete_and_common_ref(), which is trying to cover this exact case. It looks like it avoids adding more objects than there are refs, so I guess it actually has a pretty small cutoff. - find_short_object_filename(), which does the same thing with no limits. And there if we _do_ have a lot of objects, we'd still prefer to keep the cache. And really, this list is pretty much equivalent to looking at a pack .idx. The only difference is that one is mmap'd, but here we'd use the heap. So it's not shared between processes, but otherwise the working set size is similar. -Peff
Re: [RFC PATCH] index-pack: improve performance on NFS
On Mon, Oct 29, 2018 at 11:04:53AM -0400, Jeff King wrote: > > Even if someone wants to make the argument that this is behavior that we > > absolutely *MUST* keep and not make configurable, there's still much > > smarter ways to do it. > > I don't have any real object to a configuration like this, if people > want to experiment with it. But in contrast, the patch I showed earlier: > > - is safe enough to just turn on all the time, without the user having > to configure anything nor make a safety tradeoff > > - speeds up all the other spots that use OBJECT_INFO_QUICK (like > fetch's tag-following, or what appears to be the exact same > optimization done manually inside mark_complete_and_common-ref()). One thing I forgot to add. We're focusing here on the case where the objects _aren't_ present, and we're primarily trying to get rid of the stat call. But when we actually do see a duplicate, we open up the object and actually compare the bytes. Eliminating the collision check entirely would save that work, which is obviously something that can't be improved by just caching the existence of loose objects. I'm not sure how often that case happens in a normal repository. We see it a fair on GitHub servers because of the way we use alternates (i.e., we often already have the object you pushed up because it's present in another fork and available via objects/info/alternates). -Peff
Re: [RFC PATCH] index-pack: improve performance on NFS
On Sat, Oct 27, 2018 at 01:22:16PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Taking one step back, the root problem in this thread is that stat() on > > non-existing files is slow (which makes has_sha1_file slow). > > > > One solution there is to cache the results of looking in .git/objects > > (or any alternate object store) for loose files. And indeed, this whole > > scheme is just a specialized form of that: it's a flag to say "hey, we > > do not have any objects yet, so do not bother looking". > > > > Could we implement that in a more direct and central way? And could we > > implement it in a way that catches more cases? E.g., if I have _one_ > > object, that defeats this specialized optimization, but it is probably > > still beneficial to cache that knowledge (and the reasonable cutoff is > > probably not 1, but some value of N loose objects). > [...] > > The assumption with making it exactly 0 objects and not any value of >0 > is that we can safely assume that a "clone" or initial "fetch"[1] is > special in ways that a clone isn't. I.e. we're starting out with nothing > and doing the initial population, that's probably not as true in an > existing repo that's getting concurrent fetches, commits, rebases etc. I assume you mean s/that a clone isn't/that a fetch isn't/. I agree there are cases where you might be able to go further if you assume a full "0". But my point is that "clone" is an ambiguous concept, and it doesn't map completely to what's actually slow here. So if you only look at "are we cloning", then: - you have a bunch of cases which are "clones", but aren't actually starting from scratch - you get zero benefit in the non-clone cases, when we could be scaling the benefit smoothly > But in the spirit of taking a step back, maybe we should take two steps > back and consider why we're doing this at all. OK, I think it's worth discussing, and I'll do that below. But first I want to say... > Three of our tests fail if we compile git like this, and cloning is much > faster (especially on NFS): > > diff --git a/builtin/index-pack.c b/builtin/index-pack.c > index 2004e25da2..0c2d008ee0 100644 > --- a/builtin/index-pack.c > +++ b/builtin/index-pack.c > @@ -796,3 +796,3 @@ static void sha1_object(const void *data, struct > object_entry *obj_entry, > > - if (startup_info->have_repository) { > + if (0) { > read_lock(); > > Even on a local disk I'm doing 262759 lstat() calls cloning git.git and > spending 5% of my time on that. With the caching patch I posted earlier, I see roughly the same speedup on an index-pack of git.git as I do with disabling the collision check entirely (I did see about a 1% difference in favor of what you wrote above, which was within the noise, but may well be valid due to slightly reduced lock contention). TBH I'm not sure if any of this is actually worth caring about on a normal Linux system, though. There stat() is fast. It might be much more interesting on macOS or Windows, or on a Linux system on NFS. > But why do we have this in the first place? It's because of 8685da4256 > ("don't ever allow SHA1 collisions to exist by fetching a pack", > 2007-03-20) and your 51054177b3 ("index-pack: detect local corruption in > collision check", 2017-04-01). > > I.e. we are worried about (and those tests check for): > > a) A malicious user trying to give us repository where they have > created an object with the same SHA-1 that's different, as in the > SHAttered attack. > > I remember (but have not dug up) an old E-Mail from Linus saying > that this was an important security aspect of git, i.e. even if > SHA-1 was broken you couldn't easily propagate bad objects. Yeah, especially given recent advances in SHA-1 attacks, I'm not super comfortable with the idea of disabling the duplicate-object check at this point. > b) Cases where we've ended up with different content for a SHA-1 due to > e.g. a local FS corruption. Which is the subject of your commit in > 2017. Sort of. We actually detected it before my patch, but we just gave a really crappy error message. ;) > c) Are there cases where fetch.fsckObjects is off and we just flip a > bit on the wire and don't notice? I think not because we always > check the pack checksum (don't we), but I'm not 100% sure. We'd detect bit-blips on the wire due to the pack checksum. But what's more interesting are bit-flips on the disk of the sender, which would then put the bogus data into the pack checksum they generate on the fly. However, we do detect such a bit-flip, even without fsckObjects, because the sender does not tell us the expected sha-1 of each object. It gives us a stream of objects, and the receiver computes the sha-1's themselves. So a bit flip manifests in the connectivity-check when we say "hey, the other side should have sent us object X but did not" (we do not say "gee, what is this object Y they
Re: [PATCH] test-lib: introduce the '-V' short option for '--verbose-log'
On Mon, Oct 29, 2018 at 10:21:08AM -0400, Jeff King wrote: > On Mon, Oct 29, 2018 at 01:13:59PM +0100, SZEDER Gábor wrote: > > > '--verbose-log' is one of the most useful and thus most frequently > > used test options, but due to its length it's a pain to type on the > > command line. > > > > Let's introduce the corresponding short option '-V' to save some > > keystrokes. > > Interesting. I'm not opposed to something like this, but I added > "--verbose-log" specifically for scripted cases, like running an > unattended "prove" that needs to preserve stdout. When running > individual tests, I'd just use "-v" itself, and possibly redirect the > output. > > For my curiosity, can you describe your use case a bit more? Even when I run individual test scripts by hand, I prefer to have a file catching all output of the test, because I don't like it when the test output floods my terminal (especially with '-x'), and because the file is searchable but the terminal isn't. And that's exactly what '--verbose-log' does. Redirecting the '-v' output (i.e. stdout) alone is insufficient, because any error messages within the tests and the '-x' trace go to stderr, so they still end up on the terminal. Therefore I would have to remember to redirect stderr every time as well. I find it's much easier to just always use '--verbose-log'... except for the length of the option, that is, hence this patch.
Re: [PATCH v2 10/10] read-cache.c: initialize copy_len to shut up gcc 8
On Sat, Oct 27, 2018 at 07:30:08PM +0200, Nguyễn Thái Ngọc Duy wrote: > It was reported that when building with NO_PTHREADS=1, > -Wmaybe-uninitialized is triggered. Just initialize the variable from > the beginning to shut the compiler up (because this warning is enabled > in config.dev) > > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > read-cache.c | 4 +--- > 1 file changed, 1 insertion(+), 3 deletions(-) > > diff --git a/read-cache.c b/read-cache.c > index ba870bc3fd..4307b9a7bf 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1746,7 +1746,7 @@ static struct cache_entry *create_from_disk(struct > mem_pool *ce_mem_pool, > size_t len; > const char *name; > unsigned int flags; > - size_t copy_len; > + size_t copy_len = 0; > /* >* Adjacent cache entries tend to share the leading paths, so it makes >* sense to only store the differences in later entries. In the v4 > @@ -1786,8 +1786,6 @@ static struct cache_entry *create_from_disk(struct > mem_pool *ce_mem_pool, > die(_("malformed name field in the index, near > path '%s'"), > previous_ce->name); > copy_len = previous_len - strip_len; > - } else { > - copy_len = 0; > } > name = (const char *)cp; > } Yes, this silences the compiler warning I saw (and is exactly the same patch I wrote to get past it the other day ;) ). -Peff
Re: [PATCH v2 08/10] read-cache.c: remove #ifdef NO_PTHREADS
On Sat, Oct 27, 2018 at 07:30:06PM +0200, Nguyễn Thái Ngọc Duy wrote: > -#ifndef NO_PTHREADS > - nr_threads = git_config_get_index_threads(); > + if (HAVE_THREADS) { > + nr_threads = git_config_get_index_threads(); > > - /* TODO: does creating more threads than cores help? */ > - if (!nr_threads) { > - nr_threads = istate->cache_nr / THREAD_COST; > - cpus = online_cpus(); > - if (nr_threads > cpus) > - nr_threads = cpus; > + /* TODO: does creating more threads than cores help? */ > + if (!nr_threads) { > + nr_threads = istate->cache_nr / THREAD_COST; > + cpus = online_cpus(); > + if (nr_threads > cpus) > + nr_threads = cpus; > + } > + } else { > + nr_threads = 1; > } I'd have thought we could just rely on online_cpus() returning 1 here to avoid having to ask "do we even have thread support?". But I guess that TODO comment implies that we might one day two 2 threads on a single CPU. -Peff
Re: [PATCH 06/10] grep: remove #ifdef NO_PTHREADS
On Mon, Oct 29, 2018 at 11:16:39AM +0900, Junio C Hamano wrote: > Jeff King writes: > > > Cases like this are kind of weird. I'd expect to see wait_all() return > > immediately when !HAVE_THREADS. But we don't need to, because later we > > do this: > > > >> - if (num_threads) > >> + if (HAVE_THREADS && num_threads) > >>hit |= wait_all(); > > > > Which I think works, but it feels like we're introducing some subtle > > dependencies about which functions get called in which cases. I'd hoped > > in general that the non-threaded code paths could mostly just collapse > > down into the same as the "threads == 1" cases, and we wouldn't have to > > ask "are threads even supported" in a lot of places. > > True, but the way "grep" code works with threads is already strange, > and I suspect that the subtle strangeness you feel mostly comes from > that. The single threaded code signaled "hit" with return value of > individual functions like grep_file(), but the meaning of return > value from them is changed to "does not matter--we do not have > meaningful result yet at this point" when threading is used. > > In the new world order where "threading is the norm but > single-threaded is still supported", I wonder if we would want to > drive the single threaded case the same way with the add_work(run) > interface and return the "did we see hits?" etc. from the same (or > at lesat "more similar than today's") interface, so that the flow of > data smells the same in both cases. Right, your second paragraph here is a better statement of what I was trying to get at. ;) But if the problem is simply that we are not quite there yet in the grep code, I am OK with taking this as the first pass, and knowing that there is more cleanup to be done later (though that sort of thing is IMHO very useful in a commit message). -Peff
Re: [PATCH] test-lib: introduce the '-V' short option for '--verbose-log'
On Mon, Oct 29, 2018 at 01:13:59PM +0100, SZEDER Gábor wrote: > '--verbose-log' is one of the most useful and thus most frequently > used test options, but due to its length it's a pain to type on the > command line. > > Let's introduce the corresponding short option '-V' to save some > keystrokes. Interesting. I'm not opposed to something like this, but I added "--verbose-log" specifically for scripted cases, like running an unattended "prove" that needs to preserve stdout. When running individual tests, I'd just use "-v" itself, and possibly redirect the output. For my curiosity, can you describe your use case a bit more? > t/README | 1 + > t/test-lib.sh | 4 ++-- > 2 files changed, 3 insertions(+), 2 deletions(-) The patch itself looks good to me. -Peff
Re: Lost changes after merge
On Mon, Oct 29, 2018 at 09:49:20AM +0100, Gray King wrote: > Hello, > > I have a very strange issue described below: > > * Here is the tree before I merge via `git log --format="%h %p %d" -n > 20 --all --graph`: > > https://upaste.de/9Pe FWIW, neither this nor the other paste link in your email seem to work for me (which makes it hard to comment on the rest of the email). -Peff
Re: [PATCH] alias: detect loops in mixed execution mode
On Mon, Oct 29, 2018 at 12:44:58PM +0900, Junio C Hamano wrote: > Jeff King writes: > > > Hmph. So I was speaking before purely hypothetically, but now that your > > patch is in 'next', it is part of my daily build. And indeed, I hit a > > false positive within 5 minutes of building it. ;) > > Sounds like somebody is having not-so-fun-a-time having "I told you > so" moment. The 'dotgit' thing already feels bit convoluted but I > would say that it is still within the realm of reasonable workflow > elements. To be clear, the "dotgit" thing _is_ weird and convoluted. And I imagine that I push Git more than 99% of our users would. But I also won't be surprised if somebody else has something similarly disgusting in the wild. :) TBH, I'm still not really sold on the idea of doing loop detection at all in this case. But I can live with it if others feel strongly. What makes the current patch so bad is that there's no escape hatch (i.e., even a depth counter with a default of "1" would have given me something I could bump). -Peff
Re: [PATCH] pretty: Add %(trailer:X) to display single trailer
On Sun, Oct 28, 2018 at 01:50:25PM +0100, Anders Waldenborg wrote: > This new format placeholder allows displaying only a single > trailer. The formatting done is similar to what is done for > --decorate/%d using parentheses and comma separation. Displaying a single trailer makes sense as a goal. It was one of the things I considered when working on %(trailers), actually, but I ended up needing something a bit more flexible (hence the ability to dump the trailers in a parse-able format, where I feed them to another script). But your ticket example makes sense for just ordinary log displays. Junio's review already covered my biggest question, which is why not something like "%(trailers:key=ticket)". And likewise making things like comma-separation options. But my second question is whether we want to provide something more flexible than the always-parentheses that "%d" provides. That has been a problem in the past when people want to format the decoration in some other way. We have formatting magic for "if this thing is non-empty, then show this prefix" in the for-each-ref formatter, but I'm not sure that we do in the commit pretty-printer beyond "% ". I wonder if we could/should add a a placeholder for "if this thing is non-empty, put in a space and enclose it in parentheses". > diff --git a/Documentation/pretty-formats.txt > b/Documentation/pretty-formats.txt > index 6109ef09aa..a46d0c0717 100644 > --- a/Documentation/pretty-formats.txt > +++ b/Documentation/pretty-formats.txt > @@ -211,6 +211,10 @@ endif::git-rev-list[] >If the `unfold` option is given, behave as if interpret-trailer's >`--unfold` option was given. E.g., `%(trailers:only,unfold)` to do >both. > +- %(trailer:): display the specified trailer in parentheses (like > + %d does for refnames). If there are multiple entries of that trailer > + they are shown comma separated. If there are no matching trailers > + nothing is displayed. It might be worth specifying how this match is done. I'm thinking specifically of whether it's case-sensitive, but I wonder if there should be any allowance for other normalization (e.g., allowing a regex to match "coauthored-by" and "co-authored-by" or something). -Peff
Re: [PATCH 12/12] fsck: mark strings for translation
SZEDER Gábor writes: >> -fprintf(stderr, "%s in %s %s: %s\n", >> -msg_type, printable_type(obj), describe_object(obj), err); >> +fprintf_ln(stderr, _("%s in %s %s: %s"), > > Are the (f)printf() -> (f)printf_ln() changes all over > 'builtin/fsck.c' really necessary to mark strings for translation? It is beyond absolute minimum but I saw it argued here that this makes it easier to manage the .po and .pot files if your message strings do not end with LF, a you are much less likely to _add_ unneeded LF to the translated string than _lose_ LF at the end of translated string.
Re: [PATCH] wildmatch: change behavior of "foo**bar" in WM_PATHNAME mode
On Sat, Oct 27 2018, Nguyễn Thái Ngọc Duy wrote: > In WM_PATHNAME mode (or FNM_PATHNAME), '*' does not match '/' and '**' > can but only in three patterns: > > - '**/' matches zero or more leading directories > - '/**/' matches zero or more directories in between > - '/**' matches zero or more trailing directories/files > > When '**' is present but not in one of these patterns, the current > behavior is consider the pattern invalid and stop matching. In other > words, 'foo**bar' never matches anything, whatever you throw at it. > > This behavior is arguably a bit confusing partly because we can't > really tell the user their pattern is invalid so that they can fix > it. So instead, tolerate it and make '**' act like two regular '*'s > (which is essentially the same as a single asterisk). This behavior > seems more predictable. > > Noticed-by: dana > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > Documentation/gitignore.txt | 3 ++- > t/t3070-wildmatch.sh| 4 ++-- > wildmatch.c | 4 ++-- > wildmatch.h | 1 - > 4 files changed, 6 insertions(+), 6 deletions(-) > > diff --git a/Documentation/gitignore.txt b/Documentation/gitignore.txt > index d107daaffd..1c94f08ff4 100644 > --- a/Documentation/gitignore.txt > +++ b/Documentation/gitignore.txt > @@ -129,7 +129,8 @@ full pathname may have special meaning: > matches zero or more directories. For example, "`a/**/b`" > matches "`a/b`", "`a/x/b`", "`a/x/y/b`" and so on. > > - - Other consecutive asterisks are considered invalid. > + - Other consecutive asterisks are considered regular asterisks and > + will match according to the previous rules. > > NOTES > - > diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh > index 46aca0af10..891d4d7cb9 100755 > --- a/t/t3070-wildmatch.sh > +++ b/t/t3070-wildmatch.sh > @@ -237,7 +237,7 @@ match 0 0 0 0 foobar 'foo\*bar' > match 1 1 1 1 'f\oo' 'f\\oo' > match 1 1 1 1 ball '*[al]?' > match 0 0 0 0 ten '[ten]' > -match 0 0 1 1 ten '**[!te]' > +match 1 1 1 1 ten '**[!te]' > match 0 0 0 0 ten '**[!ten]' > match 1 1 1 1 ten 't[a-g]n' > match 0 0 0 0 ten 't[!a-g]n' > @@ -253,7 +253,7 @@ match 1 1 1 1 ']' ']' > # Extended slash-matching features > match 0 0 1 1 'foo/baz/bar' 'foo*bar' > match 0 0 1 1 'foo/baz/bar' 'foo**bar' > -match 0 0 1 1 'foobazbar' 'foo**bar' > +match 1 1 1 1 'foobazbar' 'foo**bar' > match 1 1 1 1 'foo/baz/bar' 'foo/**/bar' > match 1 1 0 0 'foo/baz/bar' 'foo/**/**/bar' > match 1 1 1 1 'foo/b/a/z/bar' 'foo/**/bar' > diff --git a/wildmatch.c b/wildmatch.c > index d074c1be10..9e9e2a2f95 100644 > --- a/wildmatch.c > +++ b/wildmatch.c > @@ -104,8 +104,8 @@ static int dowild(const uchar *p, const uchar *text, > unsigned int flags) > dowild(p + 1, text, flags) == > WM_MATCH) > return WM_MATCH; > match_slash = 1; > - } else > - return WM_ABORT_MALFORMED; > + } else /* WM_PATHNAME is set */ > + match_slash = 0; > } else > /* without WM_PATHNAME, '*' == '**' */ > match_slash = flags & WM_PATHNAME ? 0 : 1; > diff --git a/wildmatch.h b/wildmatch.h > index b8c826aa68..5993696298 100644 > --- a/wildmatch.h > +++ b/wildmatch.h > @@ -4,7 +4,6 @@ > #define WM_CASEFOLD 1 > #define WM_PATHNAME 2 > > -#define WM_ABORT_MALFORMED 2 > #define WM_NOMATCH 1 > #define WM_MATCH 0 > #define WM_ABORT_ALL -1 This patch looks good to me, but I think it's a bad state of affairs to keep changing these semantics and not having something like a "gitwildmatch" doc were we document this matching syntax. Also I still need to dig up the work for using PCRE as an alternate matching engine, the PCRE devs produced a bug-for-bug compatible version of our wildmatch function (all the more reason to document it), so I think they'll need to change it now that this is in, but I haven't rebased those ancient patches yet. Do you have any thoughts on how to proceed with getting this documented / into some stable state where we can specify it? Even if we don't end up using PCRE as a matching engine (sometimes it was faster, sometimes slower) I think it would be very useful if we can spew out "here's your pattern as a regex" for self-documentation purposes. Then that can be piped into e.g. "perl -Mre=debug" to see a step-by-step guide for how the pattern compiles, and why it does or doesn't match a given thing.
Re: js/mingw-http-ssl, was Re: What's cooking in git.git (Oct 2018, #05; Fri, 26)
Hi Junio, On Fri, 26 Oct 2018, Junio C Hamano wrote: > Johannes Schindelin writes: > > > On Fri, 26 Oct 2018, Junio C Hamano wrote: > > > >> * js/mingw-http-ssl (2018-10-26) 3 commits > >> (merged to 'next' on 2018-10-26 at 318e82e101) > >> + http: when using Secure Channel, ignore sslCAInfo by default > >> + http: add support for disabling SSL revocation checks in cURL > >> + http: add support for selecting SSL backends at runtime > >> (this branch is used by jc/http-curlver-warnings.) > >> > >> On Windows with recent enough cURL library, the configuration > >> variable http.sslBackend can be used to choose between OpenSSL and > >> Secure Channel at runtime as the SSL backend while talking over > >> the HTTPS protocol. > > > > Just a quick clarification: the http.sslBackend feature is in no way > > Windows-only. Sure, it was championed there, and sure, we had the first > > multi-ssl-capable libcurl, but this feature applies to all libcurl > > versions that are built with multiple SSL/TLS backends. > > Yeah, but "http.sslBackend can be used to choose betnween OpenSSL > and Scure Channel" applies only to Windows (especially the "between > A and B" part, when B is Windows only), right? I had a hard time > coming up with a phrasing to summarize what the immediate merit > users would get from the topic in a simple paragraph. On Linux, with an appropriately built libcurl, you can use http.sslBackend to choose between OpenSSL, GNU TLS, NSS and mbedTLS. > > The two patches on top are Windows-only, of course, as they really apply > > only to the Secure Channel backend (which *is* Windows-only). > > Yes, that is why the summary for the topic as a whole focuses on > Windows, as that is the primary audience who would benefit from the > topic. In contrast, I think that the main purpose of this patch series is to bring http.sslBackend to everybody. And then we also include fall-out patches that are Windows-only. :-) Ciao, Dscho
Re: [PATCH] packfile: close multi-pack-index in close_all_packs
On 10/29/2018 7:10 AM, SZEDER Gábor wrote: On Thu, Oct 25, 2018 at 12:54:05PM +, Derrick Stolee wrote: Whenever we delete pack-files from the object directory, we need to also delete the multi-pack-index that may refer to those objects. Sometimes, this connection is obvious, like during a repack. Other times, this is less obvious, like when gc calls a repack command and then does other actions on the objects, like write a commit-graph file. The pattern we use to avoid out-of-date in-memory packed_git structs is to call close_all_packs(). This should also call close_midx(). Since we already pass an object store to close_all_packs(), this is a nicely scoped operation. This fixes a test failure when running t6500-gc.sh with GIT_TEST_MULTI_PACK_INDEX=1. Reported-by: Szeder Gábor Signed-off-by: Derrick Stolee --- Thanks for the report, Szeder! I think this is the correct fix, and more likely to be permanent across all builtins that run auto-GC. I based it on ds/test-multi-pack-index so it could easily be added on top. OK, avoiding those errors in the first place is good, of course... However, I still find it disconcerting that those errors didn't cause 'git gc' to error out, and, consequently, what other MIDX-related errors/bugs might do the same. When I added GIT_TEST_MULTI_PACK_INDEX, one of the important pieces was to delete the multi-pack-index file whenever deleting the pack-files it contains. This only happens during a 'git repack'. The thing I had missed (and is covered by this patch) is when we run a subcommand that can remove pack-files while we have a multi-pack-index open. The reason this wasn't caught is that the test suite did not include any cases where the following things happened in order: 1. Open pack-files and multi-pack-index. 2. Delete pack-files, but keep multi-pack-index open. 3. Read objects (from the multi-pack-index). This step 3 was added to 'git gc' by the commit-graph write, hence the break. The pack-file deletion happens in the repack subcommand. Since close_all_packs() is the way we guarded against this problem in the traditional pack-file environment, this is the right place to insert a close_midx() call, and will avoid cases like this in the future (at least, the multi-pack-index will not be the reason on its own). Thanks, -Stolee
[PATCH] test-lib: introduce the '-V' short option for '--verbose-log'
'--verbose-log' is one of the most useful and thus most frequently used test options, but due to its length it's a pain to type on the command line. Let's introduce the corresponding short option '-V' to save some keystrokes. Signed-off-by: SZEDER Gábor --- Or it could be '-L', to emphasize the "log" part of the long option, I don't really care. t/README | 1 + t/test-lib.sh | 4 ++-- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/t/README b/t/README index 8847489640..2e9bef2852 100644 --- a/t/README +++ b/t/README @@ -154,6 +154,7 @@ appropriately before running "make". As the names depend on the tests' file names, it is safe to run the tests with this option in parallel. +-V:: --verbose-log:: Write verbose output to the same logfile as `--tee`, but do _not_ write it to stdout. Unlike `--tee --verbose`, this option diff --git a/t/test-lib.sh b/t/test-lib.sh index 897e6fcc94..47a99aa0ed 100644 --- a/t/test-lib.sh +++ b/t/test-lib.sh @@ -67,7 +67,7 @@ case "$GIT_TEST_TEE_STARTED, $* " in done,*) # do not redirect again ;; -*' --tee '*|*' --va'*|*' --verbose-log '*) +*' --tee '*|*' --va'*|*' -V '*|*' --verbose-log '*) mkdir -p "$TEST_OUTPUT_DIRECTORY/test-results" BASE="$TEST_OUTPUT_DIRECTORY/test-results/$(basename "$0" .sh)" @@ -316,7 +316,7 @@ do echo >&2 "warning: ignoring -x; '$0' is untraceable without BASH_XTRACEFD" fi shift ;; - --verbose-log) + -V|--verbose-log) verbose_log=t shift ;; *) -- 2.19.1.723.g6817e59acc
Re: 'ds/test-multi-pack-index' vs. 'ab/commit-graph-progress'
On Thu, Oct 25, 2018 at 11:11:55PM +0200, Ævar Arnfjörð Bjarmason wrote: > > Anyway, this test seems to be too fragile, because that > > > > test_line_count = 1 stderr > > Yeah maybe it's too fragile, on the other hand it caught what seems to > be a bug under GIT_TEST_MULTI_PACK_INDEX=true, and the rest of the test > suite passes, so there's that. I can image more prudent approaches that would have done the same, e.g. '! grep ^error stderr'. > > line will trigger, when anything else during 'git gc' prints > > something. And I find it quite strange that an option called > > '--no-quiet' only shows the commit-graph progress, but not the regular > > output of 'git gc'. > > It's confusing, but the reason for this seeming discrepancy is that > writing the commit-graph happens in-process, whereas the rest of the > work done by git-gc (and its output) comes from subprocesses. Most of > that output is from "git-repack" / "git-pack-objects" which doesn't pay > the same attention to --quiet and --no-quiet, instead it checks > isatty(). See cmd_pack_objects(). This explains what implementation details led to the current behaviour, but does not justify the actual inconsistency.
Re: [PATCH] packfile: close multi-pack-index in close_all_packs
On Thu, Oct 25, 2018 at 12:54:05PM +, Derrick Stolee wrote: > Whenever we delete pack-files from the object directory, we need > to also delete the multi-pack-index that may refer to those > objects. Sometimes, this connection is obvious, like during a > repack. Other times, this is less obvious, like when gc calls > a repack command and then does other actions on the objects, like > write a commit-graph file. > > The pattern we use to avoid out-of-date in-memory packed_git > structs is to call close_all_packs(). This should also call > close_midx(). Since we already pass an object store to > close_all_packs(), this is a nicely scoped operation. > > This fixes a test failure when running t6500-gc.sh with > GIT_TEST_MULTI_PACK_INDEX=1. > > Reported-by: Szeder Gábor > Signed-off-by: Derrick Stolee > --- > > Thanks for the report, Szeder! I think this is the correct fix, > and more likely to be permanent across all builtins that run > auto-GC. I based it on ds/test-multi-pack-index so it could easily > be added on top. OK, avoiding those errors in the first place is good, of course... However, I still find it disconcerting that those errors didn't cause 'git gc' to error out, and, consequently, what other MIDX-related errors/bugs might do the same. > -Stolee > > packfile.c | 5 + > 1 file changed, 5 insertions(+) > > diff --git a/packfile.c b/packfile.c > index 841b36182f..37fcd8f136 100644 > --- a/packfile.c > +++ b/packfile.c > @@ -339,6 +339,11 @@ void close_all_packs(struct raw_object_store *o) > BUG("want to close pack marked 'do-not-close'"); > else > close_pack(p); > + > + if (o->multi_pack_index) { > + close_midx(o->multi_pack_index); > + o->multi_pack_index = NULL; > + } > } > > /* > > base-commit: 0465a50506023df0932fe0534fe6ac6712c0d854 > -- > 2.17.1 >
Re: [PATCH 12/12] fsck: mark strings for translation
On Sun, Oct 28, 2018 at 07:51:57AM +0100, Nguyễn Thái Ngọc Duy wrote: > Signed-off-by: Nguyễn Thái Ngọc Duy > --- > builtin/fsck.c | 113 - > t/t1410-reflog.sh | 6 +- > t/t1450-fsck.sh| 50 > t/t6050-replace.sh | 4 +- > t/t7415-submodule-names.sh | 6 +- > 5 files changed, 94 insertions(+), 85 deletions(-) > > diff --git a/builtin/fsck.c b/builtin/fsck.c > index 06eb421720..13f8fe35c5 100644 > --- a/builtin/fsck.c > +++ b/builtin/fsck.c > @@ -108,8 +108,9 @@ static int fsck_config(const char *var, const char > *value, void *cb) > static void objreport(struct object *obj, const char *msg_type, > const char *err) > { > - fprintf(stderr, "%s in %s %s: %s\n", > - msg_type, printable_type(obj), describe_object(obj), err); > + fprintf_ln(stderr, _("%s in %s %s: %s"), Are the (f)printf() -> (f)printf_ln() changes all over 'builtin/fsck.c' really necessary to mark strings for translation? > diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh > index 90c765df3a..500c21366e 100755 > --- a/t/t1450-fsck.sh > +++ b/t/t1450-fsck.sh > @@ -594,7 +594,7 @@ test_expect_success 'fsck --name-objects' ' > remove_object $(git rev-parse julius:caesar.t) && > test_must_fail git fsck --name-objects >out && > tree=$(git rev-parse --verify julius:) && > - egrep "$tree \((refs/heads/master|HEAD)@\{[0-9]*\}:" out > + test -n "$GETTEXT_POISON" || egrep "$tree > \((refs/heads/master|HEAD)@\{[0-9]*\}:" out 'test_i18ngrep' accepts all 'grep' options, so this could be written as: test_i18ngrep -E "..." out There might be something else wrong with the patch, because now this test tends to fail on current 'pu' on Travis CI: [... snipped output from 'test_commit' ...] +git rev-parse julius:caesar.t +remove_object be45bbd3809e0829297cefa576e699c134abacfd +sha1_file be45bbd3809e0829297cefa576e699c134abacfd +remainder=45bbd3809e0829297cefa576e699c134abacfd +firsttwo=be +echo .git/objects/be/45bbd3809e0829297cefa576e699c134abacfd +rm .git/objects/be/45bbd3809e0829297cefa576e699c134abacfd +test_must_fail git fsck --name-objects +_test_ok= +git fsck --name-objects +exit_code=2 +test 2 -eq 0 +test_match_signal 13 2 +test 2 = 141 +test 2 = 269 +return 1 +test 2 -gt 129 +test 2 -eq 127 +test 2 -eq 126 +return 0 +git rev-parse --verify julius: +tree=c2fab98f409a47394d992eca10a20e0b22377c0c +test -n +egrep c2fab98f409a47394d992eca10a20e0b22377c0c \((refs/heads/master|HEAD)@\{[0-9]*\}: out error: last command exited with $?=1 not ok 65 - fsck --name-objects The contents of 'out': broken link fromtree be45bbd3809e0829297cefa576e699c134abacfd (refs/heads/master@{1112912113}:caesar.t) toblob be45bbd3809e0829297cefa576e699c134abacfd (refs/heads/master@{1112912113}:caesar.t) missing blob be45bbd3809e0829297cefa576e699c134abacfd (refs/heads/master@{1112912113}:caesar.t) On a related (side)note, I think it would be better if this 'grep' pattern were more explicit about which of the three lines it is supposed to match. Anyway, I couldn't reproduce the failure locally yet, but, admittedly, I didn't try that hard... And my builds on Travis CI haven't yet come that far to try this topic on its own.
Re: Lost changes after merge
After merged, the latest commit(a008c4d580) has lost and the second commit(a274b6e7ca) has been the latest, and changes missed too. 在 2018年10月29日 下午4:49:20, Gray King (graykin...@gmail.com(mailto:graykin...@gmail.com)) 写到: > > > Hello, > > I have a very strange issue described below: > > * Here is the tree before I merge via `git log --format="%h %p %d" -n 20 > --all --graph`: > > https://upaste.de/9Pe > > * Here is the output of `git log --format="%h %p %d" -2 path/to/file`: > > a008c4d580 c61f96eb5d > a274b6e7ca 67c1000ca3 > > * Here is the merge commands: > > git merge f087081868 > # fix conflicts > > * Here is the tree after I merged via `git log --format="%h %p %d" -n 20 > --all --graph`: > > https://upaste.de/8Bx > > > * Here is the output of `git log --format="%h %p %d" -2 path/to/file`: > > a274b6e7ca 67c1000ca3 > 67c1000ca3 00bd5c8c89
Lost changes after merge
Hello, I have a very strange issue described below: * Here is the tree before I merge via `git log --format="%h %p %d" -n 20 --all --graph`: https://upaste.de/9Pe * Here is the output of `git log --format="%h %p %d" -2 path/to/file`: a008c4d580 c61f96eb5d a274b6e7ca 67c1000ca3 * Here is the merge commands: git merge f087081868 # fix conflicts * Here is the tree after I merged via `git log --format="%h %p %d" -n 20 --all --graph`: https://upaste.de/8Bx * Here is the output of `git log --format="%h %p %d" -2 path/to/file`: a274b6e7ca 67c1000ca3 67c1000ca3 00bd5c8c89
Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:"
On Mon, Oct 29 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > >> Add DWYM support for pushing a ref in refs/remotes/* when the > > I think most people call it do-what-*I*-mean, not do-what-you-mean. FWIW I picked this up from the perl list where both are used depending on context. I.e. DW[IY]M depending on if the sentence is one where you'd describe things from a first or second-person perspective "I implemented this to DWIM, and it'll DWYM if you invoke it as...". Also "dwimerry"[1]. >> ref is unqualified. Now instead of erroring out we support e.g.: >> >> $ ./git-push avar refs/remotes/origin/master:upstream-master -n >> To github.com:avar/git.git >> * [new branch]origin/master -> upstream-master >> >> Before this we wouldn't know what do do with >> refs/remotes/origin/master, now we'll look it up and discover that >> it's a commit (or tag) and add a refs/{heads,tags}/ prefix to as >> appropriate. > > I am not sure if this is a good change, as I already hinted earlier. > If I were doing "git push" to any of my publishing places, I would > be irritated if "refs/remotes/ko/master:something" created a local > "something" branch at the desitnation, instead of erroring out. > > On the other hand, I do not think I mind all that much if a src that > is a tag object to automatically go to refs/tags/ (having a tag > object in refs/remotes/** is rare enough to matter in the first > place). Yeah maybe this is going too far. I don't think so, but happy to me challenged on that point :) I don't think so because the only reason I've ever needed this is because I deleted some branch accidentally and am using a push from "remotes/*" to bring it back. I.e. I'll always want branch-for-branch, not to push that as a tag. Of course it isn't actually a "branch", but just a commit object (depending on the refspec configuration), so we can't quite make that hard assumption. But I figured screwing with how refs/remotes/* works by manually adding new refspecs is such an advanced feature that the people doing it are probably all here on-list, and would be the sort of users advanced enough to either know the semantics or try this with -n. Whereas the vast majority of users won't ever screw with it, and if they ever push from remotes/* to another remote almost definitely want a new branch under the new name. 1. https://www.urbandictionary.com/define.php?term=DWYM
Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:"
On Mon, Oct 29 2018, Junio C Hamano wrote: > Ævar Arnfjörð Bjarmason writes: > >> This is the first use of the %N$ style of printf format in >> the *.[ch] files in our codebase. It's supported by POSIX[2] and >> there's existing uses for it in po/*.po files,... > > For now, I'll eject this from 'pu', as I had spent way too much time > trying to make it and other topics work there. I was compiling with DEVELOPER=1 but as it turns out: CFLAGS="-O0" DEVELOPER=1 Wasn't doing what I thought, i.e. we just take 'CFLAGS' from the command-line and don't add any of the DEVELOPER #leftoverbits to it. Will fix this and other issues raised. > CC remote.o > remote.c: In function 'show_push_unqualified_ref_name_error': > remote.c:1035:2: error: $ operand number used after format without operand > number [-Werror=format=] > error(_("The destination you provided is not a full refname (i.e.,\n" > ^ > cc1: all warnings being treated as errors > Makefile:2323: recipe for target 'remote.o' failed > make: *** [remote.o] Error 1 Will fix this and other issues raised. FWIW clang gives a much better error about the actual issue: remote.c:1042:46: error: cannot mix positional and non-positional arguments in format string [-Werror,-Wformat] "- Checking if the being pushed ('%2$s')\n" I.e. this on top fixes it: - "- Looking for a ref that matches '%s' on the remote side.\n" - "- Checking if the being pushed ('%s')\n" + "- Looking for a ref that matches '%1$s' on the remote side.\n" + "- Checking if the being pushed ('%2$s')\n" Maybe this whole thing isn't worth it and I should just do: @@ -1042 +1042 @@ static void show_push_unqualified_ref_name_error(const char *dst_value, - "- Checking if the being pushed ('%2$s')\n" + "- Checking if the being pushed ('%s')\n" @@ -1047 +1047 @@ static void show_push_unqualified_ref_name_error(const char *dst_value, - dst_value, matched_src_name); + dst_value, matched_src_name, matched_src_name); But I'm leaning on the side of keeping it for the self-documentation aspect of "this is a repeated parameter". Your objections to this whole thing being a stupid idea non-withstanding.
Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:"
Junio C Hamano writes: > Ævar Arnfjörð Bjarmason writes: > >> This is the first use of the %N$ style of printf format in >> the *.[ch] files in our codebase. It's supported by POSIX[2] and >> there's existing uses for it in po/*.po files,... > > For now, I'll eject this from 'pu', as I had spent way too much time > trying to make it and other topics work there. > > CC remote.o > remote.c: In function 'show_push_unqualified_ref_name_error': > remote.c:1035:2: error: $ operand number used after format without operand > number [-Werror=format=] > error(_("The destination you provided is not a full refname (i.e.,\n" > ^ > cc1: all warnings being treated as errors > Makefile:2323: recipe for target 'remote.o' failed > make: *** [remote.o] Error 1 I'll redo 'pu' with the following fix-up on top. remote.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/remote.c b/remote.c index c243e3d89e..1927c4b376 100644 --- a/remote.c +++ b/remote.c @@ -1035,8 +1035,8 @@ static void show_push_unqualified_ref_name_error(const char *dst_value, error(_("The destination you provided is not a full refname (i.e.,\n" "starting with \"refs/\"). We tried to guess what you meant by:\n" "\n" - "- Looking for a ref that matches '%s' on the remote side.\n" - "- Checking if the being pushed ('%s')\n" + "- Looking for a ref that matches '%1$s' on the remote side.\n" + "- Checking if the being pushed ('%2$s')\n" " is a ref in \"refs/{heads,tags}/\". If so we add a corresponding\n" " refs/{heads,tags}/ prefix on the remote side.\n" "- Checking if the being pushed ('%2$s')\n" -- 2.19.1-593-gc670b1f876
Re: [PATCH 08/12] remote.c: mark messages for translation
Nguyễn Thái Ngọc Duy writes: > @@ -995,12 +995,12 @@ static int match_explicit_lhs(struct ref *src, >* way to delete 'other' ref at the remote end. >*/ > if (try_explicit_object_name(rs->src, match) < 0) > - return error("src refspec %s does not match any.", > rs->src); > + return error(_("src refspec %s does not match any"), > rs->src); > if (allocated_match) > *allocated_match = 1; > return 0; > default: > - return error("src refspec %s matches more than one.", rs->src); > + return error(_("src refspec %s matches more than one"), > rs->src); > } > } These minor changes that are not accompanied by their own justification mean that the patches in this series cannot blindly be trusted, which in turn means that I won't have bandwidth to process this series properly for now. I also saw die() -> BUG() that was not highlighted in the proposed log message; the single instance I happened to notice looked sensible, but I am not sure about the others. There are other series in flight that touch the same area of code and in different ways, causing unnecessary conflicts, which does not help us either X-<.
Re: [PATCH v3 7/8] push: add DWYM support for "git push refs/remotes/...:"
Ævar Arnfjörð Bjarmason writes: > This is the first use of the %N$ style of printf format in > the *.[ch] files in our codebase. It's supported by POSIX[2] and > there's existing uses for it in po/*.po files,... For now, I'll eject this from 'pu', as I had spent way too much time trying to make it and other topics work there. CC remote.o remote.c: In function 'show_push_unqualified_ref_name_error': remote.c:1035:2: error: $ operand number used after format without operand number [-Werror=format=] error(_("The destination you provided is not a full refname (i.e.,\n" ^ cc1: all warnings being treated as errors Makefile:2323: recipe for target 'remote.o' failed make: *** [remote.o] Error 1
Re: [PATCH] worktree: refactor lock_reason_valid and lock_reason to be more sensible
Nickolai Belakovski writes: > Either way, I do see an issue with the current code that anybody who > wants to know the lock status and/or lock reason of a worktree gets > faced with a confusing, misleading, and opaque piece of code. Sorry, I don't. I do not mind a better documentation for is_worktree_locked() without doing anything else. I do not see any reason to remove fields, split the helper funciton into two, drop the caching, etc., especially when the only justification is "I am new to the codebase and find it confusing".
Re: [PATCH] worktree: refactor lock_reason_valid and lock_reason to be more sensible
On Mon, Oct 29, 2018 at 1:45 AM Nickolai Belakovski wrote: > On Sun, Oct 28, 2018 at 9:01 PM Eric Sunshine wrote: > > That said, I wouldn't necessarily oppose renaming the function, but I > > also don't think it's particularly important to do so. > > To me, I would just go lookup the signature of worktree_lock_reason > and see that it returns a pointer and I'd be satisfied with that. I > could also infer that from looking at the code if I'm just skimming > through. But if I see code like "reason = is_worktree_locked(wt)" I'm > like hold on, what's going on here?! :P I don't feel strongly about it, and, as indicated, wouldn't necessarily be opposed to it. If you do want to make that change, perhaps send it as the second patch of a 2-patch series in which patch 1 just updates the API documentation. That way, if anyone does oppose the rename in patch 2, then that patch can be dropped without having to re-send.