Re: [PATCH] hex: use unsigned index for ring buffer
Am 23.10.2016 um 11:11 schrieb Jeff King: On Sun, Oct 23, 2016 at 11:00:48AM +0200, René Scharfe wrote: Overflow is defined for unsigned integers, but not for signed ones. Make the ring buffer index in sha1_to_hex() unsigned to be on the safe side. Signed-off-by: Rene Scharfe--- Hard to trigger, but probably even harder to diagnose once someone somehow manages to do it on some uncommon architecture. Indeed. If we are worried about overflow, we would also want to assume that it wraps at a multiple of 4, but that is probably a sane assumption. Hmm, I can't think of a way to violate this assumption except with unsigned integers that are only a single bit wide. That would be a weird machine. Are there other possibilities? diff --git a/hex.c b/hex.c index ab2610e..8c6c189 100644 --- a/hex.c +++ b/hex.c @@ -76,7 +76,7 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid) char *sha1_to_hex(const unsigned char *sha1) { - static int bufno; + static unsigned int bufno; static char hexbuffer[4][GIT_SHA1_HEXSZ + 1]; return sha1_to_hex_r(hexbuffer[3 & ++bufno], sha1); } I wonder if just truncating bufno would be conceptually simpler (albeit longer): bufno++; bufno &= 3; return sha1_to_hex_r(hexbuffer[bufno], sha1); You could also write the second line like: bufno %= ARRAY_SIZE(hexbuffer); which is less magical (right now the set of buffers must be a power of 2). I expect the compiler could turn that into a bitmask itself. Expelling magic is a good idea. And indeed, at least gcc, clang and icc on x86-64 are smart enough to use an AND instead of dividing (https://godbolt.org/g/rFPpzF). But gcc also adds a sign extension (cltq/cdqe) if we store the truncated value, unlike the other two compilers. I don't see why -- the bit mask operation enforces a value between 0 and 3 (inclusive) and the upper bits of eax are zeroed automatically, so the cltq is effectively a noop. Using size_t gets us rid of the extra instruction and is the right type anyway. It would suffice on its own, hmm.. I'm fine with any of the options. I guess you'd want a similar patch for find_unique_abbrev on top of jk/no-looking-at-dotgit-outside-repo. Actually I'd want you to want to amend your series yourself. ;) Maybe I can convince Coccinelle to handle that issue for us. And there's also path.c::get_pathname(). That's enough cases to justify adding a macro, I'd say: --- cache.h | 3 +++ hex.c | 4 ++-- path.c | 4 ++-- 3 files changed, 7 insertions(+), 4 deletions(-) diff --git a/cache.h b/cache.h index 05ecb88..8bb4918 100644 --- a/cache.h +++ b/cache.h @@ -555,6 +555,9 @@ extern int daemonize(void); } \ } while (0) +#define NEXT_RING_ITEM(array, index) \ + (array)[(index) = ((index) + 1) % ARRAY_SIZE(array)] + /* Initialize and use the cache information */ struct lock_file; extern int read_index(struct index_state *); diff --git a/hex.c b/hex.c index ab2610e..5e711b9 100644 --- a/hex.c +++ b/hex.c @@ -76,9 +76,9 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid) char *sha1_to_hex(const unsigned char *sha1) { - static int bufno; + static size_t bufno; static char hexbuffer[4][GIT_SHA1_HEXSZ + 1]; - return sha1_to_hex_r(hexbuffer[3 & ++bufno], sha1); + return sha1_to_hex_r(NEXT_RING_ITEM(hexbuffer, bufno), sha1); } char *oid_to_hex(const struct object_id *oid) diff --git a/path.c b/path.c index a8e7295..60dba6a 100644 --- a/path.c +++ b/path.c @@ -24,8 +24,8 @@ static struct strbuf *get_pathname(void) static struct strbuf pathname_array[4] = { STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT }; - static int index; - struct strbuf *sb = _array[3 & ++index]; + static size_t index; + struct strbuf *sb = _RING_ITEM(pathname_array, index); strbuf_reset(sb); return sb; } -- 2.10.1
Re: RFC Failover url for fetches?
W dniu 21.10.2016 o 21:03, Junio C Hamano pisze: > Stefan Bellerwrites: > >> So when pushing it is possible to have multiple urls >> (remote..url) configured. >> >> When fetching only the first configured url is considered. >> Would it make sense to allow multiple urls and >> try them one by one until one works? > > I do not think the two are related. Pushing to multiple is not "I > want to update at least one of them" in the first place. Push is/should be 'update all', fetch is/should be 'fetch any'. I thought that multiple remote..url values provide this fallback for fetch, but it looks like it isn't so... > > As to fetching from two or more places as "fallback", I am > moderately negative to add it as a dumb feature that does nothing > more than "My fetch from A failed, so let's blindly try it from B". > I'd prefer to keep the "My fetch from A is failing" knowledge near > the surface of end user's consciousness as a mechanism to pressure A > to fix it--that way everybody who is fetching from A benefits. > After all, doing "git remote add B" once (you'd need to tell the URL > for B anyway to Git) and issuing "git fetch B" after seeing your > regular "git fetch" fails once in a blue moon is not all that > cumbersome, I would think. One would need to configure fallback B remote to use the same remote-branch namespace as remote A, if it is to be used as fallback, I would think. > > Some people _may_ have objection based on A and B going out of sync, > especially B may fall behind even yourself and cause non-ff errors, > but I personally am not worried about that, because when somebody > configures B as a fallback for A, there is an expectation that B is > kept reasonably up to date. It would be a problem if some refs are > expected to be constantly rewound at A (e.g. 'pu' in my tree) and > configured to always force-fetch, though. A stale B would silently > set such a branch in your repository back without failing. Nb. there is also http-alternates mechanism... which nowadays doesn't matter anyway, I would think. -- Jakub Narębski
Re: [PATCH v1 10/19] read-cache: regenerate shared index if necessary
On 23/10/16 10:26, Christian Couder wrote: > When writing a new split-index and there is a big number of cache > entries in the split-index compared to the shared index, it is a > good idea to regenerate the shared index. > > By default when the ratio reaches 20%, we will push back all > the entries from the split-index into a new shared index file > instead of just creating a new split-index file. > > The threshold can be configured using the > "splitIndex.maxPercentChange" config variable. > > We need to adjust the existing tests in t1700 by setting > "splitIndex.maxPercentChange" to 100 at the beginning of t1700, > as the existing tests are assuming that the shared index is > regenerated only when `git update-index --split-index` is used. > > Signed-off-by: Christian Couder> --- > read-cache.c | 33 - > t/t1700-split-index.sh | 1 + > 2 files changed, 33 insertions(+), 1 deletion(-) > > diff --git a/read-cache.c b/read-cache.c > index bb53823..a91fabe 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -2216,6 +2216,36 @@ static int write_shared_index(struct index_state > *istate, > return ret; > } > > +static const int default_max_percent_split_change = 20; > + > +int too_many_not_shared_entries(struct index_state *istate) This function is a file-loacal symbol; could you please make it a static function. Thanks. ATB, Ramsay Jones
Re: [PATCH 28/36] attr: keep attr stack for each check
On 23/10/16 00:32, Stefan Beller wrote: > Instead of having a global attr stack, attach the stack to each check. > This allows to use the attr in a multithreaded way. > > > > Signed-off-by: Stefan Beller> --- > attr.c| 101 > +++--- > attr.h| 4 ++- > hashmap.h | 2 ++ > 3 files changed, 69 insertions(+), 38 deletions(-) > > diff --git a/attr.c b/attr.c > index 89ae155..b65437d 100644 > --- a/attr.c > +++ b/attr.c > @@ -372,15 +372,17 @@ static struct match_attr *parse_attr_line(const char > *line, const char *src, > * .gitignore file and info/excludes file as a fallback. > */ > > -/* NEEDSWORK: This will become per git_attr_check */ > -static struct attr_stack { > +struct attr_stack { > struct attr_stack *prev; > char *origin; > size_t originlen; > unsigned num_matches; > unsigned alloc; > struct match_attr **attrs; > -} *attr_stack; > +}; > + > +struct hashmap all_attr_stacks; > +int all_attr_stacks_init; Mark symbols 'all_attr_stacks' and 'all_attr_stacks_init' with the static keyword. (ie. these are file-local symbols). ATB, Ramsay Jones
Re: [PATCH 17/36] attr: expose validity check for attribute names
On 23/10/16 00:32, Stefan Beller wrote: > From: Junio C Hamano> > Export attr_name_valid() function, and a helper function that > returns the message to be given when a given pair > is not a good name for an attribute. > > We could later update the message to exactly spell out what the > rules for a good attribute name are, etc. > > Signed-off-by: Junio C Hamano > Signed-off-by: Stefan Beller > --- [snip] > +extern int attr_name_valid(const char *name, size_t namelen); > +extern void invalid_attr_name_message(struct strbuf *, const char *, int); > + The symbol 'attr_name_valid()' is not used outside of attr.c, even by the end of this series. Do you expect this function to be used in any future series? (The export is deliberate and it certainly seems like it should be part of the public interface, but ...) In contrast, the 'invalid_attr_name_message()' function is called from code in pathspec.c, which relies on 'git_attr_counted()' to call 'attr_name_valid()' internally to check for validity. :-D ATB, Ramsay Jones
Re: git clone --bare --origin incompatible?
# sch...@linux-m68k.org / 2016-10-23 14:29:55 +0200: > On Okt 23 2016, Roman Neuhauserwrote: > > > what is the reason clone --bare prohibits --origin? > > > > % git clone --bare -o fubar anything anywhere > > fatal: --bare and --origin fubar options are incompatible. > > Since a bare clone maps remote branches directly to local branches, > without any remote-tracking branches, --origin doesn't make sense. is it going to break something though? i can still go and rename the remote in the bare repo's config file afterwards. -- roman
Re: git clone --bare --origin incompatible?
On Okt 23 2016, Roman Neuhauserwrote: > what is the reason clone --bare prohibits --origin? > > % git clone --bare -o fubar anything anywhere > fatal: --bare and --origin fubar options are incompatible. Since a bare clone maps remote branches directly to local branches, without any remote-tracking branches, --origin doesn't make sense. Andreas. -- Andreas Schwab, sch...@linux-m68k.org GPG Key fingerprint = 58CA 54C7 6D53 942B 1756 01D3 44D5 214B 8276 4ED5 "And now for something completely different."
git clone --bare --origin incompatible?
hello, what is the reason clone --bare prohibits --origin? % git clone --bare -o fubar anything anywhere fatal: --bare and --origin fubar options are incompatible. -- roman
Re: tools for easily "uncommitting" parts of a patch I just commited?
On Sun, Oct 23, 2016 at 8:38 AM, Jeff Kingwrote: > On Sun, Oct 23, 2016 at 08:23:01AM +0700, Duy Nguyen wrote: > >> I hit the same problem sometimes, but in my case sometimes I >> accidentally do "git add" after "git add -p" and a configuration in >> "git commit -a" won't help me. I'd prefer we could undo changes in >> index instead. Something like reflog but for index. > > An index write always writes the whole file from scratch, so you really > just need to save a copy of the old file. Perhaps something like: > > rm -f $GIT_DIR/index.old > ln $GIT_DIR/index.old $GIT_DIR/index > ... and then open $GIT_DIR/index.tmp ... > ... and then rename(index.tmp, index) ... > > could do it cheaply. It's a little more complicated if you want to save > a sequence of versions, and eventually would take a lot of space, but > presumably a handful of saved indexes would be sufficient. Yeah. I had something [1] like that but never sorted out the UI for it :( > Another option would be an index format that journals, and you could > potentially walk back the journal to a point. That seems like a much > bigger change (and has weird layering, because deciding when to fold in > the journal is usually a performance thing, but obviously this would > have user-visible impact about how far back you could undo). v2 [2] goes in this direction (but not a full blown COW, the journal does not take part in any core operations of the index) [1] https://public-inbox.org/git/%3c1375597720-13236-1-git-send-email-pclo...@gmail.com%3E/ [2] https://public-inbox.org/git/1375966270-10968-1-git-send-email-pclo...@gmail.com/ -- Duy
Re: [PATCH 2/3] submodule tests: replace cloning from . by "$(pwd)"
Am 22.10.2016 um 22:46 schrieb Stefan Beller: I have looked into it again, and by now I think the bug is a feature, actually. Consider this: git clone . super git -C super submodule add ../submodule # we thought the previous line is buggy git clone super super-clone At this point, we *should* have this if there were no bugs (at least that is my assumption): /tmp ! + submodule <- submodule's remote repo ! + foo <- we are here (.), super's remote repo ! + super <- remote.origin.url=/tmp/foo/. ! + submodule <- remote.origin.url=/tmp/foo/./../submodule submodule.submodule.url=../submodule When I test this, 'git submodule add' fails: foo@master> git -C super submodule add ../submodule fatal: repository '/tmp/foo/submodule' does not exist fatal: clone of '/tmp/foo/submodule' into submodule path '/tmp/foo/super/submodule' failed Now in the super-clone the ../submodule is the correct relative url, because the url where we cloned from doesn't end in /. I do not understand why this would be relevant. The question is not how the submodule's remote URL ends, but how the submodule's remote URL is constructed from the super-project's URL and the relative path specified for 'git submodule add'. Whether ../submodule or ./submodule is the correct relative URL depends on where the origin of the submodule is located relative to the origin of the super-project. In the above example, it is ../submodule. However, the error message tells us that git looked in /tmp/foo/submodule, which looks like the /. bug! I do not understand where you see a feature here. What am I missing? -- Hannes
Re: [PATCH v5 00/27] Prepare the sequencer for the upcoming rebase -i patches
Hi Junio, On Sun, 23 Oct 2016, Johannes Schindelin wrote: > On Sat, 22 Oct 2016, Junio C Hamano wrote: > > > Johannes Schindelinwrites: > > > > > This patch series marks the '4' in the countdown to speed up rebase -i > > > by implementing large parts in C (read: there will be three more patch > > > series after that before the full benefit hits git.git: sequencer-i, > > > rebase--helper and rebase-i-extra). > > > ... > > > It would be *really* nice if we could get this patch series at least > > > into `next` soon, as it gets late and later for the rest of the > > > patches to make it into `master` in time for v2.11 (and it is not for > > > lack of trying on my end...). > > > > This "countdown 4" step can affect cherry-pick and revert, even Oh, I forgot to comment on this tidbit of your mail, sorry. This *is* the countdown 4, as the remaining 3 patch series depend on each other in the order I sent them out. Ciao, Dscho
Re: [PATCH v5 00/27] Prepare the sequencer for the upcoming rebase -i patches
Hi Junio, On Sat, 22 Oct 2016, Junio C Hamano wrote: > Johannes Schindelinwrites: > > > This patch series marks the '4' in the countdown to speed up rebase -i > > by implementing large parts in C (read: there will be three more patch > > series after that before the full benefit hits git.git: sequencer-i, > > rebase--helper and rebase-i-extra). > > ... > > It would be *really* nice if we could get this patch series at least > > into `next` soon, as it gets late and later for the rest of the > > patches to make it into `master` in time for v2.11 (and it is not for > > lack of trying on my end...). > > This "countdown 4" step can affect cherry-pick and revert, even > though we were careful to review changes to the sequencer.c code. As I pointed out in another mail in this thread: we should not fall into the trap of overrating review. In the case of the rebase--helper patches, so far the review mainly resulted in more work for me (having to change spellings elsewhere, for example), not in improving the changes I intended to introduce into git.git's code. Sure, there has been the occasional improvement, but it certainly feels as if I spent about 80% of the work after each -v1 iteration on things that have positively nothing at all to do with accelerating rebase -i. > I prefer to cook it in 'next' sufficiently long to ensure that we hear > feedbacks from non-Windows users if there is any unexpected breakage. FWIW I am using the same patches not only on Windows but also in my Linux VM. > There isn't enough time to include this topic in the upcoming > release within the current https://tinyurl.com/gitCal calendar, > however, which places the final on Nov 11th. More is the pity. Thank you, though, for being upfront with me. I will shift my focus to tasks that require my attention more urgently, then. Ciao, Dscho
Re: [PATCH v5 00/27] Prepare the sequencer for the upcoming rebase -i patches
Hi Junio, On Fri, 21 Oct 2016, Junio C Hamano wrote: > Johannes Schindelinwrites: > > I still do not understand (note that I am not saying "I do not > accept"--acceptance or rejection happens after an understandable > explanation is given, and "do not understand" means no such > explanation has been given yet) your justification behind adding a > technical debt to reimplement the author-script parser and not > sharing it with "git am" in 13/27. At this point, I am most of all reluctant to introduce such a huge change, which surely would introduce a regression. This is what happened a couple of times to me, most recently with the hide-dot-gitdir patch series that worked flawlessly for years, had to be dramatically changed during review to enter git.git, and introduced the major regression that `core.hideDotFiles = gitDirOnly` was broken. The lesson I learned: review should not be valued more than the test of time. This lesson has been reinforced by all the regressions that have not been caught by review nor the test suite running on Linux only. It would be a different matter if I still had the cross-validator in place (which I did when I sent out v1 of this patch series) and tons of time to spend on accommodating your wishes, however I may disagree with them. And in this instance, I thought I made clear that I disagree, and why: Internally, git-am and git-rebase-i handle the author-script very differently. That may change at some stage in the future, and it would be a good time then and there to take care of unifying this code. Currently, not so much, as the only excuse to use the same parser would be that they both read the same file, while they have to do very different things with the parsed output (in fact, your suggestion would ask the parser in the sequencer to rip apart the information into key/value pairs, only to re-glue them back together when they are used as the environment variables as which rebase-i treats the contents of the author-script file). So no, at this point I am not willing to risk introducing breakages in code that has been proven to work in practice. Ciao, Dscho
Re: [PATCH v5 22/27] sequencer: teach write_message() to append an optional LF
Hi Junio, On Fri, 21 Oct 2016, Junio C Hamano wrote: > Johannes Schindelinwrites: > > > This commit prepares for future callers that will have a pointer/length > > to some text to be written that lacks an LF, yet an LF is desired. > > Instead of requiring the caller to append an LF to the buffer (and > > potentially allocate memory to do so), the write_message() function > > learns to append an LF at the end of the file. > > As no existing callers need this, it probably is better left out and > added to the series that actually needs the new feature as a > preparatory step. Apart from this patch series being semantically the right place ("prepare-sequencer"), there is also the following consideration: The next patch series is already quite long. Taking this current patch series into account, which started out as a 22-patch series and needed to bloat by 25% through four subsequent iterations, it is probably not a wise idea to move this patch to a patch series that already weighs 34 patches. So I respectfully, and forcefully, disagree, Dscho
Re: Stash pop/apply conflict and --theirs and --ours
On Sun, Oct 23, 2016 at 12:58:12AM +0200, Sven Strickroth wrote: > I regularly experience that beginners have problems unterstanding that > --ours and --theirs are swapped when a conflict occurrs on git stash > apply or stash pop. > > From the HCI perspective this is really counter intuitive. I know that people have complained about "rebase" swapping the two, but I don't think anybody has ever mentioned it for stash. I'm not sure I that they are swapped, though. The "ours" content is generally what was in the HEAD before the operation started, and "theirs" is what the operation is bringing into that history. That is true of "merge" and "cherry-pick". And AFAICT, it is true of "stash", too (I basically think of "stash apply" as a cherry-pick). So with a setup like: git init echo base >file git add file git commit -m file echo stash >file git stash echo master >file git commit -am master git checkout -b branch HEAD^ echo branch >file git commit -am branch if we merge, then --theirs is the branch we are merging: git checkout master git merge branch git checkout --theirs file cat file # "branch" Likewise, if we cherry-pick: git reset --hard git cherry-pick branch git checkout --theirs file cat file # "branch" And likewise if we apply the stash: git reset --hard git stash apply git checkout --theirs file cat file # "stash" So that seems consistent to me. I guess if you are stashing in order to pull somebody else's work, like: git stash git pull git stash pop then conceptually the stash is "ours" and HEAD is "theirs". This is exactly like the rebase case. E.g., if you instead did: git commit -m 'tmp stash' git pull --rebase So I sympathize, but I don't think that having "stash" flip the order would be the right thing to do in all cases. In theory there could be some kind of option (and things like pull autostash could use it), but I suspect it may be hard to implement in practice. The unpack-trees code does not treat "ours" and "theirs" entirely symmetrically (the "ours" side represents the current working tree, so we might do things like check whether the index is fresh). I guess you could flip the "1" and "2" bits in the index after the conflicted merge completes. I'm still not convinced it's a good idea, though. -Peff
Re: [PATCH v4 20/25] sequencer: refactor write_message()
Hi Junio, On Fri, 21 Oct 2016, Junio C Hamano wrote: > Johannes Schindelinwrites: > > >> Ah, make that four steps. The final one is: > >> > >> - add append_eol parameter that nobody uses at this step in the > >> series. > >> > >> This is a new feature to the helper. While it is OK to have it as a > >> preparatory step in this series, it is easier to understand if it > >> kept as a separate step. It is even more preferrable if it is made > >> as a preparatory step in a series that adds a caller that passes > >> true to append_eol to this helper... > > > > Done, > > Dscho > > Hmm, what has been done exactly? I still see append_eol in v5 where > nobody uses it yet. Confused... Your bullet point suggests that this change should be a separate patch. I did that. And since this patch series' purpose is to prepare the sequencer for the upcoming series that teaches it to understand git-rebase-todo scripts, this patch falls fair and square into the preparatory phase. Ciao, Dscho
[PATCH v1 14/19] read-cache: touch shared index files when used
When a split-index file is created, let's update the mtime of the shared index file that the split-index file is referencing. In a following commit we will make shared index file expire depending on their mtime, so updating the mtime makes sure that the shared index file will not be deleted soon. Signed-off-by: Christian Couder--- read-cache.c | 6 ++ 1 file changed, 6 insertions(+) diff --git a/read-cache.c b/read-cache.c index a91fabe..3aeff77 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2268,6 +2268,12 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, int ret = write_shared_index(istate, lock, flags); if (ret) return ret; + } else { + /* Signal that the shared index is used */ + const char *shared_index = git_path("sharedindex.%s", + sha1_to_hex(si->base_sha1)); + if (!check_and_freshen_file(shared_index, 1)) + warning("could not freshen '%s'", shared_index); } return write_split_index(istate, lock, flags); -- 2.10.1.462.g7e1e03a
[PATCH v1 08/19] Documentation/git-update-index: talk about core.splitIndex config var
Signed-off-by: Christian Couder--- Documentation/git-update-index.txt | 6 ++ 1 file changed, 6 insertions(+) diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt index 7386c93..e091b2a 100644 --- a/Documentation/git-update-index.txt +++ b/Documentation/git-update-index.txt @@ -171,6 +171,12 @@ may not support it yet. given again, all changes in $GIT_DIR/index are pushed back to the shared index file. This mode is designed for very large indexes that take a significant amount of time to read or write. ++ +These options take effect whatever the value of the `core.splitIndex` +configuration variable (see linkgit:git-config[1]). But a warning is +emitted when the change goes against the configured value, as the +configured value will take effect next time the index is read and this +will remove the intended effect of the option. --untracked-cache:: --no-untracked-cache:: -- 2.10.1.462.g7e1e03a
[PATCH v1 06/19] t1700: add tests for core.splitIndex
Signed-off-by: Christian Couder--- t/t1700-split-index.sh | 37 + 1 file changed, 37 insertions(+) diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh index 292a072..db8c39f 100755 --- a/t/t1700-split-index.sh +++ b/t/t1700-split-index.sh @@ -200,4 +200,41 @@ EOF test_cmp expect actual ' +test_expect_success 'set core.splitIndex config variable to true' ' + git config core.splitIndex true && + : >three && + git update-index --add three && + git ls-files --stage >ls-files.actual && + cat >ls-files.expect expect ls-files.expect expect <
[PATCH v1 16/19] read-cache: unlink old sharedindex files
Everytime split index is turned on, it creates a "sharedindex." file in the git directory. This change makes sure that shared index files that haven't been used for a long time are removed when a new shared index file is created. The new "splitIndex.sharedIndexExpire" config variable is created to tell the delay after which an unused shared index file can be deleted. It defaults to "1.week.ago". A previous commit made sure that each time a split index file is created the mtime of the shared index file it references is updated. This makes sure that recently used shared index file will not be deleted. Signed-off-by: Christian Couder--- read-cache.c | 63 +++- 1 file changed, 62 insertions(+), 1 deletion(-) diff --git a/read-cache.c b/read-cache.c index 3aeff77..65ceb29 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2190,6 +2190,64 @@ static int write_split_index(struct index_state *istate, return ret; } +static const char *shared_index_expire = "1.week.ago"; + +static unsigned long get_shared_index_expire_date(void) +{ + static unsigned long shared_index_expire_date; + static int shared_index_expire_date_prepared; + + if (!shared_index_expire_date_prepared) { + git_config_get_date_string("splitindex.sharedindexexpire", + _index_expire); + shared_index_expire_date = approxidate(shared_index_expire); + shared_index_expire_date_prepared = 1; + } + + return shared_index_expire_date; +} + +static int can_delete_shared_index(const char *shared_sha1_hex) +{ + struct stat st; + unsigned long expiration; + const char *shared_index = git_path("sharedindex.%s", shared_sha1_hex); + + /* Check timestamp */ + expiration = get_shared_index_expire_date(); + if (!expiration) + return 0; + if (stat(shared_index, )) + return error_errno("could not stat '%s", shared_index); + if (st.st_mtime > expiration) + return 0; + + return 1; +} + +static void clean_shared_index_files(const char *current_hex) +{ + struct dirent *de; + DIR *dir = opendir(get_git_dir()); + + if (!dir) { + error_errno("unable to open git dir: %s", get_git_dir()); + return; + } + + while ((de = readdir(dir)) != NULL) { + const char *sha1_hex; + if (!skip_prefix(de->d_name, "sharedindex.", _hex)) + continue; + if (!strcmp(sha1_hex, current_hex)) + continue; + if (can_delete_shared_index(sha1_hex) > 0 && + unlink(git_path("%s", de->d_name))) + error_errno("unable to unlink: %s", git_path("%s", de->d_name)); + } + closedir(dir); +} + static struct tempfile temporary_sharedindex; static int write_shared_index(struct index_state *istate, @@ -2211,8 +2269,11 @@ static int write_shared_index(struct index_state *istate, } ret = rename_tempfile(_sharedindex, git_path("sharedindex.%s", sha1_to_hex(si->base->sha1))); - if (!ret) + if (!ret) { hashcpy(si->base_sha1, si->base->sha1); + clean_shared_index_files(sha1_to_hex(si->base->sha1)); + } + return ret; } -- 2.10.1.462.g7e1e03a
[PATCH v1 15/19] config: add git_config_get_date_string() from gc.c
This function will be used in a following commit to get the expiration time of the shared index files from the config, and it is generic enough to be put in "config.c". Signed-off-by: Christian Couder--- builtin/gc.c | 15 ++- cache.h | 1 + config.c | 13 + 3 files changed, 16 insertions(+), 13 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index 069950d..c1e9602 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -62,17 +62,6 @@ static void report_pack_garbage(unsigned seen_bits, const char *path) string_list_append(_garbage, path); } -static void git_config_date_string(const char *key, const char **output) -{ - if (git_config_get_string_const(key, output)) - return; - if (strcmp(*output, "now")) { - unsigned long now = approxidate("now"); - if (approxidate(*output) >= now) - git_die_config(key, _("Invalid %s: '%s'"), key, *output); - } -} - static void process_log_file(void) { struct stat st; @@ -111,8 +100,8 @@ static void gc_config(void) git_config_get_int("gc.auto", _auto_threshold); git_config_get_int("gc.autopacklimit", _auto_pack_limit); git_config_get_bool("gc.autodetach", _auto); - git_config_date_string("gc.pruneexpire", _expire); - git_config_date_string("gc.worktreepruneexpire", _worktrees_expire); + git_config_get_date_string("gc.pruneexpire", _expire); + git_config_get_date_string("gc.worktreepruneexpire", _worktrees_expire); git_config(git_default_config, NULL); } diff --git a/cache.h b/cache.h index a625b47..bcfc0f1 100644 --- a/cache.h +++ b/cache.h @@ -1811,6 +1811,7 @@ extern int git_config_get_bool(const char *key, int *dest); extern int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest); extern int git_config_get_maybe_bool(const char *key, int *dest); extern int git_config_get_pathname(const char *key, const char **dest); +extern int git_config_get_date_string(const char *key, const char **output); extern int git_config_get_untracked_cache(void); extern int git_config_get_split_index(void); extern int git_config_get_max_percent_split_change(void); diff --git a/config.c b/config.c index 5580f56..f88c61b 100644 --- a/config.c +++ b/config.c @@ -1685,6 +1685,19 @@ int git_config_get_pathname(const char *key, const char **dest) return ret; } +int git_config_get_date_string(const char *key, const char **output) +{ + int ret = git_config_get_string_const(key, output); + if (ret) + return ret; + if (strcmp(*output, "now")) { + unsigned long now = approxidate("now"); + if (approxidate(*output) >= now) + git_die_config(key, _("Invalid %s: '%s'"), key, *output); + } + return ret; +} + int git_config_get_untracked_cache(void) { int val = -1; -- 2.10.1.462.g7e1e03a
[PATCH v1 03/19] split-index: add {add,remove}_split_index() functions
Also use the functions in cmd_update_index() in builtin/update-index.c. These functions will be used in a following commit to tweak our use of the split-index feature depending on the setting of a configuration variable. Signed-off-by: Christian Couder--- builtin/update-index.c | 18 ++ split-index.c | 22 ++ split-index.h | 2 ++ 3 files changed, 30 insertions(+), 12 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index f3f07e7..b75ea03 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -1098,18 +1098,12 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) } if (split_index > 0) { - init_split_index(_index); - the_index.cache_changed |= SPLIT_INDEX_ORDERED; - } else if (!split_index && the_index.split_index) { - /* -* can't discard_split_index(_index); because that -* will destroy split_index->base->cache[], which may -* be shared with the_index.cache[]. So yeah we're -* leaking a bit here. -*/ - the_index.split_index = NULL; - the_index.cache_changed |= SOMETHING_CHANGED; - } + if (the_index.split_index) + the_index.cache_changed |= SPLIT_INDEX_ORDERED; + else + add_split_index(_index); + } else if (!split_index) + remove_split_index(_index); switch (untracked_cache) { case UC_UNSPECIFIED: diff --git a/split-index.c b/split-index.c index 615f4ca..f519e60 100644 --- a/split-index.c +++ b/split-index.c @@ -317,3 +317,25 @@ void replace_index_entry_in_base(struct index_state *istate, istate->split_index->base->cache[new->index - 1] = new; } } + +void add_split_index(struct index_state *istate) +{ + if (!istate->split_index) { + init_split_index(istate); + istate->cache_changed |= SPLIT_INDEX_ORDERED; + } +} + +void remove_split_index(struct index_state *istate) +{ + if (istate->split_index) { + /* +* can't discard_split_index(_index); because that +* will destroy split_index->base->cache[], which may +* be shared with the_index.cache[]. So yeah we're +* leaking a bit here. +*/ + istate->split_index = NULL; + istate->cache_changed |= SOMETHING_CHANGED; + } +} diff --git a/split-index.h b/split-index.h index c1324f5..df91c1b 100644 --- a/split-index.h +++ b/split-index.h @@ -31,5 +31,7 @@ void merge_base_index(struct index_state *istate); void prepare_to_write_split_index(struct index_state *istate); void finish_writing_split_index(struct index_state *istate); void discard_split_index(struct index_state *istate); +void add_split_index(struct index_state *istate); +void remove_split_index(struct index_state *istate); #endif -- 2.10.1.462.g7e1e03a
[PATCH v1 07/19] Documentation/config: add information for core.splitIndex
Signed-off-by: Christian Couder--- Documentation/config.txt | 4 1 file changed, 4 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index 27069ac..96521a4 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -331,6 +331,10 @@ core.trustctime:: crawlers and some backup systems). See linkgit:git-update-index[1]. True by default. +core.splitIndex:: + If true, the split-index feature of the index will be used. + See linkgit:git-update-index[1]. False by default. + core.untrackedCache:: Determines what to do about the untracked cache feature of the index. It will be kept, if this variable is unset or set to -- 2.10.1.462.g7e1e03a
[PATCH v1 10/19] read-cache: regenerate shared index if necessary
When writing a new split-index and there is a big number of cache entries in the split-index compared to the shared index, it is a good idea to regenerate the shared index. By default when the ratio reaches 20%, we will push back all the entries from the split-index into a new shared index file instead of just creating a new split-index file. The threshold can be configured using the "splitIndex.maxPercentChange" config variable. We need to adjust the existing tests in t1700 by setting "splitIndex.maxPercentChange" to 100 at the beginning of t1700, as the existing tests are assuming that the shared index is regenerated only when `git update-index --split-index` is used. Signed-off-by: Christian Couder--- read-cache.c | 33 - t/t1700-split-index.sh | 1 + 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/read-cache.c b/read-cache.c index bb53823..a91fabe 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2216,6 +2216,36 @@ static int write_shared_index(struct index_state *istate, return ret; } +static const int default_max_percent_split_change = 20; + +int too_many_not_shared_entries(struct index_state *istate) +{ + int i, not_shared = 0; + int max_split = git_config_get_max_percent_split_change(); + + switch (max_split) { + case -1: + /* not or badly configured: use the default value */ + max_split = default_max_percent_split_change; + break; + case 0: + return 1; /* 0% means always write a new shared index */ + case 100: + return 0; /* 100% means never write a new shared index */ + default: + ; /* do nothing: just use the configured value */ + } + + /* Count not shared entries */ + for (i = 0; i < istate->cache_nr; i++) { + struct cache_entry *ce = istate->cache[i]; + if (!ce->index) + not_shared++; + } + + return istate->cache_nr * max_split < not_shared * 100; +} + int write_locked_index(struct index_state *istate, struct lock_file *lock, unsigned flags) { @@ -2233,7 +2263,8 @@ int write_locked_index(struct index_state *istate, struct lock_file *lock, if ((v & 15) < 6) istate->cache_changed |= SPLIT_INDEX_ORDERED; } - if (istate->cache_changed & SPLIT_INDEX_ORDERED) { + if (istate->cache_changed & SPLIT_INDEX_ORDERED || + too_many_not_shared_entries(istate)) { int ret = write_shared_index(istate, lock, flags); if (ret) return ret; diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh index db8c39f..507a1dd 100755 --- a/t/t1700-split-index.sh +++ b/t/t1700-split-index.sh @@ -8,6 +8,7 @@ test_description='split index mode tests' sane_unset GIT_TEST_SPLIT_INDEX test_expect_success 'enable split index' ' + git config splitIndex.maxPercentChange 100 && git update-index --split-index && test-dump-split-index .git/index >actual && indexversion=$(test-index-version <.git/index) && -- 2.10.1.462.g7e1e03a
[PATCH v1 12/19] Documentation/config: add splitIndex.maxPercentChange
Signed-off-by: Christian Couder--- Documentation/config.txt | 13 + 1 file changed, 13 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index 96521a4..380eeb8 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2763,6 +2763,19 @@ showbranch.default:: The default set of branches for linkgit:git-show-branch[1]. See linkgit:git-show-branch[1]. +splitIndex.maxPercentChange:: + When the split index feature is used, this specifies the + percent of entries the split index can contain compared to the + whole number of entries in both the split index and the shared + index before a new shared index is written. + The value should be between 0 and 100. If the value is 0 then + a new shared index is always written, if it is 100 a new + shared index is never written. + By default the value is 20, so a new shared index is written + if the number of entries in the split index would be greater + than 20 percent of the total number of entries. + See linkgit:git-update-index[1]. + status.relativePaths:: By default, linkgit:git-status[1] shows paths relative to the current directory. Setting this variable to `false` shows paths -- 2.10.1.462.g7e1e03a
[PATCH v1 05/19] update-index: warn in case of split-index incoherency
When users are using `git update-index --(no-)split-index`, they may expect the split-index feature to be used or not according to the option they just used, but this might not be the case if the new "core.splitIndex" config variable has been set. In this case let's warn about what will happen and why. Signed-off-by: Christian Couder--- builtin/update-index.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index b75ea03..a14dbf2 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -1098,12 +1098,21 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) } if (split_index > 0) { + if (git_config_get_split_index() == 0) + warning("core.splitIndex is set to false; " + "remove or change it, if you really want to " + "enable split index"); if (the_index.split_index) the_index.cache_changed |= SPLIT_INDEX_ORDERED; else add_split_index(_index); - } else if (!split_index) + } else if (!split_index) { + if (git_config_get_split_index() == 1) + warning("core.splitIndex is set to true; " + "remove or change it, if you really want to " + "disable split index"); remove_split_index(_index); + } switch (untracked_cache) { case UC_UNSPECIFIED: -- 2.10.1.462.g7e1e03a
[PATCH v1 17/19] t1700: test shared index file expiration
Signed-off-by: Christian Couder--- t/t1700-split-index.sh | 44 1 file changed, 44 insertions(+) diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh index f03addf..f448fc1 100755 --- a/t/t1700-split-index.sh +++ b/t/t1700-split-index.sh @@ -310,4 +310,48 @@ EOF test_cmp expect actual ' +test_expect_success 'shared index files expire after 7 days by default' ' + : >ten && + git update-index --add ten && + test $(ls .git/sharedindex.* | wc -l) -gt 1 && + just_under_7_days_ago=$((1-7*86400)) && + test-chmtime =$just_under_7_days_ago .git/sharedindex.* && + : >eleven && + git update-index --add eleven && + test $(ls .git/sharedindex.* | wc -l) -gt 1 && + just_over_7_days_ago=$((-1-7*86400)) && + test-chmtime =$just_over_7_days_ago .git/sharedindex.* && + : >twelve && + git update-index --add twelve && + test $(ls .git/sharedindex.* | wc -l) = 1 +' + +test_expect_success 'check splitIndex.sharedIndexExpire set to 8 days' ' + git config splitIndex.sharedIndexExpire "8.days.ago" && + test-chmtime =$just_over_7_days_ago .git/sharedindex.* && + : >thirteen && + git update-index --add thirteen && + test $(ls .git/sharedindex.* | wc -l) -gt 1 && + just_over_8_days_ago=$((-1-8*86400)) && + test-chmtime =$just_over_8_days_ago .git/sharedindex.* && + : >fourteen && + git update-index --add fourteen && + test $(ls .git/sharedindex.* | wc -l) = 1 +' + +test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and "now"' ' + git config splitIndex.sharedIndexExpire never && + just_10_years_ago=$((-365*10*86400)) && + test-chmtime =$just_10_years_ago .git/sharedindex.* && + : >fifteen && + git update-index --add fifteen && + test $(ls .git/sharedindex.* | wc -l) -gt 1 && + git config splitIndex.sharedIndexExpire now && + just_1_second_ago=-1 && + test-chmtime =$just_1_second_ago .git/sharedindex.* && + : >sixteen && + git update-index --add sixteen && + test $(ls .git/sharedindex.* | wc -l) = 1 +' + test_done -- 2.10.1.462.g7e1e03a
[PATCH v1 11/19] t1700: add tests for splitIndex.maxPercentChange
Signed-off-by: Christian Couder--- t/t1700-split-index.sh | 72 ++ 1 file changed, 72 insertions(+) diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh index 507a1dd..f03addf 100755 --- a/t/t1700-split-index.sh +++ b/t/t1700-split-index.sh @@ -238,4 +238,76 @@ EOF test_cmp expect actual ' +test_expect_success 'set core.splitIndex config variable to true' ' + git config core.splitIndex true && + : >three && + git update-index --add three && + BASE=$(test-dump-split-index .git/index | grep "^base") && + test-dump-split-index .git/index | sed "/^own/d" >actual && + cat >expect actual && + cat >expect actual && + cat >expect actual && + cat >expect actual && + cat >expect actual && + cat >expect <
[PATCH v1 18/19] Documentation/config: add splitIndex.sharedIndexExpire
Signed-off-by: Christian Couder--- Documentation/config.txt | 11 +++ 1 file changed, 11 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index 380eeb8..5212a97 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2776,6 +2776,17 @@ splitIndex.maxPercentChange:: than 20 percent of the total number of entries. See linkgit:git-update-index[1]. +splitIndex.sharedIndexExpire:: + When the split index feature is used, shared index files with + a mtime older than this time will be removed when a new shared + index file is created. The value "now" expires all entries + immediately, and "never" suppresses expiration altogether. + The default value is "one.week.ago". + Note that each time a new split-index file is created, the + mtime of the related shared index file is updated to the + current time. + See linkgit:git-update-index[1]. + status.relativePaths:: By default, linkgit:git-status[1] shows paths relative to the current directory. Setting this variable to `false` shows paths -- 2.10.1.462.g7e1e03a
[PATCH v1 01/19] split-index: s/eith/with/ typo fix
Signed-off-by: Christian Couder--- split-index.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/split-index.c b/split-index.c index 35da553..615f4ca 100644 --- a/split-index.c +++ b/split-index.c @@ -187,7 +187,7 @@ void prepare_to_write_split_index(struct index_state *istate) /* Go through istate->cache[] and mark CE_MATCHED to * entry with positive index. We'll go through * base->cache[] later to delete all entries in base -* that are not marked eith either CE_MATCHED or +* that are not marked with either CE_MATCHED or * CE_UPDATE_IN_BASE. If istate->cache[i] is a * duplicate, deduplicate it. */ -- 2.10.1.462.g7e1e03a
[PATCH v1 13/19] sha1_file: make check_and_freshen_file() non static
This function will be used in a commit soon, so let's make it available globally. Signed-off-by: Christian Couder--- cache.h | 3 +++ sha1_file.c | 2 +- 2 files changed, 4 insertions(+), 1 deletion(-) diff --git a/cache.h b/cache.h index faceceb..a625b47 100644 --- a/cache.h +++ b/cache.h @@ -1167,6 +1167,9 @@ extern int has_pack_index(const unsigned char *sha1); extern void assert_sha1_type(const unsigned char *sha1, enum object_type expect); +/* Helper to check and "touch" a file */ +extern int check_and_freshen_file(const char *fn, int freshen); + extern const signed char hexval_table[256]; static inline unsigned int hexval(unsigned char c) { diff --git a/sha1_file.c b/sha1_file.c index 266152d..29f927e 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -576,7 +576,7 @@ static int freshen_file(const char *fn) * either does not exist on disk, or has a stale mtime and may be subject to * pruning). */ -static int check_and_freshen_file(const char *fn, int freshen) +int check_and_freshen_file(const char *fn, int freshen) { if (access(fn, F_OK)) return 0; -- 2.10.1.462.g7e1e03a
[PATCH v1 04/19] read-cache: add and then use tweak_split_index()
This will make us use the split-index feature or not depending on the value of the "core.splitIndex" config variable. Signed-off-by: Christian Couder--- read-cache.c | 17 + 1 file changed, 17 insertions(+) diff --git a/read-cache.c b/read-cache.c index 38d67fa..bb53823 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1562,10 +1562,27 @@ static void tweak_untracked_cache(struct index_state *istate) } } +static void tweak_split_index(struct index_state *istate) +{ + switch (git_config_get_split_index()) { + case -1: /* unset: do nothing */ + break; + case 0: /* false */ + remove_split_index(istate); + break; + case 1: /* true */ + add_split_index(istate); + break; + default: /* unknown value: do nothing */ + break; + } +} + static void post_read_index_from(struct index_state *istate) { check_ce_order(istate); tweak_untracked_cache(istate); + tweak_split_index(istate); } /* remember to discard_cache() before reading a different cache! */ -- 2.10.1.462.g7e1e03a
[PATCH v1 00/19] Add configuration options for split-index
Goal We want to make it possible to use the split-index feature automatically by just setting a new "core.splitIndex" configuration variable to true. This can be valuable as split-index can help significantly speed up `git rebase` especially along with the work to libify `git apply` that has been recently merged to master (see https://github.com/git/git/commit/81358dc238372793b1590efa149cc1581d1fbd98). Design ~~ The design is similar as the previous work that introduced "core.untrackedCache". The new "core.splitIndex" configuration option can be either true, false or undefined which is the default. When it is true, the split index is created, if it does not already exists, when the index is read. When it is false, the split index is removed if it exists, when the index is read. Otherwise it is left as is. Along with this new configuration variable, the two following options are also introduced: - splitIndex.maxPercentChange This is to avoid having too many changes accumulating in the split index while in split index mode. The git-update-index documentation says: If split-index mode is already enabled and `--split-index` is given again, all changes in $GIT_DIR/index are pushed back to the shared index file. but it is probably better to not expect the user to think about it and to have a mechanism that pushes back all changes to the shared index file automatically when some threshold is reached. The default threshold is when the number of entries in the split index file reaches 20% (by default) of the number of entries in the shared index file. The new "splitIndex.maxPercentChange" config option lets people tweak this value. - splitIndex.sharedIndexExpire To make sure that old sharedindex files are eventually removed when a new one has been created, we "touch" the shared index file every time it is used by a new split index file. Then we can delete shared indexes with an mtime older than one week (by default), when we create a new shared index file. The new "splitIndex.sharedIndexExpire" config option lets people tweak this grace period. This idea was suggested by Duy in: https://public-inbox.org/git/cacsjy8bqmfashf5kjguh+bd7xg98cafnyde964vjypxz-em...@mail.gmail.com/ and after some experiments, I agree that it is much simpler than what I thought could be done during our discussion. Highlevel view of the patches in the series ~~~ Except for patch 1/19, there are 3 big steps, one for each new configuration variable introduced. The main difference between this patch series and the RFC patch series sent last July is that the Step 2 and 3 are new and have been implemented as suggested by Duy. Thanks Duy! - Patch 1/19 is a typo fix in a comment that can be applied separately. Step 1 is: - Patches 2/19 to 5/19 introduce the functions that are reading the "core.splitIndex" configuration variable and tweaking the split index depending on its value. - Patch 6/19 adds a few tests for the new feature. - Patches 7/19 and 8/19 add some documentation for the new feature. Step 2 is: - Patches 9/19 and 10/19 introduce the functions that are reading the "splitIndex.maxPercentChange" configuration variable and regenerating a new shared index file depending on its value. - Patch 11/19 adds a few tests for the new feature. - Patch 12/19 add some documentation for the new feature. Step 3 is: - Patches 13/19 to 16/19 introduce the functions that are reading the "splitIndex.sharedIndexExpire" configuration variable and expiring old shared index files depending on its value. - Patch 17/19 adds a few tests for the new feature. - Patches 18/19 and 19/19 add some documentation for the new feature. Links ~ This patch series is also available here: https://github.com/chriscool/git/commits/config-split-index The previous RFC version was: https://github.com/chriscool/git/commits/config-split-index7 On the mailing list the related patch series and discussions were: https://public-inbox.org/git/20160711172254.13439-1-chrisc...@tuxfamily.org/ Christian Couder (19): split-index: s/eith/with/ typo fix config: add git_config_get_split_index() split-index: add {add,remove}_split_index() functions read-cache: add and then use tweak_split_index() update-index: warn in case of split-index incoherency t1700: add tests for core.splitIndex Documentation/config: add information for core.splitIndex Documentation/git-update-index: talk about core.splitIndex config var config: add git_config_get_max_percent_split_change() read-cache: regenerate shared index if necessary t1700: add tests for splitIndex.maxPercentChange Documentation/config: add splitIndex.maxPercentChange sha1_file: make check_and_freshen_file() non static
[PATCH v1 02/19] config: add git_config_get_split_index()
This new function will be used in a following commit to know if we want to use the split index feature or not. Signed-off-by: Christian Couder--- cache.h | 1 + config.c | 10 ++ 2 files changed, 11 insertions(+) diff --git a/cache.h b/cache.h index 05ecb88..394da60 100644 --- a/cache.h +++ b/cache.h @@ -1809,6 +1809,7 @@ extern int git_config_get_bool_or_int(const char *key, int *is_bool, int *dest); extern int git_config_get_maybe_bool(const char *key, int *dest); extern int git_config_get_pathname(const char *key, const char **dest); extern int git_config_get_untracked_cache(void); +extern int git_config_get_split_index(void); /* * This is a hack for test programs like test-dump-untracked-cache to diff --git a/config.c b/config.c index 83fdecb..036e29b 100644 --- a/config.c +++ b/config.c @@ -1709,6 +1709,16 @@ int git_config_get_untracked_cache(void) return -1; /* default value */ } +int git_config_get_split_index(void) +{ + int val = -1; + + if (!git_config_get_maybe_bool("core.splitindex", )) + return val; + + return -1; /* default value */ +} + NORETURN void git_die_config_linenr(const char *key, const char *filename, int linenr) { -- 2.10.1.462.g7e1e03a
[PATCH v1 19/19] Documentation/git-update-index: explain splitIndex.*
Signed-off-by: Christian Couder--- Documentation/git-update-index.txt | 33 + 1 file changed, 25 insertions(+), 8 deletions(-) diff --git a/Documentation/git-update-index.txt b/Documentation/git-update-index.txt index e091b2a..635d157 100644 --- a/Documentation/git-update-index.txt +++ b/Documentation/git-update-index.txt @@ -163,14 +163,10 @@ may not support it yet. --split-index:: --no-split-index:: - Enable or disable split index mode. If enabled, the index is - split into two files, $GIT_DIR/index and $GIT_DIR/sharedindex.. - Changes are accumulated in $GIT_DIR/index while the shared - index file contains all index entries stays unchanged. If - split-index mode is already enabled and `--split-index` is - given again, all changes in $GIT_DIR/index are pushed back to - the shared index file. This mode is designed for very large - indexes that take a significant amount of time to read or write. + Enable or disable split index mode. If split-index mode is + already enabled and `--split-index` is given again, all + changes in $GIT_DIR/index are pushed back to the shared index + file. + These options take effect whatever the value of the `core.splitIndex` configuration variable (see linkgit:git-config[1]). But a warning is @@ -394,6 +390,27 @@ Although this bit looks similar to assume-unchanged bit, its goal is different from assume-unchanged bit's. Skip-worktree also takes precedence over assume-unchanged bit when both are set. +Split index +--- + +This mode is designed for very large indexes that take a significant +amount of time to read or write. + +In this mode, the index is split into two files, $GIT_DIR/index and +$GIT_DIR/sharedindex.. Changes are accumulated in +$GIT_DIR/index, the split index, while the shared index file contains +all index entries and stays unchanged. + +All changes in the split index are pushed back to the shared index +file when the number of entries in the split index reaches a level +specified by the splitIndex.maxPercentChange config variable (see +linkgit:git-config[1]). + +Each time a new shared index file is created, the old shared index +files are deleted if they are older than what is specified by the +splitIndex.sharedIndexExpire config variable (see +linkgit:git-config[1]). + Untracked cache --- -- 2.10.1.462.g7e1e03a
[PATCH v1 09/19] config: add git_config_get_max_percent_split_change()
This new function will be used in a following commit to get the value of the "splitIndex.maxPercentChange" config variable. Signed-off-by: Christian Couder--- cache.h | 1 + config.c | 16 2 files changed, 17 insertions(+) diff --git a/cache.h b/cache.h index 394da60..faceceb 100644 --- a/cache.h +++ b/cache.h @@ -1810,6 +1810,7 @@ extern int git_config_get_maybe_bool(const char *key, int *dest); extern int git_config_get_pathname(const char *key, const char **dest); extern int git_config_get_untracked_cache(void); extern int git_config_get_split_index(void); +extern int git_config_get_max_percent_split_change(void); /* * This is a hack for test programs like test-dump-untracked-cache to diff --git a/config.c b/config.c index 036e29b..5580f56 100644 --- a/config.c +++ b/config.c @@ -1719,6 +1719,22 @@ int git_config_get_split_index(void) return -1; /* default value */ } +int git_config_get_max_percent_split_change(void) +{ + int val = -1; + + if (!git_config_get_int("splitindex.maxpercentchange", )) { + if (0 <= val && val <= 100) + return val; + + error("splitindex.maxpercentchange value '%d' " + "should be between 0 and 100", val); + return -1; + } + + return -1; /* default value */ +} + NORETURN void git_die_config_linenr(const char *key, const char *filename, int linenr) { -- 2.10.1.462.g7e1e03a
Re: [PATCH] hex: use unsigned index for ring buffer
On Sun, Oct 23, 2016 at 11:00:48AM +0200, René Scharfe wrote: > Overflow is defined for unsigned integers, but not for signed ones. > Make the ring buffer index in sha1_to_hex() unsigned to be on the > safe side. > > Signed-off-by: Rene Scharfe> --- > Hard to trigger, but probably even harder to diagnose once someone > somehow manages to do it on some uncommon architecture. Indeed. If we are worried about overflow, we would also want to assume that it wraps at a multiple of 4, but that is probably a sane assumption. > diff --git a/hex.c b/hex.c > index ab2610e..8c6c189 100644 > --- a/hex.c > +++ b/hex.c > @@ -76,7 +76,7 @@ char *oid_to_hex_r(char *buffer, const struct object_id > *oid) > > char *sha1_to_hex(const unsigned char *sha1) > { > - static int bufno; > + static unsigned int bufno; > static char hexbuffer[4][GIT_SHA1_HEXSZ + 1]; > return sha1_to_hex_r(hexbuffer[3 & ++bufno], sha1); > } I wonder if just truncating bufno would be conceptually simpler (albeit longer): bufno++; bufno &= 3; return sha1_to_hex_r(hexbuffer[bufno], sha1); You could also write the second line like: bufno %= ARRAY_SIZE(hexbuffer); which is less magical (right now the set of buffers must be a power of 2). I expect the compiler could turn that into a bitmask itself. I'm fine with any of the options. I guess you'd want a similar patch for find_unique_abbrev on top of jk/no-looking-at-dotgit-outside-repo. -Peff
Re: [PATCH] Allow stashes to be referenced by index only
On Thu, Sep 08, 2016 at 07:46:37PM -0400, Aaron M Watson wrote: > Instead of referencing "stash@{n}" explicitly, it can simply be > referenced as "n". Most users only reference stashes by their position > in the stash stask (what I refer to as the "index"). The syntax for the > typical stash (stash@{n}) is slightly annoying and easy to forget, and > sometimes difficult to escape properly in a script. Because of this the > capability to do things with the stash by simply referencing the index > is desirable. > > This patch includes the superior implementation provided by Øsse Walle > (thanks for that), with a slight change to fix a broken test in the test > suite. I also merged the test scripts as suggested by Jeff King, and > un-wrapped the documentation as suggested by Junio Hamano. Just cleaning out my list of leftover bits, and it looks like this patch fell through the cracks. The last thing before this was generally people agreeing with the approach that Øsse showed: http://public-inbox.org/git/CAFaJEqu-JUcwLjrQBk_huSa3DZfCf8O4eAZ=UgcXHzN=clg...@mail.gmail.com/ http://public-inbox.org/git/xmqqbmzzoazy@gitster.mtv.corp.google.com/ and we just needed to roll that up into a patch. Which this _mostly_ is, but there are a few funny things still: > diff --git a/git-stash.sh b/git-stash.sh > index 826af18..d8d3b8d 100755 > --- a/git-stash.sh > +++ b/git-stash.sh > @@ -384,9 +384,10 @@ parse_flags_and_rev() > i_tree= > u_tree= > > - REV=$(git rev-parse --no-flags --symbolic --sq "$@") || exit 1 > + REV=$(git rev-parse --no-flags --symbolic --sq "$@" 2> /dev/null) Style: we don't put a space between ">" and the filename. So here we assign REV, and we no longer exit (to handle something like "1" by itself, which is good). > FLAGS= > + ARGV= > for opt > do > case "$opt" in > @@ -404,10 +405,13 @@ parse_flags_and_rev() > die "$(eval_gettext "unknown option: > \$opt")" > FLAGS="${FLAGS}${FLAGS:+ }$opt" > ;; > + *) > + ARGV="${ARGV}${ARGV:+ }'$opt'" > + ;; > esac > done > > - eval set -- $REV > + eval set -- $ARGV But what's going on here? Why did we bother running rev-parse earlier if we don't actually use the value of REV? You mentioned tweaking it to fix a broken test, and indeed, just using $REV here breaks a few tests in t3903. Offhand, I do not see anything wrong with pulling the non-option values out in the loop. But in that case I think the assignment of REV can just go away completely. > diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh > index 2142c1f..f82a8c4 100755 > --- a/t/t3903-stash.sh > +++ b/t/t3903-stash.sh > @@ -131,6 +131,26 @@ test_expect_success 'drop middle stash' ' > test 1 = $(git show HEAD:file) > ' > > +test_expect_success 'drop middle stash by index' ' > + git reset --hard && > + echo 8 > file && > + git stash && > + echo 9 > file && Ditto on the ">" style here and elsewhere. The tests otherwise look good to me. -Peff
[PATCH] hex: use unsigned index for ring buffer
Overflow is defined for unsigned integers, but not for signed ones. Make the ring buffer index in sha1_to_hex() unsigned to be on the safe side. Signed-off-by: Rene Scharfe--- Hard to trigger, but probably even harder to diagnose once someone somehow manages to do it on some uncommon architecture. hex.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/hex.c b/hex.c index ab2610e..8c6c189 100644 --- a/hex.c +++ b/hex.c @@ -76,7 +76,7 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid) char *sha1_to_hex(const unsigned char *sha1) { - static int bufno; + static unsigned int bufno; static char hexbuffer[4][GIT_SHA1_HEXSZ + 1]; return sha1_to_hex_r(hexbuffer[3 & ++bufno], sha1); } -- 2.10.1
revisiting zstd timings
You may recall some "what if" timings I did recently, where I replaced zlib with zstd: http://public-inbox.org/git/20160914235843.nacr54ekvl6rj...@sigill.intra.peff.net/ The aim was that it would give us about the same compression level, but much faster inflating and deflating. My numbers there showed that yes, inflate is faster (which speeds up normal operations like traversals and diffs), but that compression seemed to be much slower. Since then, zstd has released a new version of their zlib wrapper which performs much better for our case. Basically, the prior version's transparent zstd setup was not optimized well for our case of doing tons of small deflates. I'm cc-ing Przemyslaw Skibinski, who wrote the wrapper code, and who pointed me in the right direction (thanks!). I'll include the new patch at the end, but it's basically the same as the old with one minor symbol change: diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 7da2fd44b0..07f45aa171 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -2287,7 +2287,7 @@ static int git_pack_config(const char *k, const char *v, void *cb) return config_error_nonbool(k); #ifdef USE_ZSTD if (skip_prefix(v, "zstd", )) { - useZSTD(1); + ZWRAP_useZSTDcompression(1); /* XXX doesn't seem to be a constant? */ pack_compression_max = 22; } Here are the numbers I get with the newer version (see the prior email for descriptions of each test, but I did run them all from scratch again). zlib - pack 1127899484 - repack real4m55.099s user17m46.796s sys 0m6.164s - rev-list real0m27.402s user0m27.080s sys 0m0.320s - log -Sfoo real5m16.326s user5m13.860s sys 0m2.448s That gives us a baseline for the time of each operation, and the size of the compressed pack. The rest of the values will be expressed as percentages from this baseline. zstd=disabled - pack 1127884778 (+0%) - repack real4m49.443s user17m39.468s sys 0m5.996s (-1.9%) - rev-list real0m27.527s user0m27.316s sys 0m0.208s (+0.4%) - log -Sfoo real5m21.029s user5m17.704s sys 0m3.308s (+1.4%) Timing the wrapper making use of zlib, it produces the same output and takes a slightly larger amount of time (the repack is quicker, but there's quite a bit of run-to-run noise in that measurement). Nothing surprising. zstd=1 - pack 1191357029 (+5.6%) - repack real4m8.323s user16m24.664s sys 0m6.816s (-15.8%) - rev-list real0m25.780s user0m25.488s sys 0m0.288s (-5.9%) - log -Sfoo real4m19.301s user4m16.488s sys 0m2.796s (-18%) Here's where we get interesting; zstd turned down to its lowest setting. As before, the inflate step shows a nice speedup for some common read-only operations. The repack is much faster, and the resulting pack is slightly larger. We'd probably want to tune the cpu/size tradeoff for deflate a bit higher. zstd=3 - pack 1163632679 (+3.1%) - repack real4m11.800s user16m34.260s sys 0m6.788s (-14.6%) - rev-list real0m25.678s user0m25.376s sys 0m0.296s (-6.2%) - log -Sfoo real4m19.902s user4m16.604s sys 0m3.280s (-17.8%) And here that is. Without spending much more CPU on deflate, we shaved off a few percent. The reading side remains fast (I wondered if it might get faster as we shrink the pack just because there are fewer bytes to deal with on the input side, but it doesn't seem to really matter. And anyway, we're only talking a few percent of the input size). zstd=5 - pack 1137697830 (+0.8%) - repack real4m20.930s user16m45.844s sys 0m7.100s (-11.5%) Turning it up higher, we're starting to reach parity with zlib for the pack size. And the deflate process is still faster. I stopped measuring the read side, since it seemed to be remaining constant. zstd=7 - pack 1130545314 (+0.2%) - repack real4m28.960s user16m50.028s sys 0m6.884s (-8.8%) zstd=9 - pack 1128847500 (+0%) - repack real4m52.234s user17m22.644s sys 0m7.080s (-0.9%) zstd=11 - pack 1128415787 (+0%) - repack real5m33.187s user18m12.792s sys 0m7.436s (+12.9%) And these numbers show we can pretty much dial in the cpu/size tradeoff wherever we