Re: [PATCH] hex: use unsigned index for ring buffer

2016-10-23 Thread René Scharfe

Am 23.10.2016 um 11:11 schrieb Jeff King:

On Sun, Oct 23, 2016 at 11:00:48AM +0200, René Scharfe wrote:


Overflow is defined for unsigned integers, but not for signed ones.
Make the ring buffer index in sha1_to_hex() unsigned to be on the
safe side.

Signed-off-by: Rene Scharfe 
---
Hard to trigger, but probably even harder to diagnose once someone
somehow manages to do it on some uncommon architecture.


Indeed. If we are worried about overflow, we would also want to assume
that it wraps at a multiple of 4, but that is probably a sane
assumption.


Hmm, I can't think of a way to violate this assumption except with 
unsigned integers that are only a single bit wide.  That would be a 
weird machine.  Are there other possibilities?



diff --git a/hex.c b/hex.c
index ab2610e..8c6c189 100644
--- a/hex.c
+++ b/hex.c
@@ -76,7 +76,7 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid)

 char *sha1_to_hex(const unsigned char *sha1)
 {
-   static int bufno;
+   static unsigned int bufno;
static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
return sha1_to_hex_r(hexbuffer[3 & ++bufno], sha1);
 }


I wonder if just truncating bufno would be conceptually simpler (albeit
longer):

  bufno++;
  bufno &= 3;
  return sha1_to_hex_r(hexbuffer[bufno], sha1);

You could also write the second line like:

  bufno %= ARRAY_SIZE(hexbuffer);

which is less magical (right now the set of buffers must be a power of
2). I expect the compiler could turn that into a bitmask itself.


Expelling magic is a good idea.  And indeed, at least gcc, clang and icc 
on x86-64 are smart enough to use an AND instead of dividing 
(https://godbolt.org/g/rFPpzF).


But gcc also adds a sign extension (cltq/cdqe) if we store the truncated 
value, unlike the other two compilers.  I don't see why -- the bit mask 
operation enforces a value between 0 and 3 (inclusive) and the upper 
bits of eax are zeroed automatically, so the cltq is effectively a noop.


Using size_t gets us rid of the extra instruction and is the right type 
anyway.  It would suffice on its own, hmm..



I'm fine with any of the options. I guess you'd want a similar patch for
find_unique_abbrev on top of jk/no-looking-at-dotgit-outside-repo.


Actually I'd want you to want to amend your series yourself. ;)  Maybe I 
can convince Coccinelle to handle that issue for us.


And there's also path.c::get_pathname().  That's enough cases to justify 
adding a macro, I'd say:


---
 cache.h | 3 +++
 hex.c   | 4 ++--
 path.c  | 4 ++--
 3 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/cache.h b/cache.h
index 05ecb88..8bb4918 100644
--- a/cache.h
+++ b/cache.h
@@ -555,6 +555,9 @@ extern int daemonize(void);
} \
} while (0)

+#define NEXT_RING_ITEM(array, index) \
+   (array)[(index) = ((index) + 1) % ARRAY_SIZE(array)]
+
 /* Initialize and use the cache information */
 struct lock_file;
 extern int read_index(struct index_state *);
diff --git a/hex.c b/hex.c
index ab2610e..5e711b9 100644
--- a/hex.c
+++ b/hex.c
@@ -76,9 +76,9 @@ char *oid_to_hex_r(char *buffer, const struct 
object_id *oid)


 char *sha1_to_hex(const unsigned char *sha1)
 {
-   static int bufno;
+   static size_t bufno;
static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
-   return sha1_to_hex_r(hexbuffer[3 & ++bufno], sha1);
+   return sha1_to_hex_r(NEXT_RING_ITEM(hexbuffer, bufno), sha1);
 }

 char *oid_to_hex(const struct object_id *oid)
diff --git a/path.c b/path.c
index a8e7295..60dba6a 100644
--- a/path.c
+++ b/path.c
@@ -24,8 +24,8 @@ static struct strbuf *get_pathname(void)
static struct strbuf pathname_array[4] = {
STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT
};
-   static int index;
-   struct strbuf *sb = _array[3 & ++index];
+   static size_t index;
+   struct strbuf *sb = _RING_ITEM(pathname_array, index);
strbuf_reset(sb);
return sb;
 }
--
2.10.1




Re: RFC Failover url for fetches?

2016-10-23 Thread Jakub Narębski
W dniu 21.10.2016 o 21:03, Junio C Hamano pisze:
> Stefan Beller  writes:
> 
>> So when pushing it is possible to have multiple urls
>> (remote..url) configured.
>>
>> When fetching only the first configured url is considered.
>> Would it make sense to allow multiple urls and
>> try them one by one until one works?
> 
> I do not think the two are related.  Pushing to multiple is not "I
> want to update at least one of them" in the first place.

Push is/should be 'update all', fetch is/should be 'fetch any'.
I thought that multiple remote..url values provide this
fallback for fetch, but it looks like it isn't so...

> 
> As to fetching from two or more places as "fallback", I am
> moderately negative to add it as a dumb feature that does nothing
> more than "My fetch from A failed, so let's blindly try it from B".
> I'd prefer to keep the "My fetch from A is failing" knowledge near
> the surface of end user's consciousness as a mechanism to pressure A
> to fix it--that way everybody who is fetching from A benefits.
> After all, doing "git remote add B" once (you'd need to tell the URL
> for B anyway to Git) and issuing "git fetch B" after seeing your
> regular "git fetch" fails once in a blue moon is not all that
> cumbersome, I would think.

One would need to configure fallback B remote to use the same
remote-branch namespace as remote A, if it is to be used as fallback,
I would think.

> 
> Some people _may_ have objection based on A and B going out of sync,
> especially B may fall behind even yourself and cause non-ff errors,
> but I personally am not worried about that, because when somebody
> configures B as a fallback for A, there is an expectation that B is
> kept reasonably up to date.  It would be a problem if some refs are
> expected to be constantly rewound at A (e.g. 'pu' in my tree) and
> configured to always force-fetch, though.  A stale B would silently
> set such a branch in your repository back without failing.

Nb. there is also http-alternates mechanism... which nowadays doesn't
matter anyway, I would think.

-- 
Jakub Narębski



Re: [PATCH v1 10/19] read-cache: regenerate shared index if necessary

2016-10-23 Thread Ramsay Jones


On 23/10/16 10:26, Christian Couder wrote:
> When writing a new split-index and there is a big number of cache
> entries in the split-index compared to the shared index, it is a
> good idea to regenerate the shared index.
> 
> By default when the ratio reaches 20%, we will push back all
> the entries from the split-index into a new shared index file
> instead of just creating a new split-index file.
> 
> The threshold can be configured using the
> "splitIndex.maxPercentChange" config variable.
> 
> We need to adjust the existing tests in t1700 by setting
> "splitIndex.maxPercentChange" to 100 at the beginning of t1700,
> as the existing tests are assuming that the shared index is
> regenerated only when `git update-index --split-index` is used.
> 
> Signed-off-by: Christian Couder 
> ---
>  read-cache.c   | 33 -
>  t/t1700-split-index.sh |  1 +
>  2 files changed, 33 insertions(+), 1 deletion(-)
> 
> diff --git a/read-cache.c b/read-cache.c
> index bb53823..a91fabe 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -2216,6 +2216,36 @@ static int write_shared_index(struct index_state 
> *istate,
>   return ret;
>  }
>  
> +static const int default_max_percent_split_change = 20;
> +
> +int too_many_not_shared_entries(struct index_state *istate)

This function is a file-loacal symbol; could you please make it
a static function.

Thanks.

ATB,
Ramsay Jones


Re: [PATCH 28/36] attr: keep attr stack for each check

2016-10-23 Thread Ramsay Jones


On 23/10/16 00:32, Stefan Beller wrote:
> Instead of having a global attr stack, attach the stack to each check.
> This allows to use the attr in a multithreaded way.
> 
> 
> 
> Signed-off-by: Stefan Beller 
> ---
>  attr.c| 101 
> +++---
>  attr.h|   4 ++-
>  hashmap.h |   2 ++
>  3 files changed, 69 insertions(+), 38 deletions(-)
> 
> diff --git a/attr.c b/attr.c
> index 89ae155..b65437d 100644
> --- a/attr.c
> +++ b/attr.c
> @@ -372,15 +372,17 @@ static struct match_attr *parse_attr_line(const char 
> *line, const char *src,
>   * .gitignore file and info/excludes file as a fallback.
>   */
>  
> -/* NEEDSWORK: This will become per git_attr_check */
> -static struct attr_stack {
> +struct attr_stack {
>   struct attr_stack *prev;
>   char *origin;
>   size_t originlen;
>   unsigned num_matches;
>   unsigned alloc;
>   struct match_attr **attrs;
> -} *attr_stack;
> +};
> +
> +struct hashmap all_attr_stacks;
> +int all_attr_stacks_init;

Mark symbols 'all_attr_stacks' and 'all_attr_stacks_init' with
the static keyword. (ie. these are file-local symbols).

ATB,
Ramsay Jones



Re: [PATCH 17/36] attr: expose validity check for attribute names

2016-10-23 Thread Ramsay Jones


On 23/10/16 00:32, Stefan Beller wrote:
> From: Junio C Hamano 
> 
> Export attr_name_valid() function, and a helper function that
> returns the message to be given when a given  pair
> is not a good name for an attribute.
> 
> We could later update the message to exactly spell out what the
> rules for a good attribute name are, etc.
> 
> Signed-off-by: Junio C Hamano 
> Signed-off-by: Stefan Beller 
> ---

[snip]

> +extern int attr_name_valid(const char *name, size_t namelen);
> +extern void invalid_attr_name_message(struct strbuf *, const char *, int);
> +

The symbol 'attr_name_valid()' is not used outside of attr.c, even
by the end of this series. Do you expect this function to be used
in any future series? (The export is deliberate and it certainly
seems like it should be part of the public interface, but ...)

In contrast, the 'invalid_attr_name_message()' function is called
from code in pathspec.c, which relies on 'git_attr_counted()' to
call 'attr_name_valid()' internally to check for validity. :-D

ATB,
Ramsay Jones





Re: git clone --bare --origin incompatible?

2016-10-23 Thread Roman Neuhauser
# sch...@linux-m68k.org / 2016-10-23 14:29:55 +0200:
> On Okt 23 2016, Roman Neuhauser  wrote:
> 
> > what is the reason clone --bare prohibits --origin?
> >
> >   % git clone --bare -o fubar anything anywhere
> >   fatal: --bare and --origin fubar options are incompatible.
> 
> Since a bare clone maps remote branches directly to local branches,
> without any remote-tracking branches, --origin doesn't make sense.

is it going to break something though?  i can still go and rename
the remote in the bare repo's config file afterwards.

-- 
roman


Re: git clone --bare --origin incompatible?

2016-10-23 Thread Andreas Schwab
On Okt 23 2016, Roman Neuhauser  wrote:

> what is the reason clone --bare prohibits --origin?
>
>   % git clone --bare -o fubar anything anywhere
>   fatal: --bare and --origin fubar options are incompatible.

Since a bare clone maps remote branches directly to local branches,
without any remote-tracking branches, --origin doesn't make sense.

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


git clone --bare --origin incompatible?

2016-10-23 Thread Roman Neuhauser
hello,

what is the reason clone --bare prohibits --origin?

  % git clone --bare -o fubar anything anywhere
  fatal: --bare and --origin fubar options are incompatible.


-- 
roman


Re: tools for easily "uncommitting" parts of a patch I just commited?

2016-10-23 Thread Duy Nguyen
On Sun, Oct 23, 2016 at 8:38 AM, Jeff King  wrote:
> On Sun, Oct 23, 2016 at 08:23:01AM +0700, Duy Nguyen wrote:
>
>> I hit the same problem sometimes, but in my case sometimes I
>> accidentally do "git add" after "git add -p" and a configuration in
>> "git commit -a" won't help me. I'd prefer we could undo changes in
>> index instead. Something like reflog but for index.
>
> An index write always writes the whole file from scratch, so you really
> just need to save a copy of the old file. Perhaps something like:
>
>   rm -f $GIT_DIR/index.old
>   ln $GIT_DIR/index.old $GIT_DIR/index
>   ... and then open $GIT_DIR/index.tmp ...
>   ... and then rename(index.tmp, index) ...
>
> could do it cheaply. It's a little more complicated if you want to save
> a sequence of versions, and eventually would take a lot of space, but
> presumably a handful of saved indexes would be sufficient.

Yeah. I had something [1] like that but never sorted out the UI for it :(

> Another option would be an index format that journals, and you could
> potentially walk back the journal to a point. That seems like a much
> bigger change (and has weird layering, because deciding when to fold in
> the journal is usually a performance thing, but obviously this would
> have user-visible impact about how far back you could undo).

v2 [2] goes in this direction (but not a full blown COW, the journal
does not take part in any core operations of the index)

[1] 
https://public-inbox.org/git/%3c1375597720-13236-1-git-send-email-pclo...@gmail.com%3E/
[2] 
https://public-inbox.org/git/1375966270-10968-1-git-send-email-pclo...@gmail.com/
-- 
Duy


Re: [PATCH 2/3] submodule tests: replace cloning from . by "$(pwd)"

2016-10-23 Thread Johannes Sixt

Am 22.10.2016 um 22:46 schrieb Stefan Beller:

I have looked into it again, and by now I think the bug is a feature,
actually.

Consider this:

git clone . super
git -C super submodule add ../submodule
# we thought the previous line is buggy
git clone super super-clone


At this point, we *should* have this if there were no bugs (at least 
that is my assumption):


  /tmp
  !
  + submodule <- submodule's remote repo
  !
  + foo   <- we are here (.), super's remote repo
!
+ super   <- remote.origin.url=/tmp/foo/.
  !
  + submodule <- remote.origin.url=/tmp/foo/./../submodule
 submodule.submodule.url=../submodule

When I test this, 'git submodule add' fails:

foo@master> git -C super submodule add ../submodule
fatal: repository '/tmp/foo/submodule' does not exist
fatal: clone of '/tmp/foo/submodule' into submodule path 
'/tmp/foo/super/submodule' failed



Now in the super-clone the ../submodule is the correct
relative url, because the url where we cloned from doesn't
end in /.


I do not understand why this would be relevant. The question is not how 
the submodule's remote URL ends, but how the submodule's remote URL is 
constructed from the super-project's URL and the relative path specified 
for 'git submodule add'.


Whether ../submodule or ./submodule is the correct relative URL depends 
on where the origin of the submodule is located relative to the origin 
of the super-project. In the above example, it is ../submodule. However, 
the error message tells us that git looked in /tmp/foo/submodule, which 
looks like the /. bug!


I do not understand where you see a feature here. What am I missing?

-- Hannes



Re: [PATCH v5 00/27] Prepare the sequencer for the upcoming rebase -i patches

2016-10-23 Thread Johannes Schindelin
Hi Junio,

On Sun, 23 Oct 2016, Johannes Schindelin wrote:

> On Sat, 22 Oct 2016, Junio C Hamano wrote:
> 
> > Johannes Schindelin  writes:
> > 
> > > This patch series marks the '4' in the countdown to speed up rebase -i
> > > by implementing large parts in C (read: there will be three more patch
> > > series after that before the full benefit hits git.git: sequencer-i,
> > > rebase--helper and rebase-i-extra).
> > > ...
> > > It would be *really* nice if we could get this patch series at least
> > > into `next` soon, as it gets late and later for the rest of the
> > > patches to make it into `master` in time for v2.11 (and it is not for
> > > lack of trying on my end...).
> > 
> > This "countdown 4" step can affect cherry-pick and revert, even

Oh, I forgot to comment on this tidbit of your mail, sorry.

This *is* the countdown 4, as the remaining 3 patch series depend on each
other in the order I sent them out.

Ciao,
Dscho


Re: [PATCH v5 00/27] Prepare the sequencer for the upcoming rebase -i patches

2016-10-23 Thread Johannes Schindelin
Hi Junio,

On Sat, 22 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > This patch series marks the '4' in the countdown to speed up rebase -i
> > by implementing large parts in C (read: there will be three more patch
> > series after that before the full benefit hits git.git: sequencer-i,
> > rebase--helper and rebase-i-extra).
> > ...
> > It would be *really* nice if we could get this patch series at least
> > into `next` soon, as it gets late and later for the rest of the
> > patches to make it into `master` in time for v2.11 (and it is not for
> > lack of trying on my end...).
> 
> This "countdown 4" step can affect cherry-pick and revert, even
> though we were careful to review changes to the sequencer.c code.

As I pointed out in another mail in this thread: we should not fall into
the trap of overrating review.

In the case of the rebase--helper patches, so far the review mainly
resulted in more work for me (having to change spellings elsewhere, for
example), not in improving the changes I intended to introduce into
git.git's code.

Sure, there has been the occasional improvement, but it certainly feels as
if I spent about 80% of the work after each -v1 iteration on things that
have positively nothing at all to do with accelerating rebase -i.

> I prefer to cook it in 'next' sufficiently long to ensure that we hear
> feedbacks from non-Windows users if there is any unexpected breakage.

FWIW I am using the same patches not only on Windows but also in my Linux
VM.

> There isn't enough time to include this topic in the upcoming
> release within the current https://tinyurl.com/gitCal calendar,
> however, which places the final on Nov 11th.

More is the pity.

Thank you, though, for being upfront with me. I will shift my focus to
tasks that require my attention more urgently, then.

Ciao,
Dscho


Re: [PATCH v5 00/27] Prepare the sequencer for the upcoming rebase -i patches

2016-10-23 Thread Johannes Schindelin
Hi Junio,

On Fri, 21 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> I still do not understand (note that I am not saying "I do not
> accept"--acceptance or rejection happens after an understandable
> explanation is given, and "do not understand" means no such
> explanation has been given yet) your justification behind adding a
> technical debt to reimplement the author-script parser and not
> sharing it with "git am" in 13/27.

At this point, I am most of all reluctant to introduce such a huge change,
which surely would introduce a regression.

This is what happened a couple of times to me, most recently with the
hide-dot-gitdir patch series that worked flawlessly for years, had to be
dramatically changed during review to enter git.git, and introduced the
major regression that `core.hideDotFiles = gitDirOnly` was broken.

The lesson I learned: review should not be valued more than the test of
time. This lesson has been reinforced by all the regressions that have not
been caught by review nor the test suite running on Linux only.

It would be a different matter if I still had the cross-validator in place
(which I did when I sent out v1 of this patch series) and tons of time to
spend on accommodating your wishes, however I may disagree with them. And
in this instance, I thought I made clear that I disagree, and why:
Internally, git-am and git-rebase-i handle the author-script very
differently. That may change at some stage in the future, and it would be
a good time then and there to take care of unifying this code. Currently,
not so much, as the only excuse to use the same parser would be that they
both read the same file, while they have to do very different things with
the parsed output (in fact, your suggestion would ask the parser in the
sequencer to rip apart the information into key/value pairs, only to
re-glue them back together when they are used as the environment variables
as which rebase-i treats the contents of the author-script file).

So no, at this point I am not willing to risk introducing breakages in
code that has been proven to work in practice.

Ciao,
Dscho


Re: [PATCH v5 22/27] sequencer: teach write_message() to append an optional LF

2016-10-23 Thread Johannes Schindelin
Hi Junio,

On Fri, 21 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> > This commit prepares for future callers that will have a pointer/length
> > to some text to be written that lacks an LF, yet an LF is desired.
> > Instead of requiring the caller to append an LF to the buffer (and
> > potentially allocate memory to do so), the write_message() function
> > learns to append an LF at the end of the file.
> 
> As no existing callers need this, it probably is better left out and
> added to the series that actually needs the new feature as a
> preparatory step.

Apart from this patch series being semantically the right place
("prepare-sequencer"), there is also the following consideration:
The next patch series is already quite long. Taking this current patch
series into account, which started out as a 22-patch series and needed to
bloat by 25% through four subsequent iterations, it is probably not a wise
idea to move this patch to a patch series that already weighs 34 patches.

So I respectfully, and forcefully, disagree,
Dscho


Re: Stash pop/apply conflict and --theirs and --ours

2016-10-23 Thread Jeff King
On Sun, Oct 23, 2016 at 12:58:12AM +0200, Sven Strickroth wrote:

> I regularly experience that beginners have problems unterstanding that
> --ours and --theirs are swapped when a conflict occurrs on git stash
> apply or stash pop.
> 
> From the HCI perspective this is really counter intuitive.

I know that people have complained about "rebase" swapping the two, but
I don't think anybody has ever mentioned it for stash. I'm not sure I
that they are swapped, though.

The "ours" content is generally what was in the HEAD before the
operation started, and "theirs" is what the operation is bringing into
that history. That is true of "merge" and "cherry-pick". And AFAICT, it
is true of "stash", too (I basically think of "stash apply" as a
cherry-pick).

So with a setup like:

  git init
  echo base >file
  git add file
  git commit -m file

  echo stash >file
  git stash

  echo master >file
  git commit -am master

  git checkout -b branch HEAD^
  echo branch >file
  git commit -am branch

if we merge, then --theirs is the branch we are merging:

  git checkout master
  git merge branch
  git checkout --theirs file
  cat file
  # "branch"

Likewise, if we cherry-pick:

  git reset --hard
  git cherry-pick branch
  git checkout --theirs file
  cat file
  # "branch"

And likewise if we apply the stash:

  git reset --hard
  git stash apply
  git checkout --theirs file
  cat file
  # "stash"

So that seems consistent to me.

I guess if you are stashing in order to pull somebody else's work, like:

  git stash
  git pull
  git stash pop

then conceptually the stash is "ours" and HEAD is "theirs". This is
exactly like the rebase case. E.g., if you instead did:

  git commit -m 'tmp stash'
  git pull --rebase

So I sympathize, but I don't think that having "stash" flip the order
would be the right thing to do in all cases. In theory there could be
some kind of option (and things like pull autostash could use it), but I
suspect it may be hard to implement in practice. The unpack-trees code
does not treat "ours" and "theirs" entirely symmetrically (the "ours"
side represents the current working tree, so we might do things like
check whether the index is fresh). I guess you could flip the "1" and
"2" bits in the index after the conflicted merge completes.

I'm still not convinced it's a good idea, though.

-Peff


Re: [PATCH v4 20/25] sequencer: refactor write_message()

2016-10-23 Thread Johannes Schindelin
Hi Junio,

On Fri, 21 Oct 2016, Junio C Hamano wrote:

> Johannes Schindelin  writes:
> 
> >> Ah, make that four steps.  The final one is:
> >> 
> >> - add append_eol parameter that nobody uses at this step in the
> >>   series.
> >>
> >> This is a new feature to the helper.  While it is OK to have it as a
> >> preparatory step in this series, it is easier to understand if it
> >> kept as a separate step.  It is even more preferrable if it is made
> >> as a preparatory step in a series that adds a caller that passes
> >> true to append_eol to this helper...
> >
> > Done,
> > Dscho
> 
> Hmm, what has been done exactly?  I still see append_eol in v5 where
> nobody uses it yet.  Confused...

Your bullet point suggests that this change should be a separate patch. I
did that.

And since this patch series' purpose is to prepare the sequencer for the
upcoming series that teaches it to understand git-rebase-todo scripts,
this patch falls fair and square into the preparatory phase.

Ciao,
Dscho


[PATCH v1 14/19] read-cache: touch shared index files when used

2016-10-23 Thread Christian Couder
When a split-index file is created, let's update the mtime of the
shared index file that the split-index file is referencing.

In a following commit we will make shared index file expire
depending on their mtime, so updating the mtime makes sure that
the shared index file will not be deleted soon.

Signed-off-by: Christian Couder 
---
 read-cache.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index a91fabe..3aeff77 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2268,6 +2268,12 @@ int write_locked_index(struct index_state *istate, 
struct lock_file *lock,
int ret = write_shared_index(istate, lock, flags);
if (ret)
return ret;
+   } else {
+   /* Signal that the shared index is used */
+   const char *shared_index = git_path("sharedindex.%s",
+   sha1_to_hex(si->base_sha1));
+   if (!check_and_freshen_file(shared_index, 1))
+   warning("could not freshen '%s'", shared_index);
}
 
return write_split_index(istate, lock, flags);
-- 
2.10.1.462.g7e1e03a



[PATCH v1 08/19] Documentation/git-update-index: talk about core.splitIndex config var

2016-10-23 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 Documentation/git-update-index.txt | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/Documentation/git-update-index.txt 
b/Documentation/git-update-index.txt
index 7386c93..e091b2a 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -171,6 +171,12 @@ may not support it yet.
given again, all changes in $GIT_DIR/index are pushed back to
the shared index file. This mode is designed for very large
indexes that take a significant amount of time to read or write.
++
+These options take effect whatever the value of the `core.splitIndex`
+configuration variable (see linkgit:git-config[1]). But a warning is
+emitted when the change goes against the configured value, as the
+configured value will take effect next time the index is read and this
+will remove the intended effect of the option.
 
 --untracked-cache::
 --no-untracked-cache::
-- 
2.10.1.462.g7e1e03a



[PATCH v1 06/19] t1700: add tests for core.splitIndex

2016-10-23 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t1700-split-index.sh | 37 +
 1 file changed, 37 insertions(+)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 292a072..db8c39f 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -200,4 +200,41 @@ EOF
test_cmp expect actual
 '
 
+test_expect_success 'set core.splitIndex config variable to true' '
+   git config core.splitIndex true &&
+   : >three &&
+   git update-index --add three &&
+   git ls-files --stage >ls-files.actual &&
+   cat >ls-files.expect expect ls-files.expect expect <

[PATCH v1 16/19] read-cache: unlink old sharedindex files

2016-10-23 Thread Christian Couder
Everytime split index is turned on, it creates a "sharedindex."
file in the git directory. This change makes sure that shared index
files that haven't been used for a long time are removed when a new
shared index file is created.

The new "splitIndex.sharedIndexExpire" config variable is created
to tell the delay after which an unused shared index file can be
deleted. It defaults to "1.week.ago".

A previous commit made sure that each time a split index file is
created the mtime of the shared index file it references is updated.
This makes sure that recently used shared index file will not be
deleted.

Signed-off-by: Christian Couder 
---
 read-cache.c | 63 +++-
 1 file changed, 62 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index 3aeff77..65ceb29 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2190,6 +2190,64 @@ static int write_split_index(struct index_state *istate,
return ret;
 }
 
+static const char *shared_index_expire = "1.week.ago";
+
+static unsigned long get_shared_index_expire_date(void)
+{
+   static unsigned long shared_index_expire_date;
+   static int shared_index_expire_date_prepared;
+
+   if (!shared_index_expire_date_prepared) {
+   git_config_get_date_string("splitindex.sharedindexexpire",
+  _index_expire);
+   shared_index_expire_date = approxidate(shared_index_expire);
+   shared_index_expire_date_prepared = 1;
+   }
+
+   return shared_index_expire_date;
+}
+
+static int can_delete_shared_index(const char *shared_sha1_hex)
+{
+   struct stat st;
+   unsigned long expiration;
+   const char *shared_index = git_path("sharedindex.%s", shared_sha1_hex);
+
+   /* Check timestamp */
+   expiration = get_shared_index_expire_date();
+   if (!expiration)
+   return 0;
+   if (stat(shared_index, ))
+   return error_errno("could not stat '%s", shared_index);
+   if (st.st_mtime > expiration)
+   return 0;
+
+   return 1;
+}
+
+static void clean_shared_index_files(const char *current_hex)
+{
+   struct dirent *de;
+   DIR *dir = opendir(get_git_dir());
+
+   if (!dir) {
+   error_errno("unable to open git dir: %s", get_git_dir());
+   return;
+   }
+
+   while ((de = readdir(dir)) != NULL) {
+   const char *sha1_hex;
+   if (!skip_prefix(de->d_name, "sharedindex.", _hex))
+   continue;
+   if (!strcmp(sha1_hex, current_hex))
+   continue;
+   if (can_delete_shared_index(sha1_hex) > 0 &&
+   unlink(git_path("%s", de->d_name)))
+   error_errno("unable to unlink: %s", git_path("%s", 
de->d_name));
+   }
+   closedir(dir);
+}
+
 static struct tempfile temporary_sharedindex;
 
 static int write_shared_index(struct index_state *istate,
@@ -2211,8 +2269,11 @@ static int write_shared_index(struct index_state *istate,
}
ret = rename_tempfile(_sharedindex,
  git_path("sharedindex.%s", 
sha1_to_hex(si->base->sha1)));
-   if (!ret)
+   if (!ret) {
hashcpy(si->base_sha1, si->base->sha1);
+   clean_shared_index_files(sha1_to_hex(si->base->sha1));
+   }
+
return ret;
 }
 
-- 
2.10.1.462.g7e1e03a



[PATCH v1 15/19] config: add git_config_get_date_string() from gc.c

2016-10-23 Thread Christian Couder
This function will be used in a following commit to get the expiration
time of the shared index files from the config, and it is generic
enough to be put in "config.c".

Signed-off-by: Christian Couder 
---
 builtin/gc.c | 15 ++-
 cache.h  |  1 +
 config.c | 13 +
 3 files changed, 16 insertions(+), 13 deletions(-)

diff --git a/builtin/gc.c b/builtin/gc.c
index 069950d..c1e9602 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -62,17 +62,6 @@ static void report_pack_garbage(unsigned seen_bits, const 
char *path)
string_list_append(_garbage, path);
 }
 
-static void git_config_date_string(const char *key, const char **output)
-{
-   if (git_config_get_string_const(key, output))
-   return;
-   if (strcmp(*output, "now")) {
-   unsigned long now = approxidate("now");
-   if (approxidate(*output) >= now)
-   git_die_config(key, _("Invalid %s: '%s'"), key, 
*output);
-   }
-}
-
 static void process_log_file(void)
 {
struct stat st;
@@ -111,8 +100,8 @@ static void gc_config(void)
git_config_get_int("gc.auto", _auto_threshold);
git_config_get_int("gc.autopacklimit", _auto_pack_limit);
git_config_get_bool("gc.autodetach", _auto);
-   git_config_date_string("gc.pruneexpire", _expire);
-   git_config_date_string("gc.worktreepruneexpire", 
_worktrees_expire);
+   git_config_get_date_string("gc.pruneexpire", _expire);
+   git_config_get_date_string("gc.worktreepruneexpire", 
_worktrees_expire);
git_config(git_default_config, NULL);
 }
 
diff --git a/cache.h b/cache.h
index a625b47..bcfc0f1 100644
--- a/cache.h
+++ b/cache.h
@@ -1811,6 +1811,7 @@ extern int git_config_get_bool(const char *key, int 
*dest);
 extern int git_config_get_bool_or_int(const char *key, int *is_bool, int 
*dest);
 extern int git_config_get_maybe_bool(const char *key, int *dest);
 extern int git_config_get_pathname(const char *key, const char **dest);
+extern int git_config_get_date_string(const char *key, const char **output);
 extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
 extern int git_config_get_max_percent_split_change(void);
diff --git a/config.c b/config.c
index 5580f56..f88c61b 100644
--- a/config.c
+++ b/config.c
@@ -1685,6 +1685,19 @@ int git_config_get_pathname(const char *key, const char 
**dest)
return ret;
 }
 
+int git_config_get_date_string(const char *key, const char **output)
+{
+   int ret = git_config_get_string_const(key, output);
+   if (ret)
+   return ret;
+   if (strcmp(*output, "now")) {
+   unsigned long now = approxidate("now");
+   if (approxidate(*output) >= now)
+   git_die_config(key, _("Invalid %s: '%s'"), key, 
*output);
+   }
+   return ret;
+}
+
 int git_config_get_untracked_cache(void)
 {
int val = -1;
-- 
2.10.1.462.g7e1e03a



[PATCH v1 03/19] split-index: add {add,remove}_split_index() functions

2016-10-23 Thread Christian Couder
Also use the functions in cmd_update_index() in
builtin/update-index.c.

These functions will be used in a following commit to tweak
our use of the split-index feature depending on the setting
of a configuration variable.

Signed-off-by: Christian Couder 
---
 builtin/update-index.c | 18 ++
 split-index.c  | 22 ++
 split-index.h  |  2 ++
 3 files changed, 30 insertions(+), 12 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index f3f07e7..b75ea03 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1098,18 +1098,12 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
}
 
if (split_index > 0) {
-   init_split_index(_index);
-   the_index.cache_changed |= SPLIT_INDEX_ORDERED;
-   } else if (!split_index && the_index.split_index) {
-   /*
-* can't discard_split_index(_index); because that
-* will destroy split_index->base->cache[], which may
-* be shared with the_index.cache[]. So yeah we're
-* leaking a bit here.
-*/
-   the_index.split_index = NULL;
-   the_index.cache_changed |= SOMETHING_CHANGED;
-   }
+   if (the_index.split_index)
+   the_index.cache_changed |= SPLIT_INDEX_ORDERED;
+   else
+   add_split_index(_index);
+   } else if (!split_index)
+   remove_split_index(_index);
 
switch (untracked_cache) {
case UC_UNSPECIFIED:
diff --git a/split-index.c b/split-index.c
index 615f4ca..f519e60 100644
--- a/split-index.c
+++ b/split-index.c
@@ -317,3 +317,25 @@ void replace_index_entry_in_base(struct index_state 
*istate,
istate->split_index->base->cache[new->index - 1] = new;
}
 }
+
+void add_split_index(struct index_state *istate)
+{
+   if (!istate->split_index) {
+   init_split_index(istate);
+   istate->cache_changed |= SPLIT_INDEX_ORDERED;
+   }
+}
+
+void remove_split_index(struct index_state *istate)
+{
+   if (istate->split_index) {
+   /*
+* can't discard_split_index(_index); because that
+* will destroy split_index->base->cache[], which may
+* be shared with the_index.cache[]. So yeah we're
+* leaking a bit here.
+*/
+   istate->split_index = NULL;
+   istate->cache_changed |= SOMETHING_CHANGED;
+   }
+}
diff --git a/split-index.h b/split-index.h
index c1324f5..df91c1b 100644
--- a/split-index.h
+++ b/split-index.h
@@ -31,5 +31,7 @@ void merge_base_index(struct index_state *istate);
 void prepare_to_write_split_index(struct index_state *istate);
 void finish_writing_split_index(struct index_state *istate);
 void discard_split_index(struct index_state *istate);
+void add_split_index(struct index_state *istate);
+void remove_split_index(struct index_state *istate);
 
 #endif
-- 
2.10.1.462.g7e1e03a



[PATCH v1 07/19] Documentation/config: add information for core.splitIndex

2016-10-23 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 Documentation/config.txt | 4 
 1 file changed, 4 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 27069ac..96521a4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -331,6 +331,10 @@ core.trustctime::
crawlers and some backup systems).
See linkgit:git-update-index[1]. True by default.
 
+core.splitIndex::
+   If true, the split-index feature of the index will be used.
+   See linkgit:git-update-index[1]. False by default.
+
 core.untrackedCache::
Determines what to do about the untracked cache feature of the
index. It will be kept, if this variable is unset or set to
-- 
2.10.1.462.g7e1e03a



[PATCH v1 10/19] read-cache: regenerate shared index if necessary

2016-10-23 Thread Christian Couder
When writing a new split-index and there is a big number of cache
entries in the split-index compared to the shared index, it is a
good idea to regenerate the shared index.

By default when the ratio reaches 20%, we will push back all
the entries from the split-index into a new shared index file
instead of just creating a new split-index file.

The threshold can be configured using the
"splitIndex.maxPercentChange" config variable.

We need to adjust the existing tests in t1700 by setting
"splitIndex.maxPercentChange" to 100 at the beginning of t1700,
as the existing tests are assuming that the shared index is
regenerated only when `git update-index --split-index` is used.

Signed-off-by: Christian Couder 
---
 read-cache.c   | 33 -
 t/t1700-split-index.sh |  1 +
 2 files changed, 33 insertions(+), 1 deletion(-)

diff --git a/read-cache.c b/read-cache.c
index bb53823..a91fabe 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2216,6 +2216,36 @@ static int write_shared_index(struct index_state *istate,
return ret;
 }
 
+static const int default_max_percent_split_change = 20;
+
+int too_many_not_shared_entries(struct index_state *istate)
+{
+   int i, not_shared = 0;
+   int max_split = git_config_get_max_percent_split_change();
+
+   switch (max_split) {
+   case -1:
+   /* not or badly configured: use the default value */
+   max_split = default_max_percent_split_change;
+   break;
+   case 0:
+   return 1; /* 0% means always write a new shared index */
+   case 100:
+   return 0; /* 100% means never write a new shared index */
+   default:
+   ; /* do nothing: just use the configured value */
+   }
+
+   /* Count not shared entries */
+   for (i = 0; i < istate->cache_nr; i++) {
+   struct cache_entry *ce = istate->cache[i];
+   if (!ce->index)
+   not_shared++;
+   }
+
+   return istate->cache_nr * max_split < not_shared * 100;
+}
+
 int write_locked_index(struct index_state *istate, struct lock_file *lock,
   unsigned flags)
 {
@@ -2233,7 +2263,8 @@ int write_locked_index(struct index_state *istate, struct 
lock_file *lock,
if ((v & 15) < 6)
istate->cache_changed |= SPLIT_INDEX_ORDERED;
}
-   if (istate->cache_changed & SPLIT_INDEX_ORDERED) {
+   if (istate->cache_changed & SPLIT_INDEX_ORDERED ||
+   too_many_not_shared_entries(istate)) {
int ret = write_shared_index(istate, lock, flags);
if (ret)
return ret;
diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index db8c39f..507a1dd 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -8,6 +8,7 @@ test_description='split index mode tests'
 sane_unset GIT_TEST_SPLIT_INDEX
 
 test_expect_success 'enable split index' '
+   git config splitIndex.maxPercentChange 100 &&
git update-index --split-index &&
test-dump-split-index .git/index >actual &&
indexversion=$(test-index-version <.git/index) &&
-- 
2.10.1.462.g7e1e03a



[PATCH v1 12/19] Documentation/config: add splitIndex.maxPercentChange

2016-10-23 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 Documentation/config.txt | 13 +
 1 file changed, 13 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 96521a4..380eeb8 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2763,6 +2763,19 @@ showbranch.default::
The default set of branches for linkgit:git-show-branch[1].
See linkgit:git-show-branch[1].
 
+splitIndex.maxPercentChange::
+   When the split index feature is used, this specifies the
+   percent of entries the split index can contain compared to the
+   whole number of entries in both the split index and the shared
+   index before a new shared index is written.
+   The value should be between 0 and 100. If the value is 0 then
+   a new shared index is always written, if it is 100 a new
+   shared index is never written.
+   By default the value is 20, so a new shared index is written
+   if the number of entries in the split index would be greater
+   than 20 percent of the total number of entries.
+   See linkgit:git-update-index[1].
+
 status.relativePaths::
By default, linkgit:git-status[1] shows paths relative to the
current directory. Setting this variable to `false` shows paths
-- 
2.10.1.462.g7e1e03a



[PATCH v1 05/19] update-index: warn in case of split-index incoherency

2016-10-23 Thread Christian Couder
When users are using `git update-index --(no-)split-index`, they
may expect the split-index feature to be used or not according to
the option they just used, but this might not be the case if the
new "core.splitIndex" config variable has been set. In this case
let's warn about what will happen and why.

Signed-off-by: Christian Couder 
---
 builtin/update-index.c | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index b75ea03..a14dbf2 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -1098,12 +1098,21 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
}
 
if (split_index > 0) {
+   if (git_config_get_split_index() == 0)
+   warning("core.splitIndex is set to false; "
+   "remove or change it, if you really want to "
+   "enable split index");
if (the_index.split_index)
the_index.cache_changed |= SPLIT_INDEX_ORDERED;
else
add_split_index(_index);
-   } else if (!split_index)
+   } else if (!split_index) {
+   if (git_config_get_split_index() == 1)
+   warning("core.splitIndex is set to true; "
+   "remove or change it, if you really want to "
+   "disable split index");
remove_split_index(_index);
+   }
 
switch (untracked_cache) {
case UC_UNSPECIFIED:
-- 
2.10.1.462.g7e1e03a



[PATCH v1 17/19] t1700: test shared index file expiration

2016-10-23 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t1700-split-index.sh | 44 
 1 file changed, 44 insertions(+)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index f03addf..f448fc1 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -310,4 +310,48 @@ EOF
test_cmp expect actual
 '
 
+test_expect_success 'shared index files expire after 7 days by default' '
+   : >ten &&
+   git update-index --add ten &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   just_under_7_days_ago=$((1-7*86400)) &&
+   test-chmtime =$just_under_7_days_ago .git/sharedindex.* &&
+   : >eleven &&
+   git update-index --add eleven &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   just_over_7_days_ago=$((-1-7*86400)) &&
+   test-chmtime =$just_over_7_days_ago .git/sharedindex.* &&
+   : >twelve &&
+   git update-index --add twelve &&
+   test $(ls .git/sharedindex.* | wc -l) = 1
+'
+
+test_expect_success 'check splitIndex.sharedIndexExpire set to 8 days' '
+   git config splitIndex.sharedIndexExpire "8.days.ago" &&
+   test-chmtime =$just_over_7_days_ago .git/sharedindex.* &&
+   : >thirteen &&
+   git update-index --add thirteen &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   just_over_8_days_ago=$((-1-8*86400)) &&
+   test-chmtime =$just_over_8_days_ago .git/sharedindex.* &&
+   : >fourteen &&
+   git update-index --add fourteen &&
+   test $(ls .git/sharedindex.* | wc -l) = 1
+'
+
+test_expect_success 'check splitIndex.sharedIndexExpire set to "never" and 
"now"' '
+   git config splitIndex.sharedIndexExpire never &&
+   just_10_years_ago=$((-365*10*86400)) &&
+   test-chmtime =$just_10_years_ago .git/sharedindex.* &&
+   : >fifteen &&
+   git update-index --add fifteen &&
+   test $(ls .git/sharedindex.* | wc -l) -gt 1 &&
+   git config splitIndex.sharedIndexExpire now &&
+   just_1_second_ago=-1 &&
+   test-chmtime =$just_1_second_ago .git/sharedindex.* &&
+   : >sixteen &&
+   git update-index --add sixteen &&
+   test $(ls .git/sharedindex.* | wc -l) = 1
+'
+
 test_done
-- 
2.10.1.462.g7e1e03a



[PATCH v1 11/19] t1700: add tests for splitIndex.maxPercentChange

2016-10-23 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t1700-split-index.sh | 72 ++
 1 file changed, 72 insertions(+)

diff --git a/t/t1700-split-index.sh b/t/t1700-split-index.sh
index 507a1dd..f03addf 100755
--- a/t/t1700-split-index.sh
+++ b/t/t1700-split-index.sh
@@ -238,4 +238,76 @@ EOF
test_cmp expect actual
 '
 
+test_expect_success 'set core.splitIndex config variable to true' '
+   git config core.splitIndex true &&
+   : >three &&
+   git update-index --add three &&
+   BASE=$(test-dump-split-index .git/index | grep "^base") &&
+   test-dump-split-index .git/index | sed "/^own/d" >actual &&
+   cat >expect actual &&
+   cat >expect actual &&
+   cat >expect actual &&
+   cat >expect actual &&
+   cat >expect actual &&
+   cat >expect <

[PATCH v1 18/19] Documentation/config: add splitIndex.sharedIndexExpire

2016-10-23 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 Documentation/config.txt | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 380eeb8..5212a97 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2776,6 +2776,17 @@ splitIndex.maxPercentChange::
than 20 percent of the total number of entries.
See linkgit:git-update-index[1].
 
+splitIndex.sharedIndexExpire::
+   When the split index feature is used, shared index files with
+   a mtime older than this time will be removed when a new shared
+   index file is created. The value "now" expires all entries
+   immediately, and "never" suppresses expiration altogether.
+   The default value is "one.week.ago".
+   Note that each time a new split-index file is created, the
+   mtime of the related shared index file is updated to the
+   current time.
+   See linkgit:git-update-index[1].
+
 status.relativePaths::
By default, linkgit:git-status[1] shows paths relative to the
current directory. Setting this variable to `false` shows paths
-- 
2.10.1.462.g7e1e03a



[PATCH v1 01/19] split-index: s/eith/with/ typo fix

2016-10-23 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 split-index.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/split-index.c b/split-index.c
index 35da553..615f4ca 100644
--- a/split-index.c
+++ b/split-index.c
@@ -187,7 +187,7 @@ void prepare_to_write_split_index(struct index_state 
*istate)
/* Go through istate->cache[] and mark CE_MATCHED to
 * entry with positive index. We'll go through
 * base->cache[] later to delete all entries in base
-* that are not marked eith either CE_MATCHED or
+* that are not marked with either CE_MATCHED or
 * CE_UPDATE_IN_BASE. If istate->cache[i] is a
 * duplicate, deduplicate it.
 */
-- 
2.10.1.462.g7e1e03a



[PATCH v1 13/19] sha1_file: make check_and_freshen_file() non static

2016-10-23 Thread Christian Couder
This function will be used in a commit soon, so let's make
it available globally.

Signed-off-by: Christian Couder 
---
 cache.h | 3 +++
 sha1_file.c | 2 +-
 2 files changed, 4 insertions(+), 1 deletion(-)

diff --git a/cache.h b/cache.h
index faceceb..a625b47 100644
--- a/cache.h
+++ b/cache.h
@@ -1167,6 +1167,9 @@ extern int has_pack_index(const unsigned char *sha1);
 
 extern void assert_sha1_type(const unsigned char *sha1, enum object_type 
expect);
 
+/* Helper to check and "touch" a file */
+extern int check_and_freshen_file(const char *fn, int freshen);
+
 extern const signed char hexval_table[256];
 static inline unsigned int hexval(unsigned char c)
 {
diff --git a/sha1_file.c b/sha1_file.c
index 266152d..29f927e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -576,7 +576,7 @@ static int freshen_file(const char *fn)
  * either does not exist on disk, or has a stale mtime and may be subject to
  * pruning).
  */
-static int check_and_freshen_file(const char *fn, int freshen)
+int check_and_freshen_file(const char *fn, int freshen)
 {
if (access(fn, F_OK))
return 0;
-- 
2.10.1.462.g7e1e03a



[PATCH v1 04/19] read-cache: add and then use tweak_split_index()

2016-10-23 Thread Christian Couder
This will make us use the split-index feature or not depending
on the value of the "core.splitIndex" config variable.

Signed-off-by: Christian Couder 
---
 read-cache.c | 17 +
 1 file changed, 17 insertions(+)

diff --git a/read-cache.c b/read-cache.c
index 38d67fa..bb53823 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1562,10 +1562,27 @@ static void tweak_untracked_cache(struct index_state 
*istate)
}
 }
 
+static void tweak_split_index(struct index_state *istate)
+{
+   switch (git_config_get_split_index()) {
+   case -1: /* unset: do nothing */
+   break;
+   case 0: /* false */
+   remove_split_index(istate);
+   break;
+   case 1: /* true */
+   add_split_index(istate);
+   break;
+   default: /* unknown value: do nothing */
+   break;
+   }
+}
+
 static void post_read_index_from(struct index_state *istate)
 {
check_ce_order(istate);
tweak_untracked_cache(istate);
+   tweak_split_index(istate);
 }
 
 /* remember to discard_cache() before reading a different cache! */
-- 
2.10.1.462.g7e1e03a



[PATCH v1 00/19] Add configuration options for split-index

2016-10-23 Thread Christian Couder
Goal


We want to make it possible to use the split-index feature
automatically by just setting a new "core.splitIndex" configuration
variable to true.

This can be valuable as split-index can help significantly speed up
`git rebase` especially along with the work to libify `git apply`
that has been recently merged to master
(see 
https://github.com/git/git/commit/81358dc238372793b1590efa149cc1581d1fbd98).

Design
~~

The design is similar as the previous work that introduced
"core.untrackedCache". 

The new "core.splitIndex" configuration option can be either true,
false or undefined which is the default.

When it is true, the split index is created, if it does not already
exists, when the index is read. When it is false, the split index is
removed if it exists, when the index is read. Otherwise it is left as
is.

Along with this new configuration variable, the two following options
are also introduced:

- splitIndex.maxPercentChange

This is to avoid having too many changes accumulating in the split
index while in split index mode. The git-update-index
documentation says:

If split-index mode is already enabled and `--split-index` is
given again, all changes in $GIT_DIR/index are pushed back to
the shared index file.

but it is probably better to not expect the user to think about it
and to have a mechanism that pushes back all changes to the shared
index file automatically when some threshold is reached.

The default threshold is when the number of entries in the split
index file reaches 20% (by default) of the number of entries in
the shared index file. The new "splitIndex.maxPercentChange"
config option lets people tweak this value.

- splitIndex.sharedIndexExpire

To make sure that old sharedindex files are eventually removed
when a new one has been created, we "touch" the shared index file
every time it is used by a new split index file. Then we can
delete shared indexes with an mtime older than one week (by
default), when we create a new shared index file. The new
"splitIndex.sharedIndexExpire" config option lets people tweak
this grace period.

This idea was suggested by Duy in:


https://public-inbox.org/git/cacsjy8bqmfashf5kjguh+bd7xg98cafnyde964vjypxz-em...@mail.gmail.com/

and after some experiments, I agree that it is much simpler than
what I thought could be done during our discussion.

Highlevel view of the patches in the series
~~~

Except for patch 1/19, there are 3 big steps, one for each new
configuration variable introduced.

The main difference between this patch series and the RFC patch series
sent last July is that the Step 2 and 3 are new and have been
implemented as suggested by Duy. Thanks Duy!

- Patch 1/19 is a typo fix in a comment that can be applied
  separately.

Step 1 is:

- Patches 2/19 to 5/19 introduce the functions that are reading
  the "core.splitIndex" configuration variable and tweaking the
  split index depending on its value.

- Patch 6/19 adds a few tests for the new feature.

- Patches 7/19 and 8/19 add some documentation for the new
  feature.

Step 2 is:

- Patches 9/19 and 10/19 introduce the functions that are reading
  the "splitIndex.maxPercentChange" configuration variable and
  regenerating a new shared index file depending on its value.

- Patch 11/19 adds a few tests for the new feature.

- Patch 12/19 add some documentation for the new feature.

Step 3 is:

- Patches 13/19 to 16/19 introduce the functions that are reading
  the "splitIndex.sharedIndexExpire" configuration variable and
  expiring old shared index files depending on its value.

- Patch 17/19 adds a few tests for the new feature.

- Patches 18/19 and 19/19 add some documentation for the new
  feature.

Links
~

This patch series is also available here:

  https://github.com/chriscool/git/commits/config-split-index

The previous RFC version was:

  https://github.com/chriscool/git/commits/config-split-index7

On the mailing list the related patch series and discussions were:

  https://public-inbox.org/git/20160711172254.13439-1-chrisc...@tuxfamily.org/

Christian Couder (19):
  split-index: s/eith/with/ typo fix
  config: add git_config_get_split_index()
  split-index: add {add,remove}_split_index() functions
  read-cache: add and then use tweak_split_index()
  update-index: warn in case of split-index incoherency
  t1700: add tests for core.splitIndex
  Documentation/config: add information for core.splitIndex
  Documentation/git-update-index: talk about core.splitIndex config var
  config: add git_config_get_max_percent_split_change()
  read-cache: regenerate shared index if necessary
  t1700: add tests for splitIndex.maxPercentChange
  Documentation/config: add splitIndex.maxPercentChange
  sha1_file: make check_and_freshen_file() non static
 

[PATCH v1 02/19] config: add git_config_get_split_index()

2016-10-23 Thread Christian Couder
This new function will be used in a following commit to know
if we want to use the split index feature or not.

Signed-off-by: Christian Couder 
---
 cache.h  |  1 +
 config.c | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/cache.h b/cache.h
index 05ecb88..394da60 100644
--- a/cache.h
+++ b/cache.h
@@ -1809,6 +1809,7 @@ extern int git_config_get_bool_or_int(const char *key, 
int *is_bool, int *dest);
 extern int git_config_get_maybe_bool(const char *key, int *dest);
 extern int git_config_get_pathname(const char *key, const char **dest);
 extern int git_config_get_untracked_cache(void);
+extern int git_config_get_split_index(void);
 
 /*
  * This is a hack for test programs like test-dump-untracked-cache to
diff --git a/config.c b/config.c
index 83fdecb..036e29b 100644
--- a/config.c
+++ b/config.c
@@ -1709,6 +1709,16 @@ int git_config_get_untracked_cache(void)
return -1; /* default value */
 }
 
+int git_config_get_split_index(void)
+{
+   int val = -1;
+
+   if (!git_config_get_maybe_bool("core.splitindex", ))
+   return val;
+
+   return -1; /* default value */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
-- 
2.10.1.462.g7e1e03a



[PATCH v1 19/19] Documentation/git-update-index: explain splitIndex.*

2016-10-23 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 Documentation/git-update-index.txt | 33 +
 1 file changed, 25 insertions(+), 8 deletions(-)

diff --git a/Documentation/git-update-index.txt 
b/Documentation/git-update-index.txt
index e091b2a..635d157 100644
--- a/Documentation/git-update-index.txt
+++ b/Documentation/git-update-index.txt
@@ -163,14 +163,10 @@ may not support it yet.
 
 --split-index::
 --no-split-index::
-   Enable or disable split index mode. If enabled, the index is
-   split into two files, $GIT_DIR/index and $GIT_DIR/sharedindex..
-   Changes are accumulated in $GIT_DIR/index while the shared
-   index file contains all index entries stays unchanged. If
-   split-index mode is already enabled and `--split-index` is
-   given again, all changes in $GIT_DIR/index are pushed back to
-   the shared index file. This mode is designed for very large
-   indexes that take a significant amount of time to read or write.
+   Enable or disable split index mode. If split-index mode is
+   already enabled and `--split-index` is given again, all
+   changes in $GIT_DIR/index are pushed back to the shared index
+   file.
 +
 These options take effect whatever the value of the `core.splitIndex`
 configuration variable (see linkgit:git-config[1]). But a warning is
@@ -394,6 +390,27 @@ Although this bit looks similar to assume-unchanged bit, 
its goal is
 different from assume-unchanged bit's. Skip-worktree also takes
 precedence over assume-unchanged bit when both are set.
 
+Split index
+---
+
+This mode is designed for very large indexes that take a significant
+amount of time to read or write.
+
+In this mode, the index is split into two files, $GIT_DIR/index and
+$GIT_DIR/sharedindex.. Changes are accumulated in
+$GIT_DIR/index, the split index, while the shared index file contains
+all index entries and stays unchanged.
+
+All changes in the split index are pushed back to the shared index
+file when the number of entries in the split index reaches a level
+specified by the splitIndex.maxPercentChange config variable (see
+linkgit:git-config[1]).
+
+Each time a new shared index file is created, the old shared index
+files are deleted if they are older than what is specified by the
+splitIndex.sharedIndexExpire config variable (see
+linkgit:git-config[1]).
+
 Untracked cache
 ---
 
-- 
2.10.1.462.g7e1e03a



[PATCH v1 09/19] config: add git_config_get_max_percent_split_change()

2016-10-23 Thread Christian Couder
This new function will be used in a following commit to get the
value of the "splitIndex.maxPercentChange" config variable.

Signed-off-by: Christian Couder 
---
 cache.h  |  1 +
 config.c | 16 
 2 files changed, 17 insertions(+)

diff --git a/cache.h b/cache.h
index 394da60..faceceb 100644
--- a/cache.h
+++ b/cache.h
@@ -1810,6 +1810,7 @@ extern int git_config_get_maybe_bool(const char *key, int 
*dest);
 extern int git_config_get_pathname(const char *key, const char **dest);
 extern int git_config_get_untracked_cache(void);
 extern int git_config_get_split_index(void);
+extern int git_config_get_max_percent_split_change(void);
 
 /*
  * This is a hack for test programs like test-dump-untracked-cache to
diff --git a/config.c b/config.c
index 036e29b..5580f56 100644
--- a/config.c
+++ b/config.c
@@ -1719,6 +1719,22 @@ int git_config_get_split_index(void)
return -1; /* default value */
 }
 
+int git_config_get_max_percent_split_change(void)
+{
+   int val = -1;
+
+   if (!git_config_get_int("splitindex.maxpercentchange", )) {
+   if (0 <= val && val <= 100)
+   return val;
+
+   error("splitindex.maxpercentchange value '%d' "
+ "should be between 0 and 100", val);
+   return -1;
+   }
+
+   return -1; /* default value */
+}
+
 NORETURN
 void git_die_config_linenr(const char *key, const char *filename, int linenr)
 {
-- 
2.10.1.462.g7e1e03a



Re: [PATCH] hex: use unsigned index for ring buffer

2016-10-23 Thread Jeff King
On Sun, Oct 23, 2016 at 11:00:48AM +0200, René Scharfe wrote:

> Overflow is defined for unsigned integers, but not for signed ones.
> Make the ring buffer index in sha1_to_hex() unsigned to be on the
> safe side.
> 
> Signed-off-by: Rene Scharfe 
> ---
> Hard to trigger, but probably even harder to diagnose once someone
> somehow manages to do it on some uncommon architecture.

Indeed. If we are worried about overflow, we would also want to assume
that it wraps at a multiple of 4, but that is probably a sane
assumption.

> diff --git a/hex.c b/hex.c
> index ab2610e..8c6c189 100644
> --- a/hex.c
> +++ b/hex.c
> @@ -76,7 +76,7 @@ char *oid_to_hex_r(char *buffer, const struct object_id 
> *oid)
>  
>  char *sha1_to_hex(const unsigned char *sha1)
>  {
> - static int bufno;
> + static unsigned int bufno;
>   static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
>   return sha1_to_hex_r(hexbuffer[3 & ++bufno], sha1);
>  }

I wonder if just truncating bufno would be conceptually simpler (albeit
longer):

  bufno++;
  bufno &= 3;
  return sha1_to_hex_r(hexbuffer[bufno], sha1);

You could also write the second line like:

  bufno %= ARRAY_SIZE(hexbuffer);

which is less magical (right now the set of buffers must be a power of
2). I expect the compiler could turn that into a bitmask itself.

I'm fine with any of the options. I guess you'd want a similar patch for
find_unique_abbrev on top of jk/no-looking-at-dotgit-outside-repo.

-Peff


Re: [PATCH] Allow stashes to be referenced by index only

2016-10-23 Thread Jeff King
On Thu, Sep 08, 2016 at 07:46:37PM -0400, Aaron M Watson wrote:

> Instead of referencing "stash@{n}" explicitly, it can simply be
> referenced as "n".  Most users only reference stashes by their position
> in the stash stask (what I refer to as the "index"). The syntax for the
> typical stash (stash@{n}) is slightly annoying and easy to forget, and
> sometimes difficult to escape properly in a script. Because of this the
> capability to do things with the stash by simply referencing the index
> is desirable.
> 
> This patch includes the superior implementation provided by Øsse Walle
> (thanks for that), with a slight change to fix a broken test in the test
> suite. I also merged the test scripts as suggested by Jeff King, and
> un-wrapped the documentation as suggested by Junio Hamano.

Just cleaning out my list of leftover bits, and it looks like this patch
fell through the cracks.

The last thing before this was generally people agreeing with the
approach that Øsse showed:

  
http://public-inbox.org/git/CAFaJEqu-JUcwLjrQBk_huSa3DZfCf8O4eAZ=UgcXHzN=clg...@mail.gmail.com/

  http://public-inbox.org/git/xmqqbmzzoazy@gitster.mtv.corp.google.com/

and we just needed to roll that up into a patch. Which this _mostly_ is,
but there are a few funny things still:

> diff --git a/git-stash.sh b/git-stash.sh
> index 826af18..d8d3b8d 100755
> --- a/git-stash.sh
> +++ b/git-stash.sh
> @@ -384,9 +384,10 @@ parse_flags_and_rev()
>   i_tree=
>   u_tree=
>  
> - REV=$(git rev-parse --no-flags --symbolic --sq "$@") || exit 1
> + REV=$(git rev-parse --no-flags --symbolic --sq "$@" 2> /dev/null)

Style: we don't put a space between ">" and the filename.

So here we assign REV, and we no longer exit (to handle something like
"1" by itself, which is good).

>   FLAGS=
> + ARGV=
>   for opt
>   do
>   case "$opt" in
> @@ -404,10 +405,13 @@ parse_flags_and_rev()
>   die "$(eval_gettext "unknown option: 
> \$opt")"
>   FLAGS="${FLAGS}${FLAGS:+ }$opt"
>   ;;
> + *)
> + ARGV="${ARGV}${ARGV:+ }'$opt'"
> + ;;
>   esac
>   done
>  
> - eval set -- $REV
> + eval set -- $ARGV

But what's going on here? Why did we bother running rev-parse earlier if
we don't actually use the value of REV?

You mentioned tweaking it to fix a broken test, and indeed, just using
$REV here breaks a few tests in t3903.

Offhand, I do not see anything wrong with pulling the non-option values
out in the loop. But in that case I think the assignment of REV can just
go away completely.

> diff --git a/t/t3903-stash.sh b/t/t3903-stash.sh
> index 2142c1f..f82a8c4 100755
> --- a/t/t3903-stash.sh
> +++ b/t/t3903-stash.sh
> @@ -131,6 +131,26 @@ test_expect_success 'drop middle stash' '
>   test 1 = $(git show HEAD:file)
>  '
>  
> +test_expect_success 'drop middle stash by index' '
> + git reset --hard &&
> + echo 8 > file &&
> + git stash &&
> + echo 9 > file &&

Ditto on the ">" style here and elsewhere.

The tests otherwise look good to me.

-Peff


[PATCH] hex: use unsigned index for ring buffer

2016-10-23 Thread René Scharfe
Overflow is defined for unsigned integers, but not for signed ones.
Make the ring buffer index in sha1_to_hex() unsigned to be on the
safe side.

Signed-off-by: Rene Scharfe 
---
Hard to trigger, but probably even harder to diagnose once someone
somehow manages to do it on some uncommon architecture.

 hex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hex.c b/hex.c
index ab2610e..8c6c189 100644
--- a/hex.c
+++ b/hex.c
@@ -76,7 +76,7 @@ char *oid_to_hex_r(char *buffer, const struct object_id *oid)
 
 char *sha1_to_hex(const unsigned char *sha1)
 {
-   static int bufno;
+   static unsigned int bufno;
static char hexbuffer[4][GIT_SHA1_HEXSZ + 1];
return sha1_to_hex_r(hexbuffer[3 & ++bufno], sha1);
 }
-- 
2.10.1



revisiting zstd timings

2016-10-23 Thread Jeff King
You may recall some "what if" timings I did recently, where I replaced
zlib with zstd:

  
http://public-inbox.org/git/20160914235843.nacr54ekvl6rj...@sigill.intra.peff.net/

The aim was that it would give us about the same compression level, but
much faster inflating and deflating. My numbers there showed that yes,
inflate is faster (which speeds up normal operations like traversals and
diffs), but that compression seemed to be much slower.

Since then, zstd has released a new version of their zlib wrapper which
performs much better for our case. Basically, the prior version's
transparent zstd setup was not optimized well for our case of doing tons
of small deflates. I'm cc-ing Przemyslaw Skibinski, who wrote the
wrapper code, and who pointed me in the right direction (thanks!).

I'll include the new patch at the end, but it's basically the same as
the old with one minor symbol change:

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 7da2fd44b0..07f45aa171 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2287,7 +2287,7 @@ static int git_pack_config(const char *k, const 
char *v, void *cb)
return config_error_nonbool(k);
 #ifdef USE_ZSTD
if (skip_prefix(v, "zstd", )) {
-   useZSTD(1);
+   ZWRAP_useZSTDcompression(1);
/* XXX doesn't seem to be a constant? */
pack_compression_max = 22;
}

Here are the numbers I get with the newer version (see the prior email
for descriptions of each test, but I did run them all from scratch
again).

zlib
 - pack
 1127899484
 - repack
 real4m55.099s
 user17m46.796s
 sys 0m6.164s
 - rev-list
 real0m27.402s
 user0m27.080s
 sys 0m0.320s
 - log -Sfoo
 real5m16.326s
 user5m13.860s
 sys 0m2.448s

That gives us a baseline for the time of each operation, and the size of
the compressed pack. The rest of the values will be expressed as
percentages from this baseline.

zstd=disabled
 - pack
 1127884778
 (+0%)
 - repack
 real4m49.443s
 user17m39.468s
 sys 0m5.996s
 (-1.9%)
 - rev-list
 real0m27.527s
 user0m27.316s
 sys 0m0.208s
 (+0.4%)
 - log -Sfoo
 real5m21.029s
 user5m17.704s
 sys 0m3.308s
 (+1.4%)

Timing the wrapper making use of zlib, it produces the same output and
takes a slightly larger amount of time (the repack is quicker, but
there's quite a bit of run-to-run noise in that measurement). Nothing
surprising.

zstd=1
 - pack
 1191357029
 (+5.6%)
 - repack
 real4m8.323s
 user16m24.664s
 sys 0m6.816s
 (-15.8%)
 - rev-list
 real0m25.780s
 user0m25.488s
 sys 0m0.288s
 (-5.9%)
 - log -Sfoo
 real4m19.301s
 user4m16.488s
 sys 0m2.796s
 (-18%)

Here's where we get interesting; zstd turned down to its lowest setting.
As before, the inflate step shows a nice speedup for some common
read-only operations. The repack is much faster, and the resulting pack
is slightly larger. We'd probably want to tune the cpu/size tradeoff for
deflate a bit higher.

zstd=3
 - pack
 1163632679
 (+3.1%)
 - repack
 real4m11.800s
 user16m34.260s
 sys 0m6.788s
 (-14.6%)
 - rev-list
 real0m25.678s
 user0m25.376s
 sys 0m0.296s
 (-6.2%)
 - log -Sfoo
 real4m19.902s
 user4m16.604s
 sys 0m3.280s
 (-17.8%)

And here that is. Without spending much more CPU on deflate, we shaved
off a few percent. The reading side remains fast (I wondered if it might
get faster as we shrink the pack just because there are fewer bytes to
deal with on the input side, but it doesn't seem to really matter. And
anyway, we're only talking a few percent of the input size).

zstd=5
 - pack
 1137697830
 (+0.8%)
 - repack
 real4m20.930s
 user16m45.844s
 sys 0m7.100s
 (-11.5%)

Turning it up higher, we're starting to reach parity with zlib for the
pack size. And the deflate process is still faster. I stopped measuring
the read side, since it seemed to be remaining constant.

zstd=7
 - pack
 1130545314
 (+0.2%)
 - repack
 real4m28.960s
 user16m50.028s
 sys 0m6.884s
 (-8.8%)

zstd=9
 - pack
 1128847500
 (+0%)
 - repack
 real4m52.234s
 user17m22.644s
 sys 0m7.080s
 (-0.9%)

zstd=11
 - pack
 1128415787
 (+0%)
 - repack
 real5m33.187s
 user18m12.792s
 sys 0m7.436s
 (+12.9%)

And these numbers show we can pretty much dial in the cpu/size tradeoff
wherever we