Re: [PATCH v8 08/14] commit-graph: implement git commit-graph read

2018-04-14 Thread Eric Sunshine
On Sat, Apr 14, 2018 at 6:15 PM, Jakub Narebski  wrote:
> Derrick Stolee  writes:
>> + NUM_CHUNKS=$((3 + $(echo "$2" | wc -w)))
>
> I don't know if it is possible to do the above in a portable shell
> without using external 'wc' command.  Also, isn't $(( ... )) bashism?

$((...)) is POSIX and used heavily in existing Git test scripts.


[PATCH] t1510-repo-setup.sh: rm useless mkdir

2018-04-14 Thread qingyunha
From: Tao Qingyun <845767...@qq.com>

Signed-off-by: Tao Qingyun <845767...@qq.com>
---
 t/t1510-repo-setup.sh | 1 -
 1 file changed, 1 deletion(-)

diff --git a/t/t1510-repo-setup.sh b/t/t1510-repo-setup.sh
index e6854b828..972bd9c78 100755
--- a/t/t1510-repo-setup.sh
+++ b/t/t1510-repo-setup.sh
@@ -238,7 +238,6 @@ test_expect_success '#0: nonbare repo, no explicit 
configuration' '
 '
 
 test_expect_success '#1: GIT_WORK_TREE without explicit GIT_DIR is accepted' '
-   mkdir -p wt &&
try_repo 1 "$here" unset unset "" unset \
"$here/1/.git" "$here" "$here" 1/ \
"$here/1/.git" "$here" "$here" 1/sub/ 2>message &&
-- 
2.16.2





Request for Quotation

2018-04-14 Thread Mohammed
Hello,
 
Good day,
 
I am  Mohammed, Our company is interested in your product.
We have gone through your product site online and wish to make order of your
product.
Please do send us details of your products and company to our {email} Also
provide with the recent price
 
We await your response with quotation and specification.
[1] Payment terms
[2] And your products Warranty
(3] Minimum Order Quantity
 
 
Mohammed /Purchasing Manager
Telephone: +966 3 867 1902
Fax: +966 3 867 3435
tr.export.imp...@outlook.com
 
PAN TRADING  EQUIPMENT'S WORLDWIDE
 
Address: Dallah street, Al Rehab
Saudi Arabia


Re: [PATCH 0/7] nd/repack-keep-pack update

2018-04-14 Thread Junio C Hamano
On Sun, Apr 15, 2018 at 4:47 AM, Ævar Arnfjörð Bjarmason
 wrote:
>
> The only (trivial) issue I found in the patches themselves was that
> between 4/5 and 5/7 you're adding an empty line to config.txt in 4/7
> just to erase it in 5/7, better not to add it to begin with, but
> hopefully Junio can fix that up (if he cares).

I do care but I'd wish I do not have to waste time and concentration to spend
on doing so, even though I would be fully capable of skipping this round and
remembering to queue a rerolled one.

I've seen mentions like the above one a few times on the list recently, so let
me make it clear. Some things are easier to tweak locally than others, and
I'd rather not to waste my time on cleaning other people's mess. A simple
typofix that does not cascade through to later steps in a series is one thing.
A tweak that changes number of lines in a hunk that forces a later step to
compensate is more involved.

Don't expect your traffic cop to wash your care while you're stopping at a
red signal.


Re: [PATCH v8 08/14] commit-graph: implement git commit-graph read

2018-04-14 Thread Jakub Narebski
Derrick Stolee  writes:

> From: Derrick Stolee 
> Subject: [PATCH v8 08/14] commit-graph: implement git commit-graph read

Minor nit: this is one commit message [subject] among all others that
uses "git commit-graph" instead of "git-commit-graph" in the
description.

Also, perhaps this (and similarly titled commits in this series) would
read better with quotes, that is as:

  commit-graph: implement "git commit-graph read"

Though that might be a matter of personal taste.

>
> Teach git-commit-graph to read commit graph files and summarize their 
> contents.
>
> Use the read subcommand to verify the contents of a commit graph file in the
> tests.

Better would be, in my opinion

  Use the 'read' subcommand

or

  Use the "read" subcommand

>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/git-commit-graph.txt |  12 +++
>  builtin/commit-graph.c |  56 
>  commit-graph.c | 137 -
>  commit-graph.h |  23 +
>  t/t5318-commit-graph.sh|  32 +--
>  5 files changed, 254 insertions(+), 6 deletions(-)
>
> diff --git a/Documentation/git-commit-graph.txt 
> b/Documentation/git-commit-graph.txt
> index 47996e8f89..8aad8303f5 100644
> --- a/Documentation/git-commit-graph.txt
> +++ b/Documentation/git-commit-graph.txt
> @@ -9,6 +9,7 @@ git-commit-graph - Write and verify Git commit graph files
>  SYNOPSIS
>  
>  [verse]
> +'git commit-graph read' [--object-dir ]
>  'git commit-graph write'  [--object-dir ]

Why do you need this '[--object-dir ]' parameter?  Anyway, because
Git has the GIT_OBJECT_DIRECTORY environment variable support, I would
expect '--object-dir' to be parameter to the 'git' wrapper/command, like
'--git-dir' is, not to the 'git commit-graph' command, or even only its
selected individual subcommands.

>  
>  
> @@ -35,6 +36,11 @@ COMMANDS
>  Write a commit graph file based on the commits found in packfiles.
>  Includes all commits from the existing commit graph file.
>  
> +'read'::
> +
> +Read a graph file given by the commit-graph file

The above part of sentence reads very strange, as a truism.

>   and output basic
> +details about the graph file. Used for debugging purposes.

I would say that it is 'used' for testing, and is 'useful' (or 'can be
used') for debugging purposes.

> +
>  
>  EXAMPLES
>  
> @@ -45,6 +51,12 @@ EXAMPLES
>  $ git commit-graph write
>  
>  
> +* Read basic information from the commit-graph file.
> ++
> +
> +$ git commit-graph read
> +

I would personally prefer to have example output together with example
calling convention.

> +
>  
>  GIT
>  ---
> diff --git a/builtin/commit-graph.c b/builtin/commit-graph.c
> index 26b6360289..efd39331d7 100644
> --- a/builtin/commit-graph.c
> +++ b/builtin/commit-graph.c
> @@ -7,10 +7,16 @@
>  
>  static char const * const builtin_commit_graph_usage[] = {
>   N_("git commit-graph [--object-dir ]"),
> + N_("git commit-graph read [--object-dir ]"),
>   N_("git commit-graph write [--object-dir ]"),
>   NULL
>  };
>  
> +static const char * const builtin_commit_graph_read_usage[] = {
> + N_("git commit-graph read [--object-dir ]"),
> + NULL
> +};
> +
>  static const char * const builtin_commit_graph_write_usage[] = {
>   N_("git commit-graph write [--object-dir ]"),
>   NULL
> @@ -20,6 +26,54 @@ static struct opts_commit_graph {
>   const char *obj_dir;
>  } opts;
>  
> +static int graph_read(int argc, const char **argv)
> +{
> + struct commit_graph *graph = NULL;
> + char *graph_name;
> +
> + static struct option builtin_commit_graph_read_options[] = {
> + OPT_STRING(0, "object-dir", _dir,
> + N_("dir"),
> + N_("The object directory to store the graph")),

Actually it is not the object directory to store the graph, but it is
the object directory to read the commit-graph file from.

> + OPT_END(),
> + };
> +
> + argc = parse_options(argc, argv, NULL,
> +  builtin_commit_graph_read_options,
> +  builtin_commit_graph_read_usage, 0);
> +
> + if (!opts.obj_dir)
> + opts.obj_dir = get_object_directory();
> +
> + graph_name = get_commit_graph_filename(opts.obj_dir);
> + graph = load_commit_graph_one(graph_name);
> +
> + if (!graph)
> + die("graph file %s does not exist", graph_name);

It might be better to use single quotes around '%s'; this is absolute
pathname (if I understand it correctly), and it may contain spaces in
it.

> + FREE_AND_NULL(graph_name);
> +
> + printf("header: %08x %d %d %d %d\n",

Wouldn't it be better to print signature 

Re: Feature Request: Add diff.word-diff and perhaps diff.word-diff-regex configuration options to enable always using word-diffs in git diff

2018-04-14 Thread Ævar Arnfjörð Bjarmason

On Sat, Apr 14 2018, Doron Behar wrote:

> I've just came across the wonderful command line option for `git diff`:
> `--word-diff`. It could be great to have a configuration option that
> will enable this feature by default when running `git diff`.

Do you know you can do:

git config --global alias.wdiff "diff --word-diff"

?


Feature Request: Add diff.word-diff and perhaps diff.word-diff-regex configuration options to enable always using word-diffs in git diff

2018-04-14 Thread Doron Behar
I've just came across the wonderful command line option for `git diff`:
`--word-diff`. It could be great to have a configuration option that
will enable this feature by default when running `git diff`.


signature.asc
Description: PGP signature


Re: [PATCH 8/8] gpg-interface: handle alternative signature types

2018-04-14 Thread brian m. carlson
On Tue, Apr 10, 2018 at 04:24:27AM -0400, Eric Sunshine wrote:
> How confident are we that _all_ possible signing programs will conform
> to the "-BEGIN %s-" pattern? If we're not confident, then
> perhaps the user should be providing the full string here, not just
> the '%s' part?

This is not likely to be true of other signing schemes.  In fact, other
than OpenPGP, PEM, and CMS (S/MIME), this is probably not true at all.
I know OpenBSD's signify has no wrappers (except a mandatory "untrusted
comment:" line at the beginning).  There wouldn't be a way to match such
a signature unless we implemented prefix or regex support.

It's currently possible to hack other signatures in with wrappers if
they wrap the actual signature in OpenPGP-like armor; someone (I believe
Eric Wong) has gotten this to work with signify.  I only mention signify
because other than OpenPGP and CMS, it's the only scheme I've seen
people use with Git.
-- 
brian m. carlson: Houston, Texas, US
OpenPGP: https://keybase.io/bk2204


signature.asc
Description: PGP signature


Re: [PATCH 01/15] read-cache.c: make $GIT_TEST_SPLIT_INDEX boolean

2018-04-14 Thread Ævar Arnfjörð Bjarmason

On Sat, Apr 14 2018, Nguyễn Thái Ngọc Duy wrote:

> While at there, document about this special mode when running the test
> suite.

This whole series looks good to me, and I see this patch addressed the
confusing test env variable situation I pointed out in v8
(https://public-inbox.org/git/87d0zkw8r9@evledraar.gmail.com/),
thanks!


Re: [PATCH 0/7] nd/repack-keep-pack update

2018-04-14 Thread Ævar Arnfjörð Bjarmason

On Sat, Apr 14 2018, Nguyễn Thái Ngọc Duy wrote:

> This is basically a resend from the last round but rebased on
> 'master'. The old base results in some conflicts with packfile and oid
> conversion series. This one should reduce merge conflicts on 'pu' to
> trivial ones.

Thanks. I've been running this at work and as noted in
https://public-inbox.org/git/87vadpxv27@evledraar.gmail.com/ it's
had big performance impact to the better, users even started noticing it
(they'd previously get noticeable slowdowns while doing other task on
GC).

I also tried to see just how much worse this was making performance, my
hunch was that the difference should be trivial but noticeable since
we'll produce a less efficient pack.

What I found was the opposite, under real-world conditions it seems to
be making things 1-2% better on common git operations, which I suspect
is because once we've done a few pulls and coalesced those into their
own pack(s) there's more cache locality for the data we're actually
looking at.

I.e. once you've got a repo has a big pack you're not touching, and a
few weeks of updates from upstream that you've coalesced into another
pack there's a higher density of stuff you care about near HEAD per FS
page in the recent smaller pack, which if you're pressed for memory and
parts of your pack are getting paged out of the FS cache is a win. I
haven't confirmed that, it's just a hypothesis.

The only (trivial) issue I found in the patches themselves was that
between 4/5 and 5/7 you're adding an empty line to config.txt in 4/7
just to erase it in 5/7, better not to add it to begin with, but
hopefully Junio can fix that up (if he cares).


Re: [RFC PATCH] tests: fix PATH for GIT_TEST_INSTALLED tests

2018-04-14 Thread Guillaume Maudoux (Layus)
Hi Joannnes,

Le 14 avril 2018 14:30:38 UTC+02:00, Johannes Schindelin 
 a écrit :
>Hi Guillaume,
>
>On Sat, 14 Apr 2018, Guillaume Maudoux wrote:
>
>> From: Guillaume Maudoux 
>> 
>> When running tests on an existing git installation with
>> GIT_TEST_INSTALLED (as described in t/README), the test helpers are
>> missing in the PATH.
>> 
>> This fixes the test suite in a way that allows all the tests to pass.
>> 
>> Signed-off-by: Guillaume Maudoux 
>> ---
>> 
>> This is more a bug report than a real patch. The issue is described
>> above and this patch does solve it. I however think that someone with
>> more knowledge should refactor all that chunck of code that was last
>> changed in 2010.
>> 
>> In particular, it seems that the GIT_TEST_INSTALLED path does not use
>> bin-wrappers at all. This may imply that --with-dashes also breaks
>> tests.
>> 
>>  t/test-lib.sh | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>> 
>> diff --git t/test-lib.sh t/test-lib.sh
>> index 7740d511d..0d51261f7 100644
>> --- t/test-lib.sh
>> +++ t/test-lib.sh
>> @@ -923,7 +923,7 @@ elif test -n "$GIT_TEST_INSTALLED"
>>  then
>>  GIT_EXEC_PATH=$($GIT_TEST_INSTALLED/git --exec-path)  ||
>>  error "Cannot run git from $GIT_TEST_INSTALLED."
>> -PATH=$GIT_TEST_INSTALLED:$GIT_BUILD_DIR:$PATH
>>
>+  PATH=$GIT_TEST_INSTALLED:$GIT_BUILD_DIR/t/helper:$GIT_BUILD_DIR:$PATH
>>  GIT_EXEC_PATH=${GIT_TEST_EXEC_PATH:-$GIT_EXEC_PATH}
>>  else # normal case, use ../bin-wrappers only unless $with_dashes:
>>  git_bin_dir="$GIT_BUILD_DIR/bin-wrappers"
>
>This is essentially identical to what we have in
>
>http://github.com/git-for-windows/git/commit/e408b7517d
>
>So: ACK.
>
>You might also want to go a bit further and let the test suite run with
>GIT_TEST_INSTALLED when Git has not actually be built, but only the
>test
>helpers. I started something along those lines here:
>
>http://github.com/git-for-windows/git/commit/a80f047abc5
>
>I always meant to come back to polish those patches and submit them to
>the
>Git mailing list, so: thank you for getting the ball rolling.
>
>FWIW my use case is that I want to test a "MinGit" package, i.e. a
>subset
>of Git for Windows intended to cater to third-party applications
>requiring
>Git functionality (but not requiring any interactive parts of it).
>
>What is your use case?
>
>Ciao,
>Johannes

My use case is packaging git in Nix. We were bitten last week by the 2.17.0 
update that modified default perl paths, and we broke the package without 
warning. See https://github.com/NixOS/nixpkgs/pull/38924 for the install tests 
and https://github.com/NixOS/nixpkgs/pull/38636#comment-380712557 for an idea 
of the mess we created...

I first tried to run the tests, but it turns out to be pretty uninformative 
because everything is setup properly in the build environment. Plus you do not 
expect meaningful test failures on a released version. What I really want is 
testing the final, installed version with all the weird wrapping, hacking and 
tuning that Nix requires.

As a packager, it would be nice to have dedicated install tests (integration 
tests?) that check the setup of perl path or the ability for got to find 
libexec executables without testing internal features. But we cannot require 
that from every project.

Running `make clean` before running the install test would be a good idea, 
because I have the feeling that the current implementation of 
GIT_TEST_INSTALLED relies a bit too much on build products in the source tree. 
Or at least that tests are not clearly separated from the build.

I will look at your patches and would be happy to review anything that you deem 
ready to upstream.

Cheers,

Guillaume, aka layus.


[PATCH v4 2/4] Makefile: detect compiler and enable more warnings in DEVELOPER=1

2018-04-14 Thread Ævar Arnfjörð Bjarmason
From: Nguyễn Thái Ngọc Duy 

The set of extra warnings we enable when DEVELOPER has to be
conservative because we can't assume any compiler version the
developer may use. Detect the compiler version so we know when it's
safe to enable -Wextra and maybe more.

These warning settings are mostly from my custom config.mak a long
time ago when I tried to enable as many warnings as possible that can
still build without showing warnings. Some of those warnings are
probably worth fixing instead of just suppressing in future.

Helped-by: Jeff King 
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Makefile| 15 +-
 config.mak.dev  | 38 +++
 detect-compiler | 53 +
 3 files changed, 96 insertions(+), 10 deletions(-)
 create mode 100644 config.mak.dev
 create mode 100755 detect-compiler

diff --git a/Makefile b/Makefile
index f181687250..3038c78325 100644
--- a/Makefile
+++ b/Makefile
@@ -441,6 +441,10 @@ all::
 #
 # When cross-compiling, define HOST_CPU as the canonical name of the CPU on
 # which the built Git will run (for instance "x86_64").
+#
+# Define DEVELOPER to enable more compiler warnings. Compiler version
+# and family are auto detected, but could be overridden by defining
+# COMPILER_FEATURES (see config.mak.dev)
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
@@ -449,15 +453,6 @@ GIT-VERSION-FILE: FORCE
 # CFLAGS and LDFLAGS are for the users to override from the command line.
 
 CFLAGS = -g -O2 -Wall
-DEVELOPER_CFLAGS = -Werror \
-   -Wdeclaration-after-statement \
-   -Wno-format-zero-length \
-   -Wold-style-definition \
-   -Woverflow \
-   -Wpointer-arith \
-   -Wstrict-prototypes \
-   -Wunused \
-   -Wvla
 LDFLAGS =
 ALL_CFLAGS = $(CPPFLAGS) $(CFLAGS)
 ALL_LDFLAGS = $(LDFLAGS)
@@ -1062,7 +1057,7 @@ include config.mak.uname
 -include config.mak
 
 ifdef DEVELOPER
-CFLAGS += $(DEVELOPER_CFLAGS)
+include config.mak.dev
 endif
 
 comma := ,
diff --git a/config.mak.dev b/config.mak.dev
new file mode 100644
index 00..716a14ecc7
--- /dev/null
+++ b/config.mak.dev
@@ -0,0 +1,38 @@
+CFLAGS += -Werror
+CFLAGS += -Wdeclaration-after-statement
+CFLAGS += -Wno-format-zero-length
+CFLAGS += -Wold-style-definition
+CFLAGS += -Woverflow
+CFLAGS += -Wpointer-arith
+CFLAGS += -Wstrict-prototypes
+CFLAGS += -Wunused
+CFLAGS += -Wvla
+
+ifndef COMPILER_FEATURES
+COMPILER_FEATURES := $(shell ./detect-compiler $(CC))
+endif
+
+ifneq ($(filter clang4,$(COMPILER_FEATURES)),)
+CFLAGS += -Wtautological-constant-out-of-range-compare
+endif
+
+ifneq ($(or $(filter gcc6,$(COMPILER_FEATURES)),$(filter 
clang4,$(COMPILER_FEATURES))),)
+CFLAGS += -Wextra
+# if a function is public, there should be a prototype and the right
+# header file should be included. If not, it should be static.
+CFLAGS += -Wmissing-prototypes
+# These are disabled because we have these all over the place.
+CFLAGS += -Wno-empty-body
+CFLAGS += -Wno-missing-field-initializers
+CFLAGS += -Wno-sign-compare
+CFLAGS += -Wno-unused-function
+CFLAGS += -Wno-unused-parameter
+endif
+
+# uninitialized warnings on gcc 4.9.2 in xdiff/xdiffi.c and config.c
+# not worth fixing since newer compilers correctly stop complaining
+ifneq ($(filter gcc4,$(COMPILER_FEATURES)),)
+ifeq ($(filter gcc5,$(COMPILER_FEATURES)),)
+CFLAGS += -Wno-uninitialized
+endif
+endif
diff --git a/detect-compiler b/detect-compiler
new file mode 100755
index 00..70b754481c
--- /dev/null
+++ b/detect-compiler
@@ -0,0 +1,53 @@
+#!/bin/sh
+#
+# Probe the compiler for vintage, version, etc. This is used for setting
+# optional make knobs under the DEVELOPER knob.
+
+CC="$*"
+
+# we get something like (this is at least true for gcc and clang)
+#
+# FreeBSD clang version 3.4.1 (tags/RELEASE...)
+get_version_line() {
+   $CC -v 2>&1 | grep ' version '
+}
+
+get_family() {
+   get_version_line | sed 's/^\(.*\) version [0-9][^ ]* .*/\1/'
+}
+
+get_version() {
+   get_version_line | sed 's/^.* version \([0-9][^ ]*\) .*/\1/'
+}
+
+print_flags() {
+   family=$1
+   version=$(get_version | cut -f 1 -d .)
+
+   # Print a feature flag not only for the current version, but also
+   # for any prior versions we encompass. This avoids needing to do
+   # numeric comparisons in make, which are awkward.
+   while test "$version" -gt 0
+   do
+   echo $family$version
+   version=$((version - 1))
+   done
+}
+
+case "$(get_family)" in
+gcc)
+   print_flags gcc
+   ;;
+clang)
+   print_flags clang
+   ;;
+"FreeBSD clang")
+   print_flags clang
+   ;;
+"Apple LLVM")
+   print_flags clang
+   ;;
+*)
+   : unknown compiler family
+   ;;
+esac
-- 
2.17.0.290.gded63e768a



[PATCH v4 4/4] Makefile: add a DEVOPTS to get all of -Wextra

2018-04-14 Thread Ævar Arnfjörð Bjarmason
Change DEVOPTS to understand a "extra-all" option. When the DEVELOPER
flag is enabled we turn on -Wextra, but manually switch some of the
warnings it turns on off.

This is because we have many existing occurrences of them in the code
base. This mode will stop the suppression, let the developer see and
decide whether to  fix them.

This change is a slight alteration of Nguyễn Thái Ngọc Duy
EAGER_DEVELOPER mode patch[1]

1. "[PATCH v3 3/3] Makefile: add EAGER_DEVELOPER
mode" (<20180329150322.10722-4-pclo...@gmail.com>;
https://public-inbox.org/git/20180329150322.10722-4-pclo...@gmail.com/)

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile   | 6 ++
 config.mak.dev | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index 9a57495fae..47ddc85aae 100644
--- a/Makefile
+++ b/Makefile
@@ -455,6 +455,12 @@ all::
 #suppresses the -Werror that implicitly comes with
 #DEVELOPER=1. Useful for getting the full set of errors
 #without immediately dying, or for logging them.
+#
+#extra-all:
+#
+#The DEVELOPER mode enables -Wextra with a few exceptions. By
+#setting this flag the exceptions are removed, and all of
+#-Wextra is used.
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
diff --git a/config.mak.dev b/config.mak.dev
index 7a54426d78..2d244ca470 100644
--- a/config.mak.dev
+++ b/config.mak.dev
@@ -23,6 +23,7 @@ CFLAGS += -Wextra
 # if a function is public, there should be a prototype and the right
 # header file should be included. If not, it should be static.
 CFLAGS += -Wmissing-prototypes
+ifeq ($(filter extra-all,$(DEVOPTS)),)
 # These are disabled because we have these all over the place.
 CFLAGS += -Wno-empty-body
 CFLAGS += -Wno-missing-field-initializers
@@ -30,6 +31,7 @@ CFLAGS += -Wno-sign-compare
 CFLAGS += -Wno-unused-function
 CFLAGS += -Wno-unused-parameter
 endif
+endif
 
 # uninitialized warnings on gcc 4.9.2 in xdiff/xdiffi.c and config.c
 # not worth fixing since newer compilers correctly stop complaining
-- 
2.17.0.290.gded63e768a



[PATCH v4 3/4] Makefile: add a DEVOPTS to suppress -Werror under DEVELOPER

2018-04-14 Thread Ævar Arnfjörð Bjarmason
Add a DEVOPTS variable that'll be used to tweak the behavior of
DEVELOPER.

I've long wanted to use DEVELOPER=1 in my production builds, but on
some old systems I still get warnings, and thus the build would
fail. However if the build/tests fail for some other reason, it would
still be useful to scroll up and see what the relevant code is warning
about.

This change allows for that. Now setting DEVELOPER will set -Werror as
before, but if DEVOPTS=no-error is provided is set you'll get the same
warnings, but without -Werror.

Helped-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Makefile   | 10 ++
 config.mak.dev |  2 ++
 2 files changed, 12 insertions(+)

diff --git a/Makefile b/Makefile
index 3038c78325..9a57495fae 100644
--- a/Makefile
+++ b/Makefile
@@ -445,6 +445,16 @@ all::
 # Define DEVELOPER to enable more compiler warnings. Compiler version
 # and family are auto detected, but could be overridden by defining
 # COMPILER_FEATURES (see config.mak.dev)
+#
+# When DEVELOPER is set, DEVOPTS can be used to control compiler
+# options.  This variable contains keywords separated by
+# whitespace. The following keywords are are recognized:
+#
+#no-error:
+#
+#suppresses the -Werror that implicitly comes with
+#DEVELOPER=1. Useful for getting the full set of errors
+#without immediately dying, or for logging them.
 
 GIT-VERSION-FILE: FORCE
@$(SHELL_PATH) ./GIT-VERSION-GEN
diff --git a/config.mak.dev b/config.mak.dev
index 716a14ecc7..7a54426d78 100644
--- a/config.mak.dev
+++ b/config.mak.dev
@@ -1,4 +1,6 @@
+ifeq ($(filter no-error,$(DEVOPTS)),)
 CFLAGS += -Werror
+endif
 CFLAGS += -Wdeclaration-after-statement
 CFLAGS += -Wno-format-zero-length
 CFLAGS += -Wold-style-definition
-- 
2.17.0.290.gded63e768a



[PATCH v4 1/4] connect.c: mark die_initial_contact() NORETURN

2018-04-14 Thread Ævar Arnfjörð Bjarmason
From: Nguyễn Thái Ngọc Duy 

There is a series running in parallel with this one that adds code
like this

switch (...) {
case ...:
die_initial_contact();
case ...:

There is nothing wrong with this. There is no actual falling
through. But since gcc is not that smart and gcc 7.x introduces
-Wimplicit-fallthrough, it raises a false alarm in this case.

This class of warnings may be useful elsewhere, so instead of
suppressing the whole class, let's try to fix just this code. gcc is
smart enough to realize that no execution can continue after a
NORETURN function call and no longer raises the warning.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 connect.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/connect.c b/connect.c
index c3a014c5ba..49eca46462 100644
--- a/connect.c
+++ b/connect.c
@@ -46,7 +46,7 @@ int check_ref_type(const struct ref *ref, int flags)
return check_ref(ref->name, flags);
 }
 
-static void die_initial_contact(int unexpected)
+static NORETURN void die_initial_contact(int unexpected)
 {
if (unexpected)
die(_("The remote end hung up upon initial contact"));
-- 
2.17.0.290.gded63e768a



[PATCH v4 0/4] Make DEVELOPER more more flexible with DEVOPTS

2018-04-14 Thread Ævar Arnfjörð Bjarmason
This is a v4 and replacement of gitster/nd/warn-more-for-devs. I'm
sending this with Duy's blessing.

The first two patches are the same, except for one trivial
s/faimily/family/ typo fix.

The third patch in gitster/nd/warn-more-for-devs ("Makefile: add
EAGER_DEVELOPER mode") is gone, instead there's now a DEVOPTS
option. The 3/4 and 4/4 add a way to turn off -Werror & the -Wextra
suppression, respectively.

Duy was right in [1] that this is a much better and extensible way of
doing this than my "Makefile: untangle DEVELOPER and -Werror" patch.

Most of 3/4 & 4/4 are just tweaked from g...@github.com:pclouds/git.git
pclouds/more-warnings and combined with my previous 4/3 patch[2].

I changed the "no-suppression" name in Duy's WIP patch to "extra-all",
and "gentle" to "no-error". I think those are cleare,r and leave
things more open to future expansion, e.g. if we'd like pedantic-all.

1. 
https://public-inbox.org/git/cacsjy8cyb0igy365nmkswsgai9_rf+xbomqyj7xw6iqxqic...@mail.gmail.com/
2. https://public-inbox.org/git/20180331164009.2264-1-ava...@gmail.com/

Nguyễn Thái Ngọc Duy (2):
  connect.c: mark die_initial_contact() NORETURN
  Makefile: detect compiler and enable more warnings in DEVELOPER=1

Ævar Arnfjörð Bjarmason (2):
  Makefile: add a DEVOPTS to suppress -Werror under DEVELOPER
  Makefile: add a DEVOPTS to get all of -Wextra

 Makefile| 31 +++--
 config.mak.dev  | 42 +++
 connect.c   |  2 +-
 detect-compiler | 53 +
 4 files changed, 117 insertions(+), 11 deletions(-)
 create mode 100644 config.mak.dev
 create mode 100755 detect-compiler

-- 
2.17.0.290.gded63e768a



Re: [PATCH v8 09/14] commit-graph: add core.commitGraph setting

2018-04-14 Thread Jakub Narebski
Derrick Stolee  writes:

> From: Derrick Stolee 
>
> The commit graph feature is controlled by the new core.commitGraph config
> setting. This defaults to 0, so the feature is opt-in.
>
> The intention of core.commitGraph is that a user can always stop checking
> for or parsing commit graph files if core.commitGraph=0.

This is bool-valued setting, so the commit message should talk about
'true' and 'false', not '1' or '0'.

> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/config.txt | 4 
>  cache.h  | 1 +
>  config.c | 5 +
>  environment.c| 1 +
>  4 files changed, 11 insertions(+)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 4e0cff87f6..e5c7013fb0 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -898,6 +898,10 @@ core.notesRef::
>  This setting defaults to "refs/notes/commits", and it can be overridden by
>  the `GIT_NOTES_REF` environment variable.  See linkgit:git-notes[1].
>  
> +core.commitGraph::
> + Enable git commit graph feature. Allows reading from the
> + commit-graph file.
> +

This is a very minimal description of this config variable.  In my
opionion it lack two things:

1. The reference to the documentation where one can read more, for
   example linkgit:git-commit-graph[1] manpage, like in the description
   of core.sparseCheckout feature described below.

2. The information about restrictions, for example something like the
   following:

  "The feature does not work for shallow clones, neither when
  `git-replace` or grafts file are used."

   Perhaps even with references to each commit-graph disabling feature.

When the feature matures, and it will be on by default (well, the
reading part at least), it would probably acquire wording similar to the
one for `pack.useBitmaps` config option[1], isn't it?

[1]: https://git-scm.com/docs/git-config#git-config-packuseBitmaps

>  core.sparseCheckout::
>   Enable "sparse checkout" feature. See section "Sparse checkout" in
>   linkgit:git-read-tree[1] for more information.
> diff --git a/cache.h b/cache.h
> index a61b2d3f0d..8bdbcbbbf7 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -805,6 +805,7 @@ extern char *git_replace_ref_base;
>  
>  extern int fsync_object_files;
>  extern int core_preload_index;
> +extern int core_commit_graph;
>  extern int core_apply_sparse_checkout;
>  extern int precomposed_unicode;
>  extern int protect_hfs;
> diff --git a/config.c b/config.c
> index b0c20e6cb8..25ee4a676c 100644
> --- a/config.c
> +++ b/config.c
> @@ -1226,6 +1226,11 @@ static int git_default_core_config(const char *var, 
> const char *value)
>   return 0;
>   }
>  
> + if (!strcmp(var, "core.commitgraph")) {
> + core_commit_graph = git_config_bool(var, value);
> + return 0;
> + }
> +
>   if (!strcmp(var, "core.sparsecheckout")) {
>   core_apply_sparse_checkout = git_config_bool(var, value);
>   return 0;
> diff --git a/environment.c b/environment.c
> index d6dd64662c..8853e2f0dd 100644
> --- a/environment.c
> +++ b/environment.c
> @@ -62,6 +62,7 @@ enum push_default_type push_default = 
> PUSH_DEFAULT_UNSPECIFIED;
>  enum object_creation_mode object_creation_mode = OBJECT_CREATION_MODE;
>  char *notes_ref_name;
>  int grafts_replace_parents = 1;
> +int core_commit_graph;
>  int core_apply_sparse_checkout;
>  int merge_log_config = -1;
>  int precomposed_unicode = -1; /* see probe_utf8_pathname_composition() */

So this is just a config variable handling.  Nicely short patch to
review; the code part looks good to me.

-- 
Jakub Narębski


Re: [PATCH 0/6] Compute and consume generation numbers

2018-04-14 Thread Jakub Narebski
Derrick Stolee  writes:
> On 4/11/2018 3:32 PM, Jakub Narebski wrote:

>> What would you suggest as a good test that could imply performance? The
>> Google Colab notebook linked to above includes a function to count
>> number of commits (nodes / vertices in the commit graph) walked,
>> currently in the worst case scenario.
>
> The two main questions to consider are:
>
> 1. Can X reach Y?

That is easy to do.  The function generic_is_reachable() does
that... though using direct translation of the pseudocode for
"Algorithm 3: Reachable" from FELINE paper, which is recursive and
doesn't check if vertex was already visited was not good idea for large
graphs such as Linux kernel commit graph, oops.  That is why
generic_is_reachable_large() was created.

> 2. What is the set of merge-bases between X and Y?

I don't have an algorithm for that in the Google Colaboratory notebook.
Though I see that there exist algorithms for calculating lowest common
ancestors in DAGs...

I'll have to take a look how Git does that.

>
> And the thing to measure is a commit count. If possible, it would be
> good to count commits walked (commits whose parent list is enumerated)
> and commits inspected (commits that were listed as a parent of some
> walked commit). Walked commits require a commit parse -- albeit from
> the commit-graph instead of the ODB now -- while inspected commits
> only check the in-memory cache.

I don't quite see the distinction.  Whether we access generation number
of a commit (information about level of vertex in graph), or a parent
list (vertex successors / neighbours), it both needs accessing
commit-graph; well, accessing parents may be more costly for octopus
merges (due to having to go through EDGE chunk).

I can easily return the set of visited commits (vertices), or just size
of said set.

>
> For git.git and Linux, I like to use the release tags as tests. They
> provide a realistic view of the linear history, and maintenance
> releases have their own history from the major releases.

Hmmm... testing for v4.9-rc5..v4.9 in Linux kernel commit graphs, the
FELINE index does not bring any improvements over using just level
(generation number) filter.  But that may be caused by narrowing od
commit DAG around releases.

I try do do the same between commits in wide part, with many commits
with the same level (same generation number) both for source and for
target commit.  Though this may be unfair to level filter, though...


Note however that FELINE index is not unabiguous, like generation
numbers are (modulo decision whether to start at 0 or at 1); it depends
on the topological ordering chosen for the X elements.

>> I have tried finding number of false positives for level (generation
>> number) filter and for FELINE index, and number of false negatives for
>> min-post intervals in the spanning tree (for DFS tree) for 1
>> randomly selected pairs of commits... but I don't think this is a good
>> benchmark.
>
> What is a false-positive? A case where gen(X) < gen(Y) but Y cannot
> reach X?

Yes.  (And equivalent for FELINE index, which is a pair of integers).

> I do not think that is a great benchmark, but I guess it is
> something to measure.

I have simply used it to have something to compare.

>> I Linux kernel sources 
>> (https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git)
>> that has 750832 nodes and 811733 edges, and 563747941392 possible
>> directed pairs, we have for 1 randomly selected pairs of commits:
>>
>>level-filter has91 =  0.91% [all] false positives
>>FELINE index has78 =  0.78% [all] false positives
>>FELINE index has 1.16667 less false positives than level filter
>>
>>min-post spanning-tree intervals has  3641 = 36.41% [all] false
>>negatives
>
> Perhaps something you can do instead of sampling from N^2 commits in
> total is to select a pair of generations (say, G = 2, G' = 20100)
> or regions of generations ( 2 <= G <= 20050, 20100 <= G' <= 20150)
> and see how many false positives you see by testing all pairs (one
> from each level). The delta between the generations may need to be
> smaller to actually have a large proportion of unreachable pairs. Try
> different levels, since major version releases tend to "pinch" the
> commit graph to a common history.

That's a good idea.

>> For git.git repository (https://github.com/git/git.git) that has 52950
>> nodes and 65887 edges the numbers are slighly more in FELINE index
>> favor (also out of 1 random pairs):
>>
>>level-filter has   504 =  9.11% false positives
>>FELINE index has   125 =  2.26% false positives
>>FELINE index has 4.032 less false positives than level filter
>>
>> This is for FELINE which does not use level / generatio-numbers filter.

Best,
-- 
Jakub Narębski


Re: Need help debugging issue in git

2018-04-14 Thread Robert Dailey
On Mon, Apr 2, 2018 at 1:53 AM, Johannes Sixt  wrote:
> Am 02.04.2018 um 02:36 schrieb Robert Dailey:
>>
>> I'm struggling with a bug that I found introduced in git v2.13.2. The
>> bug was not reproducible in v2.13.1. The issue is that using arguments
>> like "@{-1}" to aliases causes those curly braces to be removed, so
>> once the command is executed after alias processing the argument looks
>> like "@-1". This breaks any aliases you have that wrap `git log` and
>> such. I originally opened the bug on the Git for Windows project
>> (since I use Git mostly on Windows):
>>
>> https://github.com/git-for-windows/git/issues/1220
>
> ...
>>
>> Here is the alias being used for a test:
>>
>> [alias]
>>  lgtest = !git log --oneline \"$@\"
>>
>> And here is the command I invoke for the test:
>>
>> $ git lgtest @{-1}
>>
>> I should get logs for the previously-checked-out branch.
>>
>> When `prepare_shell_cmd()` is called in run-command.c, it gets expanded
>> like so:
>>
>> + [0] "sh" const char *
>> + [1] "-c" const char *
>> + [2] "git log --oneline \"$@\" \"$@\"" const char *
>> + [3] "git log --oneline \"$@\"" const char *
>> + [4] "@{-1}" const char *
>>
>> With my modifications (again, patch inline below) I get this:
>>
>> + [0] "sh" const char *
>> + [1] "-c" const char *
>> + [2] "git log --oneline \"$@\"" const char *
>> + [3] "@{-1}" const char *
>>
>> The second version looks much better.
>
>
> But this is wrong. Try this on the command line:
>
>   sh -c 'echo "$@"' a b c
>
> Notice how this prints only 'b c', not 'a b c'. The reason is that the
> argument 'a' is treated like a "script" name, i.e. what you get for "$0",
> and 'b' and 'c' as the actual arguments to the "script".
>
> That is, you must fill in some dummy "script" name at slot [3], and
> run_command chooses to put the alias text there.
>
>> I think the constant nesting of
>> commands inside each other that the first version does is somehow
>> causing curly braces to be removed. I don't understand enough about
>> shell processing to know why it would do this.
>
>
> Some shells expand the curly braces. They must get lost somewhere by one of
> the two shell invocations that happen on the way.
>
> BTW, you don't happen to have a file named '@-1' in your directory, most
> likely by accident?

Thanks for your help. I checked for @-1 but I do not have a file with
that name (good catch though). I contacted the MinGW mailing list and
they seem to indicate that {-1} is a valid brace expansion. I was able
to verify the git.exe code itself is not causing this problem. It
seems to be GNU bash doing it. But oddly enough, the Ubuntu version of
Bash for example does not process {-1} as a brace expansion. It seems
weird to me that Git uses a syntax for a portion of its revision
specification that could be ambiguously treated as syntax processed by
Bash. In other words, I feel like this would have been designed into
Git years ago, so I'm not sure why this is a problem now all of a
sudden.

Their suggested solution was to start quoting items in the list or
escaping the braces, but that will make git revisions less intuitive
to use and make commands more tedious to type. I am still discussing
things over there but for the purposes of the Git mailing list, I
think it's clear at this point this is not an issue with Git itself.
Having said that, thanks again for the help!!


Re: [PATCH v1 0/2] fsexcludes: Add programmatic way to exclude files

2018-04-14 Thread Duy Nguyen
On Tue, Apr 10, 2018 at 11:04 PM, Ben Peart  wrote:
> In git repos with large working directories an external file system monitor
> (like fsmonitor or gvfs) can track what files in the working directory have 
> been
> modified.  This information can be used to speed up git operations that scale
> based on the size of the working directory so that they become O(# of modified
> files) vs O(# of files in the working directory).
>
> The fsmonitor patch series added logic to limit what files git had to stat() 
> to
> the set of modified files provided by the fsmonitor hook proc.  It also used 
> the
> untracked cache (if enabled) to limit the files/folders git had to scan 
> looking
> for new/untracked files.  GVFS is another external file system model that also
> speeds up git working directory based operations that has been using a 
> different
> mechanism (programmatically generating an excludes file) to enable git to be
> O(# of modified files).
>
> This patch series will introduce a new way to limit git�s traversal of the
> working directory that does not require the untracked cache (fsmonitor) or 
> using
> the excludes feature (GVFS).  It does this by enhancing the existing excludes
> logic in dir.c to support a new �File System Excludes� or fsexcludes API that 
> is
> better tuned to these programmatic applications.

I have not had a chance to really look at the patches yet but I think
these three paragraphs should somehow be included in the commit
description of 1/2 (or spread out between 1/2 and 2/2). 1/2
description for example briefly talks about how to use the new thing,
but not really tell what it's for, why you need to add it.
-- 
Duy


Re: [PATCH/RFC 0/5] Keep all info in command-list.txt in git binary

2018-04-14 Thread Duy Nguyen
On Thu, Apr 12, 2018 at 12:06 AM, Philip Oakley  wrote:
> I'm only just catching up, but does/can this series also capture the
> non-command guides that are available in git so that the 'git help -g' can
> begin to list them all?

It currently does not. But I don't see why it should not. This should
allow git.txt to list all the guides too, for people who skip "git
help" and go hard core mode with "man git". Thanks for bringing this
up.
-- 
Duy


[PATCH 06/15] pack-objects: move in_pack_pos out of struct object_entry

2018-04-14 Thread Nguyễn Thái Ngọc Duy
This field is only need for pack-bitmap, which is an optional
feature. Move it to a separate array that is only allocated when
pack-bitmap is used (like objects[], it is not freed, since we need it
until the end of the process)

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c |  3 ++-
 pack-bitmap-write.c|  8 +---
 pack-bitmap.c  |  2 +-
 pack-bitmap.h  |  4 +++-
 pack-objects.h | 16 +++-
 5 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index b231e80f17..d9c89e87cd 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -879,7 +879,8 @@ static void write_pack_file(void)
 
if (write_bitmap_index) {
bitmap_writer_set_checksum(oid.hash);
-   bitmap_writer_build_type_index(written_list, 
nr_written);
+   bitmap_writer_build_type_index(
+   _pack, written_list, nr_written);
}
 
finish_tmp_packfile(, pack_tmp_name,
diff --git a/pack-bitmap-write.c b/pack-bitmap-write.c
index 2df7b3e144..d707fc9ea2 100644
--- a/pack-bitmap-write.c
+++ b/pack-bitmap-write.c
@@ -48,7 +48,8 @@ void bitmap_writer_show_progress(int show)
 /**
  * Build the initial type index for the packfile
  */
-void bitmap_writer_build_type_index(struct pack_idx_entry **index,
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+   struct pack_idx_entry **index,
uint32_t index_nr)
 {
uint32_t i;
@@ -57,12 +58,13 @@ void bitmap_writer_build_type_index(struct pack_idx_entry 
**index,
writer.trees = ewah_new();
writer.blobs = ewah_new();
writer.tags = ewah_new();
+   ALLOC_ARRAY(to_pack->in_pack_pos, to_pack->nr_objects);
 
for (i = 0; i < index_nr; ++i) {
struct object_entry *entry = (struct object_entry *)index[i];
enum object_type real_type;
 
-   entry->in_pack_pos = i;
+   oe_set_in_pack_pos(to_pack, entry, i);
 
switch (oe_type(entry)) {
case OBJ_COMMIT:
@@ -146,7 +148,7 @@ static uint32_t find_object_pos(const unsigned char *sha1)
"(object %s is missing)", sha1_to_hex(sha1));
}
 
-   return entry->in_pack_pos;
+   return oe_in_pack_pos(writer.to_pack, entry);
 }
 
 static void show_object(struct object *object, const char *name, void *data)
diff --git a/pack-bitmap.c b/pack-bitmap.c
index 3f2dab340f..c9e90d1bb5 100644
--- a/pack-bitmap.c
+++ b/pack-bitmap.c
@@ -1033,7 +1033,7 @@ int rebuild_existing_bitmaps(struct packing_data *mapping,
oe = packlist_find(mapping, sha1, NULL);
 
if (oe)
-   reposition[i] = oe->in_pack_pos + 1;
+   reposition[i] = oe_in_pack_pos(mapping, oe) + 1;
}
 
rebuild = bitmap_new();
diff --git a/pack-bitmap.h b/pack-bitmap.h
index 3742a00e14..5ded2f139a 100644
--- a/pack-bitmap.h
+++ b/pack-bitmap.h
@@ -44,7 +44,9 @@ int rebuild_existing_bitmaps(struct packing_data *mapping, 
khash_sha1 *reused_bi
 
 void bitmap_writer_show_progress(int show);
 void bitmap_writer_set_checksum(unsigned char *sha1);
-void bitmap_writer_build_type_index(struct pack_idx_entry **index, uint32_t 
index_nr);
+void bitmap_writer_build_type_index(struct packing_data *to_pack,
+   struct pack_idx_entry **index,
+   uint32_t index_nr);
 void bitmap_writer_reuse_bitmaps(struct packing_data *to_pack);
 void bitmap_writer_select_commits(struct commit **indexed_commits,
unsigned int indexed_commits_nr, int max_bitmaps);
diff --git a/pack-objects.h b/pack-objects.h
index cdce1648de..71ea992c3c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -79,7 +79,6 @@ struct object_entry {
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned type_valid:1;
uint32_t hash;  /* name hint hash */
-   unsigned int in_pack_pos;
unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
@@ -99,6 +98,8 @@ struct packing_data {
 
int32_t *index;
uint32_t index_size;
+
+   unsigned int *in_pack_pos;
 };
 
 struct object_entry *packlist_alloc(struct packing_data *pdata,
@@ -144,4 +145,17 @@ static inline void oe_set_type(struct object_entry *e,
e->type_ = (unsigned)type;
 }
 
+static inline unsigned int oe_in_pack_pos(const struct packing_data *pack,
+ const struct object_entry *e)
+{
+   return pack->in_pack_pos[e - pack->objects];
+}
+
+static inline void oe_set_in_pack_pos(const struct 

[PATCH 11/15] pack-objects: clarify the use of object_entry::size

2018-04-14 Thread Nguyễn Thái Ngọc Duy
While this field most of the time contains the canonical object size,
there is one case it does not: when we have found that the base object
of the delta in question is also to be packed, we will very happily
reuse the delta by copying it over instead of regenerating the new
delta.

"size" in this case will record the delta size, not canonical object
size. Later on in write_reuse_object(), we reconstruct the delta
header and "size" is used for this purpose. When this happens, the
"type" field contains a delta type instead of a canonical type.
Highlight this in the code since it could be tricky to see.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 11 ---
 pack-objects.h |  4 +++-
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index e75693176e..779f14a45e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1418,6 +1418,7 @@ static void check_object(struct object_entry *entry)
off_t ofs;
unsigned char *buf, c;
enum object_type type;
+   unsigned long in_pack_size;
 
buf = use_pack(p, _curs, entry->in_pack_offset, );
 
@@ -1427,7 +1428,7 @@ static void check_object(struct object_entry *entry)
 */
used = unpack_object_header_buffer(buf, avail,
   ,
-  >size);
+  _pack_size);
if (used == 0)
goto give_up;
 
@@ -1444,6 +1445,7 @@ static void check_object(struct object_entry *entry)
default:
/* Not a delta hence we've already got all we need. */
oe_set_type(entry, entry->in_pack_type);
+   entry->size = in_pack_size;
entry->in_pack_header_size = used;
if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > 
OBJ_BLOB)
goto give_up;
@@ -1500,6 +1502,7 @@ static void check_object(struct object_entry *entry)
 * circular deltas.
 */
oe_set_type(entry, entry->in_pack_type);
+   entry->size = in_pack_size; /* delta size */
SET_DELTA(entry, base_entry);
entry->delta_size = entry->size;
entry->delta_sibling_idx = base_entry->delta_child_idx;
@@ -1509,13 +1512,15 @@ static void check_object(struct object_entry *entry)
}
 
if (oe_type(entry)) {
+   off_t delta_pos;
+
/*
 * This must be a delta and we already know what the
 * final object type is.  Let's extract the actual
 * object size from the delta header.
 */
-   entry->size = get_size_from_delta(p, _curs,
-   entry->in_pack_offset + 
entry->in_pack_header_size);
+   delta_pos = entry->in_pack_offset + 
entry->in_pack_header_size;
+   entry->size = get_size_from_delta(p, _curs, 
delta_pos);
if (entry->size == 0)
goto give_up;
unuse_pack(_curs);
diff --git a/pack-objects.h b/pack-objects.h
index 9d0391c173..e4ea6a350c 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -32,7 +32,9 @@ enum dfs_state {
  *
  * "size" is the uncompressed object size. Compressed size of the raw
  * data for an object in a pack is not stored anywhere but is computed
- * and made available when reverse .idx is made.
+ * and made available when reverse .idx is made. Note that when a
+ * delta is reused, "size" is the uncompressed _delta_ size, not the
+ * canonical one after the delta has been applied.
  *
  * "hash" contains a path name hash which is used for sorting the
  * delta list and also during delta searching. Once prepare_pack()
-- 
2.17.0.367.g5dd2e386c3



[PATCH 12/15] pack-objects: shrink size field in struct object_entry

2018-04-14 Thread Nguyễn Thái Ngọc Duy
It's very very rare that an uncompressed object is larger than 4GB
(partly because Git does not handle those large files very well to
begin with). Let's optimize it for the common case where object size
is smaller than this limit.

Shrink size field down to 31 bits and one overflow bit. If the size is
too large, we read it back from disk. As noted in the previous patch,
we need to return the delta size instead of canonical size when the
to-be-reused object entry type is a delta instead of a canonical one.

Add two compare helpers that can take advantage of the overflow
bit (e.g. if the file is 4GB+, chances are it's already larger than
core.bigFileThreshold and there's no point in comparing the actual
value).

Another note about oe_get_size_slow(). This function MUST be thread
safe because SIZE() macro is used inside try_delta() which may run in
parallel. Outside parallel code, no-contention locking should be dirt
cheap (or insignificant compared to i/o access anyway). To exercise
this code, it's best to run the test suite with something like

make test GIT_TEST_OE_SIZE=4

which forces this code on all objects larger than 3 bytes.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 105 +++--
 pack-objects.c |   3 ++
 pack-objects.h |  57 +-
 t/README   |   6 +++
 4 files changed, 145 insertions(+), 26 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 779f14a45e..cccd0f8040 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -32,6 +32,8 @@
 #include "object-store.h"
 
 #define IN_PACK(obj) oe_in_pack(_pack, obj)
+#define SIZE(obj) oe_size(_pack, obj)
+#define SET_SIZE(obj,size) oe_set_size(_pack, obj, size)
 #define DELTA(obj) oe_delta(_pack, obj)
 #define DELTA_CHILD(obj) oe_delta_child(_pack, obj)
 #define DELTA_SIBLING(obj) oe_delta_sibling(_pack, obj)
@@ -276,7 +278,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 
if (!usable_delta) {
if (oe_type(entry) == OBJ_BLOB &&
-   entry->size > big_file_threshold &&
+   oe_size_greater_than(_pack, entry, big_file_threshold) &&
(st = open_istream(>idx.oid, , , NULL)) != 
NULL)
buf = NULL;
else {
@@ -385,12 +387,13 @@ static off_t write_reuse_object(struct hashfile *f, 
struct object_entry *entry,
unsigned char header[MAX_PACK_OBJECT_HEADER],
  dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
+   unsigned long entry_size = SIZE(entry);
 
if (DELTA(entry))
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
- type, entry->size);
+ type, entry_size);
 
offset = entry->in_pack_offset;
revidx = find_pack_revindex(p, offset);
@@ -407,7 +410,7 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
datalen -= entry->in_pack_header_size;
 
if (!pack_to_stdout && p->index_version == 1 &&
-   check_pack_inflate(p, _curs, offset, datalen, entry->size)) {
+   check_pack_inflate(p, _curs, offset, datalen, entry_size)) {
error("corrupt packed object for %s",
  oid_to_hex(>idx.oid));
unuse_pack(_curs);
@@ -1408,6 +1411,8 @@ static void cleanup_preferred_base(void)
 
 static void check_object(struct object_entry *entry)
 {
+   unsigned long canonical_size;
+
if (IN_PACK(entry)) {
struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
@@ -1445,7 +1450,7 @@ static void check_object(struct object_entry *entry)
default:
/* Not a delta hence we've already got all we need. */
oe_set_type(entry, entry->in_pack_type);
-   entry->size = in_pack_size;
+   SET_SIZE(entry, in_pack_size);
entry->in_pack_header_size = used;
if (oe_type(entry) < OBJ_COMMIT || oe_type(entry) > 
OBJ_BLOB)
goto give_up;
@@ -1502,9 +1507,9 @@ static void check_object(struct object_entry *entry)
 * circular deltas.
 */
oe_set_type(entry, entry->in_pack_type);
-   entry->size = in_pack_size; /* delta size */
+   SET_SIZE(entry, in_pack_size); /* delta size */
SET_DELTA(entry, base_entry);
-   entry->delta_size = entry->size;
+   entry->delta_size = in_pack_size;
  

[PATCH 03/15] pack-objects: turn type and in_pack_type to bitfields

2018-04-14 Thread Nguyễn Thái Ngọc Duy
An extra field type_valid is added to carry the equivalent of OBJ_BAD
in the original "type" field. in_pack_type always contains a valid
type so we only need 3 bits for it.

A note about accepting OBJ_NONE as "valid" type. The function
read_object_list_from_stdin() can pass this value [1] and it
eventually calls create_object_entry() where current code skip setting
"type" field if the incoming type is zero. This does not have any bad
side effects because "type" field should be memset()'d anyway.

But since we also need to set type_valid now, skipping oe_set_type()
leaves type_valid zero/false, which will make oe_type() return
OBJ_BAD, not OBJ_NONE anymore. Apparently we do care about OBJ_NONE in
prepare_pack(). This switch from OBJ_NONE to OBJ_BAD may trigger

fatal: unable to get type of object ...

Accepting OBJ_NONE [2] does sound wrong, but this is how it is has
been for a very long time and I haven't time to dig in further.

[1] See 5c49c11686 (pack-objects: better check_object() performances -
2007-04-16)

[2] 21666f1aae (convert object type handling from a string to a number
- 2007-02-26)

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 59 +-
 cache.h|  2 ++
 object.h   |  1 -
 pack-bitmap-write.c|  6 ++---
 pack-objects.h | 20 --
 5 files changed, 58 insertions(+), 30 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4bdae5a1d8..88877f1f59 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -266,7 +266,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
struct git_istream *st = NULL;
 
if (!usable_delta) {
-   if (entry->type == OBJ_BLOB &&
+   if (oe_type(entry) == OBJ_BLOB &&
entry->size > big_file_threshold &&
(st = open_istream(>idx.oid, , , NULL)) != 
NULL)
buf = NULL;
@@ -371,7 +371,7 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
-   enum object_type type = entry->type;
+   enum object_type type = oe_type(entry);
off_t datalen;
unsigned char header[MAX_PACK_OBJECT_HEADER],
  dheader[MAX_PACK_OBJECT_HEADER];
@@ -480,11 +480,12 @@ static off_t write_object(struct hashfile *f,
to_reuse = 0;   /* explicit */
else if (!entry->in_pack)
to_reuse = 0;   /* can't reuse what we don't have */
-   else if (entry->type == OBJ_REF_DELTA || entry->type == OBJ_OFS_DELTA)
+   else if (oe_type(entry) == OBJ_REF_DELTA ||
+oe_type(entry) == OBJ_OFS_DELTA)
/* check_object() decided it for us ... */
to_reuse = usable_delta;
/* ... but pack split may override that */
-   else if (entry->type != entry->in_pack_type)
+   else if (oe_type(entry) != entry->in_pack_type)
to_reuse = 0;   /* pack has delta which is unusable */
else if (entry->delta)
to_reuse = 0;   /* we want to pack afresh */
@@ -705,8 +706,8 @@ static struct object_entry **compute_write_order(void)
 * And then all remaining commits and tags.
 */
for (i = last_untagged; i < to_pack.nr_objects; i++) {
-   if (objects[i].type != OBJ_COMMIT &&
-   objects[i].type != OBJ_TAG)
+   if (oe_type([i]) != OBJ_COMMIT &&
+   oe_type([i]) != OBJ_TAG)
continue;
add_to_write_order(wo, _end, [i]);
}
@@ -715,7 +716,7 @@ static struct object_entry **compute_write_order(void)
 * And then all the trees.
 */
for (i = last_untagged; i < to_pack.nr_objects; i++) {
-   if (objects[i].type != OBJ_TREE)
+   if (oe_type([i]) != OBJ_TREE)
continue;
add_to_write_order(wo, _end, [i]);
}
@@ -1066,8 +1067,7 @@ static void create_object_entry(const struct object_id 
*oid,
 
entry = packlist_alloc(_pack, oid->hash, index_pos);
entry->hash = hash;
-   if (type)
-   entry->type = type;
+   oe_set_type(entry, type);
if (exclude)
entry->preferred_base = 1;
else
@@ -1407,6 +1407,7 @@ static void check_object(struct object_entry *entry)
unsigned long avail;
off_t ofs;
unsigned char *buf, c;
+   enum object_type type;
 
buf = use_pack(p, _curs, entry->in_pack_offset, );
 
@@ -1415,11 +1416,15 @@ static void check_object(struct object_entry *entry)
 * since non-delta representations could still be reused.
 */
 

[PATCH 08/15] pack-objects: refer to delta objects by index instead of pointer

2018-04-14 Thread Nguyễn Thái Ngọc Duy
These delta pointers always point to elements in the objects[] array
in packing_data struct. We can only hold maximum 4G of those objects
because the array size in nr_objects is uint32_t. We could use
uint32_t indexes to address these elements instead of pointers. On
64-bit architecture (8 bytes per pointer) this would save 4 bytes per
pointer.

Convert these delta pointers to indexes. Since we need to handle NULL
pointers as well, the index is shifted by one [1].

[1] This means we can only index 2^32-2 objects even though nr_objects
could contain 2^32-1 objects. It should not be a problem in
practice because when we grow objects[], nr_alloc would probably
blow up long before nr_objects hits the wall.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 117 ++---
 pack-objects.h |  67 +--
 2 files changed, 125 insertions(+), 59 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 2784d58ec2..ec02641d2e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -32,6 +32,12 @@
 #include "object-store.h"
 
 #define IN_PACK(obj) oe_in_pack(_pack, obj)
+#define DELTA(obj) oe_delta(_pack, obj)
+#define DELTA_CHILD(obj) oe_delta_child(_pack, obj)
+#define DELTA_SIBLING(obj) oe_delta_sibling(_pack, obj)
+#define SET_DELTA(obj, val) oe_set_delta(_pack, obj, val)
+#define SET_DELTA_CHILD(obj, val) oe_set_delta_child(_pack, obj, val)
+#define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(_pack, obj, val)
 
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
@@ -129,10 +135,11 @@ static void *get_delta(struct object_entry *entry)
buf = read_object_file(>idx.oid, , );
if (!buf)
die("unable to read %s", oid_to_hex(>idx.oid));
-   base_buf = read_object_file(>delta->idx.oid, , _size);
+   base_buf = read_object_file((entry)->idx.oid, ,
+   _size);
if (!base_buf)
die("unable to read %s",
-   oid_to_hex(>delta->idx.oid));
+   oid_to_hex((entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
   buf, size, _size, 0);
if (!delta_buf || delta_size != entry->delta_size)
@@ -288,12 +295,12 @@ static unsigned long write_no_reuse_object(struct 
hashfile *f, struct object_ent
size = entry->delta_size;
buf = entry->delta_data;
entry->delta_data = NULL;
-   type = (allow_ofs_delta && entry->delta->idx.offset) ?
+   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
size = entry->delta_size;
-   type = (allow_ofs_delta && entry->delta->idx.offset) ?
+   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
 
@@ -317,7 +324,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 * encoding of the relative offset for the delta
 * base from this object's position in the pack.
 */
-   off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+   off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;
dheader[pos] = ofs & 127;
while (ofs >>= 7)
@@ -343,7 +350,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
return 0;
}
hashwrite(f, header, hdrlen);
-   hashwrite(f, entry->delta->idx.oid.hash, 20);
+   hashwrite(f, DELTA(entry)->idx.oid.hash, 20);
hdrlen += 20;
} else {
if (limit && hdrlen + datalen + 20 >= limit) {
@@ -379,8 +386,8 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
  dheader[MAX_PACK_OBJECT_HEADER];
unsigned hdrlen;
 
-   if (entry->delta)
-   type = (allow_ofs_delta && entry->delta->idx.offset) ?
+   if (DELTA(entry))
+   type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
hdrlen = encode_in_pack_object_header(header, sizeof(header),
  type, entry->size);
@@ -408,7 +415,7 @@ static off_t write_reuse_object(struct hashfile *f, struct 
object_entry *entry,
}
 
if (type == OBJ_OFS_DELTA) {
-   off_t ofs = entry->idx.offset - entry->delta->idx.offset;
+   off_t ofs = entry->idx.offset - DELTA(entry)->idx.offset;
unsigned pos = sizeof(dheader) - 1;

[PATCH 07/15] pack-objects: move in_pack out of struct object_entry

2018-04-14 Thread Nguyễn Thái Ngọc Duy
Instead of using 8 bytes (on 64 bit arch) to store a pointer to a
pack. Use an index instead since the number of packs should be
relatively small.

This limits the number of packs we can handle to 1k. Since we can't be
sure people can never run into the situation where they have more than
1k pack files. Provide a fall back route for it.

If we find out they have too many packs, the new in_pack_by_idx[]
array (which has at most 1k elements) will not be used. Instead we
allocate in_pack[] array that holds nr_objects elements. This is
similar to how the optional in_pack_pos field is handled.

The new simple test is just to make sure the too-many-packs code path
is at least executed. The true test is running

make test GIT_TEST_FULL_IN_PACK_ARRAY=1

to take advantage of other special case tests.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 28 +++---
 object-store.h |  1 +
 pack-objects.c | 65 ++
 pack-objects.h | 38 +++-
 t/README   |  5 
 t/t5300-pack-object.sh |  5 
 6 files changed, 130 insertions(+), 12 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index d9c89e87cd..2784d58ec2 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -31,6 +31,8 @@
 #include "packfile.h"
 #include "object-store.h"
 
+#define IN_PACK(obj) oe_in_pack(_pack, obj)
+
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
N_("git pack-objects [...]  [<  | < 
]"),
@@ -367,7 +369,7 @@ static unsigned long write_no_reuse_object(struct hashfile 
*f, struct object_ent
 static off_t write_reuse_object(struct hashfile *f, struct object_entry *entry,
unsigned long limit, int usable_delta)
 {
-   struct packed_git *p = entry->in_pack;
+   struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
struct revindex_entry *revidx;
off_t offset;
@@ -478,7 +480,7 @@ static off_t write_object(struct hashfile *f,
 
if (!reuse_object)
to_reuse = 0;   /* explicit */
-   else if (!entry->in_pack)
+   else if (!IN_PACK(entry))
to_reuse = 0;   /* can't reuse what we don't have */
else if (oe_type(entry) == OBJ_REF_DELTA ||
 oe_type(entry) == OBJ_OFS_DELTA)
@@ -1074,7 +1076,7 @@ static void create_object_entry(const struct object_id 
*oid,
else
nr_result++;
if (found_pack) {
-   entry->in_pack = found_pack;
+   oe_set_in_pack(_pack, entry, found_pack);
entry->in_pack_offset = found_offset;
}
 
@@ -1399,8 +1401,8 @@ static void cleanup_preferred_base(void)
 
 static void check_object(struct object_entry *entry)
 {
-   if (entry->in_pack) {
-   struct packed_git *p = entry->in_pack;
+   if (IN_PACK(entry)) {
+   struct packed_git *p = IN_PACK(entry);
struct pack_window *w_curs = NULL;
const unsigned char *base_ref = NULL;
struct object_entry *base_entry;
@@ -1535,14 +1537,16 @@ static int pack_offset_sort(const void *_a, const void 
*_b)
 {
const struct object_entry *a = *(struct object_entry **)_a;
const struct object_entry *b = *(struct object_entry **)_b;
+   const struct packed_git *a_in_pack = IN_PACK(a);
+   const struct packed_git *b_in_pack = IN_PACK(b);
 
/* avoid filesystem trashing with loose objects */
-   if (!a->in_pack && !b->in_pack)
+   if (!a_in_pack && !b_in_pack)
return oidcmp(>idx.oid, >idx.oid);
 
-   if (a->in_pack < b->in_pack)
+   if (a_in_pack < b_in_pack)
return -1;
-   if (a->in_pack > b->in_pack)
+   if (a_in_pack > b_in_pack)
return 1;
return a->in_pack_offset < b->in_pack_offset ? -1 :
(a->in_pack_offset > b->in_pack_offset);
@@ -1578,7 +1582,7 @@ static void drop_reused_delta(struct object_entry *entry)
 
oi.sizep = >size;
oi.typep = 
-   if (packed_object_info(entry->in_pack, entry->in_pack_offset, ) < 0) 
{
+   if (packed_object_info(IN_PACK(entry), entry->in_pack_offset, ) < 0) 
{
/*
 * We failed to get the info from this pack for some reason;
 * fall back to sha1_object_info, which may find another copy.
@@ -1848,8 +1852,8 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
 * it, we will still save the transfer cost, as we already know
 * the other side has it and we won't send src_entry at all.
 */
-   if (reuse_delta && trg_entry->in_pack &&
-   trg_entry->in_pack == src_entry->in_pack &&
+   if (reuse_delta && IN_PACK(trg_entry) &&
+   IN_PACK(trg_entry) == IN_PACK(src_entry) &&

[PATCH 10/15] pack-objects: don't check size when the object is bad

2018-04-14 Thread Nguyễn Thái Ngọc Duy
sha1_object_info() in check_objects() may fail to locate an object in
the pack and return type OBJ_BAD. In that case, it will likely leave
the "size" field untouched. We delay error handling until later in
prepare_pack() though. Until then, do not touch "size" field.

This field should contain the default value zero, but we can't say
sha1_object_info() cannot damage it. This becomes more important later
when the object size may have to be retrieved back from the
(non-existing) pack.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 211bb1ad0e..e75693176e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -1742,7 +1742,7 @@ static void get_object_details(void)
for (i = 0; i < to_pack.nr_objects; i++) {
struct object_entry *entry = sorted_by_offset[i];
check_object(entry);
-   if (big_file_threshold < entry->size)
+   if (entry->type_valid && big_file_threshold < entry->size)
entry->no_try_delta = 1;
}
 
@@ -2453,7 +2453,7 @@ static void prepare_pack(int window, int depth)
 */
continue;
 
-   if (entry->size < 50)
+   if (!entry->type_valid || entry->size < 50)
continue;
 
if (entry->no_try_delta)
-- 
2.17.0.367.g5dd2e386c3



[PATCH 05/15] pack-objects: use bitfield for object_entry::depth

2018-04-14 Thread Nguyễn Thái Ngọc Duy
Because of struct packing from now on we can only handle max depth
4095 (or even lower when new booleans are added in this struct). This
should be ok since long delta chain will cause significant slow down
anyway.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt   | 1 +
 Documentation/git-pack-objects.txt | 4 +++-
 Documentation/git-repack.txt   | 4 +++-
 builtin/pack-objects.c | 6 ++
 pack-objects.h | 5 ++---
 5 files changed, 15 insertions(+), 5 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2659153cb3..d97f10722c 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2422,6 +2422,7 @@ pack.window::
 pack.depth::
The maximum delta depth used by linkgit:git-pack-objects[1] when no
maximum depth is given on the command line. Defaults to 50.
+   Maximum value is 4095.
 
 pack.windowMemory::
The maximum size of memory that is consumed by each thread
diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index 81bc490ac5..3503c9e3e6 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -96,7 +96,9 @@ base-name::
it too deep affects the performance on the unpacker
side, because delta data needs to be applied that many
times to get to the necessary object.
-   The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
 
 --window-memory=::
This option provides an additional limit on top of `--window`;
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index ae750e9e11..25c83c4927 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -90,7 +90,9 @@ other objects in that pack they already have locally.
space. `--depth` limits the maximum delta depth; making it too deep
affects the performance on the unpacker side, because delta data needs
to be applied that many times to get to the necessary object.
-   The default value for --window is 10 and --depth is 50.
++
+The default value for --window is 10 and --depth is 50. The maximum
+depth is 4095.
 
 --threads=::
This option is passed through to `git pack-objects`.
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index cc3c31747e..b231e80f17 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3068,6 +3068,12 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
if (pack_to_stdout != !base_name || argc)
usage_with_options(pack_usage, pack_objects_options);
 
+   if (depth >= (1 << OE_DEPTH_BITS)) {
+   warning(_("delta chain depth %d is too deep, forcing %d"),
+   depth, (1 << OE_DEPTH_BITS) - 1);
+   depth = (1 << OE_DEPTH_BITS) - 1;
+   }
+
argv_array_push(, "pack-objects");
if (thin) {
use_internal_rev_list = 1;
diff --git a/pack-objects.h b/pack-objects.h
index 080ef62d31..cdce1648de 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -2,6 +2,7 @@
 #define PACK_OBJECTS_H
 
 #define OE_DFS_STATE_BITS  2
+#define OE_DEPTH_BITS  12
 
 /*
  * State flags for depth-first search used for analyzing delta cycles.
@@ -89,9 +90,7 @@ struct object_entry {
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
-
-   int depth;
-
+   unsigned depth:OE_DEPTH_BITS;
 };
 
 struct packing_data {
-- 
2.17.0.367.g5dd2e386c3



[PATCH 09/15] pack-objects: shrink z_delta_size field in struct object_entry

2018-04-14 Thread Nguyễn Thái Ngọc Duy
We only cache deltas when it's smaller than a certain limit. This limit
defaults to 1000 but save its compressed length in a 64-bit field.
Shrink that field down to 20 bits, so you can only cache 1MB deltas.
Larger deltas must be recomputed at when the pack is written down.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt |  3 ++-
 builtin/pack-objects.c   | 24 ++--
 pack-objects.h   |  3 ++-
 3 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index d97f10722c..a62134264e 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2459,7 +2459,8 @@ pack.deltaCacheLimit::
The maximum size of a delta, that is cached in
linkgit:git-pack-objects[1]. This cache is used to speed up the
writing object phase by not having to recompute the final delta
-   result once the best match for all objects is found. Defaults to 1000.
+   result once the best match for all objects is found.
+   Defaults to 1000. Maximum value is 65535.
 
 pack.threads::
Specifies the number of threads to spawn when searching for best
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index ec02641d2e..211bb1ad0e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -2099,12 +2099,19 @@ static void find_deltas(struct object_entry **list, 
unsigned *list_size,
 * between writes at that moment.
 */
if (entry->delta_data && !pack_to_stdout) {
-   entry->z_delta_size = do_compress(>delta_data,
- entry->delta_size);
-   cache_lock();
-   delta_cache_size -= entry->delta_size;
-   delta_cache_size += entry->z_delta_size;
-   cache_unlock();
+   unsigned long size;
+
+   size = do_compress(>delta_data, 
entry->delta_size);
+   if (size < (1U << OE_Z_DELTA_BITS)) {
+   entry->z_delta_size = size;
+   cache_lock();
+   delta_cache_size -= entry->delta_size;
+   delta_cache_size += entry->z_delta_size;
+   cache_unlock();
+   } else {
+   FREE_AND_NULL(entry->delta_data);
+   entry->z_delta_size = 0;
+   }
}
 
/* if we made n a delta, and if n is already at max
@@ -3087,6 +3094,11 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
depth, (1 << OE_DEPTH_BITS) - 1);
depth = (1 << OE_DEPTH_BITS) - 1;
}
+   if (cache_max_small_delta_size >= (1U << OE_Z_DELTA_BITS)) {
+   warning(_("pack.deltaCacheLimit is too high, forcing %d"),
+   (1U << OE_Z_DELTA_BITS) - 1);
+   cache_max_small_delta_size = (1U << OE_Z_DELTA_BITS) - 1;
+   }
 
argv_array_push(, "pack-objects");
if (thin) {
diff --git a/pack-objects.h b/pack-objects.h
index e962dce3c0..9d0391c173 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -6,6 +6,7 @@
 #define OE_DFS_STATE_BITS  2
 #define OE_DEPTH_BITS  12
 #define OE_IN_PACK_BITS10
+#define OE_Z_DELTA_BITS20
 
 /*
  * State flags for depth-first search used for analyzing delta cycles.
@@ -77,7 +78,7 @@ struct object_entry {
 */
void *delta_data;   /* cached delta (uncompressed) */
unsigned long delta_size;   /* delta data size (uncompressed) */
-   unsigned long z_delta_size; /* delta data size (compressed) */
+   unsigned z_delta_size:OE_Z_DELTA_BITS;
unsigned type_:TYPE_BITS;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
unsigned type_valid:1;
-- 
2.17.0.367.g5dd2e386c3



[PATCH 13/15] pack-objects: shrink delta_size field in struct object_entry

2018-04-14 Thread Nguyễn Thái Ngọc Duy
Allowing a delta size of 64 bits is crazy. Shrink this field down to
20 bits with one overflow bit.

If we find an existing delta larger than 1MB, we do not cache
delta_size at all and will get the value from oe_size(), potentially
from disk if it's larger than 4GB.

Note, since DELTA_SIZE() is used in try_delta() code, it must be
thread-safe. Luckily oe_size() does guarantee this so we it is
thread-safe.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 26 --
 pack-objects.h | 23 ++-
 2 files changed, 38 insertions(+), 11 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index cccd0f8040..88d2bb8153 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -34,10 +34,12 @@
 #define IN_PACK(obj) oe_in_pack(_pack, obj)
 #define SIZE(obj) oe_size(_pack, obj)
 #define SET_SIZE(obj,size) oe_set_size(_pack, obj, size)
+#define DELTA_SIZE(obj) oe_delta_size(_pack, obj)
 #define DELTA(obj) oe_delta(_pack, obj)
 #define DELTA_CHILD(obj) oe_delta_child(_pack, obj)
 #define DELTA_SIBLING(obj) oe_delta_sibling(_pack, obj)
 #define SET_DELTA(obj, val) oe_set_delta(_pack, obj, val)
+#define SET_DELTA_SIZE(obj, val) oe_set_delta_size(_pack, obj, val)
 #define SET_DELTA_CHILD(obj, val) oe_set_delta_child(_pack, obj, val)
 #define SET_DELTA_SIBLING(obj, val) oe_set_delta_sibling(_pack, obj, val)
 
@@ -144,7 +146,7 @@ static void *get_delta(struct object_entry *entry)
oid_to_hex((entry)->idx.oid));
delta_buf = diff_delta(base_buf, base_size,
   buf, size, _size, 0);
-   if (!delta_buf || delta_size != entry->delta_size)
+   if (!delta_buf || delta_size != DELTA_SIZE(entry))
die("delta size changed");
free(buf);
free(base_buf);
@@ -294,14 +296,14 @@ static unsigned long write_no_reuse_object(struct 
hashfile *f, struct object_ent
FREE_AND_NULL(entry->delta_data);
entry->z_delta_size = 0;
} else if (entry->delta_data) {
-   size = entry->delta_size;
+   size = DELTA_SIZE(entry);
buf = entry->delta_data;
entry->delta_data = NULL;
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
} else {
buf = get_delta(entry);
-   size = entry->delta_size;
+   size = DELTA_SIZE(entry);
type = (allow_ofs_delta && DELTA(entry)->idx.offset) ?
OBJ_OFS_DELTA : OBJ_REF_DELTA;
}
@@ -1509,7 +1511,7 @@ static void check_object(struct object_entry *entry)
oe_set_type(entry, entry->in_pack_type);
SET_SIZE(entry, in_pack_size); /* delta size */
SET_DELTA(entry, base_entry);
-   entry->delta_size = in_pack_size;
+   SET_DELTA_SIZE(entry, in_pack_size);
entry->delta_sibling_idx = base_entry->delta_child_idx;
SET_DELTA_CHILD(base_entry, entry);
unuse_pack(_curs);
@@ -1937,7 +1939,7 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
max_size = trg_size/2 - 20;
ref_depth = 1;
} else {
-   max_size = trg_entry->delta_size;
+   max_size = DELTA_SIZE(trg_entry);
ref_depth = trg->depth;
}
max_size = (uint64_t)max_size * (max_depth - src->depth) /
@@ -2006,10 +2008,14 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
delta_buf = create_delta(src->index, trg->data, trg_size, _size, 
max_size);
if (!delta_buf)
return 0;
+   if (delta_size >= (1U << OE_DELTA_SIZE_BITS)) {
+   free(delta_buf);
+   return 0;
+   }
 
if (DELTA(trg_entry)) {
/* Prefer only shallower same-sized deltas. */
-   if (delta_size == trg_entry->delta_size &&
+   if (delta_size == DELTA_SIZE(trg_entry) &&
src->depth + 1 >= trg->depth) {
free(delta_buf);
return 0;
@@ -2024,7 +2030,7 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
free(trg_entry->delta_data);
cache_lock();
if (trg_entry->delta_data) {
-   delta_cache_size -= trg_entry->delta_size;
+   delta_cache_size -= DELTA_SIZE(trg_entry);
trg_entry->delta_data = NULL;
}
if (delta_cacheable(src_size, trg_size, delta_size)) {
@@ -2037,7 +2043,7 @@ static int try_delta(struct unpacked *trg, struct 
unpacked *src,
}
 
SET_DELTA(trg_entry, src_entry);
-   trg_entry->delta_size = delta_size;
+   SET_DELTA_SIZE(trg_entry, delta_size);
trg->depth = 

[PATCH 15/15] ci: exercise the whole test suite with uncommon code in pack-objects

2018-04-14 Thread Nguyễn Thái Ngọc Duy
Some recent optimizations have been added to pack-objects to reduce
memory usage and some code paths are split into two: one for common
use cases and one for rare ones. Make sure the rare cases are tested
with Travis since it requires manual test configuration that is
unlikely to be done by developers.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 ci/run-build-and-tests.sh | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 8190a753de..4b04c75b7f 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -11,7 +11,10 @@ make --jobs=2
 make --quiet test
 if test "$jobname" = "linux-gcc"
 then
-   GIT_TEST_SPLIT_INDEX=yes make --quiet test
+   export GIT_TEST_SPLIT_INDEX=yes
+   export GIT_TEST_FULL_IN_PACK_ARRAY=true
+   export GIT_TEST_OE_SIZE=10
+   make --quiet test
 fi
 
 check_unignored_build_artifacts
-- 
2.17.0.367.g5dd2e386c3



[PATCH 14/15] pack-objects: reorder members to shrink struct object_entry

2018-04-14 Thread Nguyễn Thái Ngọc Duy
Previous patches leave lots of holes and padding in this struct. This
patch reorders the members and shrinks the struct down to 80 bytes
(from 136 bytes on 64-bit systems, before any field shrinking is done)
with 16 bits to spare (and a couple more in in_pack_header_size when
we really run out of bits).

This is the last in a series of memory reduction patches (see
"pack-objects: a bit of document about struct object_entry" for the
first one).

Overall they've reduced repack memory size on linux-2.6.git from
3.747G to 3.424G, or by around 320M, a decrease of 8.5%. The runtime
of repack has stayed the same throughout this series. Ævar's testing
on a big monorepo he has access to (bigger than linux-2.6.git) has
shown a 7.9% reduction, so the overall expected improvement should be
somewhere around 8%.

See 87po42cwql@evledraar.gmail.com on-list
(https://public-inbox.org/git/87po42cwql@evledraar.gmail.com/) for
more detailed numbers and a test script used to produce the numbers
cited above.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 pack-objects.h | 28 +---
 1 file changed, 21 insertions(+), 7 deletions(-)

diff --git a/pack-objects.h b/pack-objects.h
index 1c588184b2..e5456c6c89 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -28,6 +28,10 @@ enum dfs_state {
 };
 
 /*
+ * The size of struct nearly determines pack-objects's memory
+ * consumption. This struct is packed tight for that reason. When you
+ * add or reorder something in this struct, think a bit about this.
+ *
  * basic object info
  * -
  * idx.oid is filled up before delta searching starts. idx.crc32 is
@@ -76,34 +80,44 @@ enum dfs_state {
  */
 struct object_entry {
struct pack_idx_entry idx;
+   void *delta_data;   /* cached delta (uncompressed) */
+   off_t in_pack_offset;
+   uint32_t hash;  /* name hint hash */
unsigned size_:OE_SIZE_BITS;
unsigned size_valid:1;
-   unsigned in_pack_idx:OE_IN_PACK_BITS;   /* already in pack */
-   off_t in_pack_offset;
uint32_t delta_idx; /* delta base object */
uint32_t delta_child_idx; /* deltified objects who bases me */
uint32_t delta_sibling_idx; /* other deltified objects who
 * uses the same base as me
 */
-   void *delta_data;   /* cached delta (uncompressed) */
unsigned delta_size_:OE_DELTA_SIZE_BITS; /* delta data size 
(uncompressed) */
unsigned delta_size_valid:1;
+   unsigned in_pack_idx:OE_IN_PACK_BITS;   /* already in pack */
unsigned z_delta_size:OE_Z_DELTA_BITS;
+   unsigned type_valid:1;
unsigned type_:TYPE_BITS;
+   unsigned no_try_delta:1;
unsigned in_pack_type:TYPE_BITS; /* could be delta */
-   unsigned type_valid:1;
-   uint32_t hash;  /* name hint hash */
-   unsigned char in_pack_header_size;
unsigned preferred_base:1; /*
* we do not pack this, but is available
* to be used as the base object to delta
* objects against.
*/
-   unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
unsigned dfs_state:OE_DFS_STATE_BITS;
+   unsigned char in_pack_header_size;
unsigned depth:OE_DEPTH_BITS;
+
+   /*
+* pahole results on 64-bit linux (gcc and clang)
+*
+*   size: 80, bit_padding: 20 bits, holes: 8 bits
+*
+* and on 32-bit (gcc)
+*
+*   size: 76, bit_padding: 20 bits, holes: 8 bits
+*/
 };
 
 struct packing_data {
-- 
2.17.0.367.g5dd2e386c3



[PATCH 01/15] read-cache.c: make $GIT_TEST_SPLIT_INDEX boolean

2018-04-14 Thread Nguyễn Thái Ngọc Duy
While at there, document about this special mode when running the test
suite.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 ci/run-build-and-tests.sh |  2 +-
 read-cache.c  |  4 ++--
 t/README  | 11 +++
 3 files changed, 14 insertions(+), 3 deletions(-)

diff --git a/ci/run-build-and-tests.sh b/ci/run-build-and-tests.sh
index 3735ce413f..8190a753de 100755
--- a/ci/run-build-and-tests.sh
+++ b/ci/run-build-and-tests.sh
@@ -11,7 +11,7 @@ make --jobs=2
 make --quiet test
 if test "$jobname" = "linux-gcc"
 then
-   GIT_TEST_SPLIT_INDEX=YesPlease make --quiet test
+   GIT_TEST_SPLIT_INDEX=yes make --quiet test
 fi
 
 check_unignored_build_artifacts
diff --git a/read-cache.c b/read-cache.c
index 10f1c6bb8a..fa3df2e72e 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -2268,7 +2268,7 @@ static int do_write_index(struct index_state *istate, 
struct tempfile *tempfile,
 
if (!istate->version) {
istate->version = get_index_format_default();
-   if (getenv("GIT_TEST_SPLIT_INDEX"))
+   if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0))
init_split_index(istate);
}
 
@@ -2559,7 +2559,7 @@ int write_locked_index(struct index_state *istate, struct 
lock_file *lock,
goto out;
}
 
-   if (getenv("GIT_TEST_SPLIT_INDEX")) {
+   if (git_env_bool("GIT_TEST_SPLIT_INDEX", 0)) {
int v = si->base_sha1[0];
if ((v & 15) < 6)
istate->cache_changed |= SPLIT_INDEX_ORDERED;
diff --git a/t/README b/t/README
index 24ddebfabf..d5e0a3c786 100644
--- a/t/README
+++ b/t/README
@@ -293,6 +293,17 @@ and know what setup is needed for it.  Or when you want to 
run
 everything up to a certain test.
 
 
+Running tests with special setups
+-
+
+The whole test suite could be run to test some special features
+that cannot be easily covered by a few specific test cases. These
+could be enabled by running the test suite with correct GIT_TEST_
+environment set.
+
+GIT_TEST_SPLIT_INDEX= forces split-index mode on the whole
+test suite. Accept any boolean values that are accepted by git-config.
+
 Naming Tests
 
 
-- 
2.17.0.367.g5dd2e386c3



[PATCH 02/15] pack-objects: a bit of document about struct object_entry

2018-04-14 Thread Nguyễn Thái Ngọc Duy
The role of this comment block becomes more important after we shuffle
fields around to shrink this struct. It will be much harder to see what
field is related to what.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 pack-objects.h | 45 +
 1 file changed, 45 insertions(+)

diff --git a/pack-objects.h b/pack-objects.h
index 03f1191659..c0a1f61aac 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,51 @@
 #ifndef PACK_OBJECTS_H
 #define PACK_OBJECTS_H
 
+/*
+ * basic object info
+ * -
+ * idx.oid is filled up before delta searching starts. idx.crc32 is
+ * only valid after the object is written out and will be used for
+ * generating the index. idx.offset will be both gradually set and
+ * used in writing phase (base objects get offset first, then deltas
+ * refer to them)
+ *
+ * "size" is the uncompressed object size. Compressed size of the raw
+ * data for an object in a pack is not stored anywhere but is computed
+ * and made available when reverse .idx is made.
+ *
+ * "hash" contains a path name hash which is used for sorting the
+ * delta list and also during delta searching. Once prepare_pack()
+ * returns it's no longer needed.
+ *
+ * source pack info
+ * 
+ * The (in_pack, in_pack_offset) tuple contains the location of the
+ * object in the source pack. in_pack_header_size allows quickly
+ * skipping the header and going straight to the zlib stream.
+ *
+ * "type" and "in_pack_type" both describe object type. in_pack_type
+ * may contain a delta type, while type is always the canonical type.
+ *
+ * deltas
+ * --
+ * Delta links (delta, delta_child and delta_sibling) are created to
+ * reflect that delta graph from the source pack then updated or added
+ * during delta searching phase when we find better deltas.
+ *
+ * delta_child and delta_sibling are last needed in
+ * compute_write_order(). "delta" and "delta_size" must remain valid
+ * at object writing phase in case the delta is not cached.
+ *
+ * If a delta is cached in memory and is compressed, delta_data points
+ * to the data and z_delta_size contains the compressed size. If it's
+ * uncompressed [1], z_delta_size must be zero. delta_size is always
+ * the uncompressed size and must be valid even if the delta is not
+ * cached.
+ *
+ * [1] during try_delta phase we don't bother with compressing because
+ * the delta could be quickly replaced with a better one.
+ */
 struct object_entry {
struct pack_idx_entry idx;
unsigned long size; /* uncompressed size */
-- 
2.17.0.367.g5dd2e386c3



[PATCH 04/15] pack-objects: use bitfield for object_entry::dfs_state

2018-04-14 Thread Nguyễn Thái Ngọc Duy
Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c |  3 +++
 pack-objects.h | 28 +---
 2 files changed, 20 insertions(+), 11 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 88877f1f59..cc3c31747e 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -3049,6 +3049,9 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
OPT_END(),
};
 
+   if (DFS_NUM_STATES > (1 << OE_DFS_STATE_BITS))
+   BUG("too many dfs states, increase OE_DFS_STATE_BITS");
+
check_replace_refs = 0;
 
reset_pack_idx_option(_idx_opts);
diff --git a/pack-objects.h b/pack-objects.h
index b4a83a6123..080ef62d31 100644
--- a/pack-objects.h
+++ b/pack-objects.h
@@ -1,6 +1,21 @@
 #ifndef PACK_OBJECTS_H
 #define PACK_OBJECTS_H
 
+#define OE_DFS_STATE_BITS  2
+
+/*
+ * State flags for depth-first search used for analyzing delta cycles.
+ *
+ * The depth is measured in delta-links to the base (so if A is a delta
+ * against B, then A has a depth of 1, and B a depth of 0).
+ */
+enum dfs_state {
+   DFS_NONE = 0,
+   DFS_ACTIVE,
+   DFS_DONE,
+   DFS_NUM_STATES
+};
+
 /*
  * basic object info
  * -
@@ -73,19 +88,10 @@ struct object_entry {
unsigned no_try_delta:1;
unsigned tagged:1; /* near the very tip of refs */
unsigned filled:1; /* assigned write-order */
+   unsigned dfs_state:OE_DFS_STATE_BITS;
 
-   /*
-* State flags for depth-first search used for analyzing delta cycles.
-*
-* The depth is measured in delta-links to the base (so if A is a delta
-* against B, then A has a depth of 1, and B a depth of 0).
-*/
-   enum {
-   DFS_NONE = 0,
-   DFS_ACTIVE,
-   DFS_DONE
-   } dfs_state;
int depth;
+
 };
 
 struct packing_data {
-- 
2.17.0.367.g5dd2e386c3



[PATCH 00/15] nd/pack-objects-pack-struct update

2018-04-14 Thread Nguyễn Thái Ngọc Duy
This is basically a resend from the last round but rebased on latest
master to take advantage of new object-store and oid changes.
One minor change is t/README now mentions about git-config when a
variable accepts a boolean value.

Nguyễn Thái Ngọc Duy (15):
  read-cache.c: make $GIT_TEST_SPLIT_INDEX boolean
  pack-objects: a bit of document about struct object_entry
  pack-objects: turn type and in_pack_type to bitfields
  pack-objects: use bitfield for object_entry::dfs_state
  pack-objects: use bitfield for object_entry::depth
  pack-objects: move in_pack_pos out of struct object_entry
  pack-objects: move in_pack out of struct object_entry
  pack-objects: refer to delta objects by index instead of pointer
  pack-objects: shrink z_delta_size field in struct object_entry
  pack-objects: don't check size when the object is bad
  pack-objects: clarify the use of object_entry::size
  pack-objects: shrink size field in struct object_entry
  pack-objects: shrink delta_size field in struct object_entry
  pack-objects: reorder members to shrink struct object_entry
  ci: exercise the whole test suite with uncommon code in pack-objects

 Documentation/config.txt   |   4 +-
 Documentation/git-pack-objects.txt |   4 +-
 Documentation/git-repack.txt   |   4 +-
 builtin/pack-objects.c | 364 +++--
 cache.h|   2 +
 ci/run-build-and-tests.sh  |   5 +-
 object-store.h |   1 +
 object.h   |   1 -
 pack-bitmap-write.c|  14 +-
 pack-bitmap.c  |   2 +-
 pack-bitmap.h  |   4 +-
 pack-objects.c |  68 ++
 pack-objects.h | 314 +++--
 read-cache.c   |   4 +-
 t/README   |  22 ++
 t/t5300-pack-object.sh |   5 +
 16 files changed, 654 insertions(+), 164 deletions(-)

-- 
2.17.0.367.g5dd2e386c3



[PATCH 0/7] nd/repack-keep-pack update

2018-04-14 Thread Nguyễn Thái Ngọc Duy
This is basically a resend from the last round but rebased on
'master'. The old base results in some conflicts with packfile and oid
conversion series. This one should reduce merge conflicts on 'pu' to
trivial ones.

Nguyễn Thái Ngọc Duy (7):
  t7700: have closing quote of a test at the beginning of line
  repack: add --keep-pack option
  gc: add --keep-largest-pack option
  gc: add gc.bigPackThreshold config
  gc: handle a corner case in gc.bigPackThreshold
  gc --auto: exclude base pack if not enough mem to "repack -ad"
  pack-objects: show some progress when counting kept objects

 Documentation/config.txt   |  12 +++
 Documentation/git-gc.txt   |  19 +++-
 Documentation/git-pack-objects.txt |   9 +-
 Documentation/git-repack.txt   |   9 +-
 builtin/gc.c   | 165 +++--
 builtin/pack-objects.c |  83 +++
 builtin/repack.c   |  21 +++-
 config.mak.uname   |   1 +
 git-compat-util.h  |   4 +
 object-store.h |   1 +
 pack-objects.h |   2 +
 t/t6500-gc.sh  |  32 ++
 t/t7700-repack.sh  |  27 -
 13 files changed, 349 insertions(+), 36 deletions(-)

-- 
2.17.0.367.g5dd2e386c3



[PATCH 1/7] t7700: have closing quote of a test at the beginning of line

2018-04-14 Thread Nguyễn Thái Ngọc Duy
The closing quote of a test body by convention is always at the start
of line.

Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Junio C Hamano 
---
 t/t7700-repack.sh | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/t/t7700-repack.sh b/t/t7700-repack.sh
index 6061a04147..38247afbec 100755
--- a/t/t7700-repack.sh
+++ b/t/t7700-repack.sh
@@ -194,7 +194,7 @@ test_expect_success 'objects made unreachable by grafts 
only are kept' '
git reflog expire --expire=$test_tick --expire-unreachable=$test_tick 
--all &&
git repack -a -d &&
git cat-file -t $H1
-   '
+'
 
 test_done
 
-- 
2.17.0.367.g5dd2e386c3



[PATCH 5/7] gc: handle a corner case in gc.bigPackThreshold

2018-04-14 Thread Nguyễn Thái Ngọc Duy
This config allows us to keep  packs back if their size is larger
than a limit. But if this N >= gc.autoPackLimit, we may have a
problem. We are supposed to reduce the number of packs after a
threshold because it affects performance.

We could tell the user that they have incompatible gc.bigPackThreshold
and gc.autoPackLimit, but it's kinda hard when 'git gc --auto' runs in
background. Instead let's fall back to the next best stategy: try to
reduce the number of packs anyway, but keep the base pack out. This
reduces the number of packs to two and hopefully won't take up too
much resources to repack (the assumption still is the base pack takes
most resources to handle).

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt | 6 +-
 builtin/gc.c | 8 +++-
 2 files changed, 12 insertions(+), 2 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 40516675b6..4c3f1d7651 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1564,7 +1564,11 @@ gc.bigPackThreshold::
except that all packs that meet the threshold are kept, not
just the base pack. Defaults to zero. Common unit suffixes of
'k', 'm', or 'g' are supported.
-
++
+Note that if the number of kept packs is more than gc.autoPackLimit,
+this configuration variable is ignored, all packs except the base pack
+will be repacked. After this the number of packs should go below
+gc.autoPackLimit and gc.bigPackThreshold should be respected again.
 
 gc.logExpiry::
If the file gc.log exists, then `git gc --auto` won't run
diff --git a/builtin/gc.c b/builtin/gc.c
index 81ecdc5ffa..23c17ba7e9 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -253,8 +253,14 @@ static int need_to_gc(void)
if (too_many_packs()) {
struct string_list keep_pack = STRING_LIST_INIT_NODUP;
 
-   if (big_pack_threshold)
+   if (big_pack_threshold) {
find_base_packs(_pack, big_pack_threshold);
+   if (keep_pack.nr >= gc_auto_pack_limit) {
+   big_pack_threshold = 0;
+   string_list_clear(_pack, 0);
+   find_base_packs(_pack, 0);
+   }
+   }
 
add_repack_all_option(_pack);
string_list_clear(_pack, 0);
-- 
2.17.0.367.g5dd2e386c3



[PATCH 6/7] gc --auto: exclude base pack if not enough mem to "repack -ad"

2018-04-14 Thread Nguyễn Thái Ngọc Duy
pack-objects could be a big memory hog especially on large repos,
everybody knows that. The suggestion to stick a .keep file on the
giant base pack to avoid this problem is also known for a long time.

Recent patches add an option to do just this, but it has to be either
configured or activated manually. This patch lets `git gc --auto`
activate this mode automatically when it thinks `repack -ad` will use
a lot of memory and start affecting the system due to swapping or
flushing OS cache.

gc --auto decides to do this based on an estimation of pack-objects
memory usage, which is quite accurate at least for the heap part, and
whether that fits in half of system memory (the assumption here is for
desktop environment where there are many other applications running).

This mechanism only kicks in if gc.bigBasePackThreshold is not configured.
If it is, it is assumed that the user already knows what they want.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/git-gc.txt |  9 +++-
 builtin/gc.c | 98 +++-
 builtin/pack-objects.c   |  2 +-
 config.mak.uname |  1 +
 git-compat-util.h|  4 ++
 pack-objects.h   |  2 +
 t/t6500-gc.sh|  7 +++
 7 files changed, 119 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 649faddfa6..0fb1d43b2c 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -59,8 +59,13 @@ If the number of packs exceeds the value of 
`gc.autoPackLimit`,
 then existing packs (except those marked with a `.keep` file
 or over `gc.bigPackThreshold` limit)
 are consolidated into a single pack by using the `-A` option of
-'git repack'. Setting `gc.autoPackLimit` to 0 disables
-automatic consolidation of packs.
+'git repack'.
+If the amount of memory is estimated not enough for `git repack` to
+run smoothly and `gc.bigPackThreshold` is not set, the largest
+pack will also be excluded (this is the equivalent of running `git gc`
+with `--keep-base-pack`).
+Setting `gc.autoPackLimit` to 0 disables automatic consolidation of
+packs.
 +
 If houskeeping is required due to many loose objects or packs, all
 other housekeeping tasks (e.g. rerere, working trees, reflog...) will
diff --git a/builtin/gc.c b/builtin/gc.c
index 23c17ba7e9..3c7c93e961 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -22,6 +22,10 @@
 #include "commit.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "pack.h"
+#include "pack-objects.h"
+#include "blob.h"
+#include "tree.h"
 
 #define FAILED_RUN "failed to run %s"
 
@@ -42,6 +46,7 @@ static const char *gc_log_expire = "1.day.ago";
 static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
 static unsigned long big_pack_threshold;
+static unsigned long max_delta_cache_size = DEFAULT_DELTA_CACHE_SIZE;
 
 static struct argv_array pack_refs_cmd = ARGV_ARRAY_INIT;
 static struct argv_array reflog = ARGV_ARRAY_INIT;
@@ -130,6 +135,7 @@ static void gc_config(void)
git_config_get_expiry("gc.logexpiry", _log_expire);
 
git_config_get_ulong("gc.bigpackthreshold", _pack_threshold);
+   git_config_get_ulong("pack.deltacachesize", _delta_cache_size);
 
git_config(git_default_config, NULL);
 }
@@ -169,7 +175,8 @@ static int too_many_loose_objects(void)
return needed;
 }
 
-static void find_base_packs(struct string_list *packs, unsigned long limit)
+static struct packed_git *find_base_packs(struct string_list *packs,
+ unsigned long limit)
 {
struct packed_git *p, *base = NULL;
 
@@ -186,6 +193,8 @@ static void find_base_packs(struct string_list *packs, 
unsigned long limit)
 
if (base)
string_list_append(packs, base->pack_name);
+
+   return base;
 }
 
 static int too_many_packs(void)
@@ -210,6 +219,79 @@ static int too_many_packs(void)
return gc_auto_pack_limit < cnt;
 }
 
+static uint64_t total_ram(void)
+{
+#if defined(HAVE_SYSINFO)
+   struct sysinfo si;
+
+   if (!sysinfo())
+   return si.totalram;
+#elif defined(HAVE_BSD_SYSCTL) && (defined(HW_MEMSIZE) || defined(HW_PHYSMEM))
+   int64_t physical_memory;
+   int mib[2];
+   size_t length;
+
+   mib[0] = CTL_HW;
+# if defined(HW_MEMSIZE)
+   mib[1] = HW_MEMSIZE;
+# else
+   mib[1] = HW_PHYSMEM;
+# endif
+   length = sizeof(int64_t);
+   if (!sysctl(mib, 2, _memory, , NULL, 0))
+   return physical_memory;
+#elif defined(GIT_WINDOWS_NATIVE)
+   MEMORYSTATUSEX memInfo;
+
+   memInfo.dwLength = sizeof(MEMORYSTATUSEX);
+   if (GlobalMemoryStatusEx())
+   return memInfo.ullTotalPhys;
+#endif
+   return 0;
+}
+
+static uint64_t estimate_repack_memory(struct packed_git *pack)
+{
+   unsigned long nr_objects = approximate_object_count();
+   size_t os_cache, heap;
+
+   if (!pack || !nr_objects)
+  

[PATCH 4/7] gc: add gc.bigPackThreshold config

2018-04-14 Thread Nguyễn Thái Ngọc Duy
The --keep-largest-pack option is not very convenient to use because
you need to tell gc to do this explicitly (and probably on just a few
large repos).

Add a config key that enables this mode when packs larger than a limit
are found. Note that there's a slight behavior difference compared to
--keep-largest-pack: all packs larger than the threshold are kept, not
just the largest one.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/config.txt |  8 
 Documentation/git-gc.txt |  6 --
 builtin/gc.c | 26 --
 3 files changed, 32 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 2659153cb3..40516675b6 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1558,6 +1558,14 @@ gc.autoDetach::
Make `git gc --auto` return immediately and run in background
if the system supports it. Default is true.
 
+gc.bigPackThreshold::
+   If non-zero, all packs larger than this limit are kept when
+   `git gc` is run. This is very similar to `--keep-base-pack`
+   except that all packs that meet the threshold are kept, not
+   just the base pack. Defaults to zero. Common unit suffixes of
+   'k', 'm', or 'g' are supported.
+
+
 gc.logExpiry::
If the file gc.log exists, then `git gc --auto` won't run
unless that file is more than 'gc.logExpiry' old.  Default is
diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 8f903231da..649faddfa6 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -56,7 +56,8 @@ single pack using `git repack -d -l`.  Setting the value of 
`gc.auto`
 to 0 disables automatic packing of loose objects.
 +
 If the number of packs exceeds the value of `gc.autoPackLimit`,
-then existing packs (except those marked with a `.keep` file)
+then existing packs (except those marked with a `.keep` file
+or over `gc.bigPackThreshold` limit)
 are consolidated into a single pack by using the `-A` option of
 'git repack'. Setting `gc.autoPackLimit` to 0 disables
 automatic consolidation of packs.
@@ -86,7 +87,8 @@ be performed as well.
 
 --keep-largest-pack::
All packs except the largest pack and those marked with a
-   `.keep` files are consolidated into a single pack.
+   `.keep` files are consolidated into a single pack. When this
+   option is used, `gc.bigPackThreshold` is ignored.
 
 Configuration
 -
diff --git a/builtin/gc.c b/builtin/gc.c
index f251662a8f..81ecdc5ffa 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -41,6 +41,7 @@ static timestamp_t gc_log_expire_time;
 static const char *gc_log_expire = "1.day.ago";
 static const char *prune_expire = "2.weeks.ago";
 static const char *prune_worktrees_expire = "3.months.ago";
+static unsigned long big_pack_threshold;
 
 static struct argv_array pack_refs_cmd = ARGV_ARRAY_INIT;
 static struct argv_array reflog = ARGV_ARRAY_INIT;
@@ -128,6 +129,8 @@ static void gc_config(void)
git_config_get_expiry("gc.worktreepruneexpire", 
_worktrees_expire);
git_config_get_expiry("gc.logexpiry", _log_expire);
 
+   git_config_get_ulong("gc.bigpackthreshold", _pack_threshold);
+
git_config(git_default_config, NULL);
 }
 
@@ -166,14 +169,17 @@ static int too_many_loose_objects(void)
return needed;
 }
 
-static void find_base_packs(struct string_list *packs)
+static void find_base_packs(struct string_list *packs, unsigned long limit)
 {
struct packed_git *p, *base = NULL;
 
for (p = get_packed_git(the_repository); p; p = p->next) {
if (!p->pack_local)
continue;
-   if (!base || base->pack_size < p->pack_size) {
+   if (limit) {
+   if (p->pack_size >= limit)
+   string_list_append(packs, p->pack_name);
+   } else if (!base || base->pack_size < p->pack_size) {
base = p;
}
}
@@ -244,9 +250,15 @@ static int need_to_gc(void)
 * we run "repack -A -d -l".  Otherwise we tell the caller
 * there is no need.
 */
-   if (too_many_packs())
-   add_repack_all_option(NULL);
-   else if (too_many_loose_objects())
+   if (too_many_packs()) {
+   struct string_list keep_pack = STRING_LIST_INIT_NODUP;
+
+   if (big_pack_threshold)
+   find_base_packs(_pack, big_pack_threshold);
+
+   add_repack_all_option(_pack);
+   string_list_clear(_pack, 0);
+   } else if (too_many_loose_objects())
add_repack_incremental_option();
else
return 0;
@@ -464,7 +476,9 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 
if (keep_base_pack != -1) {
if (keep_base_pack)
-   find_base_packs(_pack);
+ 

[PATCH 7/7] pack-objects: show some progress when counting kept objects

2018-04-14 Thread Nguyễn Thái Ngọc Duy
We only show progress when there are new objects to be packed. But
when --keep-pack is specified on the base pack, we will exclude most
of objects. This makes 'pack-objects' stay silent for a long time
while the counting phase is going.

Let's show some progress whenever we visit an object instead. The old
"Counting objects" is renamed to "Enumerating objects" and a new
progress "Counting objects" line is added.

This new "Counting objects" line should progress pretty quick when the
system is beefy. But when the system is under pressure, the reading
object header done in this phase could be slow and showing progress is
an improvement over staying silent in the current code.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 builtin/pack-objects.c | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index c77bea404d..6a1346c41f 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -46,7 +46,7 @@ static const char *pack_usage[] = {
 static struct packing_data to_pack;
 
 static struct pack_idx_entry **written_list;
-static uint32_t nr_result, nr_written;
+static uint32_t nr_result, nr_written, nr_seen;
 
 static int non_empty;
 static int reuse_delta = 1, reuse_object = 1;
@@ -1096,6 +1096,8 @@ static int add_object_entry(const struct object_id *oid, 
enum object_type type,
off_t found_offset = 0;
uint32_t index_pos;
 
+   display_progress(progress_state, ++nr_seen);
+
if (have_duplicate_entry(oid, exclude, _pos))
return 0;
 
@@ -,8 +1113,6 @@ static int add_object_entry(const struct object_id *oid, 
enum object_type type,
create_object_entry(oid, type, pack_name_hash(name),
exclude, name && no_try_delta(name),
index_pos, found_pack, found_offset);
-
-   display_progress(progress_state, nr_result);
return 1;
 }
 
@@ -1123,6 +1123,8 @@ static int add_object_entry_from_bitmap(const struct 
object_id *oid,
 {
uint32_t index_pos;
 
+   display_progress(progress_state, ++nr_seen);
+
if (have_duplicate_entry(oid, 0, _pos))
return 0;
 
@@ -1130,8 +1132,6 @@ static int add_object_entry_from_bitmap(const struct 
object_id *oid,
return 0;
 
create_object_entry(oid, type, name_hash, 0, 0, index_pos, pack, 
offset);
-
-   display_progress(progress_state, nr_result);
return 1;
 }
 
@@ -1716,6 +1716,10 @@ static void get_object_details(void)
uint32_t i;
struct object_entry **sorted_by_offset;
 
+   if (progress)
+   progress_state = start_progress(_("Counting objects"),
+   to_pack.nr_objects);
+
sorted_by_offset = xcalloc(to_pack.nr_objects, sizeof(struct 
object_entry *));
for (i = 0; i < to_pack.nr_objects; i++)
sorted_by_offset[i] = to_pack.objects + i;
@@ -1726,7 +1730,9 @@ static void get_object_details(void)
check_object(entry);
if (big_file_threshold < entry->size)
entry->no_try_delta = 1;
+   display_progress(progress_state, i + 1);
}
+   stop_progress(_state);
 
/*
 * This must happen in a second pass, since we rely on the delta
@@ -3209,7 +3215,7 @@ int cmd_pack_objects(int argc, const char **argv, const 
char *prefix)
}
 
if (progress)
-   progress_state = start_progress(_("Counting objects"), 0);
+   progress_state = start_progress(_("Enumerating objects"), 0);
if (!use_internal_rev_list)
read_object_list_from_stdin();
else {
-- 
2.17.0.367.g5dd2e386c3



[PATCH 2/7] repack: add --keep-pack option

2018-04-14 Thread Nguyễn Thái Ngọc Duy
We allow to keep existing packs by having companion .keep files. This
is helpful when a pack is permanently kept. In the next patch, git-gc
just wants to keep a pack temporarily, for one pack-objects
run. git-gc can use --keep-pack for this use case.

A note about why the pack_keep field cannot be reused and
pack_keep_in_core has to be added. This is about the case when
--keep-pack is specified together with either --keep-unreachable or
--unpack-unreachable, but --honor-pack-keep is NOT specified.

In this case, we want to exclude objects from the packs specified on
command line, not from ones with .keep files. If only one bit flag is
used, we have to clear pack_keep on pack files with the .keep file.

But we can't make any assumption about unreachable objects in .keep
packs. If "pack_keep" field is false for .keep packs, we could
potentially pull lots of unreachable objects into the new pack, or
unpack them loose. The safer approach is ignore all packs with either
.keep file or --keep-pack.

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/git-pack-objects.txt |  9 -
 Documentation/git-repack.txt   |  9 -
 builtin/pack-objects.c | 63 --
 builtin/repack.c   | 21 --
 object-store.h |  1 +
 t/t7700-repack.sh  | 25 
 6 files changed, 110 insertions(+), 18 deletions(-)

diff --git a/Documentation/git-pack-objects.txt 
b/Documentation/git-pack-objects.txt
index 81bc490ac5..403524652a 100644
--- a/Documentation/git-pack-objects.txt
+++ b/Documentation/git-pack-objects.txt
@@ -12,7 +12,7 @@ SYNOPSIS
 'git pack-objects' [-q | --progress | --all-progress] [--all-progress-implied]
[--no-reuse-delta] [--delta-base-offset] [--non-empty]
[--local] [--incremental] [--window=] [--depth=]
-   [--revs [--unpacked | --all]]
+   [--revs [--unpacked | --all]] [--keep-pack=]
[--stdout [--filter=] | base-name]
[--shallow] [--keep-true-parents] < object-list
 
@@ -126,6 +126,13 @@ base-name::
has a .keep file to be ignored, even if it would have
otherwise been packed.
 
+--keep-pack=::
+   This flag causes an object already in the given pack to be
+   ignored, even if it would have otherwise been
+   packed. `` is the the pack file name without
+   leading directory (e.g. `pack-123.pack`). The option could be
+   specified multiple times to keep multiple packs.
+
 --incremental::
This flag causes an object already in a pack to be ignored
even if it would have otherwise been packed.
diff --git a/Documentation/git-repack.txt b/Documentation/git-repack.txt
index ae750e9e11..ce497d9d12 100644
--- a/Documentation/git-repack.txt
+++ b/Documentation/git-repack.txt
@@ -9,7 +9,7 @@ git-repack - Pack unpacked objects in a repository
 SYNOPSIS
 
 [verse]
-'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=] 
[--depth=] [--threads=]
+'git repack' [-a] [-A] [-d] [-f] [-F] [-l] [-n] [-q] [-b] [--window=] 
[--depth=] [--threads=] [--keep-pack=]
 
 DESCRIPTION
 ---
@@ -133,6 +133,13 @@ other objects in that pack they already have locally.
with `-b` or `repack.writeBitmaps`, as it ensures that the
bitmapped packfile has the necessary objects.
 
+--keep-pack=::
+   Exclude the given pack from repacking. This is the equivalent
+   of having `.keep` file on the pack. `` is the the
+   pack file name without leading directory (e.g. `pack-123.pack`).
+   The option could be specified multiple times to keep multiple
+   packs.
+
 --unpack-unreachable=::
When loosening unreachable objects, do not bother loosening any
objects older than ``. This can be used to optimize out
diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index 4bdae5a1d8..9b9a6d6268 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -30,6 +30,7 @@
 #include "list.h"
 #include "packfile.h"
 #include "object-store.h"
+#include "dir.h"
 
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
@@ -55,7 +56,8 @@ static int pack_loose_unreachable;
 static int local;
 static int have_non_local_packs;
 static int incremental;
-static int ignore_packed_keep;
+static int ignore_packed_keep_on_disk;
+static int ignore_packed_keep_in_core;
 static int allow_ofs_delta;
 static struct pack_idx_option pack_idx_opts;
 static const char *base_name;
@@ -982,13 +984,16 @@ static int want_found_object(int exclude, struct 
packed_git *p)
 * Otherwise, we signal "-1" at the end to tell the caller that we do
 * not know either way, and it needs to check more packs.
 */
-   if (!ignore_packed_keep &&
+   if (!ignore_packed_keep_on_disk &&
+   !ignore_packed_keep_in_core &&
(!local || !have_non_local_packs))
return 1;
 
if (local && !p->pack_local)

[PATCH 3/7] gc: add --keep-largest-pack option

2018-04-14 Thread Nguyễn Thái Ngọc Duy
This adds a new repack mode that combines everything into a secondary
pack, leaving the largest pack alone.

This could help reduce memory pressure. On linux-2.6.git, valgrind
massif reports 1.6GB heap in "pack all" case, and 535MB in "pack
all except the base pack" case. We save roughly 1GB memory by
excluding the base pack.

This should also lower I/O because we don't have to rewrite a giant
pack every time (e.g. for linux-2.6.git that's a 1.4GB pack file)..

PS. The use of string_list here seems overkill, but we'll need it in
the next patch...

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/git-gc.txt |  6 +-
 builtin/gc.c | 45 
 t/t6500-gc.sh| 25 ++
 3 files changed, 71 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 3126e0dd00..8f903231da 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -9,7 +9,7 @@ git-gc - Cleanup unnecessary files and optimize the local 
repository
 SYNOPSIS
 
 [verse]
-'git gc' [--aggressive] [--auto] [--quiet] [--prune= | --no-prune] 
[--force]
+'git gc' [--aggressive] [--auto] [--quiet] [--prune= | --no-prune] 
[--force] [--keep-largest-pack]
 
 DESCRIPTION
 ---
@@ -84,6 +84,10 @@ be performed as well.
Force `git gc` to run even if there may be another `git gc`
instance running on this repository.
 
+--keep-largest-pack::
+   All packs except the largest pack and those marked with a
+   `.keep` files are consolidated into a single pack.
+
 Configuration
 -
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 3e67124eaa..f251662a8f 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -166,6 +166,22 @@ static int too_many_loose_objects(void)
return needed;
 }
 
+static void find_base_packs(struct string_list *packs)
+{
+   struct packed_git *p, *base = NULL;
+
+   for (p = get_packed_git(the_repository); p; p = p->next) {
+   if (!p->pack_local)
+   continue;
+   if (!base || base->pack_size < p->pack_size) {
+   base = p;
+   }
+   }
+
+   if (base)
+   string_list_append(packs, base->pack_name);
+}
+
 static int too_many_packs(void)
 {
struct packed_git *p;
@@ -188,7 +204,13 @@ static int too_many_packs(void)
return gc_auto_pack_limit < cnt;
 }
 
-static void add_repack_all_option(void)
+static int keep_one_pack(struct string_list_item *item, void *data)
+{
+   argv_array_pushf(, "--keep-pack=%s", basename(item->string));
+   return 0;
+}
+
+static void add_repack_all_option(struct string_list *keep_pack)
 {
if (prune_expire && !strcmp(prune_expire, "now"))
argv_array_push(, "-a");
@@ -197,6 +219,9 @@ static void add_repack_all_option(void)
if (prune_expire)
argv_array_pushf(, "--unpack-unreachable=%s", 
prune_expire);
}
+
+   if (keep_pack)
+   for_each_string_list(keep_pack, keep_one_pack, NULL);
 }
 
 static void add_repack_incremental_option(void)
@@ -220,7 +245,7 @@ static int need_to_gc(void)
 * there is no need.
 */
if (too_many_packs())
-   add_repack_all_option();
+   add_repack_all_option(NULL);
else if (too_many_loose_objects())
add_repack_incremental_option();
else
@@ -354,6 +379,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
const char *name;
pid_t pid;
int daemonized = 0;
+   int keep_base_pack = -1;
 
struct option builtin_gc_options[] = {
OPT__QUIET(, N_("suppress progress reporting")),
@@ -366,6 +392,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
OPT_BOOL_F(0, "force", ,
   N_("force running gc even if there may be another gc 
running"),
   PARSE_OPT_NOCOMPLETE),
+   OPT_BOOL(0, "keep-largest-pack", _base_pack,
+N_("repack all other packs except the largest pack")),
OPT_END()
};
 
@@ -431,8 +459,17 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 */
daemonized = !daemonize();
}
-   } else
-   add_repack_all_option();
+   } else {
+   struct string_list keep_pack = STRING_LIST_INIT_NODUP;
+
+   if (keep_base_pack != -1) {
+   if (keep_base_pack)
+   find_base_packs(_pack);
+   }
+
+   add_repack_all_option(_pack);
+   string_list_clear(_pack, 0);
+   }
 
name = lock_repo_for_gc(force, );
if (name) {
diff --git a/t/t6500-gc.sh b/t/t6500-gc.sh
index d5255dd576..c42f60bc5b 100755
--- a/t/t6500-gc.sh
+++ 

Re: [PATCH] completion: reduce overhead of clearing cached --options

2018-04-14 Thread Jakub Narebski
SZEDER Gábor  writes:
> On Fri, Apr 13, 2018 at 11:44 PM, Jakub Narebski  wrote:
>> SZEDER Gábor  writes:
>>>
>>> In Bash we can do better: run the 'compgen -v __gitcomp_builtin_'
>>> builtin command, which lists the same variables, but without a
>>> pipeline and 'sed' it can do so with lower overhead.
>>
>> What about ZSH?
>
> Nothing, ZSH is unaffected by this patch.

All right, so for ZSH we would need LC_ALL=C trick, or come with some
equivalent of 'compgen -v __gitcomp_builtin_' for this shell.

Good patch, though it does not solve whole of the problem.

Best,
-- 
Jakub Narębski


Re: Bug: rebase -i creates committer time inversions on 'reword'

2018-04-14 Thread Johannes Schindelin
Hi,

On Sat, 14 Apr 2018, Phillip Wood wrote:

> On 13/04/18 17:52, Johannes Sixt wrote:
> > 
> > I just noticed that all commits in a 70-commit branch have the same
> > committer timestamp. This is very unusual on Windows, where rebase -i of
> > such a long branch takes more than one second (but not more than 3 or
> > so thanks to the builtin nature of the command!).
> > 
> > And, in fact, if you mark some commits with 'reword' to delay the quick
> > processing of the patches, then the reworded commits have later time
> > stamps, but subsequent not reworded commits receive the earlier time
> > stamp. This is clearly not intended.
> 
> Oh dear, I think this is probably due to my series making rebase commit
> in-process when the commit message isn't being edited. I didn't realize
> that git cached the commit date rather than using the current time when
> calling commit_tree_extended(). I'll take a look at it next week.

Thanks.

However, a quick lock at `git log @{u}.. --format=%ct` in my
`sequencer-shears` branch thicket (which I rebase frequently on top of
upstream's `master` using the last known-good `rebase-merges` sub-branch)
shows that the commits have different-enough commit timestamps. (It is
satisfying to see that multiple commits were made during the same second,
of course.)

So while I cannot find anything in the code that disagrees with Hannes'
assessment, it looks on the surface as if I did not encounter the bug
here.

Curious.

FWIW I agree with Hannes' patch.

> I think 'git am' probably gives all patches the same commit time as well
> if the commit date is cached though it wont suffer from the time-travel
> problem.

I thought that `git am` was the subject of such a complaint recently, but
I thought that had been resolved? Apparently I misremember...

Ciao,
Dscho


Re: [RFC PATCH] tests: fix PATH for GIT_TEST_INSTALLED tests

2018-04-14 Thread Johannes Schindelin
Hi Guillaume,

On Sat, 14 Apr 2018, Guillaume Maudoux wrote:

> From: Guillaume Maudoux 
> 
> When running tests on an existing git installation with
> GIT_TEST_INSTALLED (as described in t/README), the test helpers are
> missing in the PATH.
> 
> This fixes the test suite in a way that allows all the tests to pass.
> 
> Signed-off-by: Guillaume Maudoux 
> ---
> 
> This is more a bug report than a real patch. The issue is described
> above and this patch does solve it. I however think that someone with
> more knowledge should refactor all that chunck of code that was last
> changed in 2010.
> 
> In particular, it seems that the GIT_TEST_INSTALLED path does not use
> bin-wrappers at all. This may imply that --with-dashes also breaks
> tests.
> 
>  t/test-lib.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git t/test-lib.sh t/test-lib.sh
> index 7740d511d..0d51261f7 100644
> --- t/test-lib.sh
> +++ t/test-lib.sh
> @@ -923,7 +923,7 @@ elif test -n "$GIT_TEST_INSTALLED"
>  then
>   GIT_EXEC_PATH=$($GIT_TEST_INSTALLED/git --exec-path)  ||
>   error "Cannot run git from $GIT_TEST_INSTALLED."
> - PATH=$GIT_TEST_INSTALLED:$GIT_BUILD_DIR:$PATH
> + PATH=$GIT_TEST_INSTALLED:$GIT_BUILD_DIR/t/helper:$GIT_BUILD_DIR:$PATH
>   GIT_EXEC_PATH=${GIT_TEST_EXEC_PATH:-$GIT_EXEC_PATH}
>  else # normal case, use ../bin-wrappers only unless $with_dashes:
>   git_bin_dir="$GIT_BUILD_DIR/bin-wrappers"

This is essentially identical to what we have in

http://github.com/git-for-windows/git/commit/e408b7517d

So: ACK.

You might also want to go a bit further and let the test suite run with
GIT_TEST_INSTALLED when Git has not actually be built, but only the test
helpers. I started something along those lines here:

http://github.com/git-for-windows/git/commit/a80f047abc5

I always meant to come back to polish those patches and submit them to the
Git mailing list, so: thank you for getting the ball rolling.

FWIW my use case is that I want to test a "MinGit" package, i.e. a subset
of Git for Windows intended to cater to third-party applications requiring
Git functionality (but not requiring any interactive parts of it).

What is your use case?

Ciao,
Johannes


Re: Bug: rebase -i creates committer time inversions on 'reword'

2018-04-14 Thread Phillip Wood

On 13/04/18 17:52, Johannes Sixt wrote:
> 
> I just noticed that all commits in a 70-commit branch have the same
> committer timestamp. This is very unusual on Windows, where rebase -i of
> such a long branch takes more than one second (but not more than 3 or
> so thanks to the builtin nature of the command!).
> 
> And, in fact, if you mark some commits with 'reword' to delay the quick
> processing of the patches, then the reworded commits have later time
> stamps, but subsequent not reworded commits receive the earlier time
> stamp. This is clearly not intended.

Oh dear, I think this is probably due to my series making rebase commit
in-process when the commit message isn't being edited. I didn't realize
that git cached the commit date rather than using the current time when
calling commit_tree_extended(). I'll take a look at it next week. I
think 'git am' probably gives all patches the same commit time as well
if the commit date is cached though it wont suffer from the time-travel
problem.

Best Wishes

Phillip

> Perhaps something like this below is needed.
> 
> diff --git a/ident.c b/ident.c
> index 327abe557f..2c6bff7b9d 100644
> --- a/ident.c
> +++ b/ident.c
> @@ -178,8 +178,8 @@ const char *ident_default_email(void)
>  
>  static const char *ident_default_date(void)
>  {
> - if (!git_default_date.len)
> - datestamp(_default_date);
> + strbuf_reset(_default_date);
> + datestamp(_default_date);
>   return git_default_date.buf;
>  }
>  
> 



Gruß

2018-04-14 Thread Vladimir Volf
Gruß

In einer kurzen Einleitung bin ich der Anwalt Vladimir Volf aus
zamberk Tschechische Republik, aber jetzt lebe ich in London, ich habe
dir eine E-Mail über meinen verstorbenen Klienten geschickt, aber ich
habe keine Antwort von dir erhalten, der Verstorbene ist ein Bürger
Von deinem Land mit dem gleichen Nachnamen mit dir, er ist ein
Exporteur von Gold hier in London.

Er starb vor ein paar Jahren mit der Familie und verließ seine
Firma,und riesige Geldbeträge in UBS Investment Bank hier in London
hinterlegt, Ich bin sein persönlicher Anwalt und ich brauche deine
Mitarbeit, damit wir das Geld bekommen können. Das Geld wird in Höhe
von £ 5,7 Millionen geschätzt, bevor die Regierung das Geld endgültig
beschlagnahmt hat, aber ich sage dir weitere Details darüber, wenn ich
von dir höre.