Re: [PATCH v4 3/7] gc: add --keep-largest-pack option

2018-03-25 Thread Junio C Hamano
Duy Nguyen  writes:

>> This conflicts with master because of your own 7e1eeaa431 ("completion:
>> use __gitcomp_builtin in _git_gc", 2018-02-09). I pushed out a
>> avar-pclouds/gc-auto-keep-base-pack branch to github.com/avar/git which
>> resolves it as:
>>
>> @@ -365,6 +393,8 @@ int cmd_gc(int argc, const char **argv, const char 
>> *prefix)
>> OPT_BOOL_F(0, "force", ,
>>N_("force running gc even if there may be 
>> another gc running"),
>>PARSE_OPT_NOCOMPLETE),
>> +   OPT_BOOL(0, "keep-largest-pack", _base_pack,
>> +N_("repack all other packs except the largest 
>> pack")),
>> OPT_END()
>> };
>>
>> I assume that's the intention here.
>
> Yeah, I want  to keep the same base for easy interdiff. There are
> worse conflicts are with the other series I'm helping Stefan.

Right now there are a couple of topics that wants to touch options
array and also the builtins command table, which are both good
examples of "central registry" that everybody needs to touch and are
bound to become a source of conflict.  Until I scream (and I try
hard not to have to), contributors should not have to worry too much
about causing conflicts---being traffic cop and directing orderly
integration of these topics is what the maintainer(s) do.

Thanks.


Re: [PATCH v4 3/7] gc: add --keep-largest-pack option

2018-03-24 Thread Duy Nguyen
On Sat, Mar 24, 2018 at 10:36 AM, Ævar Arnfjörð Bjarmason
 wrote:
>
> On Sat, Mar 24 2018, Nguyễn Thái Ngọc Duy wrote:
>
>>   struct option builtin_gc_options[] = {
>>   OPT__QUIET(, N_("suppress progress reporting")),
>> @@ -362,6 +390,8 @@ int cmd_gc(int argc, const char **argv, const char 
>> *prefix)
>>   OPT_BOOL(0, "aggressive", , N_("be more thorough 
>> (increased runtime)")),
>>   OPT_BOOL(0, "auto", _gc, N_("enable auto-gc mode")),
>>   OPT_BOOL(0, "force", , N_("force running gc even if 
>> there may be another gc running")),
>> + OPT_BOOL(0, "keep-largest-pack", _base_pack,
>> +  N_("repack all other packs except the largest pack")),
>>   OPT_END()
>>   };
>
> This conflicts with master because of your own 7e1eeaa431 ("completion:
> use __gitcomp_builtin in _git_gc", 2018-02-09). I pushed out a
> avar-pclouds/gc-auto-keep-base-pack branch to github.com/avar/git which
> resolves it as:
>
> @@ -365,6 +393,8 @@ int cmd_gc(int argc, const char **argv, const char 
> *prefix)
> OPT_BOOL_F(0, "force", ,
>N_("force running gc even if there may be 
> another gc running"),
>PARSE_OPT_NOCOMPLETE),
> +   OPT_BOOL(0, "keep-largest-pack", _base_pack,
> +N_("repack all other packs except the largest 
> pack")),
> OPT_END()
> };
>
> I assume that's the intention here.

Yeah, I want  to keep the same base for easy interdiff. There are
worse conflicts are with the other series I'm helping Stefan.
-- 
Duy


Re: [PATCH v4 3/7] gc: add --keep-largest-pack option

2018-03-24 Thread Ævar Arnfjörð Bjarmason

On Sat, Mar 24 2018, Nguyễn Thái Ngọc Duy wrote:

>   struct option builtin_gc_options[] = {
>   OPT__QUIET(, N_("suppress progress reporting")),
> @@ -362,6 +390,8 @@ int cmd_gc(int argc, const char **argv, const char 
> *prefix)
>   OPT_BOOL(0, "aggressive", , N_("be more thorough 
> (increased runtime)")),
>   OPT_BOOL(0, "auto", _gc, N_("enable auto-gc mode")),
>   OPT_BOOL(0, "force", , N_("force running gc even if there 
> may be another gc running")),
> + OPT_BOOL(0, "keep-largest-pack", _base_pack,
> +  N_("repack all other packs except the largest pack")),
>   OPT_END()
>   };

This conflicts with master because of your own 7e1eeaa431 ("completion:
use __gitcomp_builtin in _git_gc", 2018-02-09). I pushed out a
avar-pclouds/gc-auto-keep-base-pack branch to github.com/avar/git which
resolves it as:

@@ -365,6 +393,8 @@ int cmd_gc(int argc, const char **argv, const char 
*prefix)
OPT_BOOL_F(0, "force", ,
   N_("force running gc even if there may be 
another gc running"),
   PARSE_OPT_NOCOMPLETE),
+   OPT_BOOL(0, "keep-largest-pack", _base_pack,
+N_("repack all other packs except the largest 
pack")),
OPT_END()
};

I assume that's the intention here.


[PATCH v4 3/7] gc: add --keep-largest-pack option

2018-03-24 Thread Nguyễn Thái Ngọc Duy
This adds a new repack mode that combines everything into a secondary
pack, leaving the largest pack alone.

This could help reduce memory pressure. On linux-2.6.git, valgrind
massif reports 1.6GB heap in "pack all" case, and 535MB in "pack
all except the base pack" case. We save roughly 1GB memory by
excluding the base pack.

This should also lower I/O because we don't have to rewrite a giant
pack every time (e.g. for linux-2.6.git that's a 1.4GB pack file)..

PS. The use of string_list here seems overkill, but we'll need it in
the next patch...

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/git-gc.txt |  6 -
 builtin/gc.c | 47 
 t/t6500-gc.sh| 25 +
 3 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 571b5a7e3c..bf81b8de30 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -9,7 +9,7 @@ git-gc - Cleanup unnecessary files and optimize the local 
repository
 SYNOPSIS
 
 [verse]
-'git gc' [--aggressive] [--auto] [--quiet] [--prune= | --no-prune] 
[--force]
+'git gc' [--aggressive] [--auto] [--quiet] [--prune= | --no-prune] 
[--force] [--keep-largest-pack]
 
 DESCRIPTION
 ---
@@ -78,6 +78,10 @@ automatic consolidation of packs.
Force `git gc` to run even if there may be another `git gc`
instance running on this repository.
 
+--keep-largest-pack::
+   All packs except the largest pack and those marked with a
+   `.keep` files are consolidated into a single pack.
+
 Configuration
 -
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 77fa720bd0..9a09cf53b0 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -164,6 +164,24 @@ static int too_many_loose_objects(void)
return needed;
 }
 
+static void find_base_packs(struct string_list *packs)
+{
+   struct packed_git *p, *base = NULL;
+
+   prepare_packed_git();
+
+   for (p = packed_git; p; p = p->next) {
+   if (!p->pack_local)
+   continue;
+   if (!base || base->pack_size < p->pack_size) {
+   base = p;
+   }
+   }
+
+   if (base)
+   string_list_append(packs, base->pack_name);
+}
+
 static int too_many_packs(void)
 {
struct packed_git *p;
@@ -187,7 +205,13 @@ static int too_many_packs(void)
return gc_auto_pack_limit < cnt;
 }
 
-static void add_repack_all_option(void)
+static int keep_one_pack(struct string_list_item *item, void *data)
+{
+   argv_array_pushf(, "--keep-pack=%s", basename(item->string));
+   return 0;
+}
+
+static void add_repack_all_option(struct string_list *keep_pack)
 {
if (prune_expire && !strcmp(prune_expire, "now"))
argv_array_push(, "-a");
@@ -196,6 +220,9 @@ static void add_repack_all_option(void)
if (prune_expire)
argv_array_pushf(, "--unpack-unreachable=%s", 
prune_expire);
}
+
+   if (keep_pack)
+   for_each_string_list(keep_pack, keep_one_pack, NULL);
 }
 
 static void add_repack_incremental_option(void)
@@ -219,7 +246,7 @@ static int need_to_gc(void)
 * there is no need.
 */
if (too_many_packs())
-   add_repack_all_option();
+   add_repack_all_option(NULL);
else if (too_many_loose_objects())
add_repack_incremental_option();
else
@@ -353,6 +380,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
const char *name;
pid_t pid;
int daemonized = 0;
+   int keep_base_pack = -1;
 
struct option builtin_gc_options[] = {
OPT__QUIET(, N_("suppress progress reporting")),
@@ -362,6 +390,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
OPT_BOOL(0, "aggressive", , N_("be more thorough 
(increased runtime)")),
OPT_BOOL(0, "auto", _gc, N_("enable auto-gc mode")),
OPT_BOOL(0, "force", , N_("force running gc even if there 
may be another gc running")),
+   OPT_BOOL(0, "keep-largest-pack", _base_pack,
+N_("repack all other packs except the largest pack")),
OPT_END()
};
 
@@ -427,8 +457,17 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 */
daemonized = !daemonize();
}
-   } else
-   add_repack_all_option();
+   } else {
+   struct string_list keep_pack = STRING_LIST_INIT_NODUP;
+
+   if (keep_base_pack != -1) {
+   if (keep_base_pack)
+   find_base_packs(_pack);
+   }
+
+   add_repack_all_option(_pack);
+   string_list_clear(_pack, 0);
+   }
 
name = lock_repo_for_gc(force, );
if (name) {

[PATCH v4 3/7] gc: add --keep-largest-pack option

2018-03-24 Thread Nguyễn Thái Ngọc Duy
This adds a new repack mode that combines everything into a secondary
pack, leaving the largest pack alone.

This could help reduce memory pressure. On linux-2.6.git, valgrind
massif reports 1.6GB heap in "pack all" case, and 535MB in "pack
all except the base pack" case. We save roughly 1GB memory by
excluding the base pack.

This should also lower I/O because we don't have to rewrite a giant
pack every time (e.g. for linux-2.6.git that's a 1.4GB pack file)..

PS. The use of string_list here seems overkill, but we'll need it in
the next patch...

Signed-off-by: Nguyễn Thái Ngọc Duy 
---
 Documentation/git-gc.txt |  6 -
 builtin/gc.c | 47 
 t/t6500-gc.sh| 25 +
 3 files changed, 73 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-gc.txt b/Documentation/git-gc.txt
index 571b5a7e3c..bf81b8de30 100644
--- a/Documentation/git-gc.txt
+++ b/Documentation/git-gc.txt
@@ -9,7 +9,7 @@ git-gc - Cleanup unnecessary files and optimize the local 
repository
 SYNOPSIS
 
 [verse]
-'git gc' [--aggressive] [--auto] [--quiet] [--prune= | --no-prune] 
[--force]
+'git gc' [--aggressive] [--auto] [--quiet] [--prune= | --no-prune] 
[--force] [--keep-largest-pack]
 
 DESCRIPTION
 ---
@@ -78,6 +78,10 @@ automatic consolidation of packs.
Force `git gc` to run even if there may be another `git gc`
instance running on this repository.
 
+--keep-largest-pack::
+   All packs except the largest pack and those marked with a
+   `.keep` files are consolidated into a single pack.
+
 Configuration
 -
 
diff --git a/builtin/gc.c b/builtin/gc.c
index 77fa720bd0..9a09cf53b0 100644
--- a/builtin/gc.c
+++ b/builtin/gc.c
@@ -164,6 +164,24 @@ static int too_many_loose_objects(void)
return needed;
 }
 
+static void find_base_packs(struct string_list *packs)
+{
+   struct packed_git *p, *base = NULL;
+
+   prepare_packed_git();
+
+   for (p = packed_git; p; p = p->next) {
+   if (!p->pack_local)
+   continue;
+   if (!base || base->pack_size < p->pack_size) {
+   base = p;
+   }
+   }
+
+   if (base)
+   string_list_append(packs, base->pack_name);
+}
+
 static int too_many_packs(void)
 {
struct packed_git *p;
@@ -187,7 +205,13 @@ static int too_many_packs(void)
return gc_auto_pack_limit < cnt;
 }
 
-static void add_repack_all_option(void)
+static int keep_one_pack(struct string_list_item *item, void *data)
+{
+   argv_array_pushf(, "--keep-pack=%s", basename(item->string));
+   return 0;
+}
+
+static void add_repack_all_option(struct string_list *keep_pack)
 {
if (prune_expire && !strcmp(prune_expire, "now"))
argv_array_push(, "-a");
@@ -196,6 +220,9 @@ static void add_repack_all_option(void)
if (prune_expire)
argv_array_pushf(, "--unpack-unreachable=%s", 
prune_expire);
}
+
+   if (keep_pack)
+   for_each_string_list(keep_pack, keep_one_pack, NULL);
 }
 
 static void add_repack_incremental_option(void)
@@ -219,7 +246,7 @@ static int need_to_gc(void)
 * there is no need.
 */
if (too_many_packs())
-   add_repack_all_option();
+   add_repack_all_option(NULL);
else if (too_many_loose_objects())
add_repack_incremental_option();
else
@@ -353,6 +380,7 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
const char *name;
pid_t pid;
int daemonized = 0;
+   int keep_base_pack = -1;
 
struct option builtin_gc_options[] = {
OPT__QUIET(, N_("suppress progress reporting")),
@@ -362,6 +390,8 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
OPT_BOOL(0, "aggressive", , N_("be more thorough 
(increased runtime)")),
OPT_BOOL(0, "auto", _gc, N_("enable auto-gc mode")),
OPT_BOOL(0, "force", , N_("force running gc even if there 
may be another gc running")),
+   OPT_BOOL(0, "keep-largest-pack", _base_pack,
+N_("repack all other packs except the largest pack")),
OPT_END()
};
 
@@ -427,8 +457,17 @@ int cmd_gc(int argc, const char **argv, const char *prefix)
 */
daemonized = !daemonize();
}
-   } else
-   add_repack_all_option();
+   } else {
+   struct string_list keep_pack = STRING_LIST_INIT_NODUP;
+
+   if (keep_base_pack != -1) {
+   if (keep_base_pack)
+   find_base_packs(_pack);
+   }
+
+   add_repack_all_option(_pack);
+   string_list_clear(_pack, 0);
+   }
 
name = lock_repo_for_gc(force, );
if (name) {