On Mon, Aug 27 2018, Ævar Arnfjörð Bjarmason wrote:
> From: René Scharfe
>
> Object IDs to skip are stored in a shared static oid_array. Lookups do
> a binary search on the sorted array. The code checks if the object IDs
> are already in the correct order while loading and skips sorting in that
> case. Lookups are done before reporting a (non-fatal) corruption and
> before checking .gitmodules files.
>
> Simplify the code by using an oidset instead. Memory usage is a bit
> higher, but we don't need to worry about any sort order anymore. Embed
> the oidset into struct fsck_options to make its ownership clear (no
> hidden sharing) and avoid unnecessary pointer indirection.
>
> Performance on repositories with a low number of reported issues and
> .gitmodules files (i.e. the usual case) won't be affected much. The
> oidset should be a bit quicker with higher numbers of bad objects in
> the skipList.
I didn't change this commit message at all, but FWIW this still has me
somewhat confused. What is the interaction with .gitmodules being
described here? You also mentioned it in
https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b...@web.de/
(but I don't get that either)
Does that just mean that when cloning with --recursive with
transfer.fsckObjects=true we'll re-read the file for each "clone"
invocation, both for the main project and everything listed in
.gitmodules?
If so, I think something like this commit message would be clearer:
fsck: use oidset instead of oid_array for skipList
Change the implementation of the skipList feature to use oidset
instead of oid_array to store SHA-1s for later lookup.
This list is parsed once on startup by fsck, fetch-pack or
receive-pack depending on the *.skipList config in use, so for fetch
it's currently re-parsed for each submodule fetch.
Memory usage is a bit higher, but we don't need to keep track of the
sort order anymore. Embed the oidset into struct fsck_options to
make its ownership clear (no hidden sharing) and avoid unnecessary
pointer indirection.
Then let's just attach the test program you wrote in
https://public-inbox.org/git/113aa2d7-6f66-8d03-dda4-7337cda9b...@web.de/
and note the results and let them speak for themselves.
I can do all that if that seems fine to you. I also notice that the test
case only sets up a case where all the items on the skip list are bad
commits in the repo, it would be interesting to test with a few needles
in a large haystack (I can modify it to do that...).
> Helped-by: Ævar Arnfjörð Bjarmason
> Signed-off-by: Rene Scharfe
> Signed-off-by: Ævar Arnfjörð Bjarmason
> ---
> Documentation/config.txt | 11 ++-
> fsck.c | 23 ++-
> fsck.h | 8 +---
> 3 files changed, 13 insertions(+), 29 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index a8dfafa61d..3d0556e85d 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -1729,11 +1729,12 @@ all three of them they must all set to the same
> values.
> +
> Older versions of Git (before 2.20) documented that the object names
> list should be sorted. This was never a requirement, the object names
> -can appear in any order, but when reading the list we track whether
> -the list is sorted for the purposes of an internal binary search
> -implementation, which can save itself some work with an already sorted
> -list. Unless you have a humongous list there's no reason to go out of
> -your way to pre-sort the list.
> +could appear in any order, but when reading the list we tracked whether
> +the list was sorted for the purposes of an internal binary search
> +implementation, which could save itself some work with an already sorted
> +list. Unless you had a humongous list there was no reason to go out of
> +your way to pre-sort the list. After Git version 2.20 a hash implementation
> +is used instead, so there's now no reason to pre-sort the list.
>
> gc.aggressiveDepth::
> The depth parameter used in the delta compression
> diff --git a/fsck.c b/fsck.c
> index 972a26b9ba..4c643f1d40 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -10,7 +10,6 @@
> #include "fsck.h"
> #include "refs.h"
> #include "utf8.h"
> -#include "sha1-array.h"
> #include "decorate.h"
> #include "oidset.h"
> #include "packfile.h"
> @@ -182,19 +181,10 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
>
> static void init_skiplist(struct fsck_options *options, const char *path)
> {
> - static struct oid_array skiplist = OID_ARRAY_INIT;
> - int sorted;
> FILE *fp;
> struct strbuf sb = STRBUF_INIT;
> struct object_id oid;
>
> - if (options->skiplist)
> - sorted = options->skiplist->sorted;
> - else {
> - sorted = 1;
> - options->skiplist =
> - }
> -
> fp = fopen(path, "r");
> if (!fp)
> die("Could not open skip list: %s",