[PATCH 3/4] archive: don't queue excluded directories
Reject directories with the attribute export-ignore already while queuing them. This prevents read_tree_recursive() from descending into them and this avoids write_archive_entry() rejecting them later on, which queue_or_write_archive_entry() is not prepared for. Borrow the existing strbuf to build the full path to avoid string copies and extra allocations; just make sure we restore the original value before moving on. Keep checking any other attributes in write_archive_entry() as before, but avoid checking them twice. Signed-off-by: Rene Scharfe--- archive.c | 32 +--- t/t5001-archive-attr.sh | 4 ++-- 2 files changed, 27 insertions(+), 9 deletions(-) diff --git a/archive.c b/archive.c index 8e5f632912..1ab8d3a1d7 100644 --- a/archive.c +++ b/archive.c @@ -121,17 +121,21 @@ static int check_attr_export_subst(const struct attr_check *check) return check && ATTR_TRUE(check->items[1].value); } +static int should_queue_directories(const struct archiver_args *args) +{ + return args->pathspec.has_wildcard; +} + static int write_archive_entry(const unsigned char *sha1, const char *base, int baselen, const char *filename, unsigned mode, int stage, void *context) { static struct strbuf path = STRBUF_INIT; - const struct attr_check *check; struct archiver_context *c = context; struct archiver_args *args = c->args; write_archive_entry_fn_t write_entry = c->write_entry; - const char *path_without_prefix; int err; + const char *path_without_prefix; args->convert = 0; strbuf_reset(); @@ -143,10 +147,13 @@ static int write_archive_entry(const unsigned char *sha1, const char *base, strbuf_addch(, '/'); path_without_prefix = path.buf + args->baselen; - check = get_archive_attrs(path_without_prefix); - if (check_attr_export_ignore(check)) - return 0; - args->convert = check_attr_export_subst(check); + if (!S_ISDIR(mode) || !should_queue_directories(args)) { + const struct attr_check *check; + check = get_archive_attrs(path_without_prefix); + if (check_attr_export_ignore(check)) + return 0; + args->convert = check_attr_export_subst(check); + } if (S_ISDIR(mode) || S_ISGITLINK(mode)) { if (args->verbose) @@ -219,6 +226,17 @@ static int queue_or_write_archive_entry(const unsigned char *sha1, } if (S_ISDIR(mode)) { + size_t baselen = base->len; + const struct attr_check *check; + + /* Borrow base, but restore its original value when done. */ + strbuf_addstr(base, filename); + strbuf_addch(base, '/'); + check = get_archive_attrs(base->buf); + strbuf_setlen(base, baselen); + + if (check_attr_export_ignore(check)) + return 0; queue_directory(sha1, base, filename, mode, stage, c); return READ_TREE_RECURSIVE; @@ -272,7 +290,7 @@ int write_archive_entries(struct archiver_args *args, } err = read_tree_recursive(args->tree, "", 0, 0, >pathspec, - args->pathspec.has_wildcard ? + should_queue_directories(args) ? queue_or_write_archive_entry : write_archive_entry_buf, ); diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh index 063622bc71..897f6f06d5 100755 --- a/t/t5001-archive-attr.sh +++ b/t/t5001-archive-attr.sh @@ -76,7 +76,7 @@ test_expect_existsarchive-pathspec/ignored-by-worktree test_expect_missingarchive-pathspec/excluded-by-pathspec.d failure test_expect_missingarchive-pathspec/excluded-by-pathspec.d/file -test_expect_failure 'git archive with wildcard pathspec' ' +test_expect_success 'git archive with wildcard pathspec' ' git archive HEAD ":!excluded-by-p*" >archive-pathspec-wildcard.tar && extract_tar_to_dir archive-pathspec-wildcard ' @@ -85,7 +85,7 @@ test_expect_missing archive-pathspec-wildcard/ignored test_expect_missingarchive-pathspec-wildcard/ignored-by-tree test_expect_missingarchive-pathspec-wildcard/ignored-by-tree.d test_expect_missingarchive-pathspec-wildcard/ignored-by-tree.d/file -test_expect_exists archive-pathspec-wildcard/ignored-by-worktree failure +test_expect_exists archive-pathspec-wildcard/ignored-by-worktree test_expect_missingarchive-pathspec-wildcard/excluded-by-pathspec.d test_expect_missingarchive-pathspec-wildcard/excluded-by-pathspec.d/file -- 2.14.1
[PATCH 4/4] archive: queue directories for all types of pathspecs
When read_tree_recursive() encounters a directory excluded by a pathspec then it enters it anyway because it might contain included entries. It calls the callback function before it is able to decide if the directory is actually needed. For that reason git archive adds directories to a queue and writes entries for them only when it encounters the first child item -- but only if pathspecs with wildcards are used. Do the same for literal pathspecs as well, as the reasoning above applies to them, too. This prevents git archive from writing entries for excluded directories. Signed-off-by: Rene Scharfe--- archive.c | 2 +- t/t5001-archive-attr.sh | 2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/archive.c b/archive.c index 1ab8d3a1d7..174c0555b6 100644 --- a/archive.c +++ b/archive.c @@ -123,7 +123,7 @@ static int check_attr_export_subst(const struct attr_check *check) static int should_queue_directories(const struct archiver_args *args) { - return args->pathspec.has_wildcard; + return args->pathspec.nr; } static int write_archive_entry(const unsigned char *sha1, const char *base, diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh index 897f6f06d5..e9aa97117a 100755 --- a/t/t5001-archive-attr.sh +++ b/t/t5001-archive-attr.sh @@ -73,7 +73,7 @@ test_expect_missing archive-pathspec/ignored-by-tree test_expect_missingarchive-pathspec/ignored-by-tree.d test_expect_missingarchive-pathspec/ignored-by-tree.d/file test_expect_exists archive-pathspec/ignored-by-worktree -test_expect_missingarchive-pathspec/excluded-by-pathspec.d failure +test_expect_missingarchive-pathspec/excluded-by-pathspec.d test_expect_missingarchive-pathspec/excluded-by-pathspec.d/file test_expect_success 'git archive with wildcard pathspec' ' -- 2.14.1
[PATCH 2/4] archive: factor out helper functions for handling attributes
Add helpers for accessing attributes that encapsulate the details of how to retrieve their values. Signed-off-by: Rene Scharfe--- archive.c | 31 +++ 1 file changed, 23 insertions(+), 8 deletions(-) diff --git a/archive.c b/archive.c index 557dd2db85..8e5f632912 100644 --- a/archive.c +++ b/archive.c @@ -103,12 +103,30 @@ struct archiver_context { struct directory *bottom; }; +static const struct attr_check *get_archive_attrs(const char *path) +{ + static struct attr_check *check; + if (!check) + check = attr_check_initl("export-ignore", "export-subst", NULL); + return git_check_attr(path, check) ? NULL : check; +} + +static int check_attr_export_ignore(const struct attr_check *check) +{ + return check && ATTR_TRUE(check->items[0].value); +} + +static int check_attr_export_subst(const struct attr_check *check) +{ + return check && ATTR_TRUE(check->items[1].value); +} + static int write_archive_entry(const unsigned char *sha1, const char *base, int baselen, const char *filename, unsigned mode, int stage, void *context) { static struct strbuf path = STRBUF_INIT; - static struct attr_check *check; + const struct attr_check *check; struct archiver_context *c = context; struct archiver_args *args = c->args; write_archive_entry_fn_t write_entry = c->write_entry; @@ -125,13 +143,10 @@ static int write_archive_entry(const unsigned char *sha1, const char *base, strbuf_addch(, '/'); path_without_prefix = path.buf + args->baselen; - if (!check) - check = attr_check_initl("export-ignore", "export-subst", NULL); - if (!git_check_attr(path_without_prefix, check)) { - if (ATTR_TRUE(check->items[0].value)) - return 0; - args->convert = ATTR_TRUE(check->items[1].value); - } + check = get_archive_attrs(path_without_prefix); + if (check_attr_export_ignore(check)) + return 0; + args->convert = check_attr_export_subst(check); if (S_ISDIR(mode) || S_ISGITLINK(mode)) { if (args->verbose) -- 2.14.1
[PATCH 1/4] t5001: add tests for export-ignore attributes and exclude pathspecs
Demonstrate mishandling of the attribute export-ignore by git archive when used together with pathspecs. Wildcard pathspecs can even cause it to abort. And a directory excluded without a wildcard is still included as an empty folder in the archive. Test-case-by: David AdamSigned-off-by: Rene Scharfe --- t/t5001-archive-attr.sh | 47 --- 1 file changed, 44 insertions(+), 3 deletions(-) diff --git a/t/t5001-archive-attr.sh b/t/t5001-archive-attr.sh index b04d955bfa..063622bc71 100755 --- a/t/t5001-archive-attr.sh +++ b/t/t5001-archive-attr.sh @@ -7,11 +7,15 @@ test_description='git archive attribute tests' SUBSTFORMAT='%H (%h)%n' test_expect_exists() { - test_expect_success " $1 exists" "test -e $1" + test_expect_${2:-success} " $1 exists" "test -e $1" } test_expect_missing() { - test_expect_success " $1 does not exist" "test ! -e $1" + test_expect_${2:-success} " $1 does not exist" "test ! -e $1" +} + +extract_tar_to_dir () { + (mkdir "$1" && cd "$1" && "$TAR" xf -) <"$1.tar" } test_expect_success 'setup' ' @@ -21,12 +25,19 @@ test_expect_success 'setup' ' echo ignored by tree >ignored-by-tree && echo ignored-by-tree export-ignore >.gitattributes && - git add ignored-by-tree .gitattributes && + mkdir ignored-by-tree.d && + >ignored-by-tree.d/file && + echo ignored-by-tree.d export-ignore >>.gitattributes && + git add ignored-by-tree ignored-by-tree.d .gitattributes && echo ignored by worktree >ignored-by-worktree && echo ignored-by-worktree export-ignore >.gitattributes && git add ignored-by-worktree && + mkdir excluded-by-pathspec.d && + >excluded-by-pathspec.d/file && + git add excluded-by-pathspec.d && + printf "A\$Format:%s\$O" "$SUBSTFORMAT" >nosubstfile && printf "A\$Format:%s\$O" "$SUBSTFORMAT" >substfile1 && printf "A not substituted O" >substfile2 && @@ -46,7 +57,37 @@ test_expect_success 'git archive' ' test_expect_missingarchive/ignored test_expect_missingarchive/ignored-by-tree +test_expect_missingarchive/ignored-by-tree.d +test_expect_missingarchive/ignored-by-tree.d/file test_expect_exists archive/ignored-by-worktree +test_expect_exists archive/excluded-by-pathspec.d +test_expect_exists archive/excluded-by-pathspec.d/file + +test_expect_success 'git archive with pathspec' ' + git archive HEAD ":!excluded-by-pathspec.d" >archive-pathspec.tar && + extract_tar_to_dir archive-pathspec +' + +test_expect_missingarchive-pathspec/ignored +test_expect_missingarchive-pathspec/ignored-by-tree +test_expect_missingarchive-pathspec/ignored-by-tree.d +test_expect_missingarchive-pathspec/ignored-by-tree.d/file +test_expect_exists archive-pathspec/ignored-by-worktree +test_expect_missingarchive-pathspec/excluded-by-pathspec.d failure +test_expect_missingarchive-pathspec/excluded-by-pathspec.d/file + +test_expect_failure 'git archive with wildcard pathspec' ' + git archive HEAD ":!excluded-by-p*" >archive-pathspec-wildcard.tar && + extract_tar_to_dir archive-pathspec-wildcard +' + +test_expect_missingarchive-pathspec-wildcard/ignored +test_expect_missingarchive-pathspec-wildcard/ignored-by-tree +test_expect_missingarchive-pathspec-wildcard/ignored-by-tree.d +test_expect_missingarchive-pathspec-wildcard/ignored-by-tree.d/file +test_expect_exists archive-pathspec-wildcard/ignored-by-worktree failure +test_expect_missingarchive-pathspec-wildcard/excluded-by-pathspec.d +test_expect_missingarchive-pathspec-wildcard/excluded-by-pathspec.d/file test_expect_success 'git archive with worktree attributes' ' git archive --worktree-attributes HEAD >worktree.tar && -- 2.14.1
Re: Bug?: git archive exclude pathspec and gitattributes export-ignore
Am 14.08.2017 um 18:43 schrieb René Scharfe: > The real solution is probably to teach tree-walk.c::do_match() how to > handle attributes and then inject ":(attr:-export-ignore)" as a default > internal pathspec in archive.c::parse_pathspec_arg() instead of handling > it in archive.c::write_archive_entry(). That's complicated and I'm not sure anymore if it's even a good idea. Let's solve this in git archive for now. t5001: add tests for export-ignore attributes and exclude pathspecs archive: factor out helper functions for handling attributes archive: don't queue excluded directories archive: queue directories for all types of pathspecs archive.c | 49 + t/t5001-archive-attr.sh | 47 --- 2 files changed, 85 insertions(+), 11 deletions(-) -- 2.14.1
Re: [PATCH v2 00/25] Move exported packfile funcs to its own file
On Fri, 11 Aug 2017 15:41:28 -0400 Ben Peartwrote: > Nice to see the pack file functions being refactored out. I looked at > the end result and it looked good to me. Thanks. > Do you have the energy to do a similar refactoring for the remaining > public functions residing in sha1_file.c? Perhaps a new sha1_file.h? It > would be nice to get more things out of cache.h. :) I agree that that would be desirable, but for now I'll leave that for someone else to do :-)
Re: [RFC PATCH] Updated "imported object" design
On Fri, 18 Aug 2017 10:18:37 -0400 Ben Peartwrote: > > But if there was a good way to refer to the "anti-projection" in a > > virtualized system (that is, the "real" thing or "object" behind the > > "virtual" thing or "image"), then maybe the "virtualized" language is > > the best. (And I would gladly change - I'm having a hard time coming up > > with a name for the "anti-projection" in the "lazy" language.) > > > > The most common "anti-virtual" language I'm familiar with is "physical." > Virtual machine <-> physical machine. Virtual world <-> physical > world. Virtual repo, commit, tree, blob - physical repo, commit, tree, > blob. I'm not thrilled but I think it works... I was thinking more along the lines of the "entity that projects the virtualization", not the opposite of a "virtualization" - "physical" might work for the latter but probably not the former. After some in-office discussion, if we stick to the "promise" concept, maybe we have something like this: In a partial clone, the origin acts as a promisor of objects. Every object obtained from the promisor also acts as a promise that any object directly or indirectly referenced from that object is fetchable from the promisor. > > This is not true if you're fetching from another repo > > This isn't a case we've explicitly dealt with (multiple remotes into a > virtualized repo). Our behavior today would be that once you set the > "virtual repo" flag on the repo (this happens at clone for us), all > remotes are treated as virtual as well (ie we don't differentiate > behavior based on which remote was used). Our "custom fetcher" always > uses "origin" and some custom settings for a cache-server saved in the > .git/config file when asked to fetch missing objects. > > This is probably a good model to stick with at least initially as trying > to solve multiple possible "virtual" remotes as well as mingling > virtualized and non-virtualized remotes and all the mixed cases that can > come up makes my head hurt. We should probably address that in a > different thread. :) OK, let's stick to the current model first then, whether our opinion on other remotes is (1) "we won't have any other remotes so we don't care", (2) "we have other remotes but it's fine to make sure that they don't introduce any new missing objects", or (3) "we need other remotes to introduce missing objects, but we can build that after this foundation is laid". > > or if you're using > > receive-pack, but (1) I think these are not used as much in such a > > situation, and (2) if you do use them, the slowness only "kicks in" if > > you do not have the objects referred to (whether non-"imported" or > > "imported") and thus have to check the references in all "imported" > > objects. > > > > Is there any case where receive-pack is used on the client side? I'm > only aware of it being used on the server side to receive packs pushed > from the client. If it is not used in a virtualized client, then we > would not need to do anything different for receive-pack. This happens if another repo decides to push to the virtualized client, which (as I wrote) I don't expect to happen often. My intention is to ensure that receive-pack will still work. > That is another good point. Given the discussion above about not > needing to do the connectivity test for fetch/clone - the potential perf > hit of loading/parsing all the various objects to build up the oidset is > much less of an issue. Agreed.
Git makes a merge commit but as a normal (non-merge) commit
While merging if I do certain actions then the merge commit is made with the merge message but as a normal (non-merge) commit. Repro steps: - Set GIT_MERGE_AUTOEDIT=yes (set other than "no") in .bashrc - Make a merge commit with no conflicts. (external text editor shows the generated merge message) - Focus on Git Bash and Ctrl-C. - Commit (git commit). Actual behavior: Git makes a normal (non-merge) commit (squash merge) but with the merge commit message. It looks like a bug to me. This is very confusing later on as the repo topology would show that the branch is not merged in and there is not an easy way to find out when the merge was made. Expected behavior: Git should stay in a MERGING state. The user can choose to either abort the merge or continue the merge (git merge --continue OR git commit). This does not happen in case of conflicts (at least I'm not able to repro). I get a (master|MERGING) prompt till I resolve the conflicts and commit, which goes through correctly as a merge commit. Environment: $ git version git version 2.14.0.windows.2 $ bash --version GNU bash, version 4.4.12(1)-release (x86_64-pc-msys) Thanks, RM
Re: Revision resolution for remote-helpers?
Mike Hommey wrote[1]: > On Fri, Aug 18, 2017 at 03:06:37PM -0700, Jonathan Nieder wrote: >> Mike Hommey wrote: >>> The reason for the :: prefix is that it matches the :: >>> prefix used for remote helpers. >>> >>> Now, there are a few caveats: [...] >>> - msys likes to completely fuck up command lines when they contain ::. >>> For remote helpers, the alternative that works is >>> :///etc. >> >> Hm --- is there a bug already open about this (e.g. in the Git for >> Windows project or in msys) where I can read more? > > It's entirely an msys problem. Msys has weird rules to translate between > unix paths and windows paths on the command line, and botches everything > as a consequence. That's by "design". > > http://www.mingw.org/wiki/Posix_path_conversion > > (Particularly, see the last two entries) > > That only happens when calling native Windows programs from a msys > shell. Cc-ing the Git for Windows mailing list as an FYI. I have faint memories that some developers on that project have had to delve deep into Msys path modification rules. It's possible they can give us advice (e.g. about :: having been a bad choice of syntax in the first place :)). Thanks, Jonathan [1] https://public-inbox.org/git/20170818221754.3rbh35aewj5xn...@glandium.org/
[PATCH v3 04/23] pack: move open_pack_index(), parse_pack_index()
alloc_packed_git() in packfile.c is duplicated from sha1_file.c. In a subsequent commit, alloc_packed_git() will be removed from sha1_file.c. Signed-off-by: Jonathan Tan--- builtin/count-objects.c | 1 + builtin/fsck.c | 1 + builtin/pack-objects.c | 1 + cache.h | 8 --- pack-bitmap.c | 1 + pack-check.c| 1 + packfile.c | 149 packfile.h | 8 +++ sha1_file.c | 140 - sha1_name.c | 1 + 10 files changed, 163 insertions(+), 148 deletions(-) diff --git a/builtin/count-objects.c b/builtin/count-objects.c index 1d82e61f2..33343818c 100644 --- a/builtin/count-objects.c +++ b/builtin/count-objects.c @@ -10,6 +10,7 @@ #include "builtin.h" #include "parse-options.h" #include "quote.h" +#include "packfile.h" static unsigned long garbage; static off_t size_garbage; diff --git a/builtin/fsck.c b/builtin/fsck.c index 99dea7adf..c56207b21 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -15,6 +15,7 @@ #include "progress.h" #include "streaming.h" #include "decorate.h" +#include "packfile.h" #define REACHABLE 0x0001 #define SEEN 0x0002 diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c index 5c5d3d507..08f05cb84 100644 --- a/builtin/pack-objects.c +++ b/builtin/pack-objects.c @@ -25,6 +25,7 @@ #include "sha1-array.h" #include "argv-array.h" #include "mru.h" +#include "packfile.h" static const char *pack_usage[] = { N_("git pack-objects --stdout [...] [< | < ]"), diff --git a/cache.h b/cache.h index a0497d469..f271033db 100644 --- a/cache.h +++ b/cache.h @@ -1611,8 +1611,6 @@ struct pack_entry { struct packed_git *p; }; -extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path); - /* A hook to report invalid files in pack directory */ #define PACKDIR_FILE_PACK 1 #define PACKDIR_FILE_IDX 2 @@ -1647,12 +1645,6 @@ extern int odb_mkstemp(struct strbuf *template, const char *pattern); */ extern int odb_pack_keep(const char *name); -/* - * mmap the index file for the specified packfile (if it is not - * already mmapped). Return 0 on success. - */ -extern int open_pack_index(struct packed_git *); - /* * munmap the index file for the specified packfile (if it is * currently mmapped). diff --git a/pack-bitmap.c b/pack-bitmap.c index 327634cd7..cb3d14ba4 100644 --- a/pack-bitmap.c +++ b/pack-bitmap.c @@ -9,6 +9,7 @@ #include "pack-bitmap.h" #include "pack-revindex.h" #include "pack-objects.h" +#include "packfile.h" /* * An entry on the bitmap index, representing the bitmap for a given diff --git a/pack-check.c b/pack-check.c index 84469168a..2086f5bb7 100644 --- a/pack-check.c +++ b/pack-check.c @@ -2,6 +2,7 @@ #include "pack.h" #include "pack-revindex.h" #include "progress.h" +#include "packfile.h" struct idx_entry { off_toffset; diff --git a/packfile.c b/packfile.c index 60d9fc3b0..6edc43228 100644 --- a/packfile.c +++ b/packfile.c @@ -1,5 +1,6 @@ #include "cache.h" #include "mru.h" +#include "pack.h" char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, @@ -59,3 +60,151 @@ void pack_report(void) pack_open_windows, peak_pack_open_windows, sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped)); } + +/* + * Open and mmap the index file at path, perform a couple of + * consistency checks, then record its information to p. Return 0 on + * success. + */ +static int check_packed_git_idx(const char *path, struct packed_git *p) +{ + void *idx_map; + struct pack_idx_header *hdr; + size_t idx_size; + uint32_t version, nr, i, *index; + int fd = git_open(path); + struct stat st; + + if (fd < 0) + return -1; + if (fstat(fd, )) { + close(fd); + return -1; + } + idx_size = xsize_t(st.st_size); + if (idx_size < 4 * 256 + 20 + 20) { + close(fd); + return error("index file %s is too small", path); + } + idx_map = xmmap(NULL, idx_size, PROT_READ, MAP_PRIVATE, fd, 0); + close(fd); + + hdr = idx_map; + if (hdr->idx_signature == htonl(PACK_IDX_SIGNATURE)) { + version = ntohl(hdr->idx_version); + if (version < 2 || version > 2) { + munmap(idx_map, idx_size); + return error("index file %s is version %"PRIu32 +" and is not supported by this binary" +" (try upgrading GIT to a newer version)", +path, version); + } + } else + version = 1; + + nr = 0; + index = idx_map; + if (version > 1) + index += 2; /* skip index
[PATCH v3 08/23] pack: move unuse_pack()
Signed-off-by: Jonathan Tan--- cache.h | 1 - packfile.c | 9 + packfile.h | 1 + sha1_file.c | 9 - 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/cache.h b/cache.h index a27018210..0313b0b8d 100644 --- a/cache.h +++ b/cache.h @@ -1645,7 +1645,6 @@ extern int odb_mkstemp(struct strbuf *template, const char *pattern); */ extern int odb_pack_keep(const char *name); -extern void unuse_pack(struct pack_window **); extern void clear_delta_base_cache(void); extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); diff --git a/packfile.c b/packfile.c index ea451d27e..0c97c3a1a 100644 --- a/packfile.c +++ b/packfile.c @@ -596,3 +596,12 @@ unsigned char *use_pack(struct packed_git *p, *left = win->len - xsize_t(offset); return win->base + offset; } + +void unuse_pack(struct pack_window **w_cursor) +{ + struct pack_window *w = *w_cursor; + if (w) { + w->inuse_cnt--; + *w_cursor = NULL; + } +} diff --git a/packfile.h b/packfile.h index 97cfc5e70..b5db490ab 100644 --- a/packfile.h +++ b/packfile.h @@ -45,6 +45,7 @@ extern void close_pack_index(struct packed_git *); extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, size_t *); extern void close_pack_windows(struct packed_git *); extern void close_all_packs(void); +extern void unuse_pack(struct pack_window **); extern void release_pack_memory(size_t); diff --git a/sha1_file.c b/sha1_file.c index 7704801d1..84d96d0ab 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -718,15 +718,6 @@ void *xmmap(void *start, size_t length, return ret; } -void unuse_pack(struct pack_window **w_cursor) -{ - struct pack_window *w = *w_cursor; - if (w) { - w->inuse_cnt--; - *w_cursor = NULL; - } -} - static struct packed_git *alloc_packed_git(int extra) { struct packed_git *p = xmalloc(st_add(sizeof(*p), extra)); -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 09/23] pack: move add_packed_git()
Signed-off-by: Jonathan Tan--- cache.h | 1 - connected.c | 1 + packfile.c | 53 + packfile.h | 1 + sha1_file.c | 61 - 5 files changed, 55 insertions(+), 62 deletions(-) diff --git a/cache.h b/cache.h index 0313b0b8d..3625509f9 100644 --- a/cache.h +++ b/cache.h @@ -1646,7 +1646,6 @@ extern int odb_mkstemp(struct strbuf *template, const char *pattern); extern int odb_pack_keep(const char *name); extern void clear_delta_base_cache(void); -extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); /* * Make sure that a pointer access into an mmap'd index file is within bounds, diff --git a/connected.c b/connected.c index 136c2ac16..3e3f0148c 100644 --- a/connected.c +++ b/connected.c @@ -3,6 +3,7 @@ #include "sigchain.h" #include "connected.h" #include "transport.h" +#include "pack.h" /* * If we feed all the commits we want to verify to this command diff --git a/packfile.c b/packfile.c index 0c97c3a1a..d1433d8c7 100644 --- a/packfile.c +++ b/packfile.c @@ -605,3 +605,56 @@ void unuse_pack(struct pack_window **w_cursor) *w_cursor = NULL; } } + +static void try_to_free_pack_memory(size_t size) +{ + release_pack_memory(size); +} + +struct packed_git *add_packed_git(const char *path, size_t path_len, int local) +{ + static int have_set_try_to_free_routine; + struct stat st; + size_t alloc; + struct packed_git *p; + + if (!have_set_try_to_free_routine) { + have_set_try_to_free_routine = 1; + set_try_to_free_routine(try_to_free_pack_memory); + } + + /* +* Make sure a corresponding .pack file exists and that +* the index looks sane. +*/ + if (!strip_suffix_mem(path, _len, ".idx")) + return NULL; + + /* +* ".pack" is long enough to hold any suffix we're adding (and +* the use xsnprintf double-checks that) +*/ + alloc = st_add3(path_len, strlen(".pack"), 1); + p = alloc_packed_git(alloc); + memcpy(p->pack_name, path, path_len); + + xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep"); + if (!access(p->pack_name, F_OK)) + p->pack_keep = 1; + + xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack"); + if (stat(p->pack_name, ) || !S_ISREG(st.st_mode)) { + free(p); + return NULL; + } + + /* ok, it looks sane as far as we can check without +* actually mapping the pack file. +*/ + p->pack_size = st.st_size; + p->pack_local = local; + p->mtime = st.st_mtime; + if (path_len < 40 || get_sha1_hex(path + path_len - 40, p->sha1)) + hashclr(p->sha1); + return p; +} diff --git a/packfile.h b/packfile.h index b5db490ab..1e932a49e 100644 --- a/packfile.h +++ b/packfile.h @@ -46,6 +46,7 @@ extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t extern void close_pack_windows(struct packed_git *); extern void close_all_packs(void); extern void unuse_pack(struct pack_window **); +extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); extern void release_pack_memory(size_t); diff --git a/sha1_file.c b/sha1_file.c index 84d96d0ab..0929fc10e 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -718,67 +718,6 @@ void *xmmap(void *start, size_t length, return ret; } -static struct packed_git *alloc_packed_git(int extra) -{ - struct packed_git *p = xmalloc(st_add(sizeof(*p), extra)); - memset(p, 0, sizeof(*p)); - p->pack_fd = -1; - return p; -} - -static void try_to_free_pack_memory(size_t size) -{ - release_pack_memory(size); -} - -struct packed_git *add_packed_git(const char *path, size_t path_len, int local) -{ - static int have_set_try_to_free_routine; - struct stat st; - size_t alloc; - struct packed_git *p; - - if (!have_set_try_to_free_routine) { - have_set_try_to_free_routine = 1; - set_try_to_free_routine(try_to_free_pack_memory); - } - - /* -* Make sure a corresponding .pack file exists and that -* the index looks sane. -*/ - if (!strip_suffix_mem(path, _len, ".idx")) - return NULL; - - /* -* ".pack" is long enough to hold any suffix we're adding (and -* the use xsnprintf double-checks that) -*/ - alloc = st_add3(path_len, strlen(".pack"), 1); - p = alloc_packed_git(alloc); - memcpy(p->pack_name, path, path_len); - - xsnprintf(p->pack_name + path_len, alloc - path_len, ".keep"); - if (!access(p->pack_name, F_OK)) - p->pack_keep = 1; - - xsnprintf(p->pack_name + path_len, alloc - path_len, ".pack"); -
[PATCH v3 20/23] pack: move find_pack_entry() and make it global
This function needs to be global as it is used by sha1_file.c and will be used by packfile.c. Signed-off-by: Jonathan Tan--- packfile.c | 53 + packfile.h | 2 ++ sha1_file.c | 53 - 3 files changed, 55 insertions(+), 53 deletions(-) diff --git a/packfile.c b/packfile.c index ba3a5eb3a..ae5395f5f 100644 --- a/packfile.c +++ b/packfile.c @@ -1787,3 +1787,56 @@ struct packed_git *find_sha1_pack(const unsigned char *sha1, return NULL; } + +static int fill_pack_entry(const unsigned char *sha1, + struct pack_entry *e, + struct packed_git *p) +{ + off_t offset; + + if (p->num_bad_objects) { + unsigned i; + for (i = 0; i < p->num_bad_objects; i++) + if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i)) + return 0; + } + + offset = find_pack_entry_one(sha1, p); + if (!offset) + return 0; + + /* +* We are about to tell the caller where they can locate the +* requested object. We better make sure the packfile is +* still here and can be accessed before supplying that +* answer, as it may have been deleted since the index was +* loaded! +*/ + if (!is_pack_valid(p)) + return 0; + e->offset = offset; + e->p = p; + hashcpy(e->sha1, sha1); + return 1; +} + +/* + * Iff a pack file contains the object named by sha1, return true and + * store its location to e. + */ +int find_pack_entry(const unsigned char *sha1, struct pack_entry *e) +{ + struct mru_entry *p; + + prepare_packed_git(); + if (!packed_git) + return 0; + + for (p = packed_git_mru->head; p; p = p->next) { + if (fill_pack_entry(sha1, e, p->item)) { + mru_mark(packed_git_mru, p); + return 1; + } + } + return 0; +} diff --git a/packfile.h b/packfile.h index a4ff6f6ed..c9b4fcfaf 100644 --- a/packfile.h +++ b/packfile.h @@ -118,4 +118,6 @@ extern int packed_object_info(struct packed_git *pack, off_t offset, struct obje extern void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1); extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1); +extern int find_pack_entry(const unsigned char *sha1, struct pack_entry *e); + #endif diff --git a/sha1_file.c b/sha1_file.c index 8853672d2..76c86639c 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1074,59 +1074,6 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep) return parse_sha1_header_extended(hdr, , 0); } -static int fill_pack_entry(const unsigned char *sha1, - struct pack_entry *e, - struct packed_git *p) -{ - off_t offset; - - if (p->num_bad_objects) { - unsigned i; - for (i = 0; i < p->num_bad_objects; i++) - if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i)) - return 0; - } - - offset = find_pack_entry_one(sha1, p); - if (!offset) - return 0; - - /* -* We are about to tell the caller where they can locate the -* requested object. We better make sure the packfile is -* still here and can be accessed before supplying that -* answer, as it may have been deleted since the index was -* loaded! -*/ - if (!is_pack_valid(p)) - return 0; - e->offset = offset; - e->p = p; - hashcpy(e->sha1, sha1); - return 1; -} - -/* - * Iff a pack file contains the object named by sha1, return true and - * store its location to e. - */ -static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e) -{ - struct mru_entry *p; - - prepare_packed_git(); - if (!packed_git) - return 0; - - for (p = packed_git_mru->head; p; p = p->next) { - if (fill_pack_entry(sha1, e, p->item)) { - mru_mark(packed_git_mru, p); - return 1; - } - } - return 0; -} - static int sha1_loose_object_info(const unsigned char *sha1, struct object_info *oi, int flags) -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 10/23] pack: move install_packed_git()
Signed-off-by: Jonathan Tan--- cache.h | 1 - packfile.c | 11 ++- packfile.h | 2 ++ sha1_file.c | 9 - 4 files changed, 12 insertions(+), 11 deletions(-) diff --git a/cache.h b/cache.h index 3625509f9..c4d8bee52 100644 --- a/cache.h +++ b/cache.h @@ -1619,7 +1619,6 @@ extern void (*report_garbage)(unsigned seen_bits, const char *path); extern void prepare_packed_git(void); extern void reprepare_packed_git(void); -extern void install_packed_git(struct packed_git *pack); /* * Give a rough count of objects in the repository. This sacrifices accuracy diff --git a/packfile.c b/packfile.c index d1433d8c7..9a65aa4f6 100644 --- a/packfile.c +++ b/packfile.c @@ -28,7 +28,7 @@ static unsigned int pack_used_ctr; static unsigned int pack_mmap_calls; static unsigned int peak_pack_open_windows; static unsigned int pack_open_windows; -unsigned int pack_open_fds; +static unsigned int pack_open_fds; static unsigned int pack_max_fds; static size_t peak_pack_mapped; static size_t pack_mapped; @@ -658,3 +658,12 @@ struct packed_git *add_packed_git(const char *path, size_t path_len, int local) hashclr(p->sha1); return p; } + +void install_packed_git(struct packed_git *pack) +{ + if (pack->pack_fd != -1) + pack_open_fds++; + + pack->next = packed_git; + packed_git = pack; +} diff --git a/packfile.h b/packfile.h index 1e932a49e..a18029184 100644 --- a/packfile.h +++ b/packfile.h @@ -28,6 +28,8 @@ extern unsigned int pack_open_fds; extern struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path); +extern void install_packed_git(struct packed_git *pack); + extern void pack_report(void); /* diff --git a/sha1_file.c b/sha1_file.c index 0929fc10e..b77e7e3c3 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -718,15 +718,6 @@ void *xmmap(void *start, size_t length, return ret; } -void install_packed_git(struct packed_git *pack) -{ - if (pack->pack_fd != -1) - pack_open_fds++; - - pack->next = packed_git; - packed_git = pack; -} - void (*report_garbage)(unsigned seen_bits, const char *path); static void report_helper(const struct string_list *list, -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 12/23] pack: move unpack_object_header_buffer()
Signed-off-by: Jonathan Tan--- cache.h | 1 - packfile.c | 25 + packfile.h | 2 ++ sha1_file.c | 25 - 4 files changed, 27 insertions(+), 26 deletions(-) diff --git a/cache.h b/cache.h index 63765d481..75cc0c497 100644 --- a/cache.h +++ b/cache.h @@ -1669,7 +1669,6 @@ extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git *) extern int is_pack_valid(struct packed_git *); extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *); -extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep); extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t); extern int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *); diff --git a/packfile.c b/packfile.c index 9cf462856..43b708812 100644 --- a/packfile.c +++ b/packfile.c @@ -884,3 +884,28 @@ void reprepare_packed_git(void) prepare_packed_git_run_once = 0; prepare_packed_git(); } + +unsigned long unpack_object_header_buffer(const unsigned char *buf, + unsigned long len, enum object_type *type, unsigned long *sizep) +{ + unsigned shift; + unsigned long size, c; + unsigned long used = 0; + + c = buf[used++]; + *type = (c >> 4) & 7; + size = c & 15; + shift = 4; + while (c & 0x80) { + if (len <= used || bitsizeof(long) <= shift) { + error("bad object header"); + size = used = 0; + break; + } + c = buf[used++]; + size += (c & 0x7f) << shift; + shift += 7; + } + *sizep = size; + return used; +} diff --git a/packfile.h b/packfile.h index 1cfda1d00..9f36e0112 100644 --- a/packfile.h +++ b/packfile.h @@ -62,6 +62,8 @@ extern void close_all_packs(void); extern void unuse_pack(struct pack_window **); extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); +extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep); + extern void release_pack_memory(size_t); extern int open_packed_git(struct packed_git *p); diff --git a/sha1_file.c b/sha1_file.c index 51bb4d1db..b57b0 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -914,31 +914,6 @@ void *map_sha1_file(const unsigned char *sha1, unsigned long *size) return map_sha1_file_1(NULL, sha1, size); } -unsigned long unpack_object_header_buffer(const unsigned char *buf, - unsigned long len, enum object_type *type, unsigned long *sizep) -{ - unsigned shift; - unsigned long size, c; - unsigned long used = 0; - - c = buf[used++]; - *type = (c >> 4) & 7; - size = c & 15; - shift = 4; - while (c & 0x80) { - if (len <= used || bitsizeof(long) <= shift) { - error("bad object header"); - size = used = 0; - break; - } - c = buf[used++]; - size += (c & 0x7f) << shift; - shift += 7; - } - *sizep = size; - return used; -} - static int unpack_sha1_short_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz) -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 11/23] pack: move {,re}prepare_packed_git and approximate_object_count
Signed-off-by: Jonathan Tan--- builtin/gc.c | 1 + bulk-checkin.c | 1 + cache.h| 15 connected.c| 2 +- fetch-pack.c | 1 + http-backend.c | 1 + packfile.c | 217 + packfile.h | 16 - path.c | 1 + server-info.c | 1 + sha1_file.c| 214 11 files changed, 238 insertions(+), 232 deletions(-) diff --git a/builtin/gc.c b/builtin/gc.c index e6b84475a..3c78fcb9b 100644 --- a/builtin/gc.c +++ b/builtin/gc.c @@ -19,6 +19,7 @@ #include "sigchain.h" #include "argv-array.h" #include "commit.h" +#include "packfile.h" #define FAILED_RUN "failed to run %s" diff --git a/bulk-checkin.c b/bulk-checkin.c index 5be7ce5c7..9a1f6c49a 100644 --- a/bulk-checkin.c +++ b/bulk-checkin.c @@ -6,6 +6,7 @@ #include "csum-file.h" #include "pack.h" #include "strbuf.h" +#include "packfile.h" static struct bulk_checkin_state { unsigned plugged:1; diff --git a/cache.h b/cache.h index c4d8bee52..63765d481 100644 --- a/cache.h +++ b/cache.h @@ -1611,21 +1611,6 @@ struct pack_entry { struct packed_git *p; }; -/* A hook to report invalid files in pack directory */ -#define PACKDIR_FILE_PACK 1 -#define PACKDIR_FILE_IDX 2 -#define PACKDIR_FILE_GARBAGE 4 -extern void (*report_garbage)(unsigned seen_bits, const char *path); - -extern void prepare_packed_git(void); -extern void reprepare_packed_git(void); - -/* - * Give a rough count of objects in the repository. This sacrifices accuracy - * for speed. - */ -unsigned long approximate_object_count(void); - extern struct packed_git *find_sha1_pack(const unsigned char *sha1, struct packed_git *packs); diff --git a/connected.c b/connected.c index 3e3f0148c..f416b0505 100644 --- a/connected.c +++ b/connected.c @@ -3,7 +3,7 @@ #include "sigchain.h" #include "connected.h" #include "transport.h" -#include "pack.h" +#include "packfile.h" /* * If we feed all the commits we want to verify to this command diff --git a/fetch-pack.c b/fetch-pack.c index fbbc99c88..105506e9a 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -17,6 +17,7 @@ #include "prio-queue.h" #include "sha1-array.h" #include "oidset.h" +#include "packfile.h" static int transfer_unpack_limit = -1; static int fetch_unpack_limit = -1; diff --git a/http-backend.c b/http-backend.c index 519025d2c..8076b1d5e 100644 --- a/http-backend.c +++ b/http-backend.c @@ -9,6 +9,7 @@ #include "string-list.h" #include "url.h" #include "argv-array.h" +#include "packfile.h" static const char content_type[] = "Content-Type"; static const char content_length[] = "Content-Length"; diff --git a/packfile.c b/packfile.c index 9a65aa4f6..9cf462856 100644 --- a/packfile.c +++ b/packfile.c @@ -1,6 +1,9 @@ #include "cache.h" #include "mru.h" #include "pack.h" +#include "dir.h" +#include "mergesort.h" +#include "packfile.h" char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, @@ -667,3 +670,217 @@ void install_packed_git(struct packed_git *pack) pack->next = packed_git; packed_git = pack; } + +void (*report_garbage)(unsigned seen_bits, const char *path); + +static void report_helper(const struct string_list *list, + int seen_bits, int first, int last) +{ + if (seen_bits == (PACKDIR_FILE_PACK|PACKDIR_FILE_IDX)) + return; + + for (; first < last; first++) + report_garbage(seen_bits, list->items[first].string); +} + +static void report_pack_garbage(struct string_list *list) +{ + int i, baselen = -1, first = 0, seen_bits = 0; + + if (!report_garbage) + return; + + string_list_sort(list); + + for (i = 0; i < list->nr; i++) { + const char *path = list->items[i].string; + if (baselen != -1 && + strncmp(path, list->items[first].string, baselen)) { + report_helper(list, seen_bits, first, i); + baselen = -1; + seen_bits = 0; + } + if (baselen == -1) { + const char *dot = strrchr(path, '.'); + if (!dot) { + report_garbage(PACKDIR_FILE_GARBAGE, path); + continue; + } + baselen = dot - path + 1; + first = i; + } + if (!strcmp(path + baselen, "pack")) + seen_bits |= 1; + else if (!strcmp(path + baselen, "idx")) + seen_bits |= 2; + } + report_helper(list, seen_bits, first, list->nr); +} + +static void prepare_packed_git_one(char *objdir, int local) +{ + struct strbuf path = STRBUF_INIT; + size_t dirnamelen; + DIR
[PATCH v3 05/23] pack: move release_pack_memory()
The function unuse_one_window() needs to be temporarily made global. Its scope will be restored to static in a subsequent commit. Signed-off-by: Jonathan Tan--- git-compat-util.h | 2 -- packfile.c| 49 + packfile.h| 4 sha1_file.c | 49 - 4 files changed, 53 insertions(+), 51 deletions(-) diff --git a/git-compat-util.h b/git-compat-util.h index db9c22de7..201056e2d 100644 --- a/git-compat-util.h +++ b/git-compat-util.h @@ -749,8 +749,6 @@ const char *inet_ntop(int af, const void *src, char *dst, size_t size); extern int git_atexit(void (*handler)(void)); #endif -extern void release_pack_memory(size_t); - typedef void (*try_to_free_t)(size_t); extern try_to_free_t set_try_to_free_routine(try_to_free_t); diff --git a/packfile.c b/packfile.c index 6edc43228..8daa74ad1 100644 --- a/packfile.c +++ b/packfile.c @@ -208,3 +208,52 @@ struct packed_git *parse_pack_index(unsigned char *sha1, const char *idx_path) return p; } + +static void scan_windows(struct packed_git *p, + struct packed_git **lru_p, + struct pack_window **lru_w, + struct pack_window **lru_l) +{ + struct pack_window *w, *w_l; + + for (w_l = NULL, w = p->windows; w; w = w->next) { + if (!w->inuse_cnt) { + if (!*lru_w || w->last_used < (*lru_w)->last_used) { + *lru_p = p; + *lru_w = w; + *lru_l = w_l; + } + } + w_l = w; + } +} + +int unuse_one_window(struct packed_git *current) +{ + struct packed_git *p, *lru_p = NULL; + struct pack_window *lru_w = NULL, *lru_l = NULL; + + if (current) + scan_windows(current, _p, _w, _l); + for (p = packed_git; p; p = p->next) + scan_windows(p, _p, _w, _l); + if (lru_p) { + munmap(lru_w->base, lru_w->len); + pack_mapped -= lru_w->len; + if (lru_l) + lru_l->next = lru_w->next; + else + lru_p->windows = lru_w->next; + free(lru_w); + pack_open_windows--; + return 1; + } + return 0; +} + +void release_pack_memory(size_t need) +{ + size_t cur = pack_mapped; + while (need >= (cur - pack_mapped) && unuse_one_window(NULL)) + ; /* nothing */ +} diff --git a/packfile.h b/packfile.h index 703887d41..f6fe1c741 100644 --- a/packfile.h +++ b/packfile.h @@ -43,4 +43,8 @@ extern void pack_report(void); */ extern int open_pack_index(struct packed_git *); +extern int unuse_one_window(struct packed_git *current); + +extern void release_pack_memory(size_t); + #endif diff --git a/sha1_file.c b/sha1_file.c index 475d2032d..d51efd78d 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -680,55 +680,6 @@ static int has_loose_object(const unsigned char *sha1) return check_and_freshen(sha1, 0); } -static void scan_windows(struct packed_git *p, - struct packed_git **lru_p, - struct pack_window **lru_w, - struct pack_window **lru_l) -{ - struct pack_window *w, *w_l; - - for (w_l = NULL, w = p->windows; w; w = w->next) { - if (!w->inuse_cnt) { - if (!*lru_w || w->last_used < (*lru_w)->last_used) { - *lru_p = p; - *lru_w = w; - *lru_l = w_l; - } - } - w_l = w; - } -} - -static int unuse_one_window(struct packed_git *current) -{ - struct packed_git *p, *lru_p = NULL; - struct pack_window *lru_w = NULL, *lru_l = NULL; - - if (current) - scan_windows(current, _p, _w, _l); - for (p = packed_git; p; p = p->next) - scan_windows(p, _p, _w, _l); - if (lru_p) { - munmap(lru_w->base, lru_w->len); - pack_mapped -= lru_w->len; - if (lru_l) - lru_l->next = lru_w->next; - else - lru_p->windows = lru_w->next; - free(lru_w); - pack_open_windows--; - return 1; - } - return 0; -} - -void release_pack_memory(size_t need) -{ - size_t cur = pack_mapped; - while (need >= (cur - pack_mapped) && unuse_one_window(NULL)) - ; /* nothing */ -} - static void mmap_limit_check(size_t length) { static size_t limit = 0; -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 18/23] pack: move find_pack_entry_one(), is_pack_valid()
Signed-off-by: Jonathan Tan--- cache.h | 8 --- packfile.c | 76 - packfile.h | 9 ++-- sha1_file.c | 73 -- 4 files changed, 82 insertions(+), 84 deletions(-) diff --git a/cache.h b/cache.h index ee75a4949..9297d078a 100644 --- a/cache.h +++ b/cache.h @@ -1626,14 +1626,6 @@ extern int odb_mkstemp(struct strbuf *template, const char *pattern); */ extern int odb_pack_keep(const char *name); -/* - * If the object named sha1 is present in the specified packfile, - * return its offset within the packfile; otherwise, return 0. - */ -extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git *); - -extern int is_pack_valid(struct packed_git *); - /* * Iterate over the files in the loose-object parts of the object * directory "path", triggering the following callbacks: diff --git a/packfile.c b/packfile.c index e914422e9..ad7336594 100644 --- a/packfile.c +++ b/packfile.c @@ -7,6 +7,7 @@ #include "delta.h" #include "list.h" #include "streaming.h" +#include "sha1-lookup.h" char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, @@ -509,7 +510,7 @@ static int open_packed_git_1(struct packed_git *p) return 0; } -int open_packed_git(struct packed_git *p) +static int open_packed_git(struct packed_git *p) { if (!open_packed_git_1(p)) return 0; @@ -1700,3 +1701,76 @@ off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n) ntohl(*((uint32_t *)(index + 4))); } } + +off_t find_pack_entry_one(const unsigned char *sha1, + struct packed_git *p) +{ + const uint32_t *level1_ofs = p->index_data; + const unsigned char *index = p->index_data; + unsigned hi, lo, stride; + static int debug_lookup = -1; + + if (debug_lookup < 0) + debug_lookup = !!getenv("GIT_DEBUG_LOOKUP"); + + if (!index) { + if (open_pack_index(p)) + return 0; + level1_ofs = p->index_data; + index = p->index_data; + } + if (p->index_version > 1) { + level1_ofs += 2; + index += 8; + } + index += 4 * 256; + hi = ntohl(level1_ofs[*sha1]); + lo = ((*sha1 == 0x0) ? 0 : ntohl(level1_ofs[*sha1 - 1])); + if (p->index_version > 1) { + stride = 20; + } else { + stride = 24; + index += 4; + } + + if (debug_lookup) + printf("%02x%02x%02x... lo %u hi %u nr %"PRIu32"\n", + sha1[0], sha1[1], sha1[2], lo, hi, p->num_objects); + + while (lo < hi) { + unsigned mi = (lo + hi) / 2; + int cmp = hashcmp(index + mi * stride, sha1); + + if (debug_lookup) + printf("lo %u hi %u rg %u mi %u\n", + lo, hi, hi - lo, mi); + if (!cmp) + return nth_packed_object_offset(p, mi); + if (cmp > 0) + hi = mi; + else + lo = mi+1; + } + return 0; +} + +int is_pack_valid(struct packed_git *p) +{ + /* An already open pack is known to be valid. */ + if (p->pack_fd != -1) + return 1; + + /* If the pack has one window completely covering the +* file size, the pack is known to be valid even if +* the descriptor is not currently open. +*/ + if (p->windows) { + struct pack_window *w = p->windows; + + if (!w->offset && w->len == p->pack_size) + return 1; + } + + /* Force the pack to open to prove its valid. */ + return !open_packed_git(p); +} diff --git a/packfile.h b/packfile.h index 8deb84bd1..4fca6fb28 100644 --- a/packfile.h +++ b/packfile.h @@ -93,6 +93,13 @@ extern const struct object_id *nth_packed_object_oid(struct object_id *, struct */ extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n); +/* + * If the object named sha1 is present in the specified packfile, + * return its offset within the packfile; otherwise, return 0. + */ +extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git *); + +extern int is_pack_valid(struct packed_git *); extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *); extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep); extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t); @@ -100,8 +107,6 @@ extern int unpack_object_header(struct packed_git *, struct pack_window **, off_ extern void
[PATCH v3 03/23] pack: move pack_report()
Signed-off-by: Jonathan Tan--- cache.h | 2 -- packfile.c | 24 packfile.h | 2 ++ sha1_file.c | 24 4 files changed, 26 insertions(+), 26 deletions(-) diff --git a/cache.h b/cache.h index aa2b4d390..a0497d469 100644 --- a/cache.h +++ b/cache.h @@ -1632,8 +1632,6 @@ unsigned long approximate_object_count(void); extern struct packed_git *find_sha1_pack(const unsigned char *sha1, struct packed_git *packs); -extern void pack_report(void); - /* * Create a temporary file rooted in the object database directory, or * die on failure. The filename is taken from "pattern", which should have the diff --git a/packfile.c b/packfile.c index 0f46e0617..60d9fc3b0 100644 --- a/packfile.c +++ b/packfile.c @@ -35,3 +35,27 @@ struct packed_git *packed_git; static struct mru packed_git_mru_storage; struct mru *packed_git_mru = _git_mru_storage; + +#define SZ_FMT PRIuMAX +static inline uintmax_t sz_fmt(size_t s) { return s; } + +void pack_report(void) +{ + fprintf(stderr, + "pack_report: getpagesize()= %10" SZ_FMT "\n" + "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n" + "pack_report: core.packedGitLimit = %10" SZ_FMT "\n", + sz_fmt(getpagesize()), + sz_fmt(packed_git_window_size), + sz_fmt(packed_git_limit)); + fprintf(stderr, + "pack_report: pack_used_ctr= %10u\n" + "pack_report: pack_mmap_calls = %10u\n" + "pack_report: pack_open_windows= %10u / %10u\n" + "pack_report: pack_mapped = " + "%10" SZ_FMT " / %10" SZ_FMT "\n", + pack_used_ctr, + pack_mmap_calls, + pack_open_windows, peak_pack_open_windows, + sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped)); +} diff --git a/packfile.h b/packfile.h index a76bb7cec..bfa94c8fe 100644 --- a/packfile.h +++ b/packfile.h @@ -33,4 +33,6 @@ extern unsigned int pack_max_fds; extern size_t peak_pack_mapped; extern size_t pack_mapped; +extern void pack_report(void); + #endif diff --git a/sha1_file.c b/sha1_file.c index 2b5ce9959..f7c8152ac 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -30,9 +30,6 @@ #include "quote.h" #include "packfile.h" -#define SZ_FMT PRIuMAX -static inline uintmax_t sz_fmt(size_t s) { return s; } - const unsigned char null_sha1[20]; const struct object_id null_oid; const struct object_id empty_tree_oid = { @@ -683,27 +680,6 @@ static int has_loose_object(const unsigned char *sha1) return check_and_freshen(sha1, 0); } -void pack_report(void) -{ - fprintf(stderr, - "pack_report: getpagesize()= %10" SZ_FMT "\n" - "pack_report: core.packedGitWindowSize = %10" SZ_FMT "\n" - "pack_report: core.packedGitLimit = %10" SZ_FMT "\n", - sz_fmt(getpagesize()), - sz_fmt(packed_git_window_size), - sz_fmt(packed_git_limit)); - fprintf(stderr, - "pack_report: pack_used_ctr= %10u\n" - "pack_report: pack_mmap_calls = %10u\n" - "pack_report: pack_open_windows= %10u / %10u\n" - "pack_report: pack_mapped = " - "%10" SZ_FMT " / %10" SZ_FMT "\n", - pack_used_ctr, - pack_mmap_calls, - pack_open_windows, peak_pack_open_windows, - sz_fmt(pack_mapped), sz_fmt(peak_pack_mapped)); -} - /* * Open and mmap the index file at path, perform a couple of * consistency checks, then record its information to p. Return 0 on -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 19/23] pack: move find_sha1_pack()
Signed-off-by: Jonathan Tan--- cache.h | 3 --- http-push.c | 1 + http-walker.c | 1 + packfile.c| 13 + packfile.h| 3 +++ sha1_file.c | 13 - 6 files changed, 18 insertions(+), 16 deletions(-) diff --git a/cache.h b/cache.h index 9297d078a..1e90bb754 100644 --- a/cache.h +++ b/cache.h @@ -1608,9 +1608,6 @@ struct pack_entry { struct packed_git *p; }; -extern struct packed_git *find_sha1_pack(const unsigned char *sha1, -struct packed_git *packs); - /* * Create a temporary file rooted in the object database directory, or * die on failure. The filename is taken from "pattern", which should have the diff --git a/http-push.c b/http-push.c index c91f40a61..e4c9b065c 100644 --- a/http-push.c +++ b/http-push.c @@ -11,6 +11,7 @@ #include "list-objects.h" #include "sigchain.h" #include "argv-array.h" +#include "packfile.h" #ifdef EXPAT_NEEDS_XMLPARSE_H #include diff --git a/http-walker.c b/http-walker.c index ee049cb13..1ae8363de 100644 --- a/http-walker.c +++ b/http-walker.c @@ -4,6 +4,7 @@ #include "http.h" #include "list.h" #include "transport.h" +#include "packfile.h" struct alt_base { char *base; diff --git a/packfile.c b/packfile.c index ad7336594..ba3a5eb3a 100644 --- a/packfile.c +++ b/packfile.c @@ -1774,3 +1774,16 @@ int is_pack_valid(struct packed_git *p) /* Force the pack to open to prove its valid. */ return !open_packed_git(p); } + +struct packed_git *find_sha1_pack(const unsigned char *sha1, + struct packed_git *packs) +{ + struct packed_git *p; + + for (p = packs; p; p = p->next) { + if (find_pack_entry_one(sha1, p)) + return p; + } + return NULL; + +} diff --git a/packfile.h b/packfile.h index 4fca6fb28..a4ff6f6ed 100644 --- a/packfile.h +++ b/packfile.h @@ -42,6 +42,9 @@ extern void install_packed_git(struct packed_git *pack); */ unsigned long approximate_object_count(void); +extern struct packed_git *find_sha1_pack(const unsigned char *sha1, +struct packed_git *packs); + extern void pack_report(void); /* diff --git a/sha1_file.c b/sha1_file.c index 27714f5e1..8853672d2 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1127,19 +1127,6 @@ static int find_pack_entry(const unsigned char *sha1, struct pack_entry *e) return 0; } -struct packed_git *find_sha1_pack(const unsigned char *sha1, - struct packed_git *packs) -{ - struct packed_git *p; - - for (p = packs; p; p = p->next) { - if (find_pack_entry_one(sha1, p)) - return p; - } - return NULL; - -} - static int sha1_loose_object_info(const unsigned char *sha1, struct object_info *oi, int flags) -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 21/23] pack: move has_sha1_pack()
Signed-off-by: Jonathan Tan--- builtin/prune-packed.c | 1 + cache.h| 2 -- diff.c | 1 + packfile.c | 6 ++ packfile.h | 2 ++ revision.c | 1 + sha1_file.c| 6 -- 7 files changed, 11 insertions(+), 8 deletions(-) diff --git a/builtin/prune-packed.c b/builtin/prune-packed.c index ac978ad40..97bfde24b 100644 --- a/builtin/prune-packed.c +++ b/builtin/prune-packed.c @@ -2,6 +2,7 @@ #include "cache.h" #include "progress.h" #include "parse-options.h" +#include "packfile.h" static const char * const prune_packed_usage[] = { N_("git prune-packed [-n | --dry-run] [-q | --quiet]"), diff --git a/cache.h b/cache.h index 1e90bb754..286891df4 100644 --- a/cache.h +++ b/cache.h @@ -1198,8 +1198,6 @@ extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned l extern int finalize_object_file(const char *tmpfile, const char *filename); -extern int has_sha1_pack(const unsigned char *sha1); - /* * Open the loose object at path, check its sha1, and return the contents, * type, and size. If the object is a blob, then "contents" may return NULL, diff --git a/diff.c b/diff.c index 85e714f6c..e9a1d6162 100644 --- a/diff.c +++ b/diff.c @@ -20,6 +20,7 @@ #include "string-list.h" #include "argv-array.h" #include "graph.h" +#include "packfile.h" #ifdef NO_FAST_WORKING_DIRECTORY #define FAST_WORKING_DIRECTORY 0 diff --git a/packfile.c b/packfile.c index ae5395f5f..7472ab816 100644 --- a/packfile.c +++ b/packfile.c @@ -1840,3 +1840,9 @@ int find_pack_entry(const unsigned char *sha1, struct pack_entry *e) } return 0; } + +int has_sha1_pack(const unsigned char *sha1) +{ + struct pack_entry e; + return find_pack_entry(sha1, ); +} diff --git a/packfile.h b/packfile.h index c9b4fcfaf..4945a1505 100644 --- a/packfile.h +++ b/packfile.h @@ -120,4 +120,6 @@ extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1); extern int find_pack_entry(const unsigned char *sha1, struct pack_entry *e); +extern int has_sha1_pack(const unsigned char *sha1); + #endif diff --git a/revision.c b/revision.c index 6603af944..db97b9221 100644 --- a/revision.c +++ b/revision.c @@ -19,6 +19,7 @@ #include "dir.h" #include "cache-tree.h" #include "bisect.h" +#include "packfile.h" volatile show_early_output_fn_t show_early_output; diff --git a/sha1_file.c b/sha1_file.c index 76c86639c..e4975e0ae 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1630,12 +1630,6 @@ int has_pack_index(const unsigned char *sha1) return 1; } -int has_sha1_pack(const unsigned char *sha1) -{ - struct pack_entry e; - return find_pack_entry(sha1, ); -} - int has_sha1_file_with_flags(const unsigned char *sha1, int flags) { if (!startup_info->have_repository) -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 14/23] pack: move unpack_object_header()
Signed-off-by: Jonathan Tan--- cache.h | 1 - packfile.c | 26 ++ packfile.h | 1 + sha1_file.c | 26 -- 4 files changed, 27 insertions(+), 27 deletions(-) diff --git a/cache.h b/cache.h index 87f65aeea..7adbc587d 100644 --- a/cache.h +++ b/cache.h @@ -1669,7 +1669,6 @@ extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git *) extern int is_pack_valid(struct packed_git *); extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *); -extern int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *); /* * Iterate over the files in the loose-object parts of the object diff --git a/packfile.c b/packfile.c index fa90b643e..3543b37b8 100644 --- a/packfile.c +++ b/packfile.c @@ -949,3 +949,29 @@ unsigned long get_size_from_delta(struct packed_git *p, /* Read the result size */ return get_delta_hdr_size(, delta_head+sizeof(delta_head)); } + +int unpack_object_header(struct packed_git *p, +struct pack_window **w_curs, +off_t *curpos, +unsigned long *sizep) +{ + unsigned char *base; + size_t left; + size_t used; + enum object_type type; + + /* use_pack() assures us we have [base, base + 20) available +* as a range that we can look at. (Its actually the hash +* size that is assured.) With our object header encoding +* the maximum deflated object size is 2^137, which is just +* insane, so we know won't exceed what we have been given. +*/ + base = use_pack(p, w_curs, *curpos, ); + used = unpack_object_header_buffer(base, left, , sizep); + if (!used) { + type = OBJ_BAD; + } else + *curpos += used; + + return type; +} diff --git a/packfile.h b/packfile.h index 9c3bce6b2..d22a528b5 100644 --- a/packfile.h +++ b/packfile.h @@ -64,6 +64,7 @@ extern struct packed_git *add_packed_git(const char *path, size_t path_len, int extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep); extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t); +extern int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *); extern void release_pack_memory(size_t); diff --git a/sha1_file.c b/sha1_file.c index 5d016ad6b..681dcf1c0 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1171,32 +1171,6 @@ static const unsigned char *get_delta_base_sha1(struct packed_git *p, return NULL; } -int unpack_object_header(struct packed_git *p, -struct pack_window **w_curs, -off_t *curpos, -unsigned long *sizep) -{ - unsigned char *base; - size_t left; - size_t used; - enum object_type type; - - /* use_pack() assures us we have [base, base + 20) available -* as a range that we can look at. (Its actually the hash -* size that is assured.) With our object header encoding -* the maximum deflated object size is 2^137, which is just -* insane, so we know won't exceed what we have been given. -*/ - base = use_pack(p, w_curs, *curpos, ); - used = unpack_object_header_buffer(base, left, , sizep); - if (!used) { - type = OBJ_BAD; - } else - *curpos += used; - - return type; -} - static int retry_bad_packed_offset(struct packed_git *p, off_t obj_offset) { int type; -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 15/23] pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry()
Both sha1_file.c and packfile.c now need read_object(), so a copy of read_object() was created in packfile.c. This patch makes both mark_bad_packed_object() and has_packed_and_bad() global. Unlike most of the other patches in this series, these 2 functions need to remain global. Signed-off-by: Jonathan Tan--- cache.h | 7 - packfile.c | 661 ++ packfile.h | 10 + sha1_file.c | 677 ++-- 4 files changed, 685 insertions(+), 670 deletions(-) diff --git a/cache.h b/cache.h index 7adbc587d..11aa18e6a 100644 --- a/cache.h +++ b/cache.h @@ -1194,9 +1194,6 @@ extern void *map_sha1_file(const unsigned char *sha1, unsigned long *size); extern int unpack_sha1_header(git_zstream *stream, unsigned char *map, unsigned long mapsize, void *buffer, unsigned long bufsiz); extern int parse_sha1_header(const char *hdr, unsigned long *sizep); -/* global flag to enable extra checks when accessing packed objects */ -extern int do_check_packed_object_crc; - extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned long size, const char *type); extern int finalize_object_file(const char *tmpfile, const char *filename); @@ -1629,8 +1626,6 @@ extern int odb_mkstemp(struct strbuf *template, const char *pattern); */ extern int odb_pack_keep(const char *name); -extern void clear_delta_base_cache(void); - /* * Make sure that a pointer access into an mmap'd index file is within bounds, * and can provide at least 8 bytes of data. @@ -1668,7 +1663,6 @@ extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n); extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git *); extern int is_pack_valid(struct packed_git *); -extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *); /* * Iterate over the files in the loose-object parts of the object @@ -1779,7 +1773,6 @@ struct object_info { /* Do not retry packed storage after checking packed and loose storage */ #define OBJECT_INFO_QUICK 8 extern int sha1_object_info_extended(const unsigned char *, struct object_info *, unsigned flags); -extern int packed_object_info(struct packed_git *pack, off_t offset, struct object_info *); /* Dumb servers support */ extern int update_server_info(int); diff --git a/packfile.c b/packfile.c index 3543b37b8..624cc109e 100644 --- a/packfile.c +++ b/packfile.c @@ -5,6 +5,8 @@ #include "mergesort.h" #include "packfile.h" #include "delta.h" +#include "list.h" +#include "streaming.h" char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, @@ -975,3 +977,662 @@ int unpack_object_header(struct packed_git *p, return type; } + +void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1) +{ + unsigned i; + for (i = 0; i < p->num_bad_objects; i++) + if (!hashcmp(sha1, p->bad_object_sha1 + GIT_SHA1_RAWSZ * i)) + return; + p->bad_object_sha1 = xrealloc(p->bad_object_sha1, + st_mult(GIT_MAX_RAWSZ, + st_add(p->num_bad_objects, 1))); + hashcpy(p->bad_object_sha1 + GIT_SHA1_RAWSZ * p->num_bad_objects, sha1); + p->num_bad_objects++; +} + +const struct packed_git *has_packed_and_bad(const unsigned char *sha1) +{ + struct packed_git *p; + unsigned i; + + for (p = packed_git; p; p = p->next) + for (i = 0; i < p->num_bad_objects; i++) + if (!hashcmp(sha1, p->bad_object_sha1 + 20 * i)) + return p; + return NULL; +} + +static off_t get_delta_base(struct packed_git *p, + struct pack_window **w_curs, + off_t *curpos, + enum object_type type, + off_t delta_obj_offset) +{ + unsigned char *base_info = use_pack(p, w_curs, *curpos, NULL); + off_t base_offset; + + /* use_pack() assured us we have [base_info, base_info + 20) +* as a range that we can look at without walking off the +* end of the mapped window. Its actually the hash size +* that is assured. An OFS_DELTA longer than the hash size +* is stupid, as then a REF_DELTA would be smaller to store. +*/ + if (type == OBJ_OFS_DELTA) { + unsigned used = 0; + unsigned char c = base_info[used++]; + base_offset = c & 127; + while (c & 128) { + base_offset += 1; + if (!base_offset || MSB(base_offset, 7)) + return 0; /* overflow */ + c = base_info[used++]; + base_offset = (base_offset << 7) + (c
[PATCH v3 22/23] pack: move has_pack_index()
Signed-off-by: Jonathan Tan--- cache.h | 2 -- packfile.c | 8 packfile.h | 2 ++ sha1_file.c | 8 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/cache.h b/cache.h index 286891df4..dcbe37a3f 100644 --- a/cache.h +++ b/cache.h @@ -1233,8 +1233,6 @@ extern int has_object_file_with_flags(const struct object_id *oid, int flags); */ extern int has_loose_object_nonlocal(const unsigned char *sha1); -extern int has_pack_index(const unsigned char *sha1); - extern void assert_sha1_type(const unsigned char *sha1, enum object_type expect); /* Helper to check and "touch" a file */ diff --git a/packfile.c b/packfile.c index 7472ab816..7e293761b 100644 --- a/packfile.c +++ b/packfile.c @@ -1846,3 +1846,11 @@ int has_sha1_pack(const unsigned char *sha1) struct pack_entry e; return find_pack_entry(sha1, ); } + +int has_pack_index(const unsigned char *sha1) +{ + struct stat st; + if (stat(sha1_pack_index_name(sha1), )) + return 0; + return 1; +} diff --git a/packfile.h b/packfile.h index 4945a1505..1b6ea832c 100644 --- a/packfile.h +++ b/packfile.h @@ -122,4 +122,6 @@ extern int find_pack_entry(const unsigned char *sha1, struct pack_entry *e); extern int has_sha1_pack(const unsigned char *sha1); +extern int has_pack_index(const unsigned char *sha1); + #endif diff --git a/sha1_file.c b/sha1_file.c index e4975e0ae..fa422435f 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1622,14 +1622,6 @@ int force_object_loose(const unsigned char *sha1, time_t mtime) return ret; } -int has_pack_index(const unsigned char *sha1) -{ - struct stat st; - if (stat(sha1_pack_index_name(sha1), )) - return 0; - return 1; -} - int has_sha1_file_with_flags(const unsigned char *sha1, int flags) { if (!startup_info->have_repository) -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 17/23] pack: move check_pack_index_ptr(), nth_packed_object_offset()
Signed-off-by: Jonathan Tan--- cache.h | 16 packfile.c | 33 + packfile.h | 16 sha1_file.c | 33 - 4 files changed, 49 insertions(+), 49 deletions(-) diff --git a/cache.h b/cache.h index 83aa3cc62..ee75a4949 100644 --- a/cache.h +++ b/cache.h @@ -1626,22 +1626,6 @@ extern int odb_mkstemp(struct strbuf *template, const char *pattern); */ extern int odb_pack_keep(const char *name); -/* - * Make sure that a pointer access into an mmap'd index file is within bounds, - * and can provide at least 8 bytes of data. - * - * Note that this is only necessary for variable-length segments of the file - * (like the 64-bit extended offset table), as we compare the size to the - * fixed-length parts when we open the file. - */ -extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr); - -/* - * Return the offset of the nth object within the specified packfile. - * The index must already be opened. - */ -extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n); - /* * If the object named sha1 is present in the specified packfile, * return its offset within the packfile; otherwise, return 0. diff --git a/packfile.c b/packfile.c index e9b16da94..e914422e9 100644 --- a/packfile.c +++ b/packfile.c @@ -1667,3 +1667,36 @@ const struct object_id *nth_packed_object_oid(struct object_id *oid, hashcpy(oid->hash, hash); return oid; } + +void check_pack_index_ptr(const struct packed_git *p, const void *vptr) +{ + const unsigned char *ptr = vptr; + const unsigned char *start = p->index_data; + const unsigned char *end = start + p->index_size; + if (ptr < start) + die(_("offset before start of pack index for %s (corrupt index?)"), + p->pack_name); + /* No need to check for underflow; .idx files must be at least 8 bytes */ + if (ptr >= end - 8) + die(_("offset beyond end of pack index for %s (truncated index?)"), + p->pack_name); +} + +off_t nth_packed_object_offset(const struct packed_git *p, uint32_t n) +{ + const unsigned char *index = p->index_data; + index += 4 * 256; + if (p->index_version == 1) { + return ntohl(*((uint32_t *)(index + 24 * n))); + } else { + uint32_t off; + index += 8 + p->num_objects * (20 + 4); + off = ntohl(*((uint32_t *)(index + 4 * n))); + if (!(off & 0x8000)) + return off; + index += p->num_objects * 4 + (off & 0x7fff) * 8; + check_pack_index_ptr(p, index); + return (((uint64_t)ntohl(*((uint32_t *)(index + 0 << 32) | + ntohl(*((uint32_t *)(index + 4))); + } +} diff --git a/packfile.h b/packfile.h index 56d70caa0..8deb84bd1 100644 --- a/packfile.h +++ b/packfile.h @@ -63,6 +63,16 @@ extern void unuse_pack(struct pack_window **); extern void clear_delta_base_cache(void); extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); +/* + * Make sure that a pointer access into an mmap'd index file is within bounds, + * and can provide at least 8 bytes of data. + * + * Note that this is only necessary for variable-length segments of the file + * (like the 64-bit extended offset table), as we compare the size to the + * fixed-length parts when we open the file. + */ +extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr); + /* * Return the SHA-1 of the nth object within the specified packfile. * Open the index if it is not already open. The return value points @@ -77,6 +87,11 @@ extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t */ extern const struct object_id *nth_packed_object_oid(struct object_id *, struct packed_git *, uint32_t n); +/* + * Return the offset of the nth object within the specified packfile. + * The index must already be opened. + */ +extern off_t nth_packed_object_offset(const struct packed_git *, uint32_t n); extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *); extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep); @@ -94,4 +109,5 @@ extern int packed_object_info(struct packed_git *pack, off_t offset, struct obje extern void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1); extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1); + #endif diff --git a/sha1_file.c b/sha1_file.c index 34fbe8e51..2d22bc228 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1074,39 +1074,6 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep) return parse_sha1_header_extended(hdr, , 0); } -void
[PATCH v3 13/23] pack: move get_size_from_delta()
Signed-off-by: Jonathan Tan--- cache.h | 1 - packfile.c | 40 packfile.h | 1 + sha1_file.c | 39 --- 4 files changed, 41 insertions(+), 40 deletions(-) diff --git a/cache.h b/cache.h index 75cc0c497..87f65aeea 100644 --- a/cache.h +++ b/cache.h @@ -1669,7 +1669,6 @@ extern off_t find_pack_entry_one(const unsigned char *sha1, struct packed_git *) extern int is_pack_valid(struct packed_git *); extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *); -extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t); extern int unpack_object_header(struct packed_git *, struct pack_window **, off_t *, unsigned long *); /* diff --git a/packfile.c b/packfile.c index 43b708812..fa90b643e 100644 --- a/packfile.c +++ b/packfile.c @@ -4,6 +4,7 @@ #include "dir.h" #include "mergesort.h" #include "packfile.h" +#include "delta.h" char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, @@ -909,3 +910,42 @@ unsigned long unpack_object_header_buffer(const unsigned char *buf, *sizep = size; return used; } + +unsigned long get_size_from_delta(struct packed_git *p, + struct pack_window **w_curs, + off_t curpos) +{ + const unsigned char *data; + unsigned char delta_head[20], *in; + git_zstream stream; + int st; + + memset(, 0, sizeof(stream)); + stream.next_out = delta_head; + stream.avail_out = sizeof(delta_head); + + git_inflate_init(); + do { + in = use_pack(p, w_curs, curpos, _in); + stream.next_in = in; + st = git_inflate(, Z_FINISH); + curpos += stream.next_in - in; + } while ((st == Z_OK || st == Z_BUF_ERROR) && +stream.total_out < sizeof(delta_head)); + git_inflate_end(); + if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) { + error("delta data unpack-initial failed"); + return 0; + } + + /* Examine the initial part of the delta to figure out +* the result size. +*/ + data = delta_head; + + /* ignore base size */ + get_delta_hdr_size(, delta_head+sizeof(delta_head)); + + /* Read the result size */ + return get_delta_hdr_size(, delta_head+sizeof(delta_head)); +} diff --git a/packfile.h b/packfile.h index 9f36e0112..9c3bce6b2 100644 --- a/packfile.h +++ b/packfile.h @@ -63,6 +63,7 @@ extern void unuse_pack(struct pack_window **); extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep); +extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t); extern void release_pack_memory(size_t); diff --git a/sha1_file.c b/sha1_file.c index b57b0..5d016ad6b 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1100,45 +1100,6 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep) return parse_sha1_header_extended(hdr, , 0); } -unsigned long get_size_from_delta(struct packed_git *p, - struct pack_window **w_curs, - off_t curpos) -{ - const unsigned char *data; - unsigned char delta_head[20], *in; - git_zstream stream; - int st; - - memset(, 0, sizeof(stream)); - stream.next_out = delta_head; - stream.avail_out = sizeof(delta_head); - - git_inflate_init(); - do { - in = use_pack(p, w_curs, curpos, _in); - stream.next_in = in; - st = git_inflate(, Z_FINISH); - curpos += stream.next_in - in; - } while ((st == Z_OK || st == Z_BUF_ERROR) && -stream.total_out < sizeof(delta_head)); - git_inflate_end(); - if ((st != Z_STREAM_END) && stream.total_out != sizeof(delta_head)) { - error("delta data unpack-initial failed"); - return 0; - } - - /* Examine the initial part of the delta to figure out -* the result size. -*/ - data = delta_head; - - /* ignore base size */ - get_delta_hdr_size(, delta_head+sizeof(delta_head)); - - /* Read the result size */ - return get_delta_hdr_size(, delta_head+sizeof(delta_head)); -} - static off_t get_delta_base(struct packed_git *p, struct pack_window **w_curs, off_t *curpos, -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 16/23] pack: move nth_packed_object_{sha1,oid}
Signed-off-by: Jonathan Tan--- cache.h | 14 -- packfile.c | 31 +++ packfile.h | 16 +++- sha1_file.c | 31 --- 4 files changed, 46 insertions(+), 46 deletions(-) diff --git a/cache.h b/cache.h index 11aa18e6a..83aa3cc62 100644 --- a/cache.h +++ b/cache.h @@ -1636,20 +1636,6 @@ extern int odb_pack_keep(const char *name); */ extern void check_pack_index_ptr(const struct packed_git *p, const void *ptr); -/* - * Return the SHA-1 of the nth object within the specified packfile. - * Open the index if it is not already open. The return value points - * at the SHA-1 within the mmapped index. Return NULL if there is an - * error. - */ -extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t n); -/* - * Like nth_packed_object_sha1, but write the data into the object specified by - * the the first argument. Returns the first argument on success, and NULL on - * error. - */ -extern const struct object_id *nth_packed_object_oid(struct object_id *, struct packed_git *, uint32_t n); - /* * Return the offset of the nth object within the specified packfile. * The index must already be opened. diff --git a/packfile.c b/packfile.c index 624cc109e..e9b16da94 100644 --- a/packfile.c +++ b/packfile.c @@ -1636,3 +1636,34 @@ void *unpack_entry(struct packed_git *p, off_t obj_offset, return data; } + +const unsigned char *nth_packed_object_sha1(struct packed_git *p, + uint32_t n) +{ + const unsigned char *index = p->index_data; + if (!index) { + if (open_pack_index(p)) + return NULL; + index = p->index_data; + } + if (n >= p->num_objects) + return NULL; + index += 4 * 256; + if (p->index_version == 1) { + return index + 24 * n + 4; + } else { + index += 8; + return index + 20 * n; + } +} + +const struct object_id *nth_packed_object_oid(struct object_id *oid, + struct packed_git *p, + uint32_t n) +{ + const unsigned char *hash = nth_packed_object_sha1(p, n); + if (!hash) + return NULL; + hashcpy(oid->hash, hash); + return oid; +} diff --git a/packfile.h b/packfile.h index c28eaccc6..56d70caa0 100644 --- a/packfile.h +++ b/packfile.h @@ -63,6 +63,21 @@ extern void unuse_pack(struct pack_window **); extern void clear_delta_base_cache(void); extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); +/* + * Return the SHA-1 of the nth object within the specified packfile. + * Open the index if it is not already open. The return value points + * at the SHA-1 within the mmapped index. Return NULL if there is an + * error. + */ +extern const unsigned char *nth_packed_object_sha1(struct packed_git *, uint32_t n); +/* + * Like nth_packed_object_sha1, but write the data into the object specified by + * the the first argument. Returns the first argument on success, and NULL on + * error. + */ +extern const struct object_id *nth_packed_object_oid(struct object_id *, struct packed_git *, uint32_t n); + + extern void *unpack_entry(struct packed_git *, off_t, enum object_type *, unsigned long *); extern unsigned long unpack_object_header_buffer(const unsigned char *buf, unsigned long len, enum object_type *type, unsigned long *sizep); extern unsigned long get_size_from_delta(struct packed_git *, struct pack_window **, off_t); @@ -79,5 +94,4 @@ extern int packed_object_info(struct packed_git *pack, off_t offset, struct obje extern void mark_bad_packed_object(struct packed_git *p, const unsigned char *sha1); extern const struct packed_git *has_packed_and_bad(const unsigned char *sha1); - #endif diff --git a/sha1_file.c b/sha1_file.c index e537ba089..34fbe8e51 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1074,37 +1074,6 @@ int parse_sha1_header(const char *hdr, unsigned long *sizep) return parse_sha1_header_extended(hdr, , 0); } -const unsigned char *nth_packed_object_sha1(struct packed_git *p, - uint32_t n) -{ - const unsigned char *index = p->index_data; - if (!index) { - if (open_pack_index(p)) - return NULL; - index = p->index_data; - } - if (n >= p->num_objects) - return NULL; - index += 4 * 256; - if (p->index_version == 1) { - return index + 24 * n + 4; - } else { - index += 8; - return index + 20 * n; - } -} - -const struct object_id *nth_packed_object_oid(struct object_id *oid, - struct packed_git *p, - uint32_t
[PATCH v3 23/23] pack: move for_each_packed_object()
Signed-off-by: Jonathan Tan--- builtin/cat-file.c | 1 + cache.h| 7 +-- packfile.c | 40 packfile.h | 11 +++ reachable.c| 1 + sha1_file.c| 40 6 files changed, 54 insertions(+), 46 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 96b786e48..be5936017 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -12,6 +12,7 @@ #include "streaming.h" #include "tree-walk.h" #include "sha1-array.h" +#include "packfile.h" struct batch_options { int enabled; diff --git a/cache.h b/cache.h index dcbe37a3f..2eeb21b02 100644 --- a/cache.h +++ b/cache.h @@ -1668,17 +1668,12 @@ int for_each_loose_file_in_objdir_buf(struct strbuf *path, void *data); /* - * Iterate over loose and packed objects in both the local + * Iterate over loose objects in both the local * repository and any alternates repositories (unless the * LOCAL_ONLY flag is set). */ #define FOR_EACH_OBJECT_LOCAL_ONLY 0x1 -typedef int each_packed_object_fn(const struct object_id *oid, - struct packed_git *pack, - uint32_t pos, - void *data); extern int for_each_loose_object(each_loose_object_fn, void *, unsigned flags); -extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags); struct object_info { /* Request */ diff --git a/packfile.c b/packfile.c index 7e293761b..1f11ef5b8 100644 --- a/packfile.c +++ b/packfile.c @@ -1854,3 +1854,43 @@ int has_pack_index(const unsigned char *sha1) return 0; return 1; } + +static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data) +{ + uint32_t i; + int r = 0; + + for (i = 0; i < p->num_objects; i++) { + struct object_id oid; + + if (!nth_packed_object_oid(, p, i)) + return error("unable to get sha1 of object %u in %s", +i, p->pack_name); + + r = cb(, p, i, data); + if (r) + break; + } + return r; +} + +int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags) +{ + struct packed_git *p; + int r = 0; + int pack_errors = 0; + + prepare_packed_git(); + for (p = packed_git; p; p = p->next) { + if ((flags & FOR_EACH_OBJECT_LOCAL_ONLY) && !p->pack_local) + continue; + if (open_pack_index(p)) { + pack_errors = 1; + continue; + } + r = for_each_object_in_pack(p, cb, data); + if (r) + break; + } + return r ? r : pack_errors; +} diff --git a/packfile.h b/packfile.h index 1b6ea832c..ca4cc3b97 100644 --- a/packfile.h +++ b/packfile.h @@ -124,4 +124,15 @@ extern int has_sha1_pack(const unsigned char *sha1); extern int has_pack_index(const unsigned char *sha1); +/* + * Iterate over packed objects in both the local + * repository and any alternates repositories (unless the + * FOR_EACH_OBJECT_LOCAL_ONLY flag, defined in cache.h, is set). + */ +typedef int each_packed_object_fn(const struct object_id *oid, + struct packed_git *pack, + uint32_t pos, + void *data); +extern int for_each_packed_object(each_packed_object_fn, void *, unsigned flags); + #endif diff --git a/reachable.c b/reachable.c index c62efbfd4..d1ac5d97e 100644 --- a/reachable.c +++ b/reachable.c @@ -9,6 +9,7 @@ #include "cache-tree.h" #include "progress.h" #include "list-objects.h" +#include "packfile.h" struct connectivity_progress { struct progress *progress; diff --git a/sha1_file.c b/sha1_file.c index fa422435f..0bb2343f8 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2014,46 +2014,6 @@ int for_each_loose_object(each_loose_object_fn cb, void *data, unsigned flags) return foreach_alt_odb(loose_from_alt_odb, ); } -static int for_each_object_in_pack(struct packed_git *p, each_packed_object_fn cb, void *data) -{ - uint32_t i; - int r = 0; - - for (i = 0; i < p->num_objects; i++) { - struct object_id oid; - - if (!nth_packed_object_oid(, p, i)) - return error("unable to get sha1 of object %u in %s", -i, p->pack_name); - - r = cb(, p, i, data); - if (r) - break; - } - return r; -} - -int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags) -{ - struct packed_git *p; - int r = 0; - int pack_errors = 0; - -
[PATCH v3 02/23] pack: move static state variables
sha1_file.c declares some static variables that store packfile-related state. Move them to packfile.c. They are temporarily made global, but subsequent commits will restore their scope back to static. Signed-off-by: Jonathan Tan--- packfile.c | 14 ++ packfile.h | 9 + sha1_file.c | 13 - 3 files changed, 23 insertions(+), 13 deletions(-) diff --git a/packfile.c b/packfile.c index 0d191dfd6..0f46e0617 100644 --- a/packfile.c +++ b/packfile.c @@ -1,4 +1,5 @@ #include "cache.h" +#include "mru.h" char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, @@ -21,3 +22,16 @@ char *sha1_pack_index_name(const unsigned char *sha1) static struct strbuf buf = STRBUF_INIT; return odb_pack_name(, sha1, "idx"); } + +unsigned int pack_used_ctr; +unsigned int pack_mmap_calls; +unsigned int peak_pack_open_windows; +unsigned int pack_open_windows; +unsigned int pack_open_fds; +unsigned int pack_max_fds; +size_t peak_pack_mapped; +size_t pack_mapped; +struct packed_git *packed_git; + +static struct mru packed_git_mru_storage; +struct mru *packed_git_mru = _git_mru_storage; diff --git a/packfile.h b/packfile.h index 3c4a0dbd7..a76bb7cec 100644 --- a/packfile.h +++ b/packfile.h @@ -24,4 +24,13 @@ extern char *sha1_pack_name(const unsigned char *sha1); */ extern char *sha1_pack_index_name(const unsigned char *sha1); +extern unsigned int pack_used_ctr; +extern unsigned int pack_mmap_calls; +extern unsigned int peak_pack_open_windows; +extern unsigned int pack_open_windows; +extern unsigned int pack_open_fds; +extern unsigned int pack_max_fds; +extern size_t peak_pack_mapped; +extern size_t pack_mapped; + #endif diff --git a/sha1_file.c b/sha1_file.c index 6e7a20b52..2b5ce9959 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -683,19 +683,6 @@ static int has_loose_object(const unsigned char *sha1) return check_and_freshen(sha1, 0); } -static unsigned int pack_used_ctr; -static unsigned int pack_mmap_calls; -static unsigned int peak_pack_open_windows; -static unsigned int pack_open_windows; -static unsigned int pack_open_fds; -static unsigned int pack_max_fds; -static size_t peak_pack_mapped; -static size_t pack_mapped; -struct packed_git *packed_git; - -static struct mru packed_git_mru_storage; -struct mru *packed_git_mru = _git_mru_storage; - void pack_report(void) { fprintf(stderr, -- 2.14.1.480.gb18f417b89-goog
[PATCH v3 06/23] pack: move pack-closing functions
The function close_pack_fd() needs to be temporarily made global. Its scope will be restored to static in a subsequent commit. Signed-off-by: Jonathan Tan--- builtin/am.c | 1 + builtin/clone.c| 1 + builtin/fetch.c| 1 + builtin/merge.c| 1 + builtin/receive-pack.c | 1 + cache.h| 8 packfile.c | 54 + packfile.h | 11 ++ sha1_file.c| 55 -- 9 files changed, 70 insertions(+), 63 deletions(-) diff --git a/builtin/am.c b/builtin/am.c index c973bd96d..b9d9ff203 100644 --- a/builtin/am.c +++ b/builtin/am.c @@ -31,6 +31,7 @@ #include "mailinfo.h" #include "apply.h" #include "string-list.h" +#include "packfile.h" /** * Returns 1 if the file is empty or does not exist, 0 otherwise. diff --git a/builtin/clone.c b/builtin/clone.c index 08b5cc433..13abb075a 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -25,6 +25,7 @@ #include "remote.h" #include "run-command.h" #include "connected.h" +#include "packfile.h" /* * Overall FIXMEs: diff --git a/builtin/fetch.c b/builtin/fetch.c index c87e59f3b..c86c36f37 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -17,6 +17,7 @@ #include "connected.h" #include "argv-array.h" #include "utf8.h" +#include "packfile.h" static const char * const builtin_fetch_usage[] = { N_("git fetch [] [ [...]]"), diff --git a/builtin/merge.c b/builtin/merge.c index 900bafdb4..45e673dcc 100644 --- a/builtin/merge.c +++ b/builtin/merge.c @@ -32,6 +32,7 @@ #include "gpg-interface.h" #include "sequencer.h" #include "string-list.h" +#include "packfile.h" #define DEFAULT_TWOHEAD (1<<0) #define DEFAULT_OCTOPUS (1<<1) diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index cabdc55e0..0019a484f 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -23,6 +23,7 @@ #include "fsck.h" #include "tmp-objdir.h" #include "oidset.h" +#include "packfile.h" static const char * const receive_pack_usage[] = { N_("git receive-pack "), diff --git a/cache.h b/cache.h index f271033db..bbc56566e 100644 --- a/cache.h +++ b/cache.h @@ -1645,15 +1645,7 @@ extern int odb_mkstemp(struct strbuf *template, const char *pattern); */ extern int odb_pack_keep(const char *name); -/* - * munmap the index file for the specified packfile (if it is - * currently mmapped). - */ -extern void close_pack_index(struct packed_git *); - extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, size_t *); -extern void close_pack_windows(struct packed_git *); -extern void close_all_packs(void); extern void unuse_pack(struct pack_window **); extern void clear_delta_base_cache(void); extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); diff --git a/packfile.c b/packfile.c index 8daa74ad1..c8e2dbdee 100644 --- a/packfile.c +++ b/packfile.c @@ -257,3 +257,57 @@ void release_pack_memory(size_t need) while (need >= (cur - pack_mapped) && unuse_one_window(NULL)) ; /* nothing */ } + +void close_pack_windows(struct packed_git *p) +{ + while (p->windows) { + struct pack_window *w = p->windows; + + if (w->inuse_cnt) + die("pack '%s' still has open windows to it", + p->pack_name); + munmap(w->base, w->len); + pack_mapped -= w->len; + pack_open_windows--; + p->windows = w->next; + free(w); + } +} + +int close_pack_fd(struct packed_git *p) +{ + if (p->pack_fd < 0) + return 0; + + close(p->pack_fd); + pack_open_fds--; + p->pack_fd = -1; + + return 1; +} + +void close_pack_index(struct packed_git *p) +{ + if (p->index_data) { + munmap((void *)p->index_data, p->index_size); + p->index_data = NULL; + } +} + +static void close_pack(struct packed_git *p) +{ + close_pack_windows(p); + close_pack_fd(p); + close_pack_index(p); +} + +void close_all_packs(void) +{ + struct packed_git *p; + + for (p = packed_git; p; p = p->next) + if (p->do_not_close) + die("BUG: want to close pack marked 'do-not-close'"); + else + close_pack(p); +} diff --git a/packfile.h b/packfile.h index f6fe1c741..c6a07de62 100644 --- a/packfile.h +++ b/packfile.h @@ -43,6 +43,17 @@ extern void pack_report(void); */ extern int open_pack_index(struct packed_git *); +/* + * munmap the index file for the specified packfile (if it is + * currently mmapped). + */ +extern void close_pack_index(struct packed_git *); + +extern void close_pack_windows(struct packed_git *); +extern void close_all_packs(void); + +extern int close_pack_fd(struct packed_git *); +
[PATCH v3 07/23] pack: move use_pack()
The function open_packed_git() needs to be temporarily made global. Its scope will be restored to static in a subsequent commit. Signed-off-by: Jonathan Tan--- cache.h | 1 - packfile.c | 303 ++-- packfile.h | 14 +-- sha1_file.c | 285 streaming.c | 1 + 5 files changed, 298 insertions(+), 306 deletions(-) diff --git a/cache.h b/cache.h index bbc56566e..a27018210 100644 --- a/cache.h +++ b/cache.h @@ -1645,7 +1645,6 @@ extern int odb_mkstemp(struct strbuf *template, const char *pattern); */ extern int odb_pack_keep(const char *name); -extern unsigned char *use_pack(struct packed_git *, struct pack_window **, off_t, size_t *); extern void unuse_pack(struct pack_window **); extern void clear_delta_base_cache(void); extern struct packed_git *add_packed_git(const char *path, size_t path_len, int local); diff --git a/packfile.c b/packfile.c index c8e2dbdee..ea451d27e 100644 --- a/packfile.c +++ b/packfile.c @@ -24,14 +24,14 @@ char *sha1_pack_index_name(const unsigned char *sha1) return odb_pack_name(, sha1, "idx"); } -unsigned int pack_used_ctr; -unsigned int pack_mmap_calls; -unsigned int peak_pack_open_windows; -unsigned int pack_open_windows; +static unsigned int pack_used_ctr; +static unsigned int pack_mmap_calls; +static unsigned int peak_pack_open_windows; +static unsigned int pack_open_windows; unsigned int pack_open_fds; -unsigned int pack_max_fds; -size_t peak_pack_mapped; -size_t pack_mapped; +static unsigned int pack_max_fds; +static size_t peak_pack_mapped; +static size_t pack_mapped; struct packed_git *packed_git; static struct mru packed_git_mru_storage; @@ -228,7 +228,7 @@ static void scan_windows(struct packed_git *p, } } -int unuse_one_window(struct packed_git *current) +static int unuse_one_window(struct packed_git *current) { struct packed_git *p, *lru_p = NULL; struct pack_window *lru_w = NULL, *lru_l = NULL; @@ -274,7 +274,7 @@ void close_pack_windows(struct packed_git *p) } } -int close_pack_fd(struct packed_git *p) +static int close_pack_fd(struct packed_git *p) { if (p->pack_fd < 0) return 0; @@ -311,3 +311,288 @@ void close_all_packs(void) else close_pack(p); } + +/* + * The LRU pack is the one with the oldest MRU window, preferring packs + * with no used windows, or the oldest mtime if it has no windows allocated. + */ +static void find_lru_pack(struct packed_git *p, struct packed_git **lru_p, struct pack_window **mru_w, int *accept_windows_inuse) +{ + struct pack_window *w, *this_mru_w; + int has_windows_inuse = 0; + + /* +* Reject this pack if it has windows and the previously selected +* one does not. If this pack does not have windows, reject +* it if the pack file is newer than the previously selected one. +*/ + if (*lru_p && !*mru_w && (p->windows || p->mtime > (*lru_p)->mtime)) + return; + + for (w = this_mru_w = p->windows; w; w = w->next) { + /* +* Reject this pack if any of its windows are in use, +* but the previously selected pack did not have any +* inuse windows. Otherwise, record that this pack +* has windows in use. +*/ + if (w->inuse_cnt) { + if (*accept_windows_inuse) + has_windows_inuse = 1; + else + return; + } + + if (w->last_used > this_mru_w->last_used) + this_mru_w = w; + + /* +* Reject this pack if it has windows that have been +* used more recently than the previously selected pack. +* If the previously selected pack had windows inuse and +* we have not encountered a window in this pack that is +* inuse, skip this check since we prefer a pack with no +* inuse windows to one that has inuse windows. +*/ + if (*mru_w && *accept_windows_inuse == has_windows_inuse && + this_mru_w->last_used > (*mru_w)->last_used) + return; + } + + /* +* Select this pack. +*/ + *mru_w = this_mru_w; + *lru_p = p; + *accept_windows_inuse = has_windows_inuse; +} + +static int close_one_pack(void) +{ + struct packed_git *p, *lru_p = NULL; + struct pack_window *mru_w = NULL; + int accept_windows_inuse = 1; + + for (p = packed_git; p; p = p->next) { + if (p->pack_fd == -1) + continue; + find_lru_pack(p, _p, _w, _windows_inuse); + } + + if (lru_p) +
[PATCH v3 01/23] pack: move pack name-related functions
Currently, sha1_file.c and cache.h contain many functions, both related to and unrelated to packfiles. This makes both files very large and causes an unclear separation of concerns. Create a new file, packfile.c, to hold all packfile-related functions currently in sha1_file.c. It has a corresponding header packfile.h. In this commit, the pack name-related functions are moved. Subsequent commits will move the other functions. Signed-off-by: Jonathan Tan--- Makefile | 1 + builtin/index-pack.c | 1 + builtin/pack-redundant.c | 1 + cache.h | 23 --- fast-import.c| 1 + http.c | 1 + outgoing/packfile.h | 0 packfile.c | 23 +++ packfile.h | 27 +++ sha1_file.c | 23 +-- 10 files changed, 56 insertions(+), 45 deletions(-) create mode 100644 outgoing/packfile.h create mode 100644 packfile.c create mode 100644 packfile.h diff --git a/Makefile b/Makefile index 461c845d3..5cdecaa17 100644 --- a/Makefile +++ b/Makefile @@ -816,6 +816,7 @@ LIB_OBJS += notes-merge.o LIB_OBJS += notes-utils.o LIB_OBJS += object.o LIB_OBJS += oidset.o +LIB_OBJS += packfile.o LIB_OBJS += pack-bitmap.o LIB_OBJS += pack-bitmap-write.o LIB_OBJS += pack-check.o diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 26828c1d8..f2be145e1 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -12,6 +12,7 @@ #include "exec_cmd.h" #include "streaming.h" #include "thread-utils.h" +#include "packfile.h" static const char index_pack_usage[] = "git index-pack [-v] [-o ] [--keep | --keep=] [--verify] [--strict] ( | --stdin [--fix-thin] [])"; diff --git a/builtin/pack-redundant.c b/builtin/pack-redundant.c index cb1df1c76..aaa813632 100644 --- a/builtin/pack-redundant.c +++ b/builtin/pack-redundant.c @@ -7,6 +7,7 @@ */ #include "builtin.h" +#include "packfile.h" #define BLKSIZE 512 diff --git a/cache.h b/cache.h index fcba87a69..aa2b4d390 100644 --- a/cache.h +++ b/cache.h @@ -902,20 +902,6 @@ extern void check_repository_format(void); */ extern const char *sha1_file_name(const unsigned char *sha1); -/* - * Return the name of the (local) packfile with the specified sha1 in - * its name. The return value is a pointer to memory that is - * overwritten each time this function is called. - */ -extern char *sha1_pack_name(const unsigned char *sha1); - -/* - * Return the name of the (local) pack index file with the specified - * sha1 in its name. The return value is a pointer to memory that is - * overwritten each time this function is called. - */ -extern char *sha1_pack_index_name(const unsigned char *sha1); - /* * Return an abbreviated sha1 unique within this repository's object database. * The result will be at least `len` characters long, and will be NUL @@ -1656,15 +1642,6 @@ extern void pack_report(void); */ extern int odb_mkstemp(struct strbuf *template, const char *pattern); -/* - * Generate the filename to be used for a pack file with checksum "sha1" and - * extension "ext". The result is written into the strbuf "buf", overwriting - * any existing contents. A pointer to buf->buf is returned as a convenience. - * - * Example: odb_pack_name(out, sha1, "idx") => ".git/objects/pack/pack-1234..idx" - */ -extern char *odb_pack_name(struct strbuf *buf, const unsigned char *sha1, const char *ext); - /* * Create a pack .keep file named "name" (which should generally be the output * of odb_pack_name). Returns a file descriptor opened for writing, or -1 on diff --git a/fast-import.c b/fast-import.c index a959161b4..49516d60e 100644 --- a/fast-import.c +++ b/fast-import.c @@ -167,6 +167,7 @@ Format of STDIN stream: #include "quote.h" #include "dir.h" #include "run-command.h" +#include "packfile.h" #define PACK_ID_BITS 16 #define MAX_PACK_ID ((1< = 0x070a08 diff --git a/outgoing/packfile.h b/outgoing/packfile.h new file mode 100644 index 0..e69de29bb diff --git a/packfile.c b/packfile.c new file mode 100644 index 0..0d191dfd6 --- /dev/null +++ b/packfile.c @@ -0,0 +1,23 @@ +#include "cache.h" + +char *odb_pack_name(struct strbuf *buf, + const unsigned char *sha1, + const char *ext) +{ + strbuf_reset(buf); + strbuf_addf(buf, "%s/pack/pack-%s.%s", get_object_directory(), + sha1_to_hex(sha1), ext); + return buf->buf; +} + +char *sha1_pack_name(const unsigned char *sha1) +{ + static struct strbuf buf = STRBUF_INIT; + return odb_pack_name(, sha1, "pack"); +} +
[PATCH v3 00/23] Move exported packfile funcs to its own file
> You'd need to double check, but I think the topics that cause > trouble are rs/find-apck-entry-bisection and jk/drop-sha1-entry-pos; > you can start from v2.14.1 and merge these topics on top and then > build your change on top. That would allow you to start cooking > before both of them graduate to 'master', as I expect they are both > quick-to-next material. There might be other topics that interfere > with what you are doing, but you can easily find out what they are > if you do a trial merge to 'next' and 'pu' yourself. OK - in addition to the 2 you mentioned, I have found some others (likely added after you wrote that). The complete list is: - rs/find-pack-entry-bisection - jk/drop-sha1-entry-pos - jt/sha1-file-cleanup (formerly part of this set) - mk/use-size-t-in-zlib - rs/unpack-entry-leakfix I have merged all of these and rebased my patches on top. Other changes: - Used packfile.h instead of pack.h (following most people's preference) - Ensured that I added functions to packfile.h retaining the order they were originally in, so that if you run "git diff --color-moved --patience", there are much fewer zebra stripes The merge base commit can be accessed online [1], if you need it. [1] https://github.com/jonathantanmy/git/commits/packmigrate Jonathan Tan (23): pack: move pack name-related functions pack: move static state variables pack: move pack_report() pack: move open_pack_index(), parse_pack_index() pack: move release_pack_memory() pack: move pack-closing functions pack: move use_pack() pack: move unuse_pack() pack: move add_packed_git() pack: move install_packed_git() pack: move {,re}prepare_packed_git and approximate_object_count pack: move unpack_object_header_buffer() pack: move get_size_from_delta() pack: move unpack_object_header() pack: move clear_delta_base_cache(), packed_object_info(), unpack_entry() pack: move nth_packed_object_{sha1,oid} pack: move check_pack_index_ptr(), nth_packed_object_offset() pack: move find_pack_entry_one(), is_pack_valid() pack: move find_sha1_pack() pack: move find_pack_entry() and make it global pack: move has_sha1_pack() pack: move has_pack_index() pack: move for_each_packed_object() Makefile |1 + builtin/am.c |1 + builtin/cat-file.c |1 + builtin/clone.c |1 + builtin/count-objects.c |1 + builtin/fetch.c |1 + builtin/fsck.c |1 + builtin/gc.c |1 + builtin/index-pack.c |1 + builtin/merge.c |1 + builtin/pack-objects.c |1 + builtin/pack-redundant.c |1 + builtin/prune-packed.c |1 + builtin/receive-pack.c |1 + bulk-checkin.c |1 + cache.h | 122 +-- connected.c |1 + diff.c |1 + fast-import.c|1 + fetch-pack.c |1 + git-compat-util.h|2 - http-backend.c |1 + http-push.c |1 + http-walker.c|1 + http.c |1 + outgoing/packfile.h |0 pack-bitmap.c|1 + pack-check.c |1 + packfile.c | 1896 +++ packfile.h | 138 +++ path.c |1 + reachable.c |1 + revision.c |1 + server-info.c|1 + sha1_file.c | 2452 ++ sha1_name.c |1 + streaming.c |1 + 37 files changed, 2354 insertions(+), 2287 deletions(-) create mode 100644 outgoing/packfile.h create mode 100644 packfile.c create mode 100644 packfile.h -- 2.14.1.480.gb18f417b89-goog
Re: Revision resolution for remote-helpers?
On Fri, Aug 18, 2017 at 03:06:37PM -0700, Jonathan Nieder wrote: > Hi, > > Mike Hommey wrote: > > > My thought is that a string like :: could be used > > wherever a committish is expected. That would call some helper > > and request to resolve revision, and the helper would provide a git > > commit as a response. > > I like this idea. > > > The reason for the :: prefix is that it matches the :: > > prefix used for remote helpers. > > > > Now, there are a few caveats: > > - , for e.g. svn, pretty much would depend on the remote. > > OTOH, in mercurial, it doesn't. I think it would be fair for the > > helper to have to deal with making what appears after :: relevant > > to find the right revision, by possibly including a remote name. > > - msys likes to completely fuck up command lines when they contain ::. > > For remote helpers, the alternative that works is > > :///etc. > > Hm --- is there a bug already open about this (e.g. in the Git for > Windows project or in msys) where I can read more? It's entirely an msys problem. Msys has weird rules to translate between unix paths and windows paths on the command line, and botches everything as a consequence. That's by "design". http://www.mingw.org/wiki/Posix_path_conversion (Particularly, see the last two entries) That only happens when calling native Windows programs from a msys shell. > > Which leads me to think some "virtual" ref namespace could be a solution > > to the problem. So instead of ::, the prefix would be /. > > For e.g. svn, svn/$remote/$rev would be a natural way to specify the > > revision for a given remote. For mercurial, hg/$sha1. > > I see. My naive assumption would be that a string like r12345 would be > the most natural way for a user to want to specify a Subversion > revision, but you're right that those only have meaning scoped to a > particular server. That makes the svn/ prefix more tolerable. > > > Potentially, this could be a sort of pluggable ref stores, which could > > be used for extensions such as the currently discussed reftable. > > > > On the opposite end of the problem, I'm also thinking about git log > > --decorate= displaying the mercurial revisions where branch > > decorations would normally go. > > > > I have no patch to show for it. Those are ideas that I first want to > > discuss before implementing anything. > > One additional thought: unlike refs, these are not necessarily > enumerable (and I wouldn't expect "git show-ref" to show them) and > they do not affect "git prune"'s reachability computation. > > So internally I don't think refs is the right concept to map these to. > I think the right layer is revision syntax handling (revision.c). Makes sense. Mike
Re: Revision resolution for remote-helpers?
Hi, Mike Hommey wrote: > My thought is that a string like :: could be used > wherever a committish is expected. That would call some helper > and request to resolve revision, and the helper would provide a git > commit as a response. I like this idea. > The reason for the :: prefix is that it matches the :: > prefix used for remote helpers. > > Now, there are a few caveats: > - , for e.g. svn, pretty much would depend on the remote. > OTOH, in mercurial, it doesn't. I think it would be fair for the > helper to have to deal with making what appears after :: relevant > to find the right revision, by possibly including a remote name. > - msys likes to completely fuck up command lines when they contain ::. > For remote helpers, the alternative that works is > :///etc. Hm --- is there a bug already open about this (e.g. in the Git for Windows project or in msys) where I can read more? > Which leads me to think some "virtual" ref namespace could be a solution > to the problem. So instead of ::, the prefix would be /. > For e.g. svn, svn/$remote/$rev would be a natural way to specify the > revision for a given remote. For mercurial, hg/$sha1. I see. My naive assumption would be that a string like r12345 would be the most natural way for a user to want to specify a Subversion revision, but you're right that those only have meaning scoped to a particular server. That makes the svn/ prefix more tolerable. > Potentially, this could be a sort of pluggable ref stores, which could > be used for extensions such as the currently discussed reftable. > > On the opposite end of the problem, I'm also thinking about git log > --decorate= displaying the mercurial revisions where branch > decorations would normally go. > > I have no patch to show for it. Those are ideas that I first want to > discuss before implementing anything. One additional thought: unlike refs, these are not necessarily enumerable (and I wouldn't expect "git show-ref" to show them) and they do not affect "git prune"'s reachability computation. So internally I don't think refs is the right concept to map these to. I think the right layer is revision syntax handling (revision.c). Thanks, Jonathan
Re: [PATCH] pull: respect submodule update configuration
On Fri, Aug 18, 2017 at 3:04 PM, Stefan Bellerwrote: > From: Lars Schneider eh, that is what I get for amending to Lars patch.
[PATCH] pull: respect submodule update configuration
From: Lars SchneiderDo not override the submodule configuration in the call to update the submodules, but give a weaker default. Reported-by: Lars Schneider Signed-off-by: Stefan Beller --- Personally I dislike this patch, but I have no better idea for the time being. Thanks, Stefan builtin/pull.c | 6 -- git-submodule.sh | 7 ++- t/t7400-submodule-basic.sh | 22 ++ 3 files changed, 32 insertions(+), 3 deletions(-) diff --git a/builtin/pull.c b/builtin/pull.c index 9b86e519b1..be4f74d764 100644 --- a/builtin/pull.c +++ b/builtin/pull.c @@ -553,7 +553,8 @@ static int rebase_submodules(void) cp.git_cmd = 1; cp.no_stdin = 1; argv_array_pushl(, "submodule", "update", - "--recursive", "--rebase", NULL); + "--recursive", "--default-update", + "rebase", NULL); return run_command(); } @@ -565,7 +566,8 @@ static int update_submodules(void) cp.git_cmd = 1; cp.no_stdin = 1; argv_array_pushl(, "submodule", "update", - "--recursive", "--checkout", NULL); + "--recursive", "--default-update", + "checkout", NULL); return run_command(); } diff --git a/git-submodule.sh b/git-submodule.sh index e131760eec..6dbc32e686 100755 --- a/git-submodule.sh +++ b/git-submodule.sh @@ -511,6 +511,7 @@ fetch_in_submodule () ( cmd_update() { # parse $args after "submodule ... update". + default_update="checkout" while test $# -ne 0 do case "$1" in @@ -552,6 +553,10 @@ cmd_update() --checkout) update="checkout" ;; + --default-update) + default_update="$2" + shift + ;; --recommend-shallow) recommend_shallow="--recommend-shallow" ;; @@ -619,7 +624,7 @@ cmd_update() update_module=$(git config submodule."$name".update) if test -z "$update_module" then - update_module="checkout" + update_module="$default_update" fi fi diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh index dcac364c5f..ff64bf8528 100755 --- a/t/t7400-submodule-basic.sh +++ b/t/t7400-submodule-basic.sh @@ -1289,4 +1289,26 @@ test_expect_success 'init properly sets the config' ' test_must_fail git -C multisuper_clone config --get submodule.sub1.active ' +test_expect_success 'submodule update and git pull with disabled submodule' ' + test_when_finished "rm -rf multisuper_clone" && + pwd=$(pwd) && + cat <<-\EOF >expect && + -sub0 +sub1 (test2) +sub2 (test2) +sub3 (test2) +sub4 (test2) +sub5 (test2) + EOF + git clone file://"$pwd"/multisuper multisuper_clone && + ( + cd multisuper_clone && + git config --local submodule.sub0.update none && + git submodule update --init --recursive && + git pull --recurse-submodules && + git submodule status | cut -c 1,43- >actual + ) && + test_cmp expect multisuper_clone/actual +' + test_done -- 2.14.0.rc0.3.g6c2e499285
Re: Revision resolution for remote-helpers?
On Fri, Aug 18, 2017 at 08:15:09AM -0400, Jeff King wrote: > On Fri, Aug 18, 2017 at 03:42:08PM +0900, Mike Hommey wrote: > > > I was thinking it could be useful to have a special syntax for revisions > > that would query a helper program. The helper program could use a > > similar protocol to that of the remote helpers. > > That sounds like a reasonable thing to want. > > > My thought is that a string like :: could be used > > wherever a committish is expected. That would call some helper > > and request to resolve revision, and the helper would provide a git > > commit as a response. > > So I'm guessing this would look something like: > > git log svn::12345 > > I think even without Git support, you could do something like: > > git log $(git svn map 12345) That's what I do, but subshells and all is extra cumbersome. > which is similarly complex in terms of concepts, and not too many more > characters. That would be a little more awkward outside of a shell, > though. > > But it did get me wondering if we could do _better_ with something built > into Git. For example, could we have an external resolution helper that > resolves names to object ids as a fallback after internal resolution has > failed. And then you could do: > > git log 12345 > > and it would just work. Efficiency shouldn't be a big problem, because > we'd hit the helper only in the error case. > > I'd be more concerned about awkward ambiguities, though. If mercurial is > also using sha1s, then there's nothing syntactic to differentiate the > two. For that matter, 12345 has the same problem, since it could be a > partial sha1. For something as short, the likelihood of hitting an actual existing abbreviated sha1 is quite high, too. > It might work to actually check if we have the object and then bail > to the remote resolver only if we don't. But that's actually conflating > name resolution with object lookup, which our internals typically keep > separate. > > So maybe this is a bad direction to go in. I'm mostly just thinking out > loud here. > > > Which leads me to think some "virtual" ref namespace could be a solution > > to the problem. So instead of ::, the prefix would be /. > > For e.g. svn, svn/$remote/$rev would be a natural way to specify the > > revision for a given remote. For mercurial, hg/$sha1. > > Interesting. I do like the general idea of having external helpers to > fill in bits of the virtual namespace. But it may also open many cans of > worms. :) > > > Potentially, this could be a sort of pluggable ref stores, which could > > be used for extensions such as the currently discussed reftable. > > The current pluggable code is definitely geared for build-time > pluggability, not runtime. But I think you could have a builtin > pluggable store that does the overlay, and then chains to another > backend. I.e., configure something like: > > [extensions] > refBackend = externalOverlay > > [externalOverlay "svn"] > namespace = refs/svn > command = my-svn-mapper > > [externalOverlay] > chain = reftable > > That would allow the externalOverlay thing to develop independent of the > core of Git's refs code. That's a lot of configuration, but it's definitely an interesting proposition. > > On the opposite end of the problem, I'm also thinking about git log > > --decorate= displaying the mercurial revisions where branch > > decorations would normally go. > > Interesting thought. I'm not sure if that would be a good thing or a bad > thing. But one of the virtual methods for pluggable backends is > "enumerate all refs". If you're mapping every mercurial revision, that's > going to be a lot of refs (and potentially a lot of overhead for certain > operations). > > I think the decorate code just looks at a few parts of the refs > namespace right now (so a "refs/svn" would probably get ignored by > default). I think decorate would need its own special entry to the "ref query" API to answer the question "what ref points to " instead of scanning the whole namespace. Mike
Re: [bug] Git submodule command interprets switch as argument and switch
On Fri, Aug 18, 2017 at 3:38 PM, Jonathan Niederwrote: > Hi, > > R0b0t1 wrote: > >> The issue is as follows: >> >> R0b0t1@host:~/devel/project$ git submodule add >> https://github.com/user/project -f >> Cloning into '/home/R0b0t1/devel/project/-f'... > > Thanks for reporting. Confusingly, I think this is intended behavior. > "git help submodule" explains: > > add [-b ] [-f|--force] [--name ] > [--reference ] [--depth ] [--] > [] > > Add the given repository as a submodule at the given > path [etc] > > Since the -f comes after , it is a . > Not to comment on every response to this bug, but I understand. What is confusing is that the command was failing without being forced, and without thinking I added "-f" at the end. The command succeeded as if I supplied the force flag, but per my experience with other tools, and with other Git commands, "-f" should not be a flag. It should be a path, as you say. However it appears to be both. To reproduce my situation exactly, add "*" to your .gitignore. The directory "gerrit" should then be ignored, and Git will warn you that the submodule will not be tracked (I may have another issue to report related to this, but I'm still trying to figure out what is going on). However, if you name the directory something else, like "-f", it will still be matched by the .gitignore rule and should not succeed. This is why I think the path is also being interpreted as a flag. Something else may be happening, but either way the behavior does not seem to be expected nor consistent with other parts of Git. > That said, there are a few related things wrong here. > > The usage string above says I can put "--" before the to > make things extra unambiguous. But when I try that, I get the following > result: > > $ git submodule add -- https://gerrit.googlesource.com/gerrit -f > Cloning into '/tmp/t/test/-f'... > [...] > Resolving deltas: 100% (215796/215796), done. > /usr/lib/git-core/git-submodule: line 261: cd: -f: invalid option > cd: usage: cd [-L|[-P [-e]] [-@]] [dir] > Unable to checkout submodule '-f' > > If I try to put the "--" between and , I get another > confusing result: > > $ git submodule add https://gerrit.googlesource.com/gerrit -- -f > '--' already exists in the index > > "git help cli" is supposed to give advice about this kind of thing as > well --- e.g., it gives some sound advice about what form of flags > scripts should use (e.g., to always use the 'stuck' form --name= > instead of --name name). But it doesn't mention this issue of flags > belonging before other arguments. > > Thoughts? > I have experienced issues with -- before. It has been long enough I have forgotten. Sorry for not being able to provide anything concrete.
What's cooking in git.git (Aug 2017, #04; Fri, 18)
Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. The ones marked with '.' do not appear in any of the integration branches, but I am still holding onto them. The top part of the draft release notes for the next cycle, which I tentatively called Git 2.15, declares that "git add ''" will still be supported up to this release but will become illegal after that. Given that a typical release cycle is 8-12 weeks, that will happen sometime early next year. You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -- [New Topics] * cc/subprocess-handshake-missing-capabilities (2017-08-16) 1 commit - sub-process: print the cmd when a capability is unsupported When handshake with a subprocess filter notices that the process asked for an unknown capability, Git did not report what program the offending subprocess was running. This has been corrected. Will merge to 'next'. We may want a follow-up fix to tighten the error checking, though. * tb/apply-with-crlf (2017-08-17) 3 commits - SQUASH??? - apply: file commited with CRLF should roundtrip diff and apply - convert: add SAFE_CRLF_KEEP_CRLF (this branch is tangled with jc/apply-with-crlf.) "git apply" that is used as a better "patch -p1" failed to apply a taken from a file with CRLF line endings to a file with CRLF line endings. The root cause was because it misused convert_to_git() that tried to do "safe-crlf" processing by looking at the index entry at the same path, which is a nonsense---in that mode, "apply" is not working on the data in (or derived from) the index at all. This has been fixed. Will merge to 'next' after squashing the fix in. * rs/t1002-do-not-use-sum (2017-08-15) 1 commit - t1002: stop using sum(1) Test simplification. Will merge to 'next'. * sb/sha1-file-cleanup (2017-08-15) 1 commit - sha1_file: make read_info_alternates static Code clean-up. Will merge to 'next'. * as/grep-quiet-no-match-exit-code-fix (2017-08-17) 1 commit - git-grep: correct exit code with --quiet and -L "git grep -L" and "git grep --quiet -L" reported different exit codes; this has been corrected. Will merge to 'next'. * hv/t5526-andand-chain-fix (2017-08-17) 1 commit - t5526: fix some broken && chains Test fix. Will merge to 'next'. * jc/diff-sane-truncate-no-more (2017-08-17) 1 commit - diff: retire sane_truncate_fn Code clean-up. Will merge to 'next'. * ks/branch-set-upstream (2017-08-17) 3 commits - branch: quote branch/ref names to improve readability - builtin/branch: stop supporting the "--set-upstream" option - t3200: cleanup cruft of a test "branch --set-upstream" that has been deprecated in Git 1.8 has finally been retired. Will merge to 'next'. * mg/format-ref-doc-fix (2017-08-18) 2 commits - Documentation/git-for-each-ref: clarify peeling of tags for --format - Documentation: use proper wording for ref format strings Doc fix. Will merge to 'next'. * po/read-graft-line (2017-08-18) 4 commits - commit: rewrite read_graft_line - commit: allocate array using object_id size - commit: replace the raw buffer with strbuf in read_graft_line - sha1_file: fix definition of null_sha1 Conversion from uchar[20] to struct object_id continues; this is to ensure that we do not assume sizeof(struct object_id) is the same as the length of SHA-1 hash (or length of longest hash we support). Will merge to 'next'. * sb/submodule-parallel-update (2017-08-17) 1 commit - submodule.sh: remove unused variable Code clean-up. Will merge to 'next'. * jc/apply-with-crlf (2017-08-16) 6 commits . apply: clarify read_old_data() is about no-index case . apply: localize the CRLF business to read_old_data() . apply: only pay attention to CRLF in the preimage . apply: remove unused field apply_state.flags . apply: file commited with CRLF should roundtrip diff and apply - convert: add SAFE_CRLF_KEEP_CRLF (this branch is tangled with tb/apply-with-crlf.) Will discard as it now is part of the tb/apply-with-crlf topic. -- [Stalled] * mg/status-in-progress-info (2017-05-10) 2 commits - status --short --inprogress: spell it as --in-progress - status: show in-progress info for short status "git status" learns an option to report various operations (e.g. "merging") that the user is in the middle of. cf.* nd/worktree-move (2017-04-20) 6 commits - worktree remove: new command - worktree move: refuse to move worktrees with submodules - worktree move: accept destination as directory - worktree move: new command - worktree.c: add update_worktree_location() - worktree.c: add validate_worktree() "git worktree" learned move and remove
Re: [bug] Git submodule command interprets switch as argument and switch
Hi, R0b0t1 wrote: > The issue is as follows: > > R0b0t1@host:~/devel/project$ git submodule add > https://github.com/user/project -f > Cloning into '/home/R0b0t1/devel/project/-f'... Thanks for reporting. Confusingly, I think this is intended behavior. "git help submodule" explains: add [-b ] [-f|--force] [--name ] [--reference ] [--depth ] [--] [] Add the given repository as a submodule at the given path [etc] Since the -f comes after , it is a . That said, there are a few related things wrong here. The usage string above says I can put "--" before the to make things extra unambiguous. But when I try that, I get the following result: $ git submodule add -- https://gerrit.googlesource.com/gerrit -f Cloning into '/tmp/t/test/-f'... [...] Resolving deltas: 100% (215796/215796), done. /usr/lib/git-core/git-submodule: line 261: cd: -f: invalid option cd: usage: cd [-L|[-P [-e]] [-@]] [dir] Unable to checkout submodule '-f' If I try to put the "--" between and , I get another confusing result: $ git submodule add https://gerrit.googlesource.com/gerrit -- -f '--' already exists in the index "git help cli" is supposed to give advice about this kind of thing as well --- e.g., it gives some sound advice about what form of flags scripts should use (e.g., to always use the 'stuck' form --name= instead of --name name). But it doesn't mention this issue of flags belonging before other arguments. Thoughts? Thanks, Jonathan
Re: [PATCH v4 4/4] commit: rewrite read_graft_line
Patryk Obarawrites: > Actually, I don't think I needed to remove free(graft) line, but I don't > know if freeing NULL is considered ok in git code. Let me know if I > should bring it back, please. Either free(graft) or assert(!graft) is fine, but we should have one of them there. I'll add assert(!graft) there while queuing, at least for now. In the current code, when the control reaches the bad_graft_data label, 'graft' must be NULL, or there is a bug in our code. Because we are parsing exactly the same input using the same helper routines in both passes, we should see failure during the first pass before 'graft' points to an allocated piece of memory. So it may be a good idea to have assert(!graft) there than free(graft); the latter would sweep a potential bug under the carpet. If this were a part of the system whose design is still fluid (it is not), it is not implausible that we would later want to add new test that jumps to the bad_graft_data label. For example, after the "runs exactly twice" loop, we may add a new test that iterates over the graft->parents[] to ensure that there is no duplicate and jumps to bad_graft_data when we find one. If we add assert(!graft) there today, and if such an enhancement forgets to replace it with free(graft), the assert() will catch the mistake. If we have neither, it makes it more likely that such an enhancement leaves a possible memory leak in its error codepath. Thanks.
Re:
Hello, I wish to seek for your assistance in a deal that will be of mutual benefit for the both of us from Camp Stanley in Uijeongbu. Please contact me for details, God bless you.
Re: [PATCH v4 4/4] commit: rewrite read_graft_line
Ok, so that's an option - in this instance free is not actually needed because it can be triggered only in phase 0, but it would add a bit of robustness. -- | ← Ceci n'est pas une pipe Patryk Obara
Re: [bug] Git submodule command interprets switch as argument and switch
On Thu, Aug 17, 2017 at 10:16 PM, R0b0t1wrote: > The issue is as follows: > > R0b0t1@host:~/devel/project$ git submodule add > https://github.com/user/project -f > Cloning into '/home/R0b0t1/devel/project/-f'... > > My .gitignore's first line is *, and then I explicitly allow things. > Despite the presence of "project/" in the .gitignore the submodule > command says it is ignored. That might indicate that another submodule command doesn't cope with submodule names that look like a common flag. > The "force" flag is interpreted as a flag > and also as the destination directory. > > It is possible the argument parsing code for other commands exhibits this > error. Yes, though these other commands are in C, not in shell. Note that Prathamesh is currently porting the "git submodule" command to C, which would allow us to fix this bug easily. Also note that the -f is ambigious, what if the user meant to have the submodule at path "-f" ? (This issue comes up in many other commands, for example when a path and a branch name is accepted, the path of a potentially deleted file. To solve this git accepts a double dash, which signals git that anything after the double dash there are arguments not to be interpreted as a command line flag. > > R0b0t1.
Re: [PATCH v4 4/4] commit: rewrite read_graft_line
Patryk Obarawrites: > Actually, I don't think I needed to remove free(graft) line, but I don't > know if freeing NULL is considered ok in git code. Let me know if I > should bring it back, please. Calling free(var) when var may or may not be NULL is perfectly fine. We even discourage people from writing: if (var) free(var); because an unconditional call to free(var) is sufficient.
Re: Submodule regression in 2.14?
Stefan Bellerwrites: >> In the past "submodule..update=none" was an easy way >> to selectively disable certain Submodules. >> >> How would I do this with Git 2.14? > > submodule..active = false > >> My gut feeling is that all commands should respect the >> "submodule..update=none" setting. > > Well my gut feeling was that the "update" part of the name > reponds to the subcommand, not the generic action. > > For example when you set update=none, git-status, > recursive git-diff still reported the submodule. Both status and diff are read-only operations, so this smells like a bit bogus argument made by comparing apples and oranges. I think Lars is more interested in operations that actually affects the state of submodules by updating them---"submodule update" may be a prime example as it goes down to run fetch, pull and/or checkout. It may have been the only thing that affected the state of submodules before the "--recurse-submodules" option was added to commands that affect the state of the (super)project, but I would think that it is not so wrong to expect that these state-affecting operations running in the "recurse into submodules" mode to honor "do not update this submodule" that used to be honored only by "submodule update".
Re: Submodule regression in 2.14?
On Fri, Aug 18, 2017 at 9:50 AM, Junio C Hamanowrote: > I am not sure if I follow. Submodules are not trees and one of the > reasons people may want to separate things into different modules is > so that they can treat them differently. If submodules allow you > a richer set of operations than a tree that is part of a monolithic > project, is that necessarily a bad thing? It is not a bad thing on its own, but we have to consider which additional actions are useful. Jonathan brought up the following very long term vision: Eventually the everyday git commands do not treat submodules any special than trees, even the submodules git directory may be non existent (everything is absorbed into the superproject); so it really feels like a monorepo. When you want to work on a submodule individually, you have to make a new working tree.
Re: [PATCH] add test for bug in git-mv with nested submodules
> I just copied the shortcut that they were adding themselfes as submodule > in 'setup submodule'. The whole setup of submodules in this test is like > this. This way we already had a nested submodule structure which I could > just add. > > I agree that this is unrealistic so I can change that in the test I am > adding. But from what I have seen, this shortcut is taken in quite some > places when dealing with submodules. Please do not make it worse. Once upon a time (late '16 IIRC) I had a series floating on the list removing all occurrences, but there were issues with the series and it did not land.
Re: [PATCH v4 4/4] commit: rewrite read_graft_line
Actually, I don't think I needed to remove free(graft) line, but I don't know if freeing NULL is considered ok in git code. Let me know if I should bring it back, please. -- | ← Ceci n'est pas une pipe Patryk Obara
[PATCH v4 0/4] Modernize read_graft_line implementation
Changes since v3: - Commit replacing raw buffer does not store temporary pointer to strbuf internals any more. - Commit message of patch 4 explains all alternative approaches considered so far. - Patch 4 uses two-phases to parse graft line, without code repetition. I have my reservations about patch 4 from readability standpoint (it's not immediately clear why parsing code can skip freeing of graft in phase 2), but this implementation seems to address every issue raised in review so far. If you'll prefer me to go back to impementation from v3, I have it prepared ;) Patryk Obara (4): sha1_file: fix definition of null_sha1 commit: replace the raw buffer with strbuf in read_graft_line commit: allocate array using object_id size commit: rewrite read_graft_line builtin/blame.c | 2 +- commit.c| 45 + commit.h| 2 +- sha1_file.c | 2 +- 4 files changed, 28 insertions(+), 23 deletions(-) -- 2.9.5
[PATCH v4 4/4] commit: rewrite read_graft_line
Old implementation determined number of hashes by dividing length of line by length of hash, which works only if all hash representations have same length. New graft line parser works in two phases: 1. In first phase line is scanned to verify correctness and compute number of hashes, then graft struct is allocated. 2. In second phase line is scanned again to fill up already allocated graft struct. This way graft parsing code can support different sizes of hashes without any further code adaptations. A number of alternative implementations were considered and discarded: - Modifying graft structure to store oid_array instead of FLEXI_ARRAY indicates undesirable usage of struct to readers. - Parsing into temporary string_list or oid_array complicates code by adding more return paths, as these structures needs to be cleared before returning from function. - Determining number of hashes by counting separators might cause maintenance issues, if this function needs to be modified in future again. Signed-off-by: Patryk Obara--- commit.c | 35 --- 1 file changed, 20 insertions(+), 15 deletions(-) diff --git a/commit.c b/commit.c index 436eb34..3eefd9d 100644 --- a/commit.c +++ b/commit.c @@ -137,32 +137,37 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups) struct commit_graft *read_graft_line(struct strbuf *line) { /* The format is just "Commit Parent1 Parent2 ...\n" */ - int i; + int i, phase; + const char *tail = NULL; struct commit_graft *graft = NULL; - const int entry_size = GIT_SHA1_HEXSZ + 1; + struct object_id dummy_oid, *oid; strbuf_rtrim(line); if (!line->len || line->buf[0] == '#') return NULL; - if ((line->len + 1) % entry_size) - goto bad_graft_data; - i = (line->len + 1) / entry_size - 1; - graft = xmalloc(st_add(sizeof(*graft), - st_mult(sizeof(struct object_id), i))); - graft->nr_parent = i; - if (get_oid_hex(line->buf, >oid)) - goto bad_graft_data; - for (i = GIT_SHA1_HEXSZ; i < line->len; i += entry_size) { - if (line->buf[i] != ' ') - goto bad_graft_data; - if (get_sha1_hex(line->buf + i + 1, graft->parent[i/entry_size].hash)) + /* +* phase 0 verifies line, counts hashes in line and allocates graft +* phase 1 fills graft +*/ + for (phase = 0; phase < 2; phase++) { + oid = graft ? >oid : _oid; + if (parse_oid_hex(line->buf, oid, )) goto bad_graft_data; + for (i = 0; *tail != '\0'; i++) { + oid = graft ? >parent[i] : _oid; + if (!isspace(*tail++) || parse_oid_hex(tail, oid, )) + goto bad_graft_data; + } + if (!graft) { + graft = xmalloc(st_add(sizeof(*graft), + st_mult(sizeof(struct object_id), i))); + graft->nr_parent = i; + } } return graft; bad_graft_data: error("bad graft data: %s", line->buf); - free(graft); return NULL; } -- 2.9.5
[PATCH v4 1/4] sha1_file: fix definition of null_sha1
The array is declared in cache.h as: extern const unsigned char null_sha1[GIT_MAX_RAWSZ]; Definition in sha1_file.c must match. Signed-off-by: Patryk Obara--- sha1_file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/sha1_file.c b/sha1_file.c index b60ae15..f5b5bec 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -32,7 +32,7 @@ #define SZ_FMT PRIuMAX static inline uintmax_t sz_fmt(size_t s) { return s; } -const unsigned char null_sha1[20]; +const unsigned char null_sha1[GIT_MAX_RAWSZ]; const struct object_id null_oid; const struct object_id empty_tree_oid = { EMPTY_TREE_SHA1_BIN_LITERAL -- 2.9.5
[PATCH v4 3/4] commit: allocate array using object_id size
struct commit_graft aggregates an array of object_id's, which have size >= GIT_MAX_RAWSZ bytes. This change prevents memory allocation error when size of object_id is larger than GIT_SHA1_RAWSZ. Signed-off-by: Patryk Obara--- commit.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/commit.c b/commit.c index 1a0a9f2..436eb34 100644 --- a/commit.c +++ b/commit.c @@ -147,7 +147,8 @@ struct commit_graft *read_graft_line(struct strbuf *line) if ((line->len + 1) % entry_size) goto bad_graft_data; i = (line->len + 1) / entry_size - 1; - graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i))); + graft = xmalloc(st_add(sizeof(*graft), + st_mult(sizeof(struct object_id), i))); graft->nr_parent = i; if (get_oid_hex(line->buf, >oid)) goto bad_graft_data; -- 2.9.5
[PATCH v4 2/4] commit: replace the raw buffer with strbuf in read_graft_line
This simplifies function declaration and allows for use of strbuf_rtrim instead of modifying buffer directly. Signed-off-by: Patryk Obara--- builtin/blame.c | 2 +- commit.c| 23 +++ commit.h| 2 +- 3 files changed, 13 insertions(+), 14 deletions(-) diff --git a/builtin/blame.c b/builtin/blame.c index bda1a78..d4472e9 100644 --- a/builtin/blame.c +++ b/builtin/blame.c @@ -488,7 +488,7 @@ static int read_ancestry(const char *graft_file) return -1; while (!strbuf_getwholeline(, fp, '\n')) { /* The format is just "Commit Parent1 Parent2 ...\n" */ - struct commit_graft *graft = read_graft_line(buf.buf, buf.len); + struct commit_graft *graft = read_graft_line(); if (graft) register_commit_graft(graft, 0); } diff --git a/commit.c b/commit.c index 8b28415..1a0a9f2 100644 --- a/commit.c +++ b/commit.c @@ -134,34 +134,33 @@ int register_commit_graft(struct commit_graft *graft, int ignore_dups) return 0; } -struct commit_graft *read_graft_line(char *buf, int len) +struct commit_graft *read_graft_line(struct strbuf *line) { /* The format is just "Commit Parent1 Parent2 ...\n" */ int i; struct commit_graft *graft = NULL; const int entry_size = GIT_SHA1_HEXSZ + 1; - while (len && isspace(buf[len-1])) - buf[--len] = '\0'; - if (buf[0] == '#' || buf[0] == '\0') + strbuf_rtrim(line); + if (!line->len || line->buf[0] == '#') return NULL; - if ((len + 1) % entry_size) + if ((line->len + 1) % entry_size) goto bad_graft_data; - i = (len + 1) / entry_size - 1; + i = (line->len + 1) / entry_size - 1; graft = xmalloc(st_add(sizeof(*graft), st_mult(GIT_SHA1_RAWSZ, i))); graft->nr_parent = i; - if (get_oid_hex(buf, >oid)) + if (get_oid_hex(line->buf, >oid)) goto bad_graft_data; - for (i = GIT_SHA1_HEXSZ; i < len; i += entry_size) { - if (buf[i] != ' ') + for (i = GIT_SHA1_HEXSZ; i < line->len; i += entry_size) { + if (line->buf[i] != ' ') goto bad_graft_data; - if (get_sha1_hex(buf + i + 1, graft->parent[i/entry_size].hash)) + if (get_sha1_hex(line->buf + i + 1, graft->parent[i/entry_size].hash)) goto bad_graft_data; } return graft; bad_graft_data: - error("bad graft data: %s", buf); + error("bad graft data: %s", line->buf); free(graft); return NULL; } @@ -174,7 +173,7 @@ static int read_graft_file(const char *graft_file) return -1; while (!strbuf_getwholeline(, fp, '\n')) { /* The format is just "Commit Parent1 Parent2 ...\n" */ - struct commit_graft *graft = read_graft_line(buf.buf, buf.len); + struct commit_graft *graft = read_graft_line(); if (!graft) continue; if (register_commit_graft(graft, 1)) diff --git a/commit.h b/commit.h index 6d857f0..baecc0a 100644 --- a/commit.h +++ b/commit.h @@ -247,7 +247,7 @@ struct commit_graft { }; typedef int (*each_commit_graft_fn)(const struct commit_graft *, void *); -struct commit_graft *read_graft_line(char *buf, int len); +struct commit_graft *read_graft_line(struct strbuf *line); int register_commit_graft(struct commit_graft *, int); struct commit_graft *lookup_commit_graft(const struct object_id *oid); -- 2.9.5
Re: [PATCH v3 4/4] commit: rewrite read_graft_line
Patryk Obarawrites: > Ah! I presumed two separate loops, one throwing away oids and second > one actually filling a table - this makes more sense. I was just about > to send v4, but will rewrite the last patch and we'll see how it looks > like. Yeah, it is understandable if you missed my "a loop that runs exactly twice", as that pattern, while we do use it in a few places in our codebase, is of limited applicability in general---the cost of discarded computation in the first pass need to be low enough for the improved maintainability to make sense.
Re: [PATCH] add test for bug in git-mv with nested submodules
On Thu, Aug 17, 2017 at 12:05:56PM -0700, Stefan Beller wrote: > On Thu, Aug 17, 2017 at 3:34 AM, Heiko Voigtwrote: > > When using git-mv with a submodule it will detect that and update the > > paths for its configurations (.gitmodules, worktree and gitfile). This > > does not work for nested submodules where a user renames the root > > submodule. > > > > We discovered this fact when working on on-demand fetch for renamed > > submodules. Lets add a test to document. > > > > Signed-off-by: Heiko Voigt > > --- > > t/t7001-mv.sh | 9 + > > 1 file changed, 9 insertions(+) > > > > diff --git a/t/t7001-mv.sh b/t/t7001-mv.sh > > index e365d1f..39f8aed 100755 > > --- a/t/t7001-mv.sh > > +++ b/t/t7001-mv.sh > > @@ -491,4 +491,13 @@ test_expect_success 'moving a submodule in nested > > directories' ' > > test_cmp actual expect > > ' > > > > +test_expect_failure 'moving nested submodules' ' > > + git commit -am "cleanup commit" && > > + git submodule add ./. sub_nested && > > If possible, I would avoid adding the repo itself > as a submodule as it is unrealistic in the wild. > > While it may be ok for the test here, later down the road > other tests making use of it it may become an issue with > the URL of the submodule. I just copied the shortcut that they were adding themselfes as submodule in 'setup submodule'. The whole setup of submodules in this test is like this. This way we already had a nested submodule structure which I could just add. I agree that this is unrealistic so I can change that in the test I am adding. But from what I have seen, this shortcut is taken in quite some places when dealing with submodules. Cheers Heiko
Re: Submodule regression in 2.14?
> In the past "submodule..update=none" was an easy way > to selectively disable certain Submodules. > > How would I do this with Git 2.14? submodule..active = false > My gut feeling is that all commands should respect the > "submodule..update=none" setting. Well my gut feeling was that the "update" part of the name reponds to the subcommand, not the generic action. For example when you set update=none, git-status, recursive git-diff still reported the submodule. > > - Lars
Re: [PATCH v3 4/4] commit: rewrite read_graft_line
Ah! I presumed two separate loops, one throwing away oids and second one actually filling a table - this makes more sense. I was just about to send v4, but will rewrite the last patch and we'll see how it looks like. -- | ← Ceci n'est pas une pipe Patryk Obara
Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option
On 18 August 2017 at 18:30, Junio C Hamanowrote: > Kaartic Sivaraam writes: > >> On Friday 18 August 2017 01:25 AM, Junio C Hamano wrote: >>> Martin Ågren writes: >>> On 17 August 2017 at 04:54, Kaartic Sivaraam wrote: > Helped-by: Martin Ågren , Junio C Hamano > > Signed-off-by: Kaartic Sivaraam I didn't expect a "Helped-by", all I did was to give some random comments. :-) I'm not so sure about the comma-separation, that seems to be a first in the project. >>> I didn't either ;-) >>> >>> The line looks odd so I'll remove it while queuing. >>> >>> Thanks for noticing. >> I should have been better with my wordings :) How about converting that >> line into two 'Suggestions-by:' or 'Reviewed-by:' ? > > I personally do not think either is needed for those small things we > saw in the discussion. > > Unless Martin feels strongly about it, that is. No, no strong feelings. Thanks.
Re: [PATCH 2/2] Documentation/git-for-each-ref: clarify peeling of tags for --format
Michael J Gruberwrites: > `*` in format strings means peeling of tag objects so that object field > names refer to the object that the tag object points at, instead of the > tag object itself. > > Currently, this is documented using grammar that is clearly inspired by > classical latin, though missing more than an article in order to be > classical english. ;-) Thanks, both patches look good to me. > > Try and straighten that explanation out a bit. > > Signed-off-by: Michael J Gruber > --- > Documentation/git-for-each-ref.txt | 5 +++-- > 1 file changed, 3 insertions(+), 2 deletions(-) > > diff --git a/Documentation/git-for-each-ref.txt > b/Documentation/git-for-each-ref.txt > index dac9138fab..bb370c9c7b 100644 > --- a/Documentation/git-for-each-ref.txt > +++ b/Documentation/git-for-each-ref.txt > @@ -41,8 +41,9 @@ OPTIONS > A string that interpolates `%(fieldname)` from a ref being shown > and the object it points at. If `fieldname` > is prefixed with an asterisk (`*`) and the ref points > - at a tag object, the value for the field in the object > - tag refers is used. When unspecified, defaults to > + at a tag object, use the value for the field in the object > + which the tag object refers to (instead of the field in the tag object). > + When unspecified, `` defaults to > `%(objectname) SPC %(objecttype) TAB %(refname)`. > It also interpolates `%%` to `%`, and `%xx` where `xx` > are hex digits interpolates to character with hex code
Re: Submodule regression in 2.14?
Stefan Bellerwrites: > On Thu, Aug 17, 2017 at 7:13 PM, Junio C Hamano wrote: >> Stefan Beller writes: >> >>> Are you saying this might be a design mistake and >>> the .update ought to be respected by all the other >>> commands? For example >>> git reset --recurse-submodules >>> should ignore the .update= none? >> >> I have been under the impression that that has been the traditional >> desire of what .update ought to mean. I personally do not have a >> strong opinion---at least not yet. > > In this context note v2.14.0-rc1-34-g7463e2ec3 > (bw/submodule-config-cleanup~7, "unpack-trees: > don't respect submodule.update") that is going opposite of > your impression. Exactly. We are in agreement that recent developments seem to go against the traditional desire and it is understandable Lars sees this as a regression. I still do not have a strong opinion either way, if this is a regression or a progress. > Maybe, I'll think about it. However there is no such > equivalent for trees (and AFAICT never came up) to > treat a specific directory other than the rest in worktree > operations. I am not sure if I follow. Submodules are not trees and one of the reasons people may want to separate things into different modules is so that they can treat them differently. If submodules allow you a richer set of operations than a tree that is part of a monolithic project, is that necessarily a bad thing?
Re: [PATCH v3 4/4] commit: rewrite read_graft_line
Patryk Obarawrites: > Junio C Hamano wrote: >> >> If I were doing the two-pass thing, I'd probably write a for loop >> that runs exactly twice, where the first iteration parses into a >> single throw-away oid struct only to count, and the second iteration >> parses the same input into the allocated array of oid struct. That >> way, you do not have to worry about two phrases going out of sync. > > Two passes would still differ in error handling due to xmalloc between them… I am not sure if I follow. What I meant was something along these lines: struct commit_graft *graft = NULL; char *line = ... what you read from the file ...; int phase; /* phase #0 counts, phase #1 fills */ for (phase = 0; phase < 2; phase++) { int count; char *scan; strucut object_id dummy_oid, *oid; for (scan = line, count = 0; *scan; count++) { oid = graft ? >parent[count] : _oid; if (parse_oid_hex(scan, oid, )) return error(...); switch (*scan) { case ' ': scan++; continue; /* there are more */ case '\0': break; /* we are done */ default: return error(...); } } if (!graft) graft = xmalloc(... with 'count' parentes ...); } /* now we have graft with parent[count] all filled */ return graft; The inner for() loop will do the same parsing for both passes, leaving little chance for programming errors to make the two passes decide there are different number of fake parents. I suspect I may have botched some details in that loop, but both passes will even share the same buggy counting when the code is structured that way ;-) That is what I meant by "not have to worry about two phases going out of sync".
Re: [RFC PATCH 1/2] implement fetching of moved submodules
On Thu, Aug 17, 2017 at 10:20:13AM -0700, Stefan Beller wrote: > On Thu, Aug 17, 2017 at 3:53 AM, Heiko Voigtwrote: > > We store the changed submodules paths to calculate which submodule needs > > fetching. This does not work for moved submodules since their paths do > > not stay the same in case of a moved submodules. In case of new > > submodules we do not have a path in the current checkout, since they > > just appeared in this fetch. > > > > It is more general to collect the submodule names for changes instead of > > their paths to include the above cases. > > > > With the change described above we implement 'on-demand' fetching of > > changes in moved submodules. > > This sounds as if this would also enable fetching new submodules > eventually? Yes that was the goal when starting with these changes back then. But it took more time than I had back then. So instead of letting these changes sit bitrot again lets see if we can get them integrated. For new submodules we need to change the iteration somehow. Currently we are iterating through the index. But new submodules obviously do not have an index entry (otherwise they would not be new). So instead of the index we will need to create another list that contains "all" submodules. Maybe something like: all submodules from the index plus all submodules that changed / are new? We could also go further and inspect all submodules from all ref tips to handle submodules on other branches configured to 'yes'. But I think we should leave that for later if need arises. Some merge of index and additional submodules is needed, because for --recurse-submodules=yes or submodule..fetchRecurseSubmodules=yes we always need to run fetch inside the submodule. That would break if we only looked at submodules that are collected as changed. > > Note: This does only work when repositories have a .gitmodules file. In > > other words: It breaks if we do not get a name for a repository. > > IIRC, consensus was that this is a requirement to get nice submodule > > handling these days? > > I think that should have been the consensus since ~1.7.8 (since the > submodules git dir can live inside the superprojects > /module/). I agree but since we started without it, we kind of have a mixed state. > A gitlink entry without corresponding .gitmodules entry is just a gitlink. > If we happen to have a repository at that path of the gitlink, we can > be nice and pretend like it is a functional submodule, but it really is > not. It's just another repo inside the superproject that happen to live > at the path of a gitlink. Yeah but at the moment we are handling 'on-demand' fetches and stuff for such just gitlink submodules. If we were firm on that requirement we would just skip those but that is not the case with the current implementation. > > Signed-off-by: Heiko Voigt > > --- > > > > I updated the leftover code from my series implementing recursive fetch > > for moved submodules[1] to the current master. > > > > This breaks t5531 and t5545 because they do not use a .gitmodules file. > > > > I also have some code leftover that does fallback on paths in case no > > submodule names can be found. But I do not really like it. The question > > here is how far do we support not using .gitmodules. Is it e.g. > > reasonable to say: "For --recurse-submodules=on-demand you need a > > .gitmodules file?" > > I would not intentionally break users here, but any new functionality can > safely assume (a) we have a proper .gitmodules entry or (b) it is not a > submodule, so do nothing/be extra careful. > > For example in recursive diff sort of makes sense to also handle > non-submodule gitlinks, but fetch is harder to tell. Well we have a few different cases for gitlinks without .gitmodule entry: 1. New gitlink: We can not handle since we do not know where to clone from. 2. Removed gitlink: No need to do anything in fetch 3. Changed (but same name) gitlink: We can / and currently do run fetch in it 4. Renamed: We currently skip those. We could probably do something to track the rename and run fetch in case of gitlink changes. In my current approach only the ones with a name are handled. So I guess I will add a fallback to paths for 3. so we do not unnecessarily break users using the current implementation. > (just last night I was rereading > https://public-inbox.org/git/CAJo=hjvnapnaddcaawavu9c4rveqdos3ev9wtguhx4fd0v_...@mail.gmail.com/ > which I think is a super cute application of gitlinks. If you happen > to checkout such > a tree, you don't want to fetch all of the fake submodules) > > > > > [1] > > https://public-inbox.org/git/f5baa2acc09531a16f4f693eebbe60706bb8ed1e.1361751905.git.hvo...@hvoigt.net/ > > Oha, that is from way back in the time. :) Yeah this code did go through some proper bitrotting :) > > submodule.c | 92 >
Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option
Kaartic Sivaraamwrites: > On Monday 14 August 2017 02:20 PM, Kaartic Sivaraam wrote: >> >> On Wednesday 09 August 2017 12:03 AM, Martin Ågren wrote: >>> >>> Maybe the final note could be removed? Someone who is looking up >>> --set-upstream because Git just "crashed" on them will only want to know >>> what they should do instead. Our thoughts about the future are perhaps >>> not that interesting. >> >> I thought it's better to document it to avoid people from getting > surprised >> when the options *starts working* again. > > I hope that explains the reason. That is something we can say we _actually_ repurpose the option. Until then, it is merely noise that distracts the users.
Re: [PATCH v4 2/3] builtin/branch: stop supporting the use of --set-upstream option
Kaartic Sivaraamwrites: > On Friday 18 August 2017 01:25 AM, Junio C Hamano wrote: >> Martin Ågren writes: >> >>> On 17 August 2017 at 04:54, Kaartic Sivaraam >>> wrote: Helped-by: Martin Ågren , Junio C Hamano Signed-off-by: Kaartic Sivaraam >>> I didn't expect a "Helped-by", all I did was to give some random >>> comments. :-) I'm not so sure about the comma-separation, that seems to >>> be a first in the project. >> I didn't either ;-) >> >> The line looks odd so I'll remove it while queuing. >> >> Thanks for noticing. > I should have been better with my wordings :) How about converting that > line into two 'Suggestions-by:' or 'Reviewed-by:' ? I personally do not think either is needed for those small things we saw in the discussion. Unless Martin feels strongly about it, that is. Thanks.
Re: [RFC PATCH 2/2] submodule: simplify decision tree whether to or not to fetch
On Thu, Aug 17, 2017 at 10:50:07AM -0700, Brandon Williams wrote: > On 08/17, Stefan Beller wrote: > > On Thu, Aug 17, 2017 at 4:00 AM, Heiko Voigtwrote: > > > To make extending this logic later easier. > > > > > > Signed-off-by: Heiko Voigt > > > --- > > > I am quite sure I replicated the same logic but a few more eyes would be > > > appreciated. > > > > A code cleanup is appreciated! > > > > I thought Brandon had a series in flight doing a very similar cleanup here, > > but in master..pu there is nothing to be found. > > Yeah there are 2 series in flight which will probably conflict here. > bw/grep-recurse-submodules and bw/submodule-config-cleanup Ok then I will wait until those are in and then see if I can base the cleanup on top. I think it is only necessary as a preparation for the fully fledged fetch configuration logic mess we will get into once we get to the full recursive submodule fetch implementation. Not necessarily needed for the moved submodules. > > > > The code looks good to me. Thanks. Cheers Heiko
[PATCH 2/2] Documentation/git-for-each-ref: clarify peeling of tags for --format
`*` in format strings means peeling of tag objects so that object field names refer to the object that the tag object points at, instead of the tag object itself. Currently, this is documented using grammar that is clearly inspired by classical latin, though missing more than an article in order to be classical english. Try and straighten that explanation out a bit. Signed-off-by: Michael J Gruber--- Documentation/git-for-each-ref.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt index dac9138fab..bb370c9c7b 100644 --- a/Documentation/git-for-each-ref.txt +++ b/Documentation/git-for-each-ref.txt @@ -41,8 +41,9 @@ OPTIONS A string that interpolates `%(fieldname)` from a ref being shown and the object it points at. If `fieldname` is prefixed with an asterisk (`*`) and the ref points - at a tag object, the value for the field in the object - tag refers is used. When unspecified, defaults to + at a tag object, use the value for the field in the object + which the tag object refers to (instead of the field in the tag object). + When unspecified, `` defaults to `%(objectname) SPC %(objecttype) TAB %(refname)`. It also interpolates `%%` to `%`, and `%xx` where `xx` are hex digits interpolates to character with hex code -- 2.14.1.224.g15ee91971c
[PATCH 1/2] Documentation: use proper wording for ref format strings
Various commands list refs and allow to use a format string for the output that interpolates from the ref as well as the object it points at (for-each-ref; branch and tag in list mode). Currently, the documentation talks about interpolating from the object. This is confusing because a ref points to an object but not vice versa, so the object cannot possible know %(refname), for example. Thus, this is wrong independent of refs being objects (one day, maybe) or not. Change the wording to make this clearer (and distinguish it from formats for the log family). Signed-off-by: Michael J Gruber--- Documentation/git-branch.txt | 4 ++-- Documentation/git-for-each-ref.txt | 4 ++-- Documentation/git-tag.txt | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/Documentation/git-branch.txt b/Documentation/git-branch.txt index 81bd0a7b77..d0b3358771 100644 --- a/Documentation/git-branch.txt +++ b/Documentation/git-branch.txt @@ -267,8 +267,8 @@ start-point is either a local or remote-tracking branch. Only list branches of the given object. --format :: - A string that interpolates `%(fieldname)` from the object - pointed at by a ref being shown. The format is the same as + A string that interpolates `%(fieldname)` from a branch ref being shown + and the object it points at. The format is the same as that of linkgit:git-for-each-ref[1]. Examples diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt index cc42c12832..dac9138fab 100644 --- a/Documentation/git-for-each-ref.txt +++ b/Documentation/git-for-each-ref.txt @@ -38,8 +38,8 @@ OPTIONS key. :: - A string that interpolates `%(fieldname)` from the - object pointed at by a ref being shown. If `fieldname` + A string that interpolates `%(fieldname)` from a ref being shown + and the object it points at. If `fieldname` is prefixed with an asterisk (`*`) and the ref points at a tag object, the value for the field in the object tag refers is used. When unspecified, defaults to diff --git a/Documentation/git-tag.txt b/Documentation/git-tag.txt index d97aad3439..543fb425ee 100644 --- a/Documentation/git-tag.txt +++ b/Documentation/git-tag.txt @@ -188,8 +188,8 @@ This option is only applicable when listing tags without annotation lines. Defaults to HEAD. :: - A string that interpolates `%(fieldname)` from the object - pointed at by a ref being shown. The format is the same as + A string that interpolates `%(fieldname)` from a tag ref being shown + and the object it points at. The format is the same as that of linkgit:git-for-each-ref[1]. When unspecified, defaults to `%(refname:strip=2)`. -- 2.14.1.224.g15ee91971c
Re: [RFC PATCH] Updated "imported object" design
On 8/17/2017 5:39 PM, Jonathan Tan wrote: Thanks for your comments. I'll reply to both your e-mails in this one e-mail. This illustrates another place we need to resolve the naming/vocabulary. We should at least be consistent to make it easier to discuss/explain. We obviously went with "virtual" when building GVFS but I'm OK with "lazy" as long as we're consistent. Some examples of how the naming can clarify or confuse: 'Promise-enable your repo by setting the "extensions.lazyObject" flag' 'Enable your repo to lazily fetch objects by setting the "extensions.lazyObject"' 'Virtualize your repo by setting the "extensions.virtualize" flag' We may want to carry the same name into the filename we use to mark the (virtualized/lazy/promised/imported) objects. (This reminds me that there are only 2 hard problems in computer science...) ;) Good point about the name. Maybe the 2nd one is the best? (Mainly because I would expect a "virtualized" repo to have virtual refs too.) But if there was a good way to refer to the "anti-projection" in a virtualized system (that is, the "real" thing or "object" behind the "virtual" thing or "image"), then maybe the "virtualized" language is the best. (And I would gladly change - I'm having a hard time coming up with a name for the "anti-projection" in the "lazy" language.) The most common "anti-virtual" language I'm familiar with is "physical." Virtual machine <-> physical machine. Virtual world <-> physical world. Virtual repo, commit, tree, blob - physical repo, commit, tree, blob. I'm not thrilled but I think it works... Also, I should probably standardize on "lazily fetch" instead of "lazily load". I didn't want to overlap with the existing fetching, but after some thought, it's probably better to do that. The explanation would thus be that you can either use the built-in Git fetcher (to be built, although I have an old version here [1]) or supply a custom fetcher. [1] https://github.com/jonathantanmy/git/commits/partialclone I think this all works and would meet the requirements we've been discussing. The big trade off here vs what we first discussed with promises is that we are generating the list of promises on the fly when they are needed rather than downloading and maintaining a list locally. My biggest concern with this model is the cost of opening and parsing every imported object (loose and pack for local and alternates) to build the oidset of promises. In fsck this probably won't be an issue as it already focuses on correctness at the expense of speed. I'm more worried about when we add the same/similar logic into check_connected. That impacts fetch, clone, and receive_pack. I guess the only way we can know for sure it to do a perf test and measure the impact. As for fetching from the main repo, the connectivity check does not need to be performed at all because all objects are "imported", so the performance of the connectivity check does not matter. Same for cloning. Very good point! I got stuck on connectivity check in general forgetting that we really only need to prevent sharing a corrupt repo. This is not true if you're fetching from another repo This isn't a case we've explicitly dealt with (multiple remotes into a virtualized repo). Our behavior today would be that once you set the "virtual repo" flag on the repo (this happens at clone for us), all remotes are treated as virtual as well (ie we don't differentiate behavior based on which remote was used). Our "custom fetcher" always uses "origin" and some custom settings for a cache-server saved in the .git/config file when asked to fetch missing objects. This is probably a good model to stick with at least initially as trying to solve multiple possible "virtual" remotes as well as mingling virtualized and non-virtualized remotes and all the mixed cases that can come up makes my head hurt. We should probably address that in a different thread. :) or if you're using receive-pack, but (1) I think these are not used as much in such a situation, and (2) if you do use them, the slowness only "kicks in" if you do not have the objects referred to (whether non-"imported" or "imported") and thus have to check the references in all "imported" objects. Is there any case where receive-pack is used on the client side? I'm only aware of it being used on the server side to receive packs pushed from the client. If it is not used in a virtualized client, then we would not need to do anything different for receive-pack. I think this topic should continue to move forward so that we can provide reasonable connectivity tests for fsck and check_connected in the face of partial clones. I'm not sure the prototype implementation of reading/parsing all imported objects to build the promised oidset is the most performant model but we can continue to investigate the best options. Agreed - I think the most important thing here is settling on the API (name of
Re: Submodule regression in 2.14?
> On 17 Aug 2017, at 23:55, Stefan Bellerwrote: > > On Thu, Aug 17, 2017 at 2:21 PM, Lars Schneider > wrote: >> >>> Oh, wait. >>> $ git log --oneline v2.13.0..v2.14.1 -- builtin/pull.c >>> c9c63ee558 Merge branch 'sb/pull-rebase-submodule' >>> a6d7eb2c7a pull: optionally rebase submodules (remote submodule changes >>> only) >>> could also be a culprit. Do you have pull.rebase set? >> >> I bisected the problem today and "a6d7eb2c7a pull: optionally rebase >> submodules >> (remote submodule changes only)" is indeed the culprit. >> >> The commit seems to break the following test case. >> >> diff --git a/t/t7400-submodule-basic.sh b/t/t7400-submodule-basic.sh >> index dcac364c5f..24f9729015 100755 >> --- a/t/t7400-submodule-basic.sh >> +++ b/t/t7400-submodule-basic.sh >> @@ -1289,4 +1289,19 @@ test_expect_success 'init properly sets the config' ' >>test_must_fail git -C multisuper_clone config --get >> submodule.sub1.active >> ' >> >> +test_expect_success 'submodule update and git pull with disabled submodule' >> ' >> + test_when_finished "rm -rf multisuper_clone" && >> + pwd=$(pwd) && >> + git clone file://"$pwd"/multisuper multisuper_clone && >> + ( >> + cd multisuper_clone && >> + git config --local submodule.sub0.update none && >> + git submodule update --init --recursive && >> + git pull --recurse-submodules && >> + git submodule status | cut -c 1,43- >actual >> + ) && >> + ls && >> + test_cmp expect multisuper_clone/actual >> +' > > Thanks for providing this test. > > cd trash directory.t7400-submodule-basic/multisuper_clone > cat .git/config > [submodule "sub0"] > update = none > active = true > url = file:///.../t/trash directory.t7400-submodule-basic/sub1 > > > submodule..update >The default update procedure for a submodule. >This variable is populated by git submodule init >from the gitmodules(5) file. See description of >update command in git-submodule(1). > > The first sentence of .update is misleading IMHO as the > these settings should strictly apply to the "submodule update" > command. So "pull --recurse-submodules" ought to ignore it, > instead the pull can do whatever it wants, namely treat the > submodule roughly like a tree and either merge/rebase > inside the submodule as well. The user *asked* for recursive > pull after all. > > Are you saying this might be a design mistake and > the .update ought to be respected by all the other > commands? For example >git reset --recurse-submodules > should ignore the .update= none? > > When designing these new recursive submodule functionality > outside the "submodule" command, I'd want submodules > to behave as much as possible like trees. In the past "submodule..update=none" was an easy way to selectively disable certain Submodules. How would I do this with Git 2.14? My gut feeling is that all commands should respect the "submodule..update=none" setting. - Lars
Re: Revision resolution for remote-helpers?
On Fri, Aug 18, 2017 at 03:42:08PM +0900, Mike Hommey wrote: > I was thinking it could be useful to have a special syntax for revisions > that would query a helper program. The helper program could use a > similar protocol to that of the remote helpers. That sounds like a reasonable thing to want. > My thought is that a string like :: could be used > wherever a committish is expected. That would call some helper > and request to resolve revision, and the helper would provide a git > commit as a response. So I'm guessing this would look something like: git log svn::12345 I think even without Git support, you could do something like: git log $(git svn map 12345) which is similarly complex in terms of concepts, and not too many more characters. That would be a little more awkward outside of a shell, though. But it did get me wondering if we could do _better_ with something built into Git. For example, could we have an external resolution helper that resolves names to object ids as a fallback after internal resolution has failed. And then you could do: git log 12345 and it would just work. Efficiency shouldn't be a big problem, because we'd hit the helper only in the error case. I'd be more concerned about awkward ambiguities, though. If mercurial is also using sha1s, then there's nothing syntactic to differentiate the two. For that matter, 12345 has the same problem, since it could be a partial sha1. It might work to actually check if we have the object and then bail to the remote resolver only if we don't. But that's actually conflating name resolution with object lookup, which our internals typically keep separate. So maybe this is a bad direction to go in. I'm mostly just thinking out loud here. > Which leads me to think some "virtual" ref namespace could be a solution > to the problem. So instead of ::, the prefix would be /. > For e.g. svn, svn/$remote/$rev would be a natural way to specify the > revision for a given remote. For mercurial, hg/$sha1. Interesting. I do like the general idea of having external helpers to fill in bits of the virtual namespace. But it may also open many cans of worms. :) > Potentially, this could be a sort of pluggable ref stores, which could > be used for extensions such as the currently discussed reftable. The current pluggable code is definitely geared for build-time pluggability, not runtime. But I think you could have a builtin pluggable store that does the overlay, and then chains to another backend. I.e., configure something like: [extensions] refBackend = externalOverlay [externalOverlay "svn"] namespace = refs/svn command = my-svn-mapper [externalOverlay] chain = reftable That would allow the externalOverlay thing to develop independent of the core of Git's refs code. > On the opposite end of the problem, I'm also thinking about git log > --decorate= displaying the mercurial revisions where branch > decorations would normally go. Interesting thought. I'm not sure if that would be a good thing or a bad thing. But one of the virtual methods for pluggable backends is "enumerate all refs". If you're mapping every mercurial revision, that's going to be a lot of refs (and potentially a lot of overhead for certain operations). I think the decorate code just looks at a few parts of the refs namespace right now (so a "refs/svn" would probably get ignored by default). -Peff
Re: Weirdness with git change detection
On Fri, Aug 18, 2017 at 08:56:03AM +0200, Michael J Gruber wrote: > > The idea being that users could run "git lint" if they suspect something > > funny is going on. I dunno. It may be a dead-end. Most such > > oddities are better detected and handled during actual git operations if > > we can. So this would really just be for things that are too expensive > > to detect in normal operations. > > Typically, that problem arises when you turn a filter on or off at some > point in your history. Since "attributes" can come from various sources, > especially the versioned ".gitattributes" file, unversioned per-repo > .git/info/attributes, and global attributes, "git diff" may apply > different attributes depending on what you diff (versioned blob, workdir > file, out-of-tree file). > [...] Yeah, I agree that we cannot catch every problematic case (for all the reason you give here). It does seem like a large number of problems people report have to do with checkout of in-tree .gitattributes, though. So my thinking was if we could cover that case, it might help people (even if we leave many problems unnoticed). But... > I've found that when I decide to use a filter like that, the best > approach is to either apply it retroactively (filter-branch, > unversionsed attributes, that is clean all stored blobs) or make a > commit where I specifically note the switch (versioned .gitattributes > plus affected blob changes) and what config should go along with it. One problem is that people need to know to run the lint command. And if they know enough that this is a problem worthy of checking via a linter, then they could perhaps just as easily do the in-tree blob changes. I say "perhaps" because I don't think it's as easy as running a single "git fix-my-stale-blobs". I wonder if rather than a linter, we ought to just have an option to "git checkout" or something to ignore stat data and re-checkout any entries for which convert_to_working_tree() isn't a noop. That serves both as a repair tool and as a linter (since running "git diff" on the result would show you what needs to be fixed). It wouldn't solve the user-education problem, but at least it would give a simple solution that could be passed along to users. I dunno. I don't do line ending conversion, so I don't really run into this issue myself. -Peff
Re: [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line
On Fri, Aug 18, 2017 at 12:12:37PM +0200, Patryk Obara wrote: > Jeff Kingwrote: > > > AFAICT this is only here to avoid having to s/buf/line->buf/ in the rest > > of the function. But I think we should just make that change (you > > already did in some of the spots). And IMHO we should do the same for > > line->len. When there are two names for the same value, it increases the > > chances of a bug where the two end up diverging. > > My motivation was rather to keep patch(es) as small as possible because every > line using buf will be replaced in a later patch in series. But it will make > commit better (it will stand on its own), so why not to do it? :) Ah, I didn't notice those lines went away. That does make it less bad, but I do think it's easier to review if each commit stands on its own. In some cases, if it's really painful to do the intermediate cleanup, I might say something in the commit message like "this leaves X that is not ideal, but we'll be getting rid of it soon anyway". But in this case I think just creating that intermediate state is simple enough. > Ah, I only replaced comparison to NULL terminator with length check because > I thought it better shows intention of the code and I didn't notice, that > reversing order will result in better code overall. > > I will include both changes in v4. Thanks. -Peff
Re: [PATCH v3 4/4] commit: rewrite read_graft_line
On Fri, Aug 18, 2017 at 01:30:23PM +0200, Patryk Obara wrote: > > We'd reject such an input totally (though as an interesting side effect, > > you can convince the parser to allocate 20x as much RAM as you send it; > > one oid for each space). > > Grafts are not populated during clone operation, so it really would be user > making his life miserable. I could allocate FLEXI_ARRAY of size > min(n, line->len / (GIT_*MIN*_HEXSZ+1)) instead… but I think it's not even > worth the cost of making the code more complicated (and I don't want > to reintroduce these size macros in here. > > We _could_ put an artificial limit on graft parents, though (e.g. 10) and > display an error message urging the user to stop using grafts? Yeah, sorry, I should have made more clear that this is fine. I always try to read parsing code with my paranoid hat on, but I agree that grafts aren't really exposed to untrusted entities. In general I'd prefer to avoid artificial limits unless there's a need for them. There are already spots (like receive-pack) where you can ask Git to store bytes in RAM as fast as you can send them. What I found interesting about this one was the 20:1 amplification. :) > Before sending v3 I tried two other alternative implementations (perhaps I > should've listed them in the v3 cover letter): It might even be worth listing them in the commit message. Somebody finding your commit 3 years from via "git log -S" or "git blame" might say "yes, but why didn't they just do it like...". You can respond to them preemptively. :) -Peff
Re: [PATCH v3 4/4] commit: rewrite read_graft_line
Jeff Kingwrote: > > So we're probably fine. The two parsing passes are right next to each > other and are sufficiently simple and strict that we don't have to > worry about them diverging. That was my conclusion as well. I added comment before the first pass and avoided any "cleverness" to make it perfectly clear to a reader. > We'd reject such an input totally (though as an interesting side effect, > you can convince the parser to allocate 20x as much RAM as you send it; > one oid for each space). Grafts are not populated during clone operation, so it really would be user making his life miserable. I could allocate FLEXI_ARRAY of size min(n, line->len / (GIT_*MIN*_HEXSZ+1)) instead… but I think it's not even worth the cost of making the code more complicated (and I don't want to reintroduce these size macros in here. We _could_ put an artificial limit on graft parents, though (e.g. 10) and display an error message urging the user to stop using grafts? > The single-pass alternative would probably be to read into a dynamic > structure like an oid_array, and then copy the result into the flex > structure. Before sending v3 I tried two other alternative implementations (perhaps I should've listed them in the v3 cover letter): 1. Using string_list_split_in_place. I resigned from this approach as soon as I noticed, that line->buf needs to be preserved for possible error message. string_list_split would have no benefits over using oid_array. 2. Parsing into temporary oid_array and then copying memory to FLEXI_ARRAY. Throw-away oid_array still needs to be cleaned, which means we have new/different return path (one before xmalloc and one after xmalloc), which means "bad_graft_data" label needs to be changed into "cleanup" label (or removed), which means error description needs be conditionally put in earlier code… and at this point, I decided these changes are not making code cleaner nor more readable at all :) Junio C Hamano wrote: > > If I were doing the two-pass thing, I'd probably write a for loop > that runs exactly twice, where the first iteration parses into a > single throw-away oid struct only to count, and the second iteration > parses the same input into the allocated array of oid struct. That > way, you do not have to worry about two phrases going out of sync. Two passes would still differ in error handling due to xmalloc between them… -- | ← Ceci n'est pas une pipe Patryk Obara
Re: [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line
Jeff Kingwrote: > AFAICT this is only here to avoid having to s/buf/line->buf/ in the rest > of the function. But I think we should just make that change (you > already did in some of the spots). And IMHO we should do the same for > line->len. When there are two names for the same value, it increases the > chances of a bug where the two end up diverging. My motivation was rather to keep patch(es) as small as possible because every line using buf will be replaced in a later patch in series. But it will make commit better (it will stand on its own), so why not to do it? :) > (…) I think short-circuiting like: > > if (!line->len || line->buf[0] == '#') > > is better (I also think "!" instead of "== 0" is our usual style, but > that's much less important). Ah, I only replaced comparison to NULL terminator with length check because I thought it better shows intention of the code and I didn't notice, that reversing order will result in better code overall. I will include both changes in v4. -- | ← Ceci n'est pas une pipe Patryk Obara
Re: reftable [v6]: new ref storage format
On Wed, Aug 16, 2017 at 12:47 AM, Shawn Pearcewrote: > On Mon, Aug 14, 2017 at 5:13 AM, Michael Haggerty > wrote: >> On 08/07/2017 03:47 AM, Shawn Pearce wrote: >>> 6th iteration of the reftable storage format. > [...] >>> index record >>> >>> An index record describes the last entry in another block. >>> Index records are written as: >>> >>> varint( prefix_length ) >>> varint( (suffix_length << 3) | 0 ) >>> suffix >>> varint( block_position ) >>> >>> Index records use prefix compression exactly like `ref_record`. >>> >>> Index records store `block_position` after the suffix, specifying the >>> absolute position in bytes (from the start of the file) of the block >>> that ends with this reference. >> >> Is there a reason that the index lists the *last* refname that is >> contained in a block rather than the *first* refname? I can't think of a >> reason to choose one vs. the other, but your choice was initially >> surprising. I don't think it matters either way; I was just curious. > > Yes, there is a reason. When a reader is searching the index block and > discovers a key that is greater than their search needle, they are now > sitting on a record with the block_position for that greater key. By > using the *last* refname the current block_position is the one to seek > to. > > If instead we used *first* refname, the reader would now have to > backtrack to the prior index record to get the block_position out of > that record. Or it has to keep a running "prior_position" local > variable. > > Using last simplifies the reader's code. Ah, OK. I was thinking of this as being a binary search, in which case you *must* see both bracketing records before you are done, and the chances are 50-50 which one you see first. But this search is a little bit different, because the index records within a restart block have to be scanned linearly. So it is much more likely that you see the "before" record followed by the "after" record. Thanks for the explanation. Michael
Re: [PATCH] check-ref-format: require a repository for --branch
Jeff Kingwrites: > I _do_ think it's a misfeature to put it in check-ref-format. It should > be part of rev-parse. Which admittedly is a kitchen sink, but this kind > of resolving is one of the main things it should be doing. And in fact > you can already do: > > git rev-parse --symbolic-full-name @{-1} > > But I stopped short of suggesting we remove the feature entirely. It > would obviously require a deprecation period. Yeah, I realize that "nonsense" was a bit too strong. I do agree that it was a misfeature to place in "check" ref-format, though. Thanks.
Re: [PATCH v3 4/4] commit: rewrite read_graft_line
Jeff Kingwrites: > So we're probably fine. The two parsing passes are right next to each > other and are sufficiently simple and strict that we don't have to > worry about them diverging. If I were doing the two-pass thing, I'd probably write a for loop that runs exactly twice, where the first iteration parses into a single throw-away oid struct only to count, and the second iteration parses the same input into the allocated array of oid struct. That way, you do not have to worry about two phrases going out of sync.
Re: [Patch size_t V3 00/19] use size_t
On Thu, Aug 17, 2017 at 10:35:54PM +0200, Torsten Bögershausen wrote: > On Wed, Aug 16, 2017 at 10:16:12PM +0200, Martin Koegler wrote: > > From: Martin Koegler> > > > This patchset is for next [24db08a6e8fed761d3bace7f2d5997806e20b9f7]. > > I applied it succesfully, and run the test suite on a 32 bit system. > The Laptop reported one brekage in t0040: > t0040-parse-options.sh not ok 19 - OPT_MAGNITUDE() 3giga I missed test-parse-options.c - I'll post an updated versions considering all comments. parse-options takes the variable as void* - so the compilers also fails to recognize the usage of incompatible types. > Beside some t5561-http-backend.sh (which are most probably not caused > by this patch. > > The raspi had 2 deadlocks, like "git push --signed dst noop ff +noff" > or > trash directory.t5541-http-push-smart/gpghome --sign -u commit...@example.com > Even this most probably not caused by the patch. (and the test is still > running) > > Well done, and on which platforms did you test ? Linux 64. Regards, Martin
Revision resolution for remote-helpers?
Hi, As you might remember, I'm maintaining a remote helper that allows to talk directly to mercurial servers with git: git-cinnabar. When dealing with "foreign (non-git) repositories", it is often necessary to refer to revisions with their native name. With mercurial, that's a sha1, with svn it's a revision number, etc. Git-cinnabar does provide a helper command that gives back the git commit from the mercurial revision (and vice versa), but that's cumbersome to use in commands. I was thinking it could be useful to have a special syntax for revisions that would query a helper program. The helper program could use a similar protocol to that of the remote helpers. My thought is that a string like :: could be used wherever a committish is expected. That would call some helper and request to resolve revision, and the helper would provide a git commit as a response. The reason for the :: prefix is that it matches the :: prefix used for remote helpers. Now, there are a few caveats: - , for e.g. svn, pretty much would depend on the remote. OTOH, in mercurial, it doesn't. I think it would be fair for the helper to have to deal with making what appears after :: relevant to find the right revision, by possibly including a remote name. - msys likes to completely fuck up command lines when they contain ::. For remote helpers, the alternative that works is :///etc. Which leads me to think some "virtual" ref namespace could be a solution to the problem. So instead of ::, the prefix would be /. For e.g. svn, svn/$remote/$rev would be a natural way to specify the revision for a given remote. For mercurial, hg/$sha1. Potentially, this could be a sort of pluggable ref stores, which could be used for extensions such as the currently discussed reftable. On the opposite end of the problem, I'm also thinking about git log --decorate= displaying the mercurial revisions where branch decorations would normally go. I have no patch to show for it. Those are ideas that I first want to discuss before implementing anything. Thoughts? Mike
Re: Weirdness with git change detection
Jeff King venit, vidit, dixit 11.07.2017 10:24: > On Tue, Jul 11, 2017 at 10:20:43AM +0200, Torsten Bögershausen wrote: > >>> No problem. I actually think it would be interesting if Git could >>> somehow detect and warn about this situation. But the obvious way to do >>> that would be to re-run the clean filter directly after checkout. And >>> doing that all the time is expensive. >> >> Would it be possible to limit the round-trip-check to "git reset --hard" ? >> If yes, possibly many users are willing to pay the price, if Git tells >> the "your filters don't round-trip". (Including CRLF conversions) > > Anything's possible, I suppose. But I don't think I'd want that feature > turned on myself. > >>> Perhaps some kind of "lint" program would be interesting to warn of >>> possible misconfigurations. Of course people would have to run it for it >>> to be useful. :) >> >> What do you have in mind here ? >> Don't we need to run some content through the filter(s)? > > I was thinking of a tool that could run a series of checks on the > repository and nag about potential problems. One of them could be doing > a round-trip repo->clean->smudge for each file. > > Another one might be warning about files that differ only in case. > > The idea being that users could run "git lint" if they suspect something > funny is going on. I dunno. It may be a dead-end. Most such > oddities are better detected and handled during actual git operations if > we can. So this would really just be for things that are too expensive > to detect in normal operations. > > -Peff > Typically, that problem arises when you turn a filter on or off at some point in your history. Since "attributes" can come from various sources, especially the versioned ".gitattributes" file, unversioned per-repo .git/info/attributes, and global attributes, "git diff" may apply different attributes depending on what you diff (versioned blob, workdir file, out-of-tree file). This is not made easier by the fact that unversioned config (per repo, per user, global) defines the filter action, and that even upgrades of your filter tools may change the output. So, "filter off/on" is by no means the only possible source of discrepancies. I've found that when I decide to use a filter like that, the best approach is to either apply it retroactively (filter-branch, unversionsed attributes, that is clean all stored blobs) or make a commit where I specifically note the switch (versioned .gitattributes plus affected blob changes) and what config should go along with it. All of this is difficult to check or correct automatically, since it depends on user decisions. About the only thing we could do is checking that "clean(smudge(foo))=clean(foo)" at a specific "point in time" (attributes, config) for specific foo, but that wouldn't catch the case above, even if we iterated over all commits which affect files that the filter (currently) applies to. Keep in mind that filters are a killer feature, so if you shoot yourself in the foot: it could have come worse ;) Michael
Re: ignoring extra bitmap file?
On Thu, Aug 17, 2017 at 09:24:36PM +0200, Andreas Krey wrote: > I'm seeing the message > >remote: warning: ignoring extra bitmap file: > ./objects/pack/pack-2943dc24pack > > and indeed, there is such a thing (two, actually): Only one is the extra. :) The other is doing something useful. Basically, the bitmap code was written to a handle a single bitmap file. It would be possible to handle multiple, but it simplified the implementation greatly to only handle one. And in practice, since a bitmap can only be made for a pack which contains all of the reachable objects, you'd have only one bitmap per repo, for the one big "main" pack. > But it looks like something went wrong in that repack cycle (that > pack-2943dc247702 is the full repo), and it won't get removed later > in the next repack in the evening. Yes, it looks like you got a full repack that failed to remove the old pack. Or more likely you had two full repacks racing with each other, each creating a new big pack. So the extra bitmap here is harmless. Both of them contain more or less the same data, and whichever one we use will be fine (and remember that the .bitmap files are purely an optimization, so that "more or less" will only make a minor impact on the speed of operations, not on the output). > Question: Can I safely remove the .bitmap file, and repack will then > clean up the .pack and .idx files as will? Yes, it's always safe to remove a .bitmap file (though if you remove the last one, you may expect performance to drop for some operations). Whether there's a .bitmap doesn't impact whether .pack and .idx files are deleted. The next full repack would pack everything into a new big pack, and then delete any existing files, including .pack, .idx, and .bitmap. -Peff
Re: [PATCH v3 4/4] commit: rewrite read_graft_line
On Fri, Aug 18, 2017 at 03:59:38AM +0200, Patryk Obara wrote: > Determine the number of object_id's to parse in a single graft line by > counting separators (whitespace characters) instead of dividing by > length of hash representation. > > This way graft parsing code can support different sizes of hashes > without any further code adaptations. Sounds like a reasonable approach, though I wonder what happens if our counting pass differs in its behavior from the actual parse. E.g., here: > + /* count number of blanks to determine size of array to allocate */ > + for (i = 0, n = 0; i < line->len; i++) > + if (isspace(line->buf[i])) > + n++; If we see multiple spaces like "1234abcd 5678abcd" we'll allocate a slot for each. I think that's OK because here: > + for (i = 0; i < graft->nr_parent; i++) > + if (!isspace(*tail++) || parse_oid_hex(tail, >parent[i], > )) > goto bad_graft_data; We'd reject such an input totally (though as an interesting side effect, you can convince the parser to allocate 20x as much RAM as you send it; one oid for each space). So we're probably fine. The two parsing passes are right next to each other and are sufficiently simple and strict that we don't have to worry about them diverging. The single-pass alternative would probably be to read into a dynamic structure like an oid_array, and then copy the result into the flex structure. Or of course to stop using a flex structure, as your original pass did. I agree with Junio that the use of object_id's is orthogonal to using a FLEX_ARRAY. But I could also see an argument that the complexity the flex array adds here isn't worth the savings. The main benefits of a flex array are: 1. Less memory used. But we don't expect to see a large enough number of grafts for this to matter. 2. A more compact memory representation, which can be faster. But accessing the parent list of a graft isn't going to be the hot code path. It probably only happens once in a program run when we rewrite the parents (versus checking the grafted commit, which is looked up once per commit we access). 3. It's easier to free the struct and its associated resources in a single free(). But we never free the graft list. Reading your original, my thought was "why _not_ keep doing it as a FLEX_ARRAY, as it saves a little memory, which can't hurt". But seeing this pre-counting phase, I think it does make the code a little more complicated. But I'd be OK with doing it either way. -Peff
Dear Talented
Dear Talented, I am Talent Scout For BLUE SKY FILM STUDIO, Present Blue sky Studio a Film Corporation Located in the United State, is Soliciting for the Right to use Your Photo/Face and Personality as One of the Semi -Major Role/ Character in our Upcoming ANIMATED Stereoscope 3D Movie-The Story of Anubis (Anubis 2018) The Movie is Currently Filming (In Production) Please Note That There Will Be No Auditions, Traveling or Any Special / Professional Acting Skills, Since the Production of This Movie Will Be Done with our State of Art Computer -Generating Imagery Equipment. We Are Prepared to Pay the Total Sum of $620,000.00 USD. For More Information/Understanding, Please Write us on the E-Mail Below. CONTACT EMAIL: bluesky.filmstu...@usa.com All Reply to: bluesky.filmstu...@usa.com Note: Only the Response send to this mail will be Given a Prior Consideration. Talent Scout Kim Sharma
Re: [PATCH v3 2/4] commit: replace the raw buffer with strbuf in read_graft_line
On Fri, Aug 18, 2017 at 03:59:36AM +0200, Patryk Obara wrote: > diff --git a/commit.c b/commit.c > index 8b28415..019e733 100644 > --- a/commit.c > +++ b/commit.c > @@ -134,17 +134,18 @@ int register_commit_graft(struct commit_graft *graft, > int ignore_dups) > return 0; > } > > -struct commit_graft *read_graft_line(char *buf, int len) > +struct commit_graft *read_graft_line(struct strbuf *line) > { > /* The format is just "Commit Parent1 Parent2 ...\n" */ > - int i; > + int i, len; > + char *buf = line->buf; Copying a pointer to a strbuf's buffer is a dangerous habit. The strbuf is free to re-allocate the buffer under the hood during any operation it likes, potentially leaving you pointing to freed memory. In this case it's OK because the only function you call is strbuf_rtrim(), which never reallocates. But I feel like this is setting up a maintenance trap for the next person to touch the function. AFAICT this is only here to avoid having to s/buf/line->buf/ in the rest of the function. But I think we should just make that change (you already did in some of the spots). And IMHO we should do the same for line->len. When there are two names for the same value, it increases the chances of a bug where the two end up diverging. > - while (len && isspace(buf[len-1])) > - buf[--len] = '\0'; > - if (buf[0] == '#' || buf[0] == '\0') > + strbuf_rtrim(line); > + if (line->buf[0] == '#' || line->len == 0) > return NULL; I find it funny to look at line->buf[0] before line->len, because it means we're reading pas the end of the buffer. It's OK here because we know there's a NUL terminator, but I think short-circuiting like: if (!line->len || line->buf[0] == '#') is better (I also think "!" instead of "== 0" is our usual style, but that's much less important). -Peff
Re: [PATCH] check-ref-format: require a repository for --branch
On Thu, Aug 17, 2017 at 02:30:53PM -0700, Junio C Hamano wrote: > > I'm not sure I buy that. What does "turning it into a branch name" even > > mean when you are not in a repository? Clearly @{-1} must fail. And > > everything else is just going to output the exact input you provided. > > This "just going to output the exact input" is not entirely correct; > there is just one use case for it. > > "git check-ref-format --branch a..b" would fail with a helpful error > message, while the command run with "a.b" would tell you the name is > safe. Well, yes. It's checking the syntax, as well. But you don't need --branch for that. > Use of 'check-ref-format --branch "@{-1}"' *IS* a nonsense, whether > it is inside or outside a repository; it is OK to fail it outside a > repository and I would say it is even OK to fail it inside a > repository. After all "check-ref-format" is about checking if the > string is a sensible name to use. > > I think calling interpret_branch_name() in the codepath is a mistake > and we should instead report if "refs/heads/@{-1}" literally is a > valid refname we could create instead. I don't think it's nonsense. It's the reason the feature was added in the first place. See your own a31dca0393 (check-ref-format --branch: give Porcelain a way to grok branch shorthand, 2009-03-21). Without that interpretation, it does nothing that you could not equally well do as: git check-ref-format refs/heads/$name I _do_ think it's a misfeature to put it in check-ref-format. It should be part of rev-parse. Which admittedly is a kitchen sink, but this kind of resolving is one of the main things it should be doing. And in fact you can already do: git rev-parse --symbolic-full-name @{-1} But I stopped short of suggesting we remove the feature entirely. It would obviously require a deprecation period. -Peff