Re: [PATCH] git-stash.txt: document show options
Hi Mike, there is a GSoC project going on to make `git stash` a builtin, and I think that this change makes sense and at the same time needs to be coordinated with Paul, who is performing that builtin work, so I Cc:ed him for visibility. Ciao, Johannes On Wed, 20 Jun 2018, Mike Frysinger wrote: > Signed-off-by: Mike Frysinger > --- > Documentation/git-stash.txt | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt > index 7ef8c4791177..76e4ca788102 100644 > --- a/Documentation/git-stash.txt > +++ b/Documentation/git-stash.txt > @@ -106,7 +106,7 @@ stash@{1}: On master: 9cc0589... Add git-stash > The command takes options applicable to the 'git log' > command to control what is shown and how. See linkgit:git-log[1]. > > -show []:: > +show [] []:: > > Show the changes recorded in the stash entry as a diff between the > stashed contents and the commit back when the stash entry was first > @@ -116,6 +116,9 @@ show []:: > to view the second most recent entry in patch form). > You can use stash.showStat and/or stash.showPatch config variables > to change the default behavior. > ++ > +The command takes options applicable to the 'git diff' > +command to control what is shown and how. See linkgit:git-diff[1]. > > pop [--index] [-q|--quiet] []:: > > -- > 2.17.1 > >
[PATCH] git-stash.txt: document show options
Signed-off-by: Mike Frysinger --- Documentation/git-stash.txt | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/Documentation/git-stash.txt b/Documentation/git-stash.txt index 7ef8c4791177..76e4ca788102 100644 --- a/Documentation/git-stash.txt +++ b/Documentation/git-stash.txt @@ -106,7 +106,7 @@ stash@{1}: On master: 9cc0589... Add git-stash The command takes options applicable to the 'git log' command to control what is shown and how. See linkgit:git-log[1]. -show []:: +show [] []:: Show the changes recorded in the stash entry as a diff between the stashed contents and the commit back when the stash entry was first @@ -116,6 +116,9 @@ show []:: to view the second most recent entry in patch form). You can use stash.showStat and/or stash.showPatch config variables to change the default behavior. ++ +The command takes options applicable to the 'git diff' +command to control what is shown and how. See linkgit:git-diff[1]. pop [--index] [-q|--quiet] []:: -- 2.17.1
Re: [msysGit] Possible git status problem at case insensitive file system
Hi Frank, Your system Clock looks to be providing the wrong date for your emails. The last XP version was https://github.com/git-for-windows/git/releases/tag/v2.10.0.windows.1 so you may want to upgrade to that. (see FAQs https://github.com/git-for-windows/git/wiki/FAQ) It won't solve the capitalisation problem - that is a Windows FS issue. Git assumes case matters, but the FS will fetch directories and branches case insensitively. Philip - Original Message - From: "Frank Li" To: "Git List" ; "msysGit" Sent: Monday, August 09, 2010 5:22 AM Subject: [msysGit] Possible git status problem at case insensitive file system All: I use msysgit 1.7.0.2 at windows xp. Problem: git status will list tracked directory as untracked dir. Duplicate: 1. mkdir test, cd test 2. git init-db 3. mkdir d, cd d 4. touch a.c 5. git add a.c 6. git commit -a -m "test" 7. cd .. 8. mv d d1 9. mv d1 D 10. git status # On branch master # Untracked files: # (use "git add ..." to include in what will be committed) # # D/ nothing added to commit but untracked files present (use "git add" to track) D/ should be same as d/ at case insensitive file system. D/ should not listed by git status. best regards Frank Li -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html -- -- *** Please reply-to-all at all times *** *** (do not pretend to know who is subscribed and who is not) *** *** Please avoid top-posting. *** The msysGit Wiki is here: https://github.com/msysgit/msysgit/wiki - Github accounts are free. You received this message because you are subscribed to the Google Groups "msysGit" group. To post to this group, send email to msys...@googlegroups.com To unsubscribe from this group, send email to msysgit+unsubscr...@googlegroups.com For more options, and view previous threads, visit this group at http://groups.google.com/group/msysgit?hl=en_US?hl=en --- You received this message because you are subscribed to the Google Groups "Git for Windows" group. To unsubscribe from this group and stop receiving emails from it, send an email to msysgit+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[PATCH] submodule.c: report the submodule that an error occurs in
When an error occurs in updating the working tree of a submodule in submodule_move_head, tell the user which submodule the error occurred in. The call to read-tree contains a super-prefix, such that the read-tree will correctly report any path related issues, but some error messages do not contain a path, for example: ~/gerrit$ git checkout --recurse-submodules origin/master ~/gerrit$ fatal: failed to unpack tree object 07672f31880ba80300b38492df9d0acfcd6ee00a Give the hint which submodule has a problem. Signed-off-by: Stefan Beller --- submodule.c | 2 +- t/lib-submodule-update.sh | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/submodule.c b/submodule.c index 939d6870ecd..ebd092a14fd 100644 --- a/submodule.c +++ b/submodule.c @@ -1668,7 +1668,7 @@ int submodule_move_head(const char *path, argv_array_push(&cp.args, new_head ? new_head : empty_tree_oid_hex()); if (run_command(&cp)) { - ret = -1; + ret = error(_("Submodule '%s' could not be updated."), path); goto out; } diff --git a/t/lib-submodule-update.sh b/t/lib-submodule-update.sh index 1f38a85371a..e27f5d8541d 100755 --- a/t/lib-submodule-update.sh +++ b/t/lib-submodule-update.sh @@ -781,7 +781,8 @@ test_submodule_recursing_with_args_common() { ( cd submodule_update && git branch -t invalid_sub1 origin/invalid_sub1 && - test_must_fail $command invalid_sub1 && + test_must_fail $command invalid_sub1 2>err && + grep sub1 err && test_superproject_content origin/add_sub1 && test_submodule_content sub1 origin/add_sub1 ) -- 2.18.0.rc2.346.g013aa6912e-goog
[PATCH v2] docs: link to gitsubmodules
Add a link to gitsubmodules(7) under the `submodule.active` entry in git-config(1). Signed-off-by: Brandon Williams --- Documentation/config.txt | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ab641bf5a..340eb1f3c 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -3327,12 +3327,13 @@ submodule..ignore:: submodule..active:: Boolean value indicating if the submodule is of interest to git commands. This config option takes precedence over the - submodule.active config option. + submodule.active config option. See linkgit:gitsubmodules[7] for + details. submodule.active:: A repeated field which contains a pathspec used to match against a submodule's path to determine if the submodule is of interest to git - commands. + commands. See linkgit:gitsubmodules[7] for details. submodule.recurse:: Specifies if commands recurse into submodules by default. This -- 2.18.0.rc1.244.gcf134e6275-goog
[PATCH v3 8/8] fetch-pack: implement ref-in-want
Implement ref-in-want on the client side so that when a server supports the "ref-in-want" feature, a client will send "want-ref" lines for each reference the client wants to fetch. Signed-off-by: Brandon Williams --- fetch-pack.c | 35 +++--- remote.c | 1 + remote.h | 1 + t/t5703-upload-pack-ref-in-want.sh | 4 ++-- 4 files changed, 36 insertions(+), 5 deletions(-) diff --git a/fetch-pack.c b/fetch-pack.c index f1709e816..681b44061 100644 --- a/fetch-pack.c +++ b/fetch-pack.c @@ -1102,9 +1102,10 @@ static void add_shallow_requests(struct strbuf *req_buf, static void add_wants(const struct ref *wants, struct strbuf *req_buf) { + int use_ref_in_want = server_supports_feature("fetch", "ref-in-want", 0); + for ( ; wants ; wants = wants->next) { const struct object_id *remote = &wants->old_oid; - const char *remote_hex; struct object *o; /* @@ -1122,8 +1123,10 @@ static void add_wants(const struct ref *wants, struct strbuf *req_buf) continue; } - remote_hex = oid_to_hex(remote); - packet_buf_write(req_buf, "want %s\n", remote_hex); + if (!use_ref_in_want || wants->exact_oid) + packet_buf_write(req_buf, "want %s\n", oid_to_hex(remote)); + else + packet_buf_write(req_buf, "want-ref %s\n", wants->name); } } @@ -1334,6 +1337,29 @@ static void receive_shallow_info(struct fetch_pack_args *args, args->deepen = 1; } +static void receive_wanted_refs(struct packet_reader *reader, struct ref *refs) +{ + process_section_header(reader, "wanted-refs", 0); + while (packet_reader_read(reader) == PACKET_READ_NORMAL) { + struct object_id oid; + const char *end; + struct ref *r = NULL; + + if (parse_oid_hex(reader->line, &oid, &end) || *end++ != ' ') + die("expected wanted-ref, got '%s'", reader->line); + + for (r = refs; r; r = r->next) { + if (!strcmp(end, r->name)) { + oidcpy(&r->old_oid, &oid); + break; + } + } + } + + if (reader->status != PACKET_READ_DELIM) + die("error processing wanted refs: %d", reader->status); +} + enum fetch_state { FETCH_CHECK_LOCAL = 0, FETCH_SEND_REQUEST, @@ -1408,6 +1434,9 @@ static struct ref *do_fetch_pack_v2(struct fetch_pack_args *args, if (process_section_header(&reader, "shallow-info", 1)) receive_shallow_info(args, &reader); + if (process_section_header(&reader, "wanted-refs", 1)) + receive_wanted_refs(&reader, ref); + /* get the pack */ process_section_header(&reader, "packfile", 0); if (get_pack(args, fd, pack_lockfile)) diff --git a/remote.c b/remote.c index abe80c139..2c2376fff 100644 --- a/remote.c +++ b/remote.c @@ -1735,6 +1735,7 @@ int get_fetch_map(const struct ref *remote_refs, if (refspec->exact_sha1) { ref_map = alloc_ref(name); get_oid_hex(name, &ref_map->old_oid); + ref_map->exact_oid = 1; } else { ref_map = get_remote_ref(remote_refs, name); } diff --git a/remote.h b/remote.h index 45ecc6cef..976292152 100644 --- a/remote.h +++ b/remote.h @@ -73,6 +73,7 @@ struct ref { force:1, forced_update:1, expect_old_sha1:1, + exact_oid:1, deletion:1; enum { diff --git a/t/t5703-upload-pack-ref-in-want.sh b/t/t5703-upload-pack-ref-in-want.sh index 979ab6d03..b94a51380 100755 --- a/t/t5703-upload-pack-ref-in-want.sh +++ b/t/t5703-upload-pack-ref-in-want.sh @@ -204,7 +204,7 @@ test_expect_success 'server is initially ahead - no ref in want' ' grep "ERR upload-pack: not our ref" err ' -test_expect_failure 'server is initially ahead - ref in want' ' +test_expect_success 'server is initially ahead - ref in want' ' git -C "$REPO" config uploadpack.allowRefInWant true && rm -rf local && cp -r "$LOCAL_PRISTINE" local && @@ -228,7 +228,7 @@ test_expect_success 'server is initially behind - no ref in want' ' test_cmp expected actual ' -test_expect_failure 'server is initially behind - ref in want' ' +test_expect_success 'server is initially behind - ref in want' ' git -C "$REPO" config uploadpack.allowRefInWant true && rm -rf local && cp -r "$LOCAL_PRISTINE" local && -- 2.18.0.rc1.244.gcf134e6275-goog
[PATCH v3 7/8] fetch-pack: put shallow info in output parameter
Expand the transport fetch method signature, by adding an output parameter, to allow transports to return information about the refs they have fetched. Then communicate shallow status information through this mechanism instead of by modifying the input list of refs. This does require clients to sometimes generate the ref map twice: once from the list of refs provided by the remote (as is currently done) and potentially once from the new list of refs that the fetch mechanism provides. Signed-off-by: Brandon Williams --- builtin/clone.c | 4 ++-- builtin/fetch.c | 28 fetch-object.c | 2 +- fetch-pack.c | 17 + transport-helper.c | 6 -- transport-internal.h | 9 - transport.c | 34 -- transport.h | 3 ++- 8 files changed, 78 insertions(+), 25 deletions(-) diff --git a/builtin/clone.c b/builtin/clone.c index 99e73dae8..8f86d99c5 100644 --- a/builtin/clone.c +++ b/builtin/clone.c @@ -1155,7 +1155,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix) } if (!is_local && !complete_refs_before_fetch) - transport_fetch_refs(transport, mapped_refs); + transport_fetch_refs(transport, mapped_refs, NULL); remote_head = find_ref_by_name(refs, "HEAD"); remote_head_points_at = @@ -1197,7 +1197,7 @@ int cmd_clone(int argc, const char **argv, const char *prefix) if (is_local) clone_local(path, git_dir); else if (refs && complete_refs_before_fetch) - transport_fetch_refs(transport, mapped_refs); + transport_fetch_refs(transport, mapped_refs, NULL); update_remote_refs(refs, mapped_refs, remote_head_points_at, branch_top.buf, reflog_msg.buf, transport, diff --git a/builtin/fetch.c b/builtin/fetch.c index b600e1f10..8362a97a2 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -946,11 +946,13 @@ static int quickfetch(struct ref *ref_map) return check_connected(iterate_ref_map, &rm, &opt); } -static int fetch_refs(struct transport *transport, struct ref *ref_map) +static int fetch_refs(struct transport *transport, struct ref *ref_map, + struct ref **updated_remote_refs) { int ret = quickfetch(ref_map); if (ret) - ret = transport_fetch_refs(transport, ref_map); + ret = transport_fetch_refs(transport, ref_map, + updated_remote_refs); if (ret) transport_unlock_pack(transport); return ret; @@ -1106,7 +1108,7 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map) transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL); transport_set_option(transport, TRANS_OPT_DEPTH, "0"); transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL); - if (!fetch_refs(transport, ref_map)) + if (!fetch_refs(transport, ref_map, NULL)) consume_refs(transport, ref_map); if (gsecondary) { @@ -1122,6 +1124,7 @@ static int do_fetch(struct transport *transport, int autotags = (transport->remote->fetch_tags == 1); int retcode = 0; const struct ref *remote_refs; + struct ref *updated_remote_refs = NULL; struct argv_array ref_prefixes = ARGV_ARRAY_INIT; if (tags == TAGS_DEFAULT) { @@ -1172,7 +1175,24 @@ static int do_fetch(struct transport *transport, transport->url); } } - if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) { + + if (fetch_refs(transport, ref_map, &updated_remote_refs)) { + free_refs(ref_map); + retcode = 1; + goto cleanup; + } + if (updated_remote_refs) { + /* +* Regenerate ref_map using the updated remote refs. This is +* to account for additional information which may be provided +* by the transport (e.g. shallow info). +*/ + free_refs(ref_map); + ref_map = get_ref_map(transport->remote, updated_remote_refs, rs, + tags, &autotags); + free_refs(updated_remote_refs); + } + if (consume_refs(transport, ref_map)) { free_refs(ref_map); retcode = 1; goto cleanup; diff --git a/fetch-object.c b/fetch-object.c index 853624f81..48fe63dd6 100644 --- a/fetch-object.c +++ b/fetch-object.c @@ -19,7 +19,7 @@ static void fetch_refs(const char *remote_name, struct ref *ref) transport_set_option(transport, TRANS_OPT_FROM_PROMISOR, "1"); transport_set_option(transport, TRANS_OPT_NO_DEPENDENTS, "1"); - transport_fetch_re
[PATCH v3 6/8] fetch: refactor to make function args narrower
Refactor find_non_local_tags and get_ref_map to only take the information they need instead of the entire transport struct. Besides improving code clarity, this also improves their flexibility, allowing for a different set of refs to be used instead of relying on the ones stored in the transport struct. Signed-off-by: Brandon Williams --- builtin/fetch.c | 52 - 1 file changed, 26 insertions(+), 26 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index ee8b87c78..b600e1f10 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -254,9 +254,9 @@ static int will_fetch(struct ref **head, const unsigned char *sha1) return 0; } -static void find_non_local_tags(struct transport *transport, - struct ref **head, - struct ref ***tail) +static void find_non_local_tags(const struct ref *refs, + struct ref **head, + struct ref ***tail) { struct string_list existing_refs = STRING_LIST_INIT_DUP; struct string_list remote_refs = STRING_LIST_INIT_NODUP; @@ -264,7 +264,7 @@ static void find_non_local_tags(struct transport *transport, struct string_list_item *item = NULL; for_each_ref(add_existing, &existing_refs); - for (ref = transport_get_remote_refs(transport, NULL); ref; ref = ref->next) { + for (ref = refs; ref; ref = ref->next) { if (!starts_with(ref->name, "refs/tags/")) continue; @@ -338,7 +338,8 @@ static void find_non_local_tags(struct transport *transport, string_list_clear(&remote_refs, 0); } -static struct ref *get_ref_map(struct transport *transport, +static struct ref *get_ref_map(struct remote *remote, + const struct ref *remote_refs, struct refspec *rs, int tags, int *autotags) { @@ -346,27 +347,11 @@ static struct ref *get_ref_map(struct transport *transport, struct ref *rm; struct ref *ref_map = NULL; struct ref **tail = &ref_map; - struct argv_array ref_prefixes = ARGV_ARRAY_INIT; /* opportunistically-updated references: */ struct ref *orefs = NULL, **oref_tail = &orefs; struct string_list existing_refs = STRING_LIST_INIT_DUP; - const struct ref *remote_refs; - - if (rs->nr) - refspec_ref_prefixes(rs, &ref_prefixes); - else if (transport->remote && transport->remote->fetch.nr) - refspec_ref_prefixes(&transport->remote->fetch, &ref_prefixes); - - if (ref_prefixes.argc && - (tags == TAGS_SET || (tags == TAGS_DEFAULT && !rs->nr))) { - argv_array_push(&ref_prefixes, "refs/tags/"); - } - - remote_refs = transport_get_remote_refs(transport, &ref_prefixes); - - argv_array_clear(&ref_prefixes); if (rs->nr) { struct refspec *fetch_refspec; @@ -403,7 +388,7 @@ static struct ref *get_ref_map(struct transport *transport, if (refmap.nr) fetch_refspec = &refmap; else - fetch_refspec = &transport->remote->fetch; + fetch_refspec = &remote->fetch; for (i = 0; i < fetch_refspec->nr; i++) get_fetch_map(ref_map, &fetch_refspec->items[i], &oref_tail, 1); @@ -411,7 +396,6 @@ static struct ref *get_ref_map(struct transport *transport, die("--refmap option is only meaningful with command-line refspec(s)."); } else { /* Use the defaults */ - struct remote *remote = transport->remote; struct branch *branch = branch_get(NULL); int has_merge = branch_has_merge_config(branch); if (remote && @@ -450,7 +434,7 @@ static struct ref *get_ref_map(struct transport *transport, /* also fetch all tags */ get_fetch_map(remote_refs, tag_refspec, &tail, 0); else if (tags == TAGS_DEFAULT && *autotags) - find_non_local_tags(transport, &ref_map, &tail); + find_non_local_tags(remote_refs, &ref_map, &tail); /* Now append any refs to be updated opportunistically: */ *tail = orefs; @@ -1137,6 +1121,8 @@ static int do_fetch(struct transport *transport, struct ref *ref_map; int autotags = (transport->remote->fetch_tags == 1); int retcode = 0; + const struct ref *remote_refs; + struct argv_array ref_prefixes = ARGV_ARRAY_INIT; if (tags == TAGS_DEFAULT) { if (transport->remote->fetch_tags == 2) @@ -1152,7 +1138,21 @@ static int do_fetch(struct transport *transport, goto cleanup; } - ref_map = get_ref_map(transport, rs, tags, &autotags); + if (rs->nr) + refspec_ref_prefixe
[PATCH v3 0/8] ref-in-want
Changes in v3: * Discussion seemed to settle on keeping the simplified version of ref-in-want where the "want-ref" line only accepts full ref names. If we want to we can add patterns at a later time. * Reverted back to v1's behavior where requesting a ref that doesn't exists is a hard error on the server. I went back and forth many times on what the right thing to do here is and decided that a hard error works much cleaner for the time being. * Some typos. Brandon Williams (8): test-pkt-line: add unpack-sideband subcommand upload-pack: implement ref-in-want upload-pack: test negotiation with changing repository fetch: refactor the population of peer ref OIDs fetch: refactor fetch_refs into two functions fetch: refactor to make function args narrower fetch-pack: put shallow info in output parameter fetch-pack: implement ref-in-want Documentation/config.txt| 7 + Documentation/technical/protocol-v2.txt | 28 ++- builtin/clone.c | 4 +- builtin/fetch.c | 131 - fetch-object.c | 2 +- fetch-pack.c| 52 +++-- remote.c| 1 + remote.h| 1 + t/helper/test-pkt-line.c| 33 t/lib-httpd.sh | 1 + t/lib-httpd/apache.conf | 8 + t/lib-httpd/one-time-sed.sh | 16 ++ t/t5703-upload-pack-ref-in-want.sh | 245 transport-helper.c | 6 +- transport-internal.h| 9 +- transport.c | 34 +++- transport.h | 3 +- upload-pack.c | 64 +++ 18 files changed, 568 insertions(+), 77 deletions(-) create mode 100644 t/lib-httpd/one-time-sed.sh create mode 100755 t/t5703-upload-pack-ref-in-want.sh -- 2.18.0.rc1.244.gcf134e6275-goog
[PATCH v3 2/8] upload-pack: implement ref-in-want
Currently, while performing packfile negotiation, clients are only allowed to specify their desired objects using object ids. This causes a vulnerability to failure when an object turns non-existent during negotiation, which may happen if, for example, the desired repository is provided by multiple Git servers in a load-balancing arrangement and there exists replication delay. In order to eliminate this vulnerability, implement the ref-in-want feature for the 'fetch' command in protocol version 2. This feature enables the 'fetch' command to support requests in the form of ref names through a new "want-ref " parameter. At the conclusion of negotiation, the server will send a list of all of the wanted references (as provided by "want-ref" lines) in addition to the generated packfile. Signed-off-by: Brandon Williams --- Documentation/config.txt| 7 ++ Documentation/technical/protocol-v2.txt | 28 - t/t5703-upload-pack-ref-in-want.sh | 153 upload-pack.c | 64 ++ 4 files changed, 251 insertions(+), 1 deletion(-) create mode 100755 t/t5703-upload-pack-ref-in-want.sh diff --git a/Documentation/config.txt b/Documentation/config.txt index ab641bf5a..fb1dd7428 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -3479,6 +3479,13 @@ Note that this configuration variable is ignored if it is seen in the repository-level config (this is a safety measure against fetching from untrusted repositories). +uploadpack.allowRefInWant:: + If this option is set, `upload-pack` will support the `ref-in-want` + feature of the protocol version 2 `fetch` command. This feature + is intended for the benefit of load-balanced servers which may + not have the same view of what OIDs their refs point to due to + replication delay. + url..insteadOf:: Any URL that starts with this value will be rewritten to start, instead, with . In cases where some site serves a diff --git a/Documentation/technical/protocol-v2.txt b/Documentation/technical/protocol-v2.txt index 49bda76d2..9dee75d45 100644 --- a/Documentation/technical/protocol-v2.txt +++ b/Documentation/technical/protocol-v2.txt @@ -299,12 +299,21 @@ included in the client's request: for use with partial clone and partial fetch operations. See `rev-list` for possible "filter-spec" values. +If the 'ref-in-want' feature is advertised, the following argument can +be included in the client's request as well as the potential addition of +the 'wanted-refs' section in the server's response as explained below. + +want-ref + Indicates to the server that the client wants to retrieve a + particular ref, where is the full name of a ref on the + server. + The response of `fetch` is broken into a number of sections separated by delimiter packets (0001), with each section beginning with its section header. output = *section -section = (acknowledgments | shallow-info | packfile) +section = (acknowledgments | shallow-info | wanted-refs | packfile) (flush-pkt | delim-pkt) acknowledgments = PKT-LINE("acknowledgments" LF) @@ -319,6 +328,10 @@ header. shallow = "shallow" SP obj-id unshallow = "unshallow" SP obj-id +wanted-refs = PKT-LINE("wanted-refs" LF) + *PKT-LINE(wanted-ref LF) +wanted-ref = obj-id SP refname + packfile = PKT-LINE("packfile" LF) *PKT-LINE(%x01-03 *%x00-ff) @@ -379,6 +392,19 @@ header. * This section is only included if a packfile section is also included in the response. +wanted-refs section + * This section is only included if the client has requested a + ref using a 'want-ref' line and if a packfile section is also + included in the response. + + * Always begins with the section header "wanted-refs" + + * The server will send a ref listing (" ") for + each reference requested using 'want-ref' lines. + + * The server MUST NOT send any refs which were not requested + using 'want-ref' lines. + packfile section * This section is only included if the client has sent 'want' lines in its request and either requested that no more diff --git a/t/t5703-upload-pack-ref-in-want.sh b/t/t5703-upload-pack-ref-in-want.sh new file mode 100755 index 0..0ef182970 --- /dev/null +++ b/t/t5703-upload-pack-ref-in-want.sh @@ -0,0 +1,153 @@ +#!/bin/sh + +test_description='upload-pack ref-in-want' + +. ./test-lib.sh + +get_actual_refs() { + sed -n '/wanted-refs/,/0001/p' actual_refs +} + +get_actual_commits() { + sed -n '/packfile/,//p' o.pack && + git index-pack o.pack && + git verify-pack -v o.idx | grep commit | cut -c-40 | sort >actual_commits +} + +check_output() { + get_actual_refs && + test_cmp expected_refs actual_refs && + get_actual_commits && +
[PATCH v3 4/8] fetch: refactor the population of peer ref OIDs
Populate peer ref OIDs in get_ref_map instead of do_fetch. Besides tightening scopes of variables in the code, this also prepares for get_ref_map being able to be called multiple times within do_fetch. Signed-off-by: Brandon Williams --- builtin/fetch.c | 36 ++-- 1 file changed, 18 insertions(+), 18 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index ea5b9669a..545635448 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -351,6 +351,7 @@ static struct ref *get_ref_map(struct transport *transport, /* opportunistically-updated references: */ struct ref *orefs = NULL, **oref_tail = &orefs; + struct string_list existing_refs = STRING_LIST_INIT_DUP; const struct ref *remote_refs; if (rs->nr) @@ -458,7 +459,23 @@ static struct ref *get_ref_map(struct transport *transport, tail = &rm->next; } - return ref_remove_duplicates(ref_map); + ref_map = ref_remove_duplicates(ref_map); + + for_each_ref(add_existing, &existing_refs); + for (rm = ref_map; rm; rm = rm->next) { + if (rm->peer_ref) { + struct string_list_item *peer_item = + string_list_lookup(&existing_refs, + rm->peer_ref->name); + if (peer_item) { + struct object_id *old_oid = peer_item->util; + oidcpy(&rm->peer_ref->old_oid, old_oid); + } + } + } + string_list_clear(&existing_refs, 1); + + return ref_map; } #define STORE_REF_ERROR_OTHER 1 @@ -1110,14 +1127,10 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map) static int do_fetch(struct transport *transport, struct refspec *rs) { - struct string_list existing_refs = STRING_LIST_INIT_DUP; struct ref *ref_map; - struct ref *rm; int autotags = (transport->remote->fetch_tags == 1); int retcode = 0; - for_each_ref(add_existing, &existing_refs); - if (tags == TAGS_DEFAULT) { if (transport->remote->fetch_tags == 2) tags = TAGS_SET; @@ -1136,18 +1149,6 @@ static int do_fetch(struct transport *transport, if (!update_head_ok) check_not_current_branch(ref_map); - for (rm = ref_map; rm; rm = rm->next) { - if (rm->peer_ref) { - struct string_list_item *peer_item = - string_list_lookup(&existing_refs, - rm->peer_ref->name); - if (peer_item) { - struct object_id *old_oid = peer_item->util; - oidcpy(&rm->peer_ref->old_oid, old_oid); - } - } - } - if (tags == TAGS_DEFAULT && autotags) transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, "1"); if (prune) { @@ -1183,7 +1184,6 @@ static int do_fetch(struct transport *transport, } cleanup: - string_list_clear(&existing_refs, 1); return retcode; } -- 2.18.0.rc1.244.gcf134e6275-goog
[PATCH v3 1/8] test-pkt-line: add unpack-sideband subcommand
Add an 'unpack-sideband' subcommand to the test-pkt-line helper to enable unpacking packet line data sent multiplexed using a sideband. Signed-off-by: Brandon Williams --- t/helper/test-pkt-line.c | 33 + 1 file changed, 33 insertions(+) diff --git a/t/helper/test-pkt-line.c b/t/helper/test-pkt-line.c index 0f19e53c7..30775f986 100644 --- a/t/helper/test-pkt-line.c +++ b/t/helper/test-pkt-line.c @@ -1,3 +1,4 @@ +#include "cache.h" #include "pkt-line.h" static void pack_line(const char *line) @@ -48,6 +49,36 @@ static void unpack(void) } } +static void unpack_sideband(void) +{ + struct packet_reader reader; + packet_reader_init(&reader, 0, NULL, 0, + PACKET_READ_GENTLE_ON_EOF | + PACKET_READ_CHOMP_NEWLINE); + + while (packet_reader_read(&reader) != PACKET_READ_EOF) { + int band; + int fd; + + switch (reader.status) { + case PACKET_READ_EOF: + break; + case PACKET_READ_NORMAL: + band = reader.line[0] & 0xff; + if (band < 1 || band > 2) + die("unexpected side band %d", band); + fd = band; + + write_or_die(fd, reader.line + 1, reader.pktlen - 1); + break; + case PACKET_READ_FLUSH: + return; + case PACKET_READ_DELIM: + break; + } + } +} + int cmd_main(int argc, const char **argv) { if (argc < 2) @@ -57,6 +88,8 @@ int cmd_main(int argc, const char **argv) pack(argc - 2, argv + 2); else if (!strcmp(argv[1], "unpack")) unpack(); + else if (!strcmp(argv[1], "unpack-sideband")) + unpack_sideband(); else die("invalid argument '%s'", argv[1]); -- 2.18.0.rc1.244.gcf134e6275-goog
[PATCH v3 5/8] fetch: refactor fetch_refs into two functions
Refactor the fetch_refs function into a function that does the fetching of refs and another function that stores them. Signed-off-by: Brandon Williams --- builtin/fetch.c | 19 +-- 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/builtin/fetch.c b/builtin/fetch.c index 545635448..ee8b87c78 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -967,10 +967,16 @@ static int fetch_refs(struct transport *transport, struct ref *ref_map) int ret = quickfetch(ref_map); if (ret) ret = transport_fetch_refs(transport, ref_map); - if (!ret) - ret |= store_updated_refs(transport->url, - transport->remote->name, - ref_map); + if (ret) + transport_unlock_pack(transport); + return ret; +} + +static int consume_refs(struct transport *transport, struct ref *ref_map) +{ + int ret = store_updated_refs(transport->url, +transport->remote->name, +ref_map); transport_unlock_pack(transport); return ret; } @@ -1116,7 +1122,8 @@ static void backfill_tags(struct transport *transport, struct ref *ref_map) transport_set_option(transport, TRANS_OPT_FOLLOWTAGS, NULL); transport_set_option(transport, TRANS_OPT_DEPTH, "0"); transport_set_option(transport, TRANS_OPT_DEEPEN_RELATIVE, NULL); - fetch_refs(transport, ref_map); + if (!fetch_refs(transport, ref_map)) + consume_refs(transport, ref_map); if (gsecondary) { transport_disconnect(gsecondary); @@ -1165,7 +1172,7 @@ static int do_fetch(struct transport *transport, transport->url); } } - if (fetch_refs(transport, ref_map)) { + if (fetch_refs(transport, ref_map) || consume_refs(transport, ref_map)) { free_refs(ref_map); retcode = 1; goto cleanup; -- 2.18.0.rc1.244.gcf134e6275-goog
Re: [RFC PATCH 04/10] submodule--helper: add a new 'config' subcommand
On Mon, 14 May 2018 18:33:44 -0700 Stefan Beller wrote: > On Mon, May 14, 2018 at 3:58 AM, Antonio Ospite wrote: > > Add a new 'config' subcommand to 'submodule--helper', this extra level > > of indirection makes it possible to add some flexibility to how the > > submodules configuration is handled. > > > > Signed-off-by: Antonio Ospite > > --- > > builtin/submodule--helper.c | 39 + > > t/t7411-submodule-config.sh | 26 + > > 2 files changed, 65 insertions(+) > > > > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > > index 9e8f2acd5..b32110e3b 100644 > > --- a/builtin/submodule--helper.c > > +++ b/builtin/submodule--helper.c > > @@ -1825,6 +1825,44 @@ static int is_active(int argc, const char **argv, > > const char *prefix) > > return !is_submodule_active(the_repository, argv[1]); > > } > > > > +static int config_print_callback(const char *key_, const char *value_, > > void *cb_data) > > +{ > > + char *key = cb_data; > > + > > + if (!strcmp(key, key_)) > > + printf("%s\n", value_); > > + > > + return 0; > > +} > > + > > +static int module_config(int argc, const char **argv, const char *prefix) > > +{ > > + int ret; > > + > > + if (argc < 2 || argc > 3) > > + die("submodule--helper config takes 1 or 2 arguments: name > > [value]"); > > + > > + /* Equivalent to ACTION_GET in builtin/config.c */ > > + if (argc == 2) { > > + char *key; > > + > > + ret = git_config_parse_key(argv[1], &key, NULL); > > + if (ret < 0) > > + return CONFIG_INVALID_KEY; > > + > > + config_from_gitmodules(config_print_callback, > > the_repository, key); > > + > > + free(key); > > + return 0; > > + } > > + > > + /* Equivalent to ACTION_SET in builtin/config.c */ > > + if (argc == 3) > > + return config_gitmodules_set(argv[1], argv[2]); > > Ah, here we definitely want to set it in the .gitmodules file? > (Or does that change later in this series?) > > > + > > + return 0; > > +} > > + > > #define SUPPORT_SUPER_PREFIX (1<<0) > > > > struct cmd_struct { > > @@ -1850,6 +1888,7 @@ static struct cmd_struct commands[] = { > > {"push-check", push_check, 0}, > > {"absorb-git-dirs", absorb_git_dirs, SUPPORT_SUPER_PREFIX}, > > {"is-active", is_active, 0}, > > + {"config", module_config, 0}, > > }; > > > > int cmd_submodule__helper(int argc, const char **argv, const char *prefix) > > diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh > > index a648de6a9..dfe019f05 100755 > > --- a/t/t7411-submodule-config.sh > > +++ b/t/t7411-submodule-config.sh > > @@ -139,4 +139,30 @@ test_expect_success 'error in history in > > fetchrecursesubmodule lets continue' ' > > ) > > ' > > > > +test_expect_success 'reading submodules config with "submodule--helper > > config"' ' > > + (cd super && > > I think the project prefers a style > of the cd at the same level of the echo and the following commands. > There is mixed style about that, so for new tests in existing files I'd stick to the predominant style in the file. For new test files I'll use the recommended style of cd on the same level of the following commands. > However we might not need the (cd super && ...) via > > echo "../submodule" >expected > git -C super ubmodule--helper config submodule.submodule.url >../actual > test_cmp expected actual > > Our friends developing git on Windows will thank us for saving > to spawn a shell as spawning processes is expensive on Windows. :) > Also we have fewer lines of code. > I'll see how that looks, thanks for the suggestion. Again, I'd use a subshell if that's what the other tests in a file do, and use "git -C" in new files. Thanks, Antonio -- Antonio Ospite https://ao2.it https://twitter.com/ao2it A: Because it messes up the order in which people normally read text. See http://en.wikipedia.org/wiki/Posting_style Q: Why is top-posting such a bad thing?
[PATCH v3 3/8] upload-pack: test negotiation with changing repository
Add tests to check the behavior of fetching from a repository which changes between rounds of negotiation (for example, when different servers in a load-balancing agreement participate in the same stateless RPC negotiation). This forms a baseline of comparison to the ref-in-want functionality (which will be introduced to the client in subsequent commits), and ensures that subsequent commits do not change existing behavior. As part of this effort, a mechanism to substitute strings in a single HTTP response is added. Signed-off-by: Brandon Williams --- t/lib-httpd.sh | 1 + t/lib-httpd/apache.conf| 8 +++ t/lib-httpd/one-time-sed.sh| 16 ++ t/t5703-upload-pack-ref-in-want.sh | 92 ++ 4 files changed, 117 insertions(+) create mode 100644 t/lib-httpd/one-time-sed.sh diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh index 435a37465..84f8efdd4 100644 --- a/t/lib-httpd.sh +++ b/t/lib-httpd.sh @@ -132,6 +132,7 @@ prepare_httpd() { cp "$TEST_PATH"/passwd "$HTTPD_ROOT_PATH" install_script broken-smart-http.sh install_script error.sh + install_script one-time-sed.sh ln -s "$LIB_HTTPD_MODULE_PATH" "$HTTPD_ROOT_PATH/modules" diff --git a/t/lib-httpd/apache.conf b/t/lib-httpd/apache.conf index 724d9ae46..fe68d37bb 100644 --- a/t/lib-httpd/apache.conf +++ b/t/lib-httpd/apache.conf @@ -111,9 +111,14 @@ Alias /auth/dumb/ www/auth/dumb/ SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH} SetEnv GIT_HTTP_EXPORT_ALL + + SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH} + SetEnv GIT_HTTP_EXPORT_ALL + ScriptAliasMatch /smart_*[^/]*/(.*) ${GIT_EXEC_PATH}/git-http-backend/$1 ScriptAlias /broken_smart/ broken-smart-http.sh/ ScriptAlias /error/ error.sh/ +ScriptAliasMatch /one_time_sed/(.*) one-time-sed.sh/$1 Options FollowSymlinks @@ -123,6 +128,9 @@ ScriptAlias /error/ error.sh/ Options ExecCGI + + Options ExecCGI + Options ExecCGI diff --git a/t/lib-httpd/one-time-sed.sh b/t/lib-httpd/one-time-sed.sh new file mode 100644 index 0..a9c4aa5f4 --- /dev/null +++ b/t/lib-httpd/one-time-sed.sh @@ -0,0 +1,16 @@ +#!/bin/sh + +if [ -e one-time-sed ]; then + "$GIT_EXEC_PATH/git-http-backend" >out + + sed "$(cat one-time-sed)" out_modified + + if diff out out_modified >/dev/null; then + cat out + else + cat out_modified + rm one-time-sed + fi +else + "$GIT_EXEC_PATH/git-http-backend" +fi diff --git a/t/t5703-upload-pack-ref-in-want.sh b/t/t5703-upload-pack-ref-in-want.sh index 0ef182970..979ab6d03 100755 --- a/t/t5703-upload-pack-ref-in-want.sh +++ b/t/t5703-upload-pack-ref-in-want.sh @@ -150,4 +150,96 @@ test_expect_success 'want-ref with ref we already have commit for' ' check_output ' +. "$TEST_DIRECTORY"/lib-httpd.sh +start_httpd + +REPO="$HTTPD_DOCUMENT_ROOT_PATH/repo" +LOCAL_PRISTINE="$(pwd)/local_pristine" + +test_expect_success 'setup repos for change-while-negotiating test' ' + ( + git init "$REPO" && + cd "$REPO" && + >.git/git-daemon-export-ok && + test_commit m1 && + git tag -d m1 && + + # Local repo with many commits (so that negotiation will take + # more than 1 request/response pair) + git clone "http://127.0.0.1:$LIB_HTTPD_PORT/smart/repo"; "$LOCAL_PRISTINE" && + cd "$LOCAL_PRISTINE" && + git checkout -b side && + for i in $(seq 1 33); do test_commit s$i; done && + + # Add novel commits to upstream + git checkout master && + cd "$REPO" && + test_commit m2 && + test_commit m3 && + git tag -d m2 m3 + ) && + git -C "$LOCAL_PRISTINE" remote set-url origin "http://127.0.0.1:$LIB_HTTPD_PORT/one_time_sed/repo"; && + git -C "$LOCAL_PRISTINE" config protocol.version 2 +' + +inconsistency() { + # Simulate that the server initially reports $2 as the ref + # corresponding to $1, and after that, $1 as the ref corresponding to + # $1. This corresponds to the real-life situation where the server's + # repository appears to change during negotiation, for example, when + # different servers in a load-balancing arrangement serve (stateless) + # RPCs during a single negotiation. + printf "s/%s/%s/" \ + $(git -C "$REPO" rev-parse $1 | tr -d "\n") \ + $(git -C "$REPO" rev-parse $2 | tr -d "\n") \ + >"$HTTPD_ROOT_PATH/one-time-sed" +} + +test_expect_success 'server is initially ahead - no ref in want' ' + git -C "$REPO" config uploadpack.allowRefInWant false && + rm -rf local && + cp -r "$LOCAL_PRISTINE" local && + inconsistency master 1234567890123456789012345678901234567890 && + test_must_fail git -C local fetch 2>err &&
Re: [RFC PATCH 03/10] t7411: be nicer to other tests and really clean things up
On Mon, 14 May 2018 18:23:22 -0700 Stefan Beller wrote: > On Mon, May 14, 2018 at 3:58 AM, Antonio Ospite wrote: > > Tests 5 and 8 in t/t7411-submodule-config.sh add two commits with > > invalid lines in .gitmodules but then only the second commit is removed. > > > > This may affect subsequent tests if they assume that the .gitmodules > > file has no errors. > > > > Since those commits are not needed anymore remove both of them. > > > > Signed-off-by: Antonio Ospite > > --- > > > > I am putting these fixups to the test-suite before the patch that actually > > needs them so that the test-suite passes after each commit. > > > > t/t7411-submodule-config.sh | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/t/t7411-submodule-config.sh b/t/t7411-submodule-config.sh > > index 0bde5850a..a648de6a9 100755 > > --- a/t/t7411-submodule-config.sh > > +++ b/t/t7411-submodule-config.sh > > @@ -135,7 +135,7 @@ test_expect_success 'error in history in > > fetchrecursesubmodule lets continue' ' > > HEAD submodule \ > > >actual && > > test_cmp expect_error actual && > > - git reset --hard HEAD^ > > + git reset --hard HEAD~2 > > ) > > ' > > As this is the last test in this file, we do not change any subsequent > tests in a subtle way. > Good! > > This is > Reviewed-by: Stefan Beller > > FYI: > This test -- of course -- doesn't quite follow the latest coding guidelines, > as usually we'd prefer a test_when_finished "" > at the beginning of a test. I'll keep that in mind for new tests, trying to remember that 'test_when_finished' does not work in a subshell. BTW I can use 'test_when_finished' here as well, maybe adding a comment to clarify the the command also cleans up something from a previous test. Thanks, Antonio -- Antonio Ospite https://ao2.it https://twitter.com/ao2it A: Because it messes up the order in which people normally read text. See http://en.wikipedia.org/wiki/Posting_style Q: Why is top-posting such a bad thing?
RE: [PATCH v3 0/7] allocate cache entries from memory pool
> > > We debated several approaches for what to do here > > it would be awesome if the list could participate in the discussion even if > only > read-only. A bit of delay in my response here, but I like the suggestion. here is a summary of some approaches I considered: 1) Do not include any information about where the cache_entry was allocated. Pros: No extra memory overhead Cons: Caller is responsible for freeing the cache entry correctly This was our initial approach. We were hoping that all cache entries could be allocated from a memory pool, and we would not have to worry about non-pool allocated entries. There are still a couple of places that need "temporary" cache entries at the moment, so we couldn't move completely to only memory pool allocated cache_entries. This would have resulted in the code allocating many "temporary" cache_entries from a pool, and the memory would not be eligible to be reclaimed until the entire memory pool was freed - and this was a tradeoff we didn't want to make. 2) Include an extra field encoding whether the cache_entry was allocated from a memory pool Pro: the discard function can now make a decision regarding how to free the cache_entry Con: each cache entry grows by an extra int field. The bits in the existing ce_flags field are all claimed, so we need an extra field to track this bit of information. We could claim just a bit in the field now, which would result in the cache_entry still growing by the same amount, but any future bits would not require extra space. This pushes off the work for an actual bit field off into future work. 3) Encode whether the cache_entry was allocated from a memory pool as a bit in a field. Pro: only a single bit is required to encode this information. Con: All the existings bits in the existing ce_flags field are claimed. We need to add an extra bit field and take the same memory growth. I considered this approach (and am still open to it), but I went for a simpler initial approach to make sure the overall change is acceptable. There is no difference in the memory footprint with this change, so it is only to enable future changes more easily. 4) Include pointer to the memory pool from which this cache_entry was created from Pro: Could (potentially) do some extra bookkeeping, such as automatically cleaning up the memory_pool when all allocated cache_entries are freed. Con: extra complexity, larger growth to cache_entry struct to accommodate mem_pool pointer In the end, we didn't see a tangible benefit to this option at this point. Given the tradeoffs, I went with option #2 for now.
[PATCH v4 2/8] block alloc: add lifecycle APIs for cache_entry structs
Add an API around managing the lifetime of cache_entry structs. Abstracting memory management details behind this API will allow for alternative memory management strategies without affecting all the call sites. This commit does not change how memory is allocated or freed. A later commit in this series will allocate cache entries from memory pools as appropriate. Motivation: It has been observed that the time spent loading an index with a large number of entries is partly dominated by malloc() calls. This change is in preparation for using memory pools to reduce the number of malloc() calls made when loading an index. This API makes a distinction between cache entries that are intended for use with a particular index and cache entries that are not. This enables us to use the knowledge about how a cache entry will be used to make informed decisions about how to handle the corresponding memory. Signed-off-by: Jameson Miller --- apply.c| 24 ++--- blame.c| 5 ++- builtin/checkout.c | 8 ++--- builtin/difftool.c | 6 ++-- builtin/reset.c| 2 +- builtin/update-index.c | 26 ++ cache.h| 24 - merge-recursive.c | 2 +- read-cache.c | 96 ++ resolve-undo.c | 4 ++- split-index.c | 8 ++--- tree.c | 4 +-- unpack-trees.c | 35 -- 13 files changed, 155 insertions(+), 89 deletions(-) diff --git a/apply.c b/apply.c index d79e61591b..1b5d923f4e 100644 --- a/apply.c +++ b/apply.c @@ -4090,12 +4090,12 @@ static int build_fake_ancestor(struct apply_state *state, struct patch *list) return error(_("sha1 information is lacking or useless " "(%s)."), name); - ce = make_cache_entry(patch->old_mode, oid.hash, name, 0, 0); + ce = make_cache_entry(&result, patch->old_mode, oid.hash, name, 0, 0); if (!ce) return error(_("make_cache_entry failed for path '%s'"), name); if (add_index_entry(&result, ce, ADD_CACHE_OK_TO_ADD)) { - free(ce); + discard_cache_entry(ce); return error(_("could not add %s to temporary index"), name); } @@ -4263,12 +4263,11 @@ static int add_index_file(struct apply_state *state, struct stat st; struct cache_entry *ce; int namelen = strlen(path); - unsigned ce_size = cache_entry_size(namelen); if (!state->update_index) return 0; - ce = xcalloc(1, ce_size); + ce = make_empty_cache_entry(&the_index, namelen); memcpy(ce->name, path, namelen); ce->ce_mode = create_ce_mode(mode); ce->ce_flags = create_ce_flags(0); @@ -4278,13 +4277,13 @@ static int add_index_file(struct apply_state *state, if (!skip_prefix(buf, "Subproject commit ", &s) || get_oid_hex(s, &ce->oid)) { - free(ce); - return error(_("corrupt patch for submodule %s"), path); + discard_cache_entry(ce); + return error(_("corrupt patch for submodule %s"), path); } } else { if (!state->cached) { if (lstat(path, &st) < 0) { - free(ce); + discard_cache_entry(ce); return error_errno(_("unable to stat newly " "created file '%s'"), path); @@ -4292,13 +4291,13 @@ static int add_index_file(struct apply_state *state, fill_stat_cache_info(ce, &st); } if (write_object_file(buf, size, blob_type, &ce->oid) < 0) { - free(ce); + discard_cache_entry(ce); return error(_("unable to create backing store " "for newly created file %s"), path); } } if (add_cache_entry(ce, ADD_CACHE_OK_TO_ADD) < 0) { - free(ce); + discard_cache_entry(ce); return error(_("unable to add cache entry for %s"), path); } @@ -4422,27 +4421,26 @@ static int add_conflicted_stages_file(struct apply_state *state, struct patch *patch) { int stage, namelen; - unsigned ce_size, mode; + unsigned mode; struct cache_entry *ce; if (!state->update_index) return 0; namelen = strlen(patch->new_name); - ce_size = cache_entry_size(namelen); mode = patch->new_
[PATCH v4 4/8] mem-pool: tweak math on mp_block allocation size
The amount of memory to allocate for mp_blocks was off by a sizeof(uintmax_t) on platforms that do not support flexible arrays. To account for this, use the offset of the space field when calculating the total amount of memory to allocate for a mp_block. Signed-off-by: Jameson Miller --- mem-pool.c | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/mem-pool.c b/mem-pool.c index c80124f1fe..a5d5eed923 100644 --- a/mem-pool.c +++ b/mem-pool.c @@ -8,9 +8,11 @@ static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t block_alloc) { struct mp_block *p; + size_t total_alloc = st_add(offsetof(struct mp_block, space), block_alloc); + + mem_pool->pool_alloc = st_add(mem_pool->pool_alloc, total_alloc); + p = xmalloc(total_alloc); - mem_pool->pool_alloc += sizeof(struct mp_block) + block_alloc; - p = xmalloc(st_add(sizeof(struct mp_block), block_alloc)); p->next_block = mem_pool->mp_block; p->next_free = (char *)p->space; p->end = p->next_free + block_alloc; -- 2.14.3
[PATCH v4 6/8] mem-pool: fill out functionality
Add functions for: - combining two memory pools - determining if a memory address is within the range managed by a memory pool These functions will be used by future commits. Signed-off-by: Jameson Miller --- mem-pool.c | 42 ++ mem-pool.h | 13 + 2 files changed, 55 insertions(+) diff --git a/mem-pool.c b/mem-pool.c index 4e544459e9..81fda969d3 100644 --- a/mem-pool.c +++ b/mem-pool.c @@ -94,3 +94,45 @@ void *mem_pool_calloc(struct mem_pool *mem_pool, size_t count, size_t size) memset(r, 0, len); return r; } + +int mem_pool_contains(struct mem_pool *mem_pool, void *mem) +{ + struct mp_block *p; + + /* Check if memory is allocated in a block */ + for (p = mem_pool->mp_block; p; p = p->next_block) + if ((mem >= ((void *)p->space)) && + (mem < ((void *)p->end))) + return 1; + + return 0; +} + +void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src) +{ + struct mp_block *p; + + /* Append the blocks from src to dst */ + if (dst->mp_block && src->mp_block) { + /* +* src and dst have blocks, append +* blocks from src to dst. +*/ + p = dst->mp_block; + while (p->next_block) + p = p->next_block; + + p->next_block = src->mp_block; + } else if (src->mp_block) { + /* +* src has blocks, dst is empty. +*/ + dst->mp_block = src->mp_block; + } else { + /* src is empty, nothing to do. */ + } + + dst->pool_alloc += src->pool_alloc; + src->pool_alloc = 0; + src->mp_block = NULL; +} diff --git a/mem-pool.h b/mem-pool.h index f75b3365d5..adeefdcb28 100644 --- a/mem-pool.h +++ b/mem-pool.h @@ -41,4 +41,17 @@ void *mem_pool_alloc(struct mem_pool *pool, size_t len); */ void *mem_pool_calloc(struct mem_pool *pool, size_t count, size_t size); +/* + * Move the memory associated with the 'src' pool to the 'dst' pool. The 'src' + * pool will be empty and not contain any memory. It still needs to be free'd + * with a call to `mem_pool_discard`. + */ +void mem_pool_combine(struct mem_pool *dst, struct mem_pool *src); + +/* + * Check if a memory pointed at by 'mem' is part of the range of + * memory managed by the specified mem_pool. + */ +int mem_pool_contains(struct mem_pool *mem_pool, void *mem); + #endif -- 2.14.3
[PATCH v4 7/8] block alloc: allocate cache entries from mem_pool
When reading large indexes from disk, a portion of the time is dominated in malloc() calls. This can be mitigated by allocating a large block of memory and manage it ourselves via memory pools. This change moves the cache entry allocation to be on top of memory pools. Design: The index_state struct will gain a notion of an associated memory_pool from which cache_entries will be allocated from. When reading in the index from disk, we have information on the number of entries and their size, which can guide us in deciding how large our initial memory allocation should be. When an index is discarded, the associated memory_pool will be discarded as well - so the lifetime of a cache_entry is tied to the lifetime of the index_state that it was allocated for. In the case of a Split Index, the following rules are followed. 1st, some terminology is defined: Terminology: - 'the_index': represents the logical view of the index - 'split_index': represents the "base" cache entries. Read from the split index file. 'the_index' can reference a single split_index, as well as cache_entries from the split_index. `the_index` will be discarded before the `split_index` is. This means that when we are allocating cache_entries in the presence of a split index, we need to allocate the entries from the `split_index`'s memory pool. This allows us to follow the pattern that `the_index` can reference cache_entries from the `split_index`, and that the cache_entries will not be freed while they are still being referenced. Signed-off-by: Jameson Miller --- cache.h| 21 ++ mem-pool.c | 3 +- read-cache.c | 119 - split-index.c | 50 unpack-trees.c | 13 +-- 5 files changed, 167 insertions(+), 39 deletions(-) diff --git a/cache.h b/cache.h index abcc27ff87..74d3ebebc2 100644 --- a/cache.h +++ b/cache.h @@ -15,6 +15,7 @@ #include "path.h" #include "sha1-array.h" #include "repository.h" +#include "mem-pool.h" #include typedef struct git_zstream { @@ -156,6 +157,7 @@ struct cache_entry { struct stat_data ce_stat_data; unsigned int ce_mode; unsigned int ce_flags; + unsigned int mem_pool_allocated; unsigned int ce_namelen; unsigned int index; /* for link extension */ struct object_id oid; @@ -227,6 +229,7 @@ static inline void copy_cache_entry(struct cache_entry *dst, const struct cache_entry *src) { unsigned int state = dst->ce_flags & CE_HASHED; + int mem_pool_allocated = dst->mem_pool_allocated; /* Don't copy hash chain and name */ memcpy(&dst->ce_stat_data, &src->ce_stat_data, @@ -235,6 +238,9 @@ static inline void copy_cache_entry(struct cache_entry *dst, /* Restore the hash state */ dst->ce_flags = (dst->ce_flags & ~CE_HASHED) | state; + + /* Restore the mem_pool_allocated flag */ + dst->mem_pool_allocated = mem_pool_allocated; } static inline unsigned create_ce_flags(unsigned stage) @@ -328,6 +334,7 @@ struct index_state { struct untracked_cache *untracked; uint64_t fsmonitor_last_update; struct ewah_bitmap *fsmonitor_dirty; + struct mem_pool *ce_mem_pool; }; extern struct index_state the_index; @@ -362,6 +369,20 @@ extern struct cache_entry *make_empty_transient_cache_entry(size_t name_len); */ void discard_cache_entry(struct cache_entry *ce); +/* + * Duplicate a cache_entry. Allocate memory for the new entry from a + * memory_pool. Takes into account cache_entry fields that are meant + * for managing the underlying memory allocation of the cache_entry. + */ +struct cache_entry *dup_cache_entry(const struct cache_entry *ce, struct index_state *istate); + +/* + * Validate the cache entries in the index. This is an internal + * consistency check that the cache_entry structs are allocated from + * the expected memory pool. + */ +void validate_cache_entries(const struct index_state *istate); + #ifndef NO_THE_INDEX_COMPATIBILITY_MACROS #define active_cache (the_index.cache) #define active_nr (the_index.cache_nr) diff --git a/mem-pool.c b/mem-pool.c index 81fda969d3..0f19cc01a9 100644 --- a/mem-pool.c +++ b/mem-pool.c @@ -55,7 +55,8 @@ void mem_pool_discard(struct mem_pool *mem_pool) { struct mp_block *block, *block_to_free; - while ((block = mem_pool->mp_block)) + block = mem_pool->mp_block; + while (block) { block_to_free = block; block = block->next_block; diff --git a/read-cache.c b/read-cache.c index 6396e04e45..a8932ce2a6 100644 --- a/read-cache.c +++ b/read-cache.c @@ -46,6 +46,48 @@ CE_ENTRY_ADDED | CE_ENTRY_REMOVED | CE_ENTRY_CHANGED | \ SPLIT_INDEX_ORDERED | UNTRACKED_CHANGED | FSMONITOR_CHANGED) + +/* + * This is an estimate of the pathname length in the index. We use + * this for V4 in
[PATCH v4 8/8] block alloc: add validations around cache_entry lifecyle
Add an option (controlled by an environment variable) perform extra validations on mem_pool allocated cache entries. When set: 1) Invalidate cache_entry memory when discarding cache_entry. 2) When discarding index_state struct, verify that all cache_entries were allocated from expected mem_pool. 3) When discarding mem_pools, invalidate mem_pool memory. This should provide extra checks that mem_pools and their allocated cache_entries are being used as expected. Signed-off-by: Jameson Miller --- cache.h | 6 ++ git.c| 3 +++ mem-pool.c | 6 +- mem-pool.h | 2 +- read-cache.c | 55 +-- 5 files changed, 68 insertions(+), 4 deletions(-) diff --git a/cache.h b/cache.h index 74d3ebebc2..c0d6b976f5 100644 --- a/cache.h +++ b/cache.h @@ -369,6 +369,12 @@ extern struct cache_entry *make_empty_transient_cache_entry(size_t name_len); */ void discard_cache_entry(struct cache_entry *ce); +/* + * Check configuration if we should perform extra validation on cache + * entries. + */ +int should_validate_cache_entries(void); + /* * Duplicate a cache_entry. Allocate memory for the new entry from a * memory_pool. Takes into account cache_entry fields that are meant diff --git a/git.c b/git.c index c2f48d53dd..010898ba6d 100644 --- a/git.c +++ b/git.c @@ -414,7 +414,10 @@ static int run_builtin(struct cmd_struct *p, int argc, const char **argv) trace_argv_printf(argv, "trace: built-in: git"); + validate_cache_entries(&the_index); status = p->fn(argc, argv, prefix); + validate_cache_entries(&the_index); + if (status) return status; diff --git a/mem-pool.c b/mem-pool.c index 0f19cc01a9..92d106a637 100644 --- a/mem-pool.c +++ b/mem-pool.c @@ -51,7 +51,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size) *mem_pool = pool; } -void mem_pool_discard(struct mem_pool *mem_pool) +void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory) { struct mp_block *block, *block_to_free; @@ -60,6 +60,10 @@ void mem_pool_discard(struct mem_pool *mem_pool) { block_to_free = block; block = block->next_block; + + if (invalidate_memory) + memset(block_to_free->space, 0xDD, ((char *)block_to_free->end) - ((char *)block_to_free->space)); + free(block_to_free); } diff --git a/mem-pool.h b/mem-pool.h index adeefdcb28..999d3c3a52 100644 --- a/mem-pool.h +++ b/mem-pool.h @@ -29,7 +29,7 @@ void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size); /* * Discard a memory pool and free all the memory it is responsible for. */ -void mem_pool_discard(struct mem_pool *mem_pool); +void mem_pool_discard(struct mem_pool *mem_pool, int invalidate_memory); /* * Alloc memory from the mem_pool. diff --git a/read-cache.c b/read-cache.c index a8932ce2a6..2652f2aeb0 100644 --- a/read-cache.c +++ b/read-cache.c @@ -2047,8 +2047,10 @@ int discard_index(struct index_state *istate) * Cache entries in istate->cache[] should have been allocated * from the memory pool associated with this index, or from an * associated split_index. There is no need to free individual -* cache entries. +* cache entries. validate_cache_entries can detect when this +* assertion does not hold. */ + validate_cache_entries(istate); resolve_undo_clear_index(istate); istate->cache_nr = 0; @@ -2065,13 +2067,45 @@ int discard_index(struct index_state *istate) istate->untracked = NULL; if (istate->ce_mem_pool) { - mem_pool_discard(istate->ce_mem_pool); + mem_pool_discard(istate->ce_mem_pool, should_validate_cache_entries()); istate->ce_mem_pool = NULL; } return 0; } +/* + * Validate the cache entries of this index. + * All cache entries associated with this index + * should have been allocated by the memory pool + * associated with this index, or by a referenced + * split index. + */ +void validate_cache_entries(const struct index_state *istate) +{ + int i; + + if (!should_validate_cache_entries() ||!istate || !istate->initialized) + return; + + for (i = 0; i < istate->cache_nr; i++) { + if (!istate) { + die("internal error: cache entry is not allocated from expected memory pool"); + } else if (!istate->ce_mem_pool || + !mem_pool_contains(istate->ce_mem_pool, istate->cache[i])) { + if (!istate->split_index || + !istate->split_index->base || + !istate->split_index->base->ce_mem_pool || + !mem_pool_contains(istate->split_index->base->ce_mem_pool, istate->cache[i])) { +
[PATCH v4 1/8] read-cache: teach refresh_cache_entry() to take istate
Refactor refresh_cache_entry() to work on a specific index, instead of implicitly using the_index. This is in preparation for making the make_cache_entry function apply to a specific index. Signed-off-by: Jameson Miller --- cache.h | 2 +- merge-recursive.c | 2 +- read-cache.c | 9 + 3 files changed, 7 insertions(+), 6 deletions(-) diff --git a/cache.h b/cache.h index 89a107a7f7..9538511d9f 100644 --- a/cache.h +++ b/cache.h @@ -751,7 +751,7 @@ extern void fill_stat_cache_info(struct cache_entry *ce, struct stat *st); #define REFRESH_IGNORE_SUBMODULES 0x0010 /* ignore submodules */ #define REFRESH_IN_PORCELAIN 0x0020 /* user friendly output, not "needs update" */ extern int refresh_index(struct index_state *, unsigned int flags, const struct pathspec *pathspec, char *seen, const char *header_msg); -extern struct cache_entry *refresh_cache_entry(struct cache_entry *, unsigned int); +extern struct cache_entry *refresh_cache_entry(struct index_state *, struct cache_entry *, unsigned int); /* * Opportunistically update the index but do not complain if we can't. diff --git a/merge-recursive.c b/merge-recursive.c index f110e1c5ec..11a767cc72 100644 --- a/merge-recursive.c +++ b/merge-recursive.c @@ -323,7 +323,7 @@ static int add_cacheinfo(struct merge_options *o, if (refresh) { struct cache_entry *nce; - nce = refresh_cache_entry(ce, CE_MATCH_REFRESH | CE_MATCH_IGNORE_MISSING); + nce = refresh_cache_entry(&the_index, ce, CE_MATCH_REFRESH | CE_MATCH_IGNORE_MISSING); if (!nce) return err(o, _("add_cacheinfo failed to refresh for path '%s'; merge aborting."), path); if (nce != ce) diff --git a/read-cache.c b/read-cache.c index 372588260e..fa8366ecab 100644 --- a/read-cache.c +++ b/read-cache.c @@ -767,7 +767,7 @@ struct cache_entry *make_cache_entry(unsigned int mode, ce->ce_namelen = len; ce->ce_mode = create_ce_mode(mode); - ret = refresh_cache_entry(ce, refresh_options); + ret = refresh_cache_entry(&the_index, ce, refresh_options); if (ret != ce) free(ce); return ret; @@ -1473,10 +1473,11 @@ int refresh_index(struct index_state *istate, unsigned int flags, return has_errors; } -struct cache_entry *refresh_cache_entry(struct cache_entry *ce, - unsigned int options) +struct cache_entry *refresh_cache_entry(struct index_state *istate, + struct cache_entry *ce, + unsigned int options) { - return refresh_cache_ent(&the_index, ce, options, NULL, NULL); + return refresh_cache_ent(istate, ce, options, NULL, NULL); } -- 2.14.3
[PATCH v4 0/8] Allocate cache entries from mem_pool
Changes from V3: Mainly changes from last round of feedback: - Rename make_index_cache_entry -> make_cache_entry - Rename make_empty_index_cache_entry -> make_empty-cache_entry - Remove tail pointer in mem_pool - Small code tweaks - More accurately calculate mp_block size for platforms that do not support flexible arrays One thing that came up with my testing is that the current automated tests do not fully cover the code path of "large" allocations from a memory pool. I was able to force this condition by manually tweaking some variables and then running the automated tests, but this is not ideal for preventing regressions in the future. One way I can think of testing this is to add a test-helper and directly test the memory pool struct. This will allow me to control the parameters and different conditions. I was hoping for some guidance before I actually implemented these tests. Either way, I would like to do the additional tests in a separate patch series to have a more focused discussion. I am not sure if these tests would prevent inclusion of this patch series - I am open to guidance here. Base Ref: master Web-Diff: https://github.com/jamill/git/compare/242ba98e44...667b8de06c Jameson Miller (8): read-cache: teach refresh_cache_entry() to take istate block alloc: add lifecycle APIs for cache_entry structs mem-pool: only search head block for available space mem-pool: tweak math on mp_block allocation size mem-pool: add lifecycle management functions mem-pool: fill out functionality block alloc: allocate cache entries from mem_pool block alloc: add validations around cache_entry lifecyle apply.c| 24 +++-- blame.c| 5 +- builtin/checkout.c | 8 +- builtin/difftool.c | 6 +- builtin/reset.c| 2 +- builtin/update-index.c | 26 +++-- cache.h| 53 +- git.c | 3 + mem-pool.c | 124 +++ mem-pool.h | 23 + merge-recursive.c | 4 +- read-cache.c | 259 - resolve-undo.c | 4 +- split-index.c | 58 --- tree.c | 4 +- unpack-trees.c | 40 16 files changed, 504 insertions(+), 139 deletions(-) base-commit: 242ba98e44d8314fb184d240939614a3c9b424db -- 2.14.3
[PATCH v4 5/8] mem-pool: add lifecycle management functions
Add initialization and discard functions to mem_pool type. As the memory allocated by mem_pool can now be freed, we also track the large memory alllocations so they can be freed as well. If the there are existing mp_blocks in the mem_pool's linked list of mp_blocks, then the mp_block for a large allocation is inserted after the head block. This is because only the head mp_block is considered when searching for availble space. This results in the following desirable properties: 1) The mp_block allocated for the large request will not be included in the search for available space in future requests, as the large mp_block is sized for the specific request and does not contain any spare space. 2) The head mp_block will not bumped from considation for future memory requests just because a request for a large chunk of memory came in. These changes are in preparation for a future commit that will utilize creating and discarding memory pool. Signed-off-by: Jameson Miller --- mem-pool.c | 63 ++ mem-pool.h | 10 ++ 2 files changed, 61 insertions(+), 12 deletions(-) diff --git a/mem-pool.c b/mem-pool.c index a5d5eed923..4e544459e9 100644 --- a/mem-pool.c +++ b/mem-pool.c @@ -5,7 +5,14 @@ #include "cache.h" #include "mem-pool.h" -static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t block_alloc) +#define BLOCK_GROWTH_SIZE 1024*1024 - sizeof(struct mp_block); + +/* + * Allocate a new mp_block and insert it after the block specified in + * `insert_after`. If `insert_after` is NULL, then insert block at the + * head of the linked list. + */ +static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t block_alloc, struct mp_block *insert_after) { struct mp_block *p; size_t total_alloc = st_add(offsetof(struct mp_block, space), block_alloc); @@ -13,14 +20,51 @@ static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t b mem_pool->pool_alloc = st_add(mem_pool->pool_alloc, total_alloc); p = xmalloc(total_alloc); - p->next_block = mem_pool->mp_block; p->next_free = (char *)p->space; p->end = p->next_free + block_alloc; - mem_pool->mp_block = p; + + if (insert_after) { + p->next_block = insert_after->next_block; + insert_after->next_block = p; + } else { + p->next_block = mem_pool->mp_block; + mem_pool->mp_block = p; + } return p; } +void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size) +{ + struct mem_pool *pool; + + if (*mem_pool) + return; + + pool = xcalloc(1, sizeof(*pool)); + + pool->block_alloc = BLOCK_GROWTH_SIZE; + + if (initial_size > 0) + mem_pool_alloc_block(pool, initial_size, NULL); + + *mem_pool = pool; +} + +void mem_pool_discard(struct mem_pool *mem_pool) +{ + struct mp_block *block, *block_to_free; + + while ((block = mem_pool->mp_block)) + { + block_to_free = block; + block = block->next_block; + free(block_to_free); + } + + free(mem_pool); +} + void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len) { struct mp_block *p = NULL; @@ -33,15 +77,10 @@ void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len) if (mem_pool->mp_block && mem_pool->mp_block->end - mem_pool->mp_block->next_free >= len) p = mem_pool->mp_block; - - if (!p) { - if (len >= (mem_pool->block_alloc / 2)) { - mem_pool->pool_alloc += len; - return xmalloc(len); - } - - p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc); - } + else if (len >= (mem_pool->block_alloc / 2)) + p = mem_pool_alloc_block(mem_pool, len, mem_pool->mp_block); + else + p = mem_pool_alloc_block(mem_pool, mem_pool->block_alloc, NULL); r = p->next_free; p->next_free += len; diff --git a/mem-pool.h b/mem-pool.h index 829ad58ecf..f75b3365d5 100644 --- a/mem-pool.h +++ b/mem-pool.h @@ -21,6 +21,16 @@ struct mem_pool { size_t pool_alloc; }; +/* + * Initialize mem_pool with specified initial size. + */ +void mem_pool_init(struct mem_pool **mem_pool, size_t initial_size); + +/* + * Discard a memory pool and free all the memory it is responsible for. + */ +void mem_pool_discard(struct mem_pool *mem_pool); + /* * Alloc memory from the mem_pool. */ -- 2.14.3
[PATCH v4 3/8] mem-pool: only search head block for available space
Instead of searching all memory blocks for available space to fulfill a memory request, only search the head block. If the head block does not have space, assume that previous block would most likely not be able to fulfill request either. This could potentially lead to more memory fragmentation, but also avoids searching memory blocks that probably will not be able to fulfill request. This pattern will benefit consumers that are able to generate a good estimate for how much memory will be needed, or if they are performing fixed sized allocations, so that once a block is exhausted it will never be able to fulfill a future request. Signed-off-by: Jameson Miller --- mem-pool.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/mem-pool.c b/mem-pool.c index 389d7af447..c80124f1fe 100644 --- a/mem-pool.c +++ b/mem-pool.c @@ -21,16 +21,16 @@ static struct mp_block *mem_pool_alloc_block(struct mem_pool *mem_pool, size_t b void *mem_pool_alloc(struct mem_pool *mem_pool, size_t len) { - struct mp_block *p; + struct mp_block *p = NULL; void *r; /* round up to a 'uintmax_t' alignment */ if (len & (sizeof(uintmax_t) - 1)) len += sizeof(uintmax_t) - (len & (sizeof(uintmax_t) - 1)); - for (p = mem_pool->mp_block; p; p = p->next_block) - if (p->end - p->next_free >= len) - break; + if (mem_pool->mp_block && + mem_pool->mp_block->end - mem_pool->mp_block->next_free >= len) + p = mem_pool->mp_block; if (!p) { if (len >= (mem_pool->block_alloc / 2)) { -- 2.14.3
[PATCH v2 5/7] builtin/grep.c: add '--column' option to 'git-grep(1)'
Teach 'git-grep(1)' a new option, '--column', to show the column number of the first match on a non-context line. This makes it possible to teach 'contrib/git-jump/git-jump' how to seek to the first matching position of a grep match in your editor, and allows similar additional scripting capabilities. For example: $ git grep -n --column foo | head -n3 .clang-format:51:14:# myFunction(foo, bar, baz); .clang-format:64:7:# int foo(); .clang-format:75:8:# void foo() Signed-off-by: Taylor Blau --- Documentation/git-grep.txt | 6 ++- builtin/grep.c | 1 + t/t7810-grep.sh| 84 ++ 3 files changed, 90 insertions(+), 1 deletion(-) diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 312409a607..31dc0392a6 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -13,7 +13,7 @@ SYNOPSIS [-v | --invert-match] [-h|-H] [--full-name] [-E | --extended-regexp] [-G | --basic-regexp] [-P | --perl-regexp] - [-F | --fixed-strings] [-n | --line-number] + [-F | --fixed-strings] [-n | --line-number] [--column] [-l | --files-with-matches] [-L | --files-without-match] [(-O | --open-files-in-pager) []] [-z | --null] @@ -169,6 +169,10 @@ providing this option will cause it to die. --line-number:: Prefix the line number to matching lines. +--column:: + Prefix the 1-indexed byte-offset of the first match from the start of the + matching line. + -l:: --files-with-matches:: --name-only:: diff --git a/builtin/grep.c b/builtin/grep.c index ee753a403e..61bcaf6e58 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -828,6 +828,7 @@ int cmd_grep(int argc, const char **argv, const char *prefix) GREP_PATTERN_TYPE_PCRE), OPT_GROUP(""), OPT_BOOL('n', "line-number", &opt.linenum, N_("show line numbers")), + OPT_BOOL(0, "column", &opt.columnnum, N_("show column number of first match")), OPT_NEGBIT('h', NULL, &opt.pathname, N_("don't show filenames"), 1), OPT_BIT('H', NULL, &opt.pathname, N_("show filenames"), 1), OPT_NEGBIT(0, "full-name", &opt.relative, diff --git a/t/t7810-grep.sh b/t/t7810-grep.sh index 1797f632a3..bf0b572dab 100755 --- a/t/t7810-grep.sh +++ b/t/t7810-grep.sh @@ -99,6 +99,90 @@ do test_cmp expected actual ' + test_expect_success "grep -w $L (with --column)" ' + { + echo ${HC}file:5:foo mmap bar + echo ${HC}file:14:foo_mmap bar mmap + echo ${HC}file:5:foo mmap bar_mmap + echo ${HC}file:14:foo_mmap bar mmap baz + } >expected && + git grep --column -w -e mmap $H >actual && + test_cmp expected actual + ' + + test_expect_success "grep -w $L (with --column, extended)" ' + { + echo ${HC}file:14:foo_mmap bar mmap + echo ${HC}file:19:foo_mmap bar mmap baz + } >expected && + git grep --column -w -e mmap$ --or -e baz $H >actual && + test_cmp expected actual + ' + + test_expect_success "grep -w $L (with --column, --invert)" ' + { + echo ${HC}file:1:foo mmap bar + echo ${HC}file:1:foo_mmap bar + echo ${HC}file:1:foo_mmap bar mmap + echo ${HC}file:1:foo mmap bar_mmap + } >expected && + git grep --column --invert -w -e baz $H -- file >actual && + test_cmp expected actual + ' + + test_expect_success "grep $L (with --column, --invert, extended)" ' + { + echo ${HC}hello_world:6:HeLLo_world + } >expected && + git grep --column --invert -e ll --or --not -e _ $H -- hello_world \ + >actual && + test_cmp expected actual + ' + + test_expect_success "grep $L (with --column, double-negation)" ' + { + echo ${HC}file:1:foo_mmap bar mmap baz + } >expected && + git grep --column --not \( --not -e foo --or --not -e baz \) $H -- file \ + >actual && + test_cmp expected actual + ' + + test_expect_success "grep -w $L (with --column, -C)" ' + { + echo ${HC}file:5:foo mmap bar + echo ${HC}file-foo_mmap bar + echo ${HC}file:14:foo_mmap bar mmap + echo ${HC}file:5:foo mmap bar_mmap + echo ${HC}file:14:foo_mmap bar mmap baz + } >expected && + git grep --column -w -C1 -e mmap $H >actual && +
[PATCH v2 2/7] grep.c: expose {,inverted} match column in match_line()
When calling match_line(), callers presently cannot determine the relative offset of the match because match_line() discards the 'regmatch_t' that contains this information. Instead, teach match_line() to take in two 'ssize_t's. Fill the first with the offset of the match produced by the given expression. If extended, fill the later with the offset of the match produced as if --invert were given. For instance, matching "--not -e x" on this line produces a columnar offset of 0, (i.e., the whole line does not contain an x), but "--invert --not -e -x" will fill the later ssize_t of the column containing an "x", because this expression is semantically equivalent to "-e x". To determine the column for the inverted and non-inverted case, do the following: - If matching an atom, the non-inverted column is as given from match_one_pattern(), and the inverted column is unset. - If matching a --not, the inverted column and non-inverted column swap. - If matching an --and, or --or, the non-inverted column is the minimum of the two children, with the exception that --or is short-circuiting. For instance, if we match "-e a --or -e b" on a line that contains both "a" and "b" (and "b" comes first), the match column will hold "a", since we inspected the left child first, and short-circuited over the right child. This patch will become useful when we later pick between the two new results in order to display the column number of the first match on a line with --column. Co-authored-by: Jeff King Signed-off-by: Taylor Blau --- grep.c | 58 +++--- 1 file changed, 39 insertions(+), 19 deletions(-) diff --git a/grep.c b/grep.c index 45ec7e636c..dedfe17f93 100644 --- a/grep.c +++ b/grep.c @@ -1248,11 +1248,11 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol, return hit; } -static int match_expr_eval(struct grep_expr *x, char *bol, char *eol, - enum grep_context ctx, int collect_hits) +static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, char *bol, + char *eol, enum grep_context ctx, ssize_t *col, + ssize_t *icol, int collect_hits) { int h = 0; - regmatch_t match; if (!x) die("Not a valid grep expression"); @@ -1261,25 +1261,39 @@ static int match_expr_eval(struct grep_expr *x, char *bol, char *eol, h = 1; break; case GREP_NODE_ATOM: - h = match_one_pattern(x->u.atom, bol, eol, ctx, &match, 0); + { + regmatch_t tmp; + h = match_one_pattern(x->u.atom, bol, eol, ctx, + &tmp, 0); + if (h && (*col < 0 || tmp.rm_so < *col)) + *col = tmp.rm_so; + } break; case GREP_NODE_NOT: - h = !match_expr_eval(x->u.unary, bol, eol, ctx, 0); + /* +* Upon visiting a GREP_NODE_NOT, col and icol become swapped. +*/ + h = !match_expr_eval(opt, x->u.unary, bol, eol, ctx, icol, col, +0); break; case GREP_NODE_AND: - if (!match_expr_eval(x->u.binary.left, bol, eol, ctx, 0)) + if (!match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col, +icol, 0)) return 0; - h = match_expr_eval(x->u.binary.right, bol, eol, ctx, 0); + h = match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col, + icol, 0); break; case GREP_NODE_OR: if (!collect_hits) - return (match_expr_eval(x->u.binary.left, - bol, eol, ctx, 0) || - match_expr_eval(x->u.binary.right, - bol, eol, ctx, 0)); - h = match_expr_eval(x->u.binary.left, bol, eol, ctx, 0); + return (match_expr_eval(opt, x->u.binary.left, bol, eol, + ctx, col, icol, 0) || + match_expr_eval(opt, x->u.binary.right, bol, + eol, ctx, col, icol, 0)); + h = match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col, + icol, 0); x->u.binary.left->hit |= h; - h |= match_expr_eval(x->u.binary.right, bol, eol, ctx, 1); + h |= match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col, +icol, 1); break; default: die("Unexpected node type (internal error) %
[PATCH v2 1/7] Documentation/config.txt: camel-case lineNumber for consistency
lineNumber has casing that is inconsistent with surrounding options, like color.grep.matchContext, and color.grep.matchSelected. Re-case this documentation in order to be consistent with the text around it, and to ensure that new entries are consistent, too. Signed-off-by: Taylor Blau --- Documentation/config.txt | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index ab641bf5a9..58fde4daea 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1181,7 +1181,7 @@ color.grep.:: filename prefix (when not using `-h`) `function`;; function name lines (when using `-p`) -`linenumber`;; +`lineNumber`;; line number prefix (when using `-n`) `match`;; matching text (same as setting `matchContext` and `matchSelected`) -- 2.17.0.582.gccdcbd54c
[PATCH v2 6/7] grep.c: add configuration variables to show matched option
To support git-grep(1)'s new option, '--column', document and teach grep.c how to interpret relevant configuration options, similar to those associated with '--line-number'. Signed-off-by: Taylor Blau --- Documentation/config.txt | 5 + Documentation/git-grep.txt | 3 +++ grep.c | 6 ++ 3 files changed, 14 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index 58fde4daea..e4cbed3078 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1183,6 +1183,8 @@ color.grep.:: function name lines (when using `-p`) `lineNumber`;; line number prefix (when using `-n`) +`column`;; + column number prefix (when using `--column`) `match`;; matching text (same as setting `matchContext` and `matchSelected`) `matchContext`;; @@ -1797,6 +1799,9 @@ gitweb.snapshot:: grep.lineNumber:: If set to true, enable `-n` option by default. +grep.column:: + If set to true, enable the `--column` option by default. + grep.patternType:: Set the default matching behavior. Using a value of 'basic', 'extended', 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, diff --git a/Documentation/git-grep.txt b/Documentation/git-grep.txt index 31dc0392a6..0de3493b80 100644 --- a/Documentation/git-grep.txt +++ b/Documentation/git-grep.txt @@ -44,6 +44,9 @@ CONFIGURATION grep.lineNumber:: If set to true, enable `-n` option by default. +grep.column:: + If set to true, enable the `--column` option by default. + grep.patternType:: Set the default matching behavior. Using a value of 'basic', 'extended', 'fixed', or 'perl' will enable the `--basic-regexp`, `--extended-regexp`, diff --git a/grep.c b/grep.c index d353d5d976..08d3df2855 100644 --- a/grep.c +++ b/grep.c @@ -96,6 +96,10 @@ int grep_config(const char *var, const char *value, void *cb) opt->linenum = git_config_bool(var, value); return 0; } + if (!strcmp(var, "grep.column")) { + opt->columnnum = git_config_bool(var, value); + return 0; + } if (!strcmp(var, "grep.fullname")) { opt->relative = !git_config_bool(var, value); @@ -112,6 +116,8 @@ int grep_config(const char *var, const char *value, void *cb) color = opt->color_function; else if (!strcmp(var, "color.grep.linenumber")) color = opt->color_lineno; + else if (!strcmp(var, "color.grep.column")) + color = opt->color_columnno; else if (!strcmp(var, "color.grep.matchcontext")) color = opt->color_match_context; else if (!strcmp(var, "color.grep.matchselected")) -- 2.17.0.582.gccdcbd54c
[PATCH v2 3/7] grep.[ch]: extend grep_opt to allow showing matched column
To support showing the matched column when calling 'git-grep(1)', teach 'grep_opt' the normal set of options to configure the default behavior and colorization of this feature. Now that we have opt->columnnum, use it to disable short-circuiting over ORs so that col and icol are always filled with the earliest matches on each line. In addition, don't return the first match from match_line(), for the same reason. Signed-off-by: Taylor Blau --- grep.c | 33 +++-- grep.h | 2 ++ 2 files changed, 29 insertions(+), 6 deletions(-) diff --git a/grep.c b/grep.c index dedfe17f93..d56d4a3a37 100644 --- a/grep.c +++ b/grep.c @@ -46,6 +46,7 @@ void init_grep_defaults(void) color_set(opt->color_filename, ""); color_set(opt->color_function, ""); color_set(opt->color_lineno, ""); + color_set(opt->color_columnno, ""); color_set(opt->color_match_context, GIT_COLOR_BOLD_RED); color_set(opt->color_match_selected, GIT_COLOR_BOLD_RED); color_set(opt->color_selected, ""); @@ -155,6 +156,7 @@ void grep_init(struct grep_opt *opt, const char *prefix) opt->extended_regexp_option = def->extended_regexp_option; opt->pattern_type_option = def->pattern_type_option; opt->linenum = def->linenum; + opt->columnnum = def->columnnum; opt->max_depth = def->max_depth; opt->pathname = def->pathname; opt->relative = def->relative; @@ -164,6 +166,7 @@ void grep_init(struct grep_opt *opt, const char *prefix) color_set(opt->color_filename, def->color_filename); color_set(opt->color_function, def->color_function); color_set(opt->color_lineno, def->color_lineno); + color_set(opt->color_columnno, def->color_columnno); color_set(opt->color_match_context, def->color_match_context); color_set(opt->color_match_selected, def->color_match_selected); color_set(opt->color_selected, def->color_selected); @@ -1284,16 +1287,23 @@ static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, char *bol, icol, 0); break; case GREP_NODE_OR: - if (!collect_hits) + if (!(collect_hits || opt->columnnum)) { + /* +* Don't short-circuit OR when given --column (or +* collecting hits) to ensure we don't skip a later +* child that would produce an earlier match. +*/ return (match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col, icol, 0) || match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col, icol, 0)); + } h = match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col, icol, 0); - x->u.binary.left->hit |= h; + if (collect_hits) + x->u.binary.left->hit |= h; h |= match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col, -icol, 1); +icol, collect_hits); break; default: die("Unexpected node type (internal error) %d", x->node); @@ -1316,6 +1326,7 @@ static int match_line(struct grep_opt *opt, char *bol, char *eol, enum grep_context ctx, int collect_hits) { struct grep_pat *p; + int hit = 0; if (opt->extended) return match_expr(opt, bol, eol, ctx, col, icol, @@ -1325,11 +1336,21 @@ static int match_line(struct grep_opt *opt, char *bol, char *eol, for (p = opt->pattern_list; p; p = p->next) { regmatch_t tmp; if (match_one_pattern(p, bol, eol, ctx, &tmp, 0)) { - *col = tmp.rm_so; - return 1; + hit |= 1; + if (!opt->columnnum) { + /* +* Without --column, any single match on a line +* is enough to know that it needs to be +* printed. With --column, scan _all_ patterns +* to find the earliest. +*/ + break; + } + if (*col < 0 || tmp.rm_so < *col) + *col = tmp.rm_so; } } - return 0; + return hit; } static int match_next_pattern(struct grep_pat *p, char *bol, char *eol, diff --git a/grep.h b/grep.h index 399381c908..08a0b391c5 100644 --- a/grep.h +++ b/grep.h @@ -127,6 +127,7 @@ struct grep_opt { int prefix_length; regex_t regexp; int linenum; +
[PATCH v2 4/7] grep.c: display column number of first match
To prepare for 'git grep' learning '--column', teach grep.c's show_line() how to show the column of the first match on non-context lines. Signed-off-by: Taylor Blau --- grep.c | 33 - 1 file changed, 28 insertions(+), 5 deletions(-) diff --git a/grep.c b/grep.c index d56d4a3a37..d353d5d976 100644 --- a/grep.c +++ b/grep.c @@ -1399,7 +1399,7 @@ static int next_match(struct grep_opt *opt, char *bol, char *eol, } static void show_line(struct grep_opt *opt, char *bol, char *eol, - const char *name, unsigned lno, char sign) + const char *name, unsigned lno, ssize_t cno, char sign) { int rest = eol - bol; const char *match_color, *line_color = NULL; @@ -1434,6 +1434,17 @@ static void show_line(struct grep_opt *opt, char *bol, char *eol, output_color(opt, buf, strlen(buf), opt->color_lineno); output_sep(opt, sign); } + /* +* Treat 'cno' as the 1-indexed offset from the start of a non-context +* line to its first match. Otherwise, 'cno' is 0 indicating that we are +* being called with a context line. +*/ + if (opt->columnnum && cno) { + char buf[32]; + xsnprintf(buf, sizeof(buf), "%zu", cno); + output_color(opt, buf, strlen(buf), opt->color_columnno); + output_sep(opt, sign); + } if (opt->color) { regmatch_t match; enum grep_context ctx = GREP_CONTEXT_BODY; @@ -1539,7 +1550,7 @@ static void show_funcname_line(struct grep_opt *opt, struct grep_source *gs, break; if (match_funcname(opt, gs, bol, eol)) { - show_line(opt, bol, eol, gs->name, lno, '='); + show_line(opt, bol, eol, gs->name, lno, 0, '='); break; } } @@ -1604,7 +1615,7 @@ static void show_pre_context(struct grep_opt *opt, struct grep_source *gs, while (*eol != '\n') eol++; - show_line(opt, bol, eol, gs->name, cur, sign); + show_line(opt, bol, eol, gs->name, cur, 0, sign); bol = eol + 1; cur++; } @@ -1803,6 +1814,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle while (left) { char *eol, ch; int hit; + ssize_t cno; ssize_t col = -1, icol = -1; /* @@ -1868,7 +1880,18 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle show_pre_context(opt, gs, bol, eol, lno); else if (opt->funcname) show_funcname_line(opt, gs, bol, lno); - show_line(opt, bol, eol, gs->name, lno, ':'); + cno = opt->invert ? icol : col; + if (cno < 0) { + /* +* A negative cno indicates that there was no +* match on the line. We are thus inverted and +* being asked to show all lines that _don't_ +* match a given expression. Therefore, set cno +* to 0 to suggest the whole line matches. +*/ + cno = 0; + } + show_line(opt, bol, eol, gs->name, lno, cno + 1, ':'); last_hit = lno; if (opt->funcbody) show_function = 1; @@ -1897,7 +1920,7 @@ static int grep_source_1(struct grep_opt *opt, struct grep_source *gs, int colle /* If the last hit is within the post context, * we need to show this line. */ - show_line(opt, bol, eol, gs->name, lno, '-'); + show_line(opt, bol, eol, gs->name, lno, col + 1, '-'); } next_line: -- 2.17.0.582.gccdcbd54c
[PATCH v2 7/7] contrib/git-jump/git-jump: jump to exact location
Take advantage of 'git-grep(1)''s new option, '--column' in order to teach Peff's 'git-jump' script how to jump to the correct column for any given match. 'git-grep(1)''s output is in the correct format for Vim's jump list, so no additional cleanup is necessary. Signed-off-by: Taylor Blau --- contrib/git-jump/README | 12 ++-- contrib/git-jump/git-jump | 2 +- 2 files changed, 11 insertions(+), 3 deletions(-) diff --git a/contrib/git-jump/README b/contrib/git-jump/README index 4484bda410..2f618a7f97 100644 --- a/contrib/git-jump/README +++ b/contrib/git-jump/README @@ -25,6 +25,13 @@ git-jump will feed this to the editor: foo.c:2: printf("hello word!\n"); --- +Or, when running 'git jump grep', column numbers will also be emitted, +e.g. `git jump grep "hello"` would return: + +--- +foo.c:2:9: printf("hello word!\n"); +--- + Obviously this trivial case isn't that interesting; you could just open `foo.c` yourself. But when you have many changes scattered across a project, you can use the editor's support to "jump" from point to point. @@ -35,7 +42,8 @@ Git-jump can generate four types of interesting lists: 2. The beginning of any merge conflict markers. - 3. Any grep matches. + 3. Any grep matches, including the column of the first match on a + line. 4. Any whitespace errors detected by `git diff --check`. @@ -82,7 +90,7 @@ which does something similar to `git jump grep`. However, it is limited to positioning the cursor to the correct line in only the first file, leaving you to locate subsequent hits in that file or other files using the editor or pager. By contrast, git-jump provides the editor with a -complete list of files and line numbers for each match. +complete list of files, lines, and a column number for each match. Limitations diff --git a/contrib/git-jump/git-jump b/contrib/git-jump/git-jump index 80ab0590bc..931b0fe3a9 100755 --- a/contrib/git-jump/git-jump +++ b/contrib/git-jump/git-jump @@ -52,7 +52,7 @@ mode_merge() { # editor shows them to us in the status bar. mode_grep() { cmd=$(git config jump.grepCmd) - test -n "$cmd" || cmd="git grep -n" + test -n "$cmd" || cmd="git grep -n --column" $cmd "$@" | perl -pe ' s/[ \t]+/ /g; -- 2.17.0.582.gccdcbd54c
[PATCH v2 0/7] grep.c: teach --column to 'git-grep(1)'
Hi, Here is a re-roll of my series to add --column to 'git-grep(1)'. Since last time, not much has changed other than the following: - Fix a typo where 'col', 'icol' were spelled as 'match', 'imatch' [1]. - Disable short-circuiting OR when --column is given [2]. - Disable early-return in match_line() when multiple patterns are given and --column is, too [3]. - Add some more tests in t7810. Thanks again for your kind and through review; hopefully this re-roll should be OK to queue as-is. Thanks, Taylor [1]: https://public-inbox.org/git/xmqqwouuvi0e@gitster-ct.c.googlers.com/ [2]: https://public-inbox.org/git/20180619174452.ga47...@syl.attlocal.net/ [3]: https://public-inbox.org/git/80b9a0b1-3849-7097-fe1a-dd80835d6...@web.de/ Taylor Blau (7): Documentation/config.txt: camel-case lineNumber for consistency grep.c: expose {,inverted} match column in match_line() grep.[ch]: extend grep_opt to allow showing matched column grep.c: display column number of first match builtin/grep.c: add '--column' option to 'git-grep(1)' grep.c: add configuration variables to show matched option contrib/git-jump/git-jump: jump to exact location Documentation/config.txt | 7 ++- Documentation/git-grep.txt | 9 ++- builtin/grep.c | 1 + contrib/git-jump/README| 12 +++- contrib/git-jump/git-jump | 2 +- grep.c | 126 - grep.h | 2 + t/t7810-grep.sh| 84 + 8 files changed, 210 insertions(+), 33 deletions(-) Inter-diff (since v1): diff --git a/grep.c b/grep.c index 8ffa94c791..08d3df2855 100644 --- a/grep.c +++ b/grep.c @@ -1257,8 +1257,8 @@ static int match_one_pattern(struct grep_pat *p, char *bol, char *eol, return hit; } -static int match_expr_eval(struct grep_expr *x, char *bol, char *eol, - enum grep_context ctx, ssize_t *col, +static int match_expr_eval(struct grep_opt *opt, struct grep_expr *x, char *bol, + char *eol, enum grep_context ctx, ssize_t *col, ssize_t *icol, int collect_hits) { int h = 0; @@ -1280,29 +1280,36 @@ static int match_expr_eval(struct grep_expr *x, char *bol, char *eol, break; case GREP_NODE_NOT: /* -* Upon visiting a GREP_NODE_NOT, imatch and match become -* swapped. +* Upon visiting a GREP_NODE_NOT, col and icol become swapped. */ - h = !match_expr_eval(x->u.unary, bol, eol, ctx, icol, col, 0); + h = !match_expr_eval(opt, x->u.unary, bol, eol, ctx, icol, col, +0); break; case GREP_NODE_AND: - if (!match_expr_eval(x->u.binary.left, bol, eol, ctx, col, + if (!match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col, icol, 0)) return 0; - h = match_expr_eval(x->u.binary.right, bol, eol, ctx, col, + h = match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col, icol, 0); break; case GREP_NODE_OR: - if (!collect_hits) - return (match_expr_eval(x->u.binary.left, bol, eol, ctx, - col, icol, 0) || - match_expr_eval(x->u.binary.right, bol, eol, - ctx, col, icol, 0)); - h = match_expr_eval(x->u.binary.left, bol, eol, ctx, col, + if (!(collect_hits || opt->columnnum)) { + /* +* Don't short-circuit OR when given --column (or +* collecting hits) to ensure we don't skip a later +* child that would produce an earlier match. +*/ + return (match_expr_eval(opt, x->u.binary.left, bol, eol, + ctx, col, icol, 0) || + match_expr_eval(opt, x->u.binary.right, bol, + eol, ctx, col, icol, 0)); + } + h = match_expr_eval(opt, x->u.binary.left, bol, eol, ctx, col, icol, 0); - x->u.binary.left->hit |= h; - h |= match_expr_eval(x->u.binary.right, bol, eol, ctx, col, -icol, 1); + if (collect_hits) + x->u.binary.left->hit |= h; + h |= match_expr_eval(opt, x->u.binary.right, bol, eol, ctx, col, +icol, collect_hits); break; default: die("Unexpected node type (internal error) %d", x->node); @@ -1317
Re: [RFC PATCH 01/10] config: make config_from_gitmodules generally useful
Hi Antonio! On Wed, Jun 20, 2018 at 11:06 AM Antonio Ospite wrote: > I get that the _content_ of .gitmodules is not meant to be generic > config, but I still see some value in having a single point when its > _location_ is decided. I agree that a single point for the _location_ as well as the _order_ (in case there will be multiple files; as of now we have the layering of .gitmodules overlayed by .git/config; IIRC I explained having an intermediate layer in our conversation to be useful; See one of the latest bug reports[1], where an intermediate layer outside a single branch would prove to be useful.) parsing are useful. [1] https://public-inbox.org/git/db6pr0101mb2344d682511891e4e9528598d9...@db6pr0101mb2344.eurprd01.prod.exchangelabs.com/ Sorry for not explaining my point of view well enough, let me try again: Historically we did not store any config in the repository itself. There are some exceptions, but these configurations are content related, i.e. .gitmodules or .gitattributes can tell you meta data about the content itself. However other config was kept out: One could have a .gitconfig that pre-populates the .git/config file, right? That was intentionally avoided as there are many configurations that are easy to abuse security wise, e.g. setting core.pager = "rm -rf /" And then submodules entered the big picture. Submodules need quite a lot of configuration, and hence the .gitmodules file was born. Initially IIRC there was only a very simple config like url store: [submodule.] url = and that was it as it was just like the .gitignore and .gitattributes related to the content and transporting this configuration with the content itself seemed so natural. However then a lot of settings were invented for submodules and some made it into the .gitmodules file; only recently there was a discussion on list how these settings may or may not pose a security issue. It turns out we had a CVE (with sumodule names) and that got fixed but we really want to keep the .gitmodules file simple and ignore any additional things in there as they may pose security issues later. With that said, how would you write the code while keeping this in mind? If you look at builtin/submodule--helper.c and builtin/fetch.c, you'll see that they use config_from_gitmodules(their_parse_function); git_config(their_parse_function); and config_from_gitmodules is documented to not be expanded; the config_from_gitmodules is singled out to treat those settings that snuck in and are kept around as we don't want to break existing users. But we'd really not want to expand the use of that function again for its implications on security. Right now it is nailed down beautifully and it is easy to tell which commands actually look at config from the .gitmodules file (not to be confused with the submodule parsing, that is in submodule-config.{h, c}. That is actually about finding submodule specific name/path/url etc instead of the generic "submodule.fetchjobs" or "fetch.recursesubmodules". So far about the background story. I would rather replicate the code in repo_read_gitmodules in the submodule-config.c as to mix those two lines (reading generic config vs. reading submodule specific config, like name/url/path). And to not mix them, I would not reuse that function but rather replicate (or have a static helper) in submodule helper, as then we cannot pass in an arbitrary config parsing callback to that, but are bound to the submodule helper code. > > I think extending 'repo_read_gitmodules' makes sense, as that is > > used everywhere for the loading of submodule configuration. > > I would follow Brandon's suggestion here and still use > 'config_from_gitmodules' from 'repo_read_gitmodules' to avoid code > duplication. > > I will try to be clear in the comments and in commit message when > motivating the decision. Rereading what Brandon said, I agree with this approach, sorry for writing out the story above in such lengthy detail. Thanks for picking up this series again! Stefan
Re: [RFC PATCH 02/10] submodule: factor out a config_gitmodules_set function
On Mon, 14 May 2018 18:20:21 -0700 Stefan Beller wrote: > On Mon, May 14, 2018 at 3:58 AM, Antonio Ospite wrote: > > Introduce a new config_gitmodules_set function to write config values to the > > .gitmodules file. > > > > This is in preparation for a future change which will use the function > > to write to the .gitmodules file in a more controlled way instead of > > using "git config -f .gitmodules". > > > > Signed-off-by: Antonio Ospite > > --- > > > > Not sure about the name, and maybe it can go in config.c for symmetry with > > config_from_gitmodules? > > What is the function about (in the end state and now) ? > is it more of a > * configure_submodules_config() > which would convey it is a generic function to configure submodules > (i.e. it may not even write to *the* .gitmodules file but somewhere else, > such as a helper ref) > This doesn't sound like it as we make use of the function in > update_path_in_gitmodules() that is used from git-mv, which would want > to ensure that the specific .gitmodules file is changed. > * gitmodules_file_set() > that focuses on the specific file that we want to modify? > * ... > > Let's continue reading the series to see the end state for > a good name. > OK, that helped to clarify one point: eventually there will be some asymmetry between reading the submodule config and writing it. 1. reading will be either form .gitmodules or HEAD:.gitmodules; 2. writing will be only to .gitmodules. I'll try to consider that when naming the functions. If a more generic mechanism to write the submodules configuration is added in the future (e.g. the special ref you were mentioning) the functions can always be renamed accordingly. Thanks, Antonio -- Antonio Ospite https://ao2.it https://twitter.com/ao2it A: Because it messes up the order in which people normally read text. See http://en.wikipedia.org/wiki/Posting_style Q: Why is top-posting such a bad thing?
Re: [RFC PATCH 01/10] config: make config_from_gitmodules generally useful
On Mon, 14 May 2018 11:19:28 -0700 Brandon Williams wrote: Hi Brandon, sorry for the delay, some comments below. > On 05/14, Antonio Ospite wrote: > > The config_from_gitmodules() function is a good candidate for > > a centralized point where to read the gitmodules configuration file. > > > > Add a repo argument to it to make the function more general, and adjust > > the current callers in cmd_fetch and update-clone. > > > > As a proof of the utility of the change, start using the function also > > in repo_read_gitmodules which was basically doing the same operations. > > > > Signed-off-by: Antonio Ospite > > --- > > builtin/fetch.c | 2 +- > > builtin/submodule--helper.c | 2 +- > > config.c| 13 +++-- > > config.h| 10 +- > > submodule-config.c | 16 > > 5 files changed, 14 insertions(+), 29 deletions(-) > > > > diff --git a/builtin/fetch.c b/builtin/fetch.c > > index 7ee83ac0f..a67ee7c39 100644 > > --- a/builtin/fetch.c > > +++ b/builtin/fetch.c > > @@ -1445,7 +1445,7 @@ int cmd_fetch(int argc, const char **argv, const char > > *prefix) > > for (i = 1; i < argc; i++) > > strbuf_addf(&default_rla, " %s", argv[i]); > > > > - config_from_gitmodules(gitmodules_fetch_config, NULL); > > + config_from_gitmodules(gitmodules_fetch_config, the_repository, NULL); > > git_config(git_fetch_config, NULL); > > > > argc = parse_options(argc, argv, prefix, > > diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c > > index c2403a915..9e8f2acd5 100644 > > --- a/builtin/submodule--helper.c > > +++ b/builtin/submodule--helper.c > > @@ -1602,7 +1602,7 @@ static int update_clone(int argc, const char **argv, > > const char *prefix) > > }; > > suc.prefix = prefix; > > > > - config_from_gitmodules(gitmodules_update_clone_config, &max_jobs); > > + config_from_gitmodules(gitmodules_update_clone_config, the_repository, > > &max_jobs); > > git_config(gitmodules_update_clone_config, &max_jobs); > > > > argc = parse_options(argc, argv, prefix, module_update_clone_options, > > diff --git a/config.c b/config.c > > index 6f8f1d8c1..8ffe29330 100644 > > --- a/config.c > > +++ b/config.c > > @@ -2173,17 +2173,18 @@ int git_config_get_pathname(const char *key, const > > char **dest) > > } > > > > /* > > - * Note: This function exists solely to maintain backward compatibility > > with > > - * 'fetch' and 'update_clone' storing configuration in '.gitmodules' and > > should > > - * NOT be used anywhere else. > > + * Note: Initially this function existed solely to maintain backward > > + * compatibility with 'fetch' and 'update_clone' storing configuration in > > + * '.gitmodules' but it turns out it can be useful as a centralized point > > to > > + * read the gitmodules config file. > > I'm all for what you're trying to accomplish in this patch series but I > think a little more care needs to be taken here. Maybe about a year ago > I did some refactoring with how the gitmodules file was loaded and it > was decided that allowing arbitrary configuration in the .gitmodules > file was a bad thing, so I tried to make sure that it was very difficult > to sneak in more of that and limiting it to the places where it was > already done (fetch and update_clone). Now this patch is explicitly > changing the comment on this function to loosen the requirements I made > when it was introduced, which could be problematic in the future. > Yes, it was a little inconsiderate of me, sorry. > So here's what I suggest doing: > 1. Move this function from config.{c,h} to submodule-config.{c,h} to > make it clear who owns this. > 2. Move the gitmodules_set function you created to live in submodule-config. > 3. You can still use this function as the main driver for reading the > submodule config BUT the comment above the function in both the .c and > .h files should be adapted so that it is clearly spells out that the > function is to be used only by the submodule config code (reading it > in repo_read_gitmodules and reading/writing it in the > submodule-helper config function you've added) and that the only > exceptions to this are to maintain backwards compatibility with the > existing callers and that new callers shouldn't be added. > I fully agree, I am going to send a new RFC soon. > This is just a long way to say that if new callers to this function are > added in the future that it is made very clear in the code that the > .gitmodules file exists for a specific purpose and that it shouldn't be > exploited to ship config with a repository. (If we end up allowing > config to be shipped with a repository at a later date that will need to > be handled with great care due to security concerns). > Got it. > Thanks for working on this, the end result is definitely a step in the > direction I've wanted the submodule config to he
Re: [RFC PATCH 01/10] config: make config_from_gitmodules generally useful
On Mon, 14 May 2018 18:05:19 -0700 Stefan Beller wrote: Hi again Stefan, > On Mon, May 14, 2018 at 3:58 AM, Antonio Ospite wrote: > > The config_from_gitmodules() function is a good candidate for > > a centralized point where to read the gitmodules configuration file. > > It is very tempting to use that function. However it was introduced > specifically to not do that. ;) > > See the series that was merged at 5aa0b6c506c (Merge branch > 'bw/grep-recurse-submodules', 2017-08-22), specifically > f20e7c1ea24 (submodule: remove submodule.fetchjobs from > submodule-config parsing, 2017-08-02), where both > builtin/fetch as well as the submodule helper use the pattern to > read from the .gitmodules file va this function and then overlay it > with the read from config. > I get that the _content_ of .gitmodules is not meant to be generic config, but I still see some value in having a single point when its _location_ is decided. > > Add a repo argument to it to make the function more general, and adjust > > the current callers in cmd_fetch and update-clone. > > This could be a preparatory patch, but including it here is fine, too. > I agree, having this as a preparatory change will help focusing the review, thanks. > > As a proof of the utility of the change, start using the function also > > in repo_read_gitmodules which was basically doing the same operations. > > I think they were separated for the reason outlined above, or what Brandon > said in his reply. > > I think extending 'repo_read_gitmodules' makes sense, as that is > used everywhere for the loading of submodule configuration. I would follow Brandon's suggestion here and still use 'config_from_gitmodules' from 'repo_read_gitmodules' to avoid code duplication. I will try to be clear in the comments and in commit message when motivating the decision. Thanks for the review Stefan. Ciao, Antonio -- Antonio Ospite https://ao2.it https://twitter.com/ao2it A: Because it messes up the order in which people normally read text. See http://en.wikipedia.org/wiki/Posting_style Q: Why is top-posting such a bad thing?
Re: [RFC PATCH v2 1/7] git-rebase.txt: document incompatible options
Hi Eric, On Sun, Jun 17, 2018 at 10:15 AM, Eric Sunshine wrote: > On Sun, Jun 17, 2018 at 1:59 AM Elijah Newren wrote: >> git rebase has many options that only work with one of its three backends. >> It also has a few other pairs of incompatible options. Document these. >> >> Signed-off-by: Elijah Newren >> --- >> diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt >> @@ -487,6 +510,51 @@ recreates the topic branch with fresh commits so it can >> be remerged >> +Flags only understood by the interactive backend: >> + [...] >> + * --edit-todo >> + * --root + --onto > > What does "+" mean? Is it "and"? Yes, it means "and", though perhaps it'd be even clearer to write "in combination with" or "when used in combination with" >> +Other incompatible flag pairs: >> + >> + * --preserve-merges && --interactive >> + * --preserve-merges && --signoff >> + * --preserve-merges && --rebase-merges >> + * --rebase-merges && --strategy > > It's somewhat odd seeing "programmer" notation in end-user > documentation. Perhaps replace "&&" with "and"? Yes, good point. I'll make that fix.
Re: [RFC PATCH v2 1/7] git-rebase.txt: document incompatible options
On Sun, Jun 17, 2018 at 8:38 AM, Phillip Wood wrote: >> +Other incompatible flag pairs: >> + >> + * --preserve-merges && --interactive >> + * --preserve-merges && --signoff >> + * --preserve-merges && --rebase-merges >> + * --rebase-merges && --strategy > > Does --rebase-merges support --strategy-options? Good catch. I hadn't yet tested --rebase-merges or --preserve-merges myself, so I was just going off notes in the documentation or code for those ones. I created a simple case needing -Xignore-space-change to merge cleanly, and found that any --strategy-option argument will be silently ignored by --rebase-merges currently. So, we should also document and warn the user that those options are currently incompatible. >> + >> include::merge-strategies.txt[] >> NOTES >> > This is a great step forward. In the long run it would be nice not to have > to mention the backends. Once git-rebase--merge.sh is gone, the situation is > simpler and it would be nice to reword this to say something like > The --committer-date-is-author-date, --ignore-date, --whitespace, > --ignore-whitespace and -C options are incompatible with the following > --exec ... Also note that --rebase-merges is incompatible with > --strategy; and --preserve-merges is incompatible with --interactive, > --signoff, --rebase-merges. > rather than talking about am options etc. That sounds like it would be simpler for the incompatible options, but what about discussing inconsistent behavior between different backends (empty commit handling, empty commit message handling, and directory rename detection)? Do we first need to remove all those inconsistencies before rewording the documentation here, or is there a clever way to discuss those?
Re: [RFC PATCH v2 7/7] git-rebase: make --allow-empty-message the default
Hi Phillip, On Sun, Jun 17, 2018 at 8:30 AM, Phillip Wood wrote: > Looking at the option parsing in git-rebase.sh it appears that it does not > check for --no-allow-empty-message so there's no way to turn off the default > for those modes that support it. I'm not sure what to think of this change, > I'm slightly uneasy with changing to default to be different from > cherry-pick, though one could argue rebasing is a different operation and > just keeping the existing messages even if they are empty is more > appropriate and having all the rebase modes do the default to the same > behaviour is definitely an improvement. > Here's my extended rationale for why to change the default for rebase -i/-m (and _maybe_ also for cherry-pick): First, commits with an empty commit message are fairly rare, so it's hard to gather that much data on; we're dealing with out-liers. That also means that folks who have touched the area in the past probably weren't dealing with full information either; they just tweaked what was already there. It makes sense that git-commit would default to not allowing empty commit messages without a flag. We'd like to warn against that kind of poor practice. However, that implicitly means that commands which build on top of git-commit might copy that default even when it doesn't make sense. So it's not at all clear to me that cherry-pick or rebase -i/-m should have the same default. They do, but that may have been entirely accidental. And when someone comes along and notices the problematic default of stopping with empty commit messages, they assume it was intended and add a flag to change it, as happened for both commands. On the other hand git-am was designed from the workflow of applying many patches from the beginning, and has a different default. am-based rebases copy that default of silently applying commits with empty commit messages; it may not be clear that rebase should copy the default from am, but it does. Yet, despite the fact that ignoring empty commit messages is the default backend for rebases, and thus is likely used more than the other rebase backends, no one has ever submitted a patch to add a flag to rebase to get the opposite behavior. To me, this argues pretty strongly that not only is ignoring empty messages the right default, that it's a waste of time to implement a flag (e.g. --no-allow-empty-message) providing the opposite behavior for rebases. It is obvious that one of the two rebase backends have the wrong default, because they should have the same default as each other. I am inclined to believe, based on the above logic, that am-based rebase has the correct default. I also suspect that cherry-pick has the wrong default, though it's not as cut-and-dry since it doesn't have multiple backends with conflicting defaults. Does that make sense? Am I overlooking anything important?
Re: [PATCH 06/23] midx: struct midxed_git and 'read' subcommand
On 6/20/2018 11:07 AM, Duy Nguyen wrote: On Wed, Jun 20, 2018 at 3:33 PM Derrick Stolee wrote: On 6/7/2018 2:31 PM, Duy Nguyen wrote: On Thu, Jun 7, 2018 at 4:03 PM, Derrick Stolee wrote: diff --git a/Documentation/git-midx.txt b/Documentation/git-midx.txt index dcaeb1a91b..919283fdd8 100644 --- a/Documentation/git-midx.txt +++ b/Documentation/git-midx.txt @@ -23,6 +23,11 @@ OPTIONS /packs/multi-pack-index for the current MIDX file, and /packs for the pack-files to index. +read:: + When given as the verb, read the current MIDX file and output + basic information about its contents. Used for debugging + purposes only. On second thought. If you just need a temporary debugging interface, adding a program in t/helper may be a better option. In the end we might still need 'read' to dump a file out, but we should have some stable output format (and json might be a good choice). My intention with this 'read' pattern in the MIDX (and commit-graph) is two-fold: 1. We can test that we are writing the correct data in our test suite. A test-tool builtin would suffice for this purpose. 2. We can help trouble-shoot users who may be having trouble with their MIDX files. Having the subcommand in a plumbing command allows us to do this in the shipped versions of Git. Maybe this second purpose isn't enough to justify the feature in Git and we should move this to the test-tool, especially with the 'verify' mode coming in a second series. Note that a 'verify' mode doesn't satisfy item (1). Yeah I think normally we just have some "fsck" thing to verify when things go bad. If you need more than that I think you just ask the user to send the .midx to you (with full understanding of potentially revealing confidential info and stuff). It'll be faster than instructing them to "run this command", "ok, run another command" I thought of suggesting a command to dump the midx file in readable form (like json), but I think if fsck fails then chances of that command successfully dumping may be very low. Either way, if the command is meant for troubleshooting, I think it should be added at the end when the whole midx file is implemented and understood and we see what we need to troubleshoot. Adding small pieces of changes from patch to patch makes it really hard to see if it helps troubleshooting at all, it just helps the first purpose. I'll abandon point (2) for the later 'verify' patch series. Adding the test helper early allows tests to demonstrate that each patch does the right thing, and that we don't miss testing something. Thanks, -Stolee
[PATCH] packfile: generalize pack directory list
In anticipation of sharing the pack directory listing with the multi-pack-index, generalize prepare_packed_git_one() into for_each_file_in_pack_dir(). Signed-off-by: Derrick Stolee --- Duy, I think this is what you mean by sharing code between packfile.c and midx.c for reading the packfiles from a pack directory. This does make the code in midx.c much simpler. Is this change worth it? This patch could stand on its own, or can be incorporated into the next version of the MIDX series. Thanks, -Stolee packfile.c | 103 + packfile.h | 6 2 files changed, 71 insertions(+), 38 deletions(-) diff --git a/packfile.c b/packfile.c index 7cd45aa4b2..db61c8813b 100644 --- a/packfile.c +++ b/packfile.c @@ -738,13 +738,14 @@ static void report_pack_garbage(struct string_list *list) report_helper(list, seen_bits, first, list->nr); } -static void prepare_packed_git_one(struct repository *r, char *objdir, int local) +void for_each_file_in_pack_dir(const char *objdir, + each_file_in_pack_dir_fn fn, + void *data) { struct strbuf path = STRBUF_INIT; size_t dirnamelen; DIR *dir; struct dirent *de; - struct string_list garbage = STRING_LIST_INIT_DUP; strbuf_addstr(&path, objdir); strbuf_addstr(&path, "/pack"); @@ -759,53 +760,79 @@ static void prepare_packed_git_one(struct repository *r, char *objdir, int local strbuf_addch(&path, '/'); dirnamelen = path.len; while ((de = readdir(dir)) != NULL) { - struct packed_git *p; - size_t base_len; - if (is_dot_or_dotdot(de->d_name)) continue; strbuf_setlen(&path, dirnamelen); strbuf_addstr(&path, de->d_name); - base_len = path.len; - if (strip_suffix_mem(path.buf, &base_len, ".idx")) { - /* Don't reopen a pack we already have. */ - for (p = r->objects->packed_git; p; -p = p->next) { - size_t len; - if (strip_suffix(p->pack_name, ".pack", &len) && - len == base_len && - !memcmp(p->pack_name, path.buf, len)) - break; - } - if (p == NULL && - /* -* See if it really is a valid .idx file with -* corresponding .pack file that we can map. -*/ - (p = add_packed_git(path.buf, path.len, local)) != NULL) - install_packed_git(r, p); - } - - if (!report_garbage) - continue; - - if (ends_with(de->d_name, ".idx") || - ends_with(de->d_name, ".pack") || - ends_with(de->d_name, ".bitmap") || - ends_with(de->d_name, ".keep") || - ends_with(de->d_name, ".promisor")) - string_list_append(&garbage, path.buf); - else - report_garbage(PACKDIR_FILE_GARBAGE, path.buf); + fn(path.buf, path.len, de->d_name, data); } + closedir(dir); - report_pack_garbage(&garbage); - string_list_clear(&garbage, 0); strbuf_release(&path); } +struct prepare_pack_data +{ + struct repository *r; + struct string_list *garbage; + int local; +}; + +static void prepare_pack(const char *full_name, size_t full_name_len, const char *file_name, void *_data) +{ + struct prepare_pack_data *data = (struct prepare_pack_data *)_data; + struct packed_git *p; + size_t base_len = full_name_len; + + if (strip_suffix_mem(full_name, &base_len, ".idx")) { + /* Don't reopen a pack we already have. */ + for (p = data->r->objects->packed_git; p; p = p->next) { + size_t len; + if (strip_suffix(p->pack_name, ".pack", &len) && + len == base_len && + !memcmp(p->pack_name, full_name, len)) + break; + } + + if (p == NULL && + /* +* See if it really is a valid .idx file with +* corresponding .pack file that we can map. +*/ + (p = add_packed_git(full_name, full_name_len, data->local)) != NULL) + install_packed_git(data->r, p); + } + + if (!report_garbage) + return; + + if (ends_with(file_name, ".idx") || + ends_with(file_name, ".pack") || + ends_with(
Re: [RFC PATCH 3/3] git-rebase.sh: make git-rebase--interactive the default
Hi Dscho, On Sun, Jun 17, 2018 at 2:44 PM, Johannes Schindelin wrote: > I was really referring to speed. But I have to admit that I do not have > any current numbers. > > Another issue just hit me, though: rebase --am does not need to look at as > many Git objects as rebase --merge or rebase -i. Therefore, GVFS users > will still want to use --am wherever possible, to avoid "hydrating" > many objects during their rebase. What is it that makes rebase --am need fewer Git objects than rebase --merge or rebase -i? I have one idea which isn't intrinsic to the algorithm, so I'm curious if there's something else I'm unaware of. My guess at what objects are needed by each type: At a high level, rebase --am for each commit will need to compare the commit to its parent to generate a diff (which thus involves walking over the objects in both the commit and its parent, though it should be able to skip over subtrees that are equal), and then will need to look at all the objects in the target commit on which it needs to apply the patch (in order to properly fill the index for a starting point, and used later when creating a new commit). If the application of the diff fails, it falls back to a three-way merge, though the three-way merge shouldn't need any additional objects. So, to summarize, rebase--am needs objects from the commit being rebased, its parent, and the target commit onto which it is applying, though it can short circuit some objects when the commit and its parent have matching subtree(s). rebase -i, if I understand correctly, does a three-way merge between the commit, its parent, and the target commit. Thus, we again walk over objects in those three commits; I think unpack_trees() does not take advantage of matching trees to avoid descending into subtrees, but if so that's an optimization that we may be able to implement (though it would require diving into unpack_trees() code, which is never easy...). (Side notes: (1) rebase --merge is basically the same as rebase -i here; it's path to reaching the recursive merge machinery is a bit different but the resulting arguments are the same; (2) a real merge between branches would require more objects because it would have to do some revision walking to find a merge base, and a real merge base is likely to differ more than just the parent commit. But finding merge bases isn't relevant to rebase -m or rebase -i) Is there something else I'm missing that fundamentally makes rebase -i need more objects? > As to speed: that might be harder. But then, the performance might already > be good enough. I do not have numbers (nor the time to generate them) to > back up my hunch that --am is substantially faster than --merge. I too have a hunch that --am is faster than --merge, on big enough repos or repos with enough renames. I can partially back it up with an indirect number: at [1], it was reported that cherry-picks could be sped up by a factor of 20-30 on some repos with lots of renames. I believe there are other performance improvements possible too, for the --merge or -i cases. I'm also curious now whether your comment on hydrating objects might uncover additional areas where performance improvements could be made for non-am-based rebases of large-enough repos. Elijah [1] https://public-inbox.org/git/cabpp-bh4llzejje5cvwwqj8xtj3m9oc-41tu8bm8c7r0gqt...@mail.gmail.com/ (see also Peter's last reply in that thread, and compare to his first post)
Re: Adding nested repository with slash adds files instead of gitlink
On Tue, Jun 19, 2018 at 11:28 AM Heiko Voigt wrote: > > Interesting and nobody complained to the mailinglist? > On Wed, Jun 20, 2018 at 3:57 PM Duy Nguyen wrote: > > Abusing a long standing bug does not make it a feature. > To make things clear, I wasn't defending if this should be considered a bug or a feature. Was just clarifying that this isn't a newly discovered thing and was reported/discussed in the past. -- Cheers, Rafael
Re: t5562: gzip -k is not portable
On Wed, Jun 20, 2018 at 08:40:06AM -0400, Jeff King wrote: > On Wed, Jun 20, 2018 at 08:13:06AM +0200, Torsten Bögershausen wrote: > > > Good eyes, thanks. > > The "-f -c" combo works: > > > > - gzip -k fetch_body && > > + gzip -f -c fetch_body >fetch_body.gz && > > test_copy_bytes 10 fetch_body.gz.trunc && > > - gzip -k push_body && > > + gzip -f -c push_body >push_body.gz && > > Do we still need "-f"? With "-c" gzip is only writing to stdout, so we > should not need to force anything. > > -Peff You are rigth, -f is not needed. I was confusing stdout with terminal :-( Most often stdout is the terminal, but not here of course.
Re: What's cooking in git.git (Jun 2018, #05; Mon, 18)
Jonathan Tan writes: >> Here are the topics that have been cooking. Commits prefixed with >> '-' are only in 'pu' (proposed updates) while commits prefixed with >> '+' are in 'next'. The ones marked with '.' do not appear in any of >> the integration branches, but I am still holding onto them. > > Would it be possible to have my patches that make bitmap_git not global > [1] in this list? Peff seems OK with it. Let me know if you'd like to > see anything else. That one did fall through the cracks. I'm (unfotunately) offline this morning but hopefully can tag the 2.18 final by the end of the day, and then I'll go into the usual "pick up the patches and discussions missed during the pre-release freeze" mode for a few days, during which time a reminder like this is greatly appreciated. Thanks. > The original patch should contain an extra paragraph that I've provided > here [2] in the commit message - let me know if you want a reroll with > that extra paragraph included. > > [1] > https://public-inbox.org/git/cover.1528397984.git.jonathanta...@google.com/ > > [2] > https://public-inbox.org/git/2018065046.d03f8093347dc6c0e9b11...@google.com/
Re: The state of the object store series
On Wed, Jun 20, 2018 at 12:09 AM Stefan Beller wrote: > "xx/convert-revision-walking" > This series aims to convert get_merge_bases(), in_merge_bases() and all its > revision walking code to take a repository argument. We'll see who gets there first [1] ;-). It does not really matter (and I suspect you may beat me to it) but I hope we all agree to pass 'struct repository *' in init_revisions(), or I should start reworking that patch now. [1] https://gitlab.com/pclouds/git/commit/a4a91e3815cf0e594d5ce0c862f5060e2c26 -- Duy
Re: [PATCH 06/23] midx: struct midxed_git and 'read' subcommand
On Wed, Jun 20, 2018 at 3:33 PM Derrick Stolee wrote: > > On 6/7/2018 2:31 PM, Duy Nguyen wrote: > > On Thu, Jun 7, 2018 at 4:03 PM, Derrick Stolee wrote: > >> diff --git a/Documentation/git-midx.txt b/Documentation/git-midx.txt > >> index dcaeb1a91b..919283fdd8 100644 > >> --- a/Documentation/git-midx.txt > >> +++ b/Documentation/git-midx.txt > >> @@ -23,6 +23,11 @@ OPTIONS > >> /packs/multi-pack-index for the current MIDX file, and > >> /packs for the pack-files to index. > >> > >> +read:: > >> + When given as the verb, read the current MIDX file and output > >> + basic information about its contents. Used for debugging > >> + purposes only. > > On second thought. If you just need a temporary debugging interface, > > adding a program in t/helper may be a better option. In the end we > > might still need 'read' to dump a file out, but we should have some > > stable output format (and json might be a good choice). > > My intention with this 'read' pattern in the MIDX (and commit-graph) is > two-fold: > > 1. We can test that we are writing the correct data in our test suite. A > test-tool builtin would suffice for this purpose. > > 2. We can help trouble-shoot users who may be having trouble with their > MIDX files. Having the subcommand in a plumbing command allows us to do > this in the shipped versions of Git. > > Maybe this second purpose isn't enough to justify the feature in Git and > we should move this to the test-tool, especially with the 'verify' mode > coming in a second series. Note that a 'verify' mode doesn't satisfy > item (1). Yeah I think normally we just have some "fsck" thing to verify when things go bad. If you need more than that I think you just ask the user to send the .midx to you (with full understanding of potentially revealing confidential info and stuff). It'll be faster than instructing them to "run this command", "ok, run another command" I thought of suggesting a command to dump the midx file in readable form (like json), but I think if fsck fails then chances of that command successfully dumping may be very low. Either way, if the command is meant for troubleshooting, I think it should be added at the end when the whole midx file is implemented and understood and we see what we need to troubleshoot. Adding small pieces of changes from patch to patch makes it really hard to see if it helps troubleshooting at all, it just helps the first purpose. -- Duy
Re: Adding nested repository with slash adds files instead of gitlink
On Wed, Jun 20, 2018 at 1:55 PM Rafael Ascensão wrote: > > On Wed, Jun 20, 2018 at 5:39 AM Kevin Daudt wrote: > > > > What this is about that when doing `git add path/` (with trailing /), > > > > This is what I was referring to. If you search for 'Fake Submodules', > you'll see that some people were/are intentionally using this instead of > subtrees or submodules. Unfortunately the original article [1] seems to > be dead, but searching url in the mailing list archives leads to some > additional discussion on the subject [2,3]. Abusing a long standing bug does not make it a feature. I'm not opposed to having a new option to keep that behavior, but it should not be the default. If you use it that way, you're on your own. > [1]:http://debuggable.com/posts/git-fake-submodules:4b563ee4-f3cc-4061-967e-0e48cbdd56cb > [2]:https://public-inbox.org/git/xmqqy47o6q71@gitster.mtv.corp.google.com/ > [3]:https://public-inbox.org/git/CAGZ79kZofg3jS+g0weTdco+PGo_p-_Hd-NScZ=q2ufb7tf2...@mail.gmail.com/ -- Duy
Re: Excessive mmap [was Git server eats all memory]
On Tue, Jun 19, 2018 at 10:27 PM Ivan Kanis wrote: > > Dmitry Potapov wrote: > > > On Fri, Aug 06, 2010 at 07:23:17PM +0200, Ivan Kanis wrote: > >> > >> I expected the malloc to take 4G but was surprised it didn't. It seems > >> to be mmap taking all the memory. I am not familiar with that function, > >> it looks like it's mapping memory to a file... Is it reasonable to mmap > >> so much memory? > > > > AFAIK, Git does not need to mmap the whole pack to memory, but it > > is more efficient to mmap the whole pack wherever possible, because > > it has a completely random access, so if you store only one sliding > > window, you will have to re-read it many times. Besides, mmap size > > does not mean that so much physical memory is used. Pages should > > be loaded when they are necessary, and if you have more than one > > client cloning the same repo, this memory should be shared by them. > > I have clone identical repositories and the system starts to swap. I > think it shows that cloning two repository doesn't share mmap. I doubt it (assuming you're on linux). If you suspect this, configure core.packedGitWindowSize to reduce the mmap size. There are lots of other things in a cloning process that do not share (is this client or server btw?) and things could add up. -- Duy
Re: [PATCH 06/23] midx: struct midxed_git and 'read' subcommand
On 6/7/2018 2:31 PM, Duy Nguyen wrote: On Thu, Jun 7, 2018 at 4:03 PM, Derrick Stolee wrote: diff --git a/Documentation/git-midx.txt b/Documentation/git-midx.txt index dcaeb1a91b..919283fdd8 100644 --- a/Documentation/git-midx.txt +++ b/Documentation/git-midx.txt @@ -23,6 +23,11 @@ OPTIONS /packs/multi-pack-index for the current MIDX file, and /packs for the pack-files to index. +read:: + When given as the verb, read the current MIDX file and output + basic information about its contents. Used for debugging + purposes only. On second thought. If you just need a temporary debugging interface, adding a program in t/helper may be a better option. In the end we might still need 'read' to dump a file out, but we should have some stable output format (and json might be a good choice). My intention with this 'read' pattern in the MIDX (and commit-graph) is two-fold: 1. We can test that we are writing the correct data in our test suite. A test-tool builtin would suffice for this purpose. 2. We can help trouble-shoot users who may be having trouble with their MIDX files. Having the subcommand in a plumbing command allows us to do this in the shipped versions of Git. Maybe this second purpose isn't enough to justify the feature in Git and we should move this to the test-tool, especially with the 'verify' mode coming in a second series. Note that a 'verify' mode doesn't satisfy item (1). Thanks, -Stolee
Re: [PATCH 06/23] midx: struct midxed_git and 'read' subcommand
On 6/7/2018 1:54 PM, Duy Nguyen wrote: On Thu, Jun 7, 2018 at 4:03 PM, Derrick Stolee wrote: As we build the multi-pack-index feature by adding chunks at a time, we want to test that the data is being written correctly. Create struct midxed_git to store an in-memory representation of a A word play on 'packed_git'? Amusing. Some more descriptive name would be better though. midxed looks almost like random letters thrown together. I'll use 'struct multi_pack_index'. multi-pack-index and a memory-map of the binary file. Initialize this struct in load_midxed_git(object_dir). +static int read_midx_file(const char *object_dir) +{ + struct midxed_git *m = load_midxed_git(object_dir); + + if (!m) + return 0; This looks like an error case, please don't just return zero, typically used to say "success". I don't know if this command stays "for debugging purposes" until the end. Of course in that case it does not really matter. It is intended for debugging and testing. Generally, it is not an error to not have a MIDX in an object directory. +struct midxed_git *load_midxed_git(const char *object_dir) +{ + struct midxed_git *m; + int fd; + struct stat st; + size_t midx_size; + void *midx_map; + const char *midx_name = get_midx_filename(object_dir); mem leak? This function returns allocated memory if I remember correctly. + + fd = git_open(midx_name); + if (fd < 0) + return NULL; do an error_errno() so we know what went wrong at least. + if (fstat(fd, &st)) { + close(fd); + return NULL; same here, we should know why fstat() fails. + } + midx_size = xsize_t(st.st_size); + + if (midx_size < MIDX_MIN_SIZE) { + close(fd); + die("multi-pack-index file %s is too small", midx_name); _() The use of die() should be discouraged though. Many people still try (or wish) to libify code and new die() does not help. I think error() here would be enough then you can return NULL. Or you can go fancier and store the error string in a strbuf like refs code. + } + + midx_map = xmmap(NULL, midx_size, PROT_READ, MAP_PRIVATE, fd, 0); + + m = xcalloc(1, sizeof(*m) + strlen(object_dir) + 1); + strcpy(m->object_dir, object_dir); + m->data = midx_map; + + m->signature = get_be32(m->data); + if (m->signature != MIDX_SIGNATURE) { + error("multi-pack-index signature %X does not match signature %X", + m->signature, MIDX_SIGNATURE); _(). Maybe 0x%08x instead of %x + goto cleanup_fail; + } + + m->version = *(m->data + 4); m->data[4] instead? shorter and easier to understand. Same comment on "*(m->data + x)" and error() without _() for the rest. + if (m->version != MIDX_VERSION) { + error("multi-pack-index version %d not recognized", + m->version); _() + goto cleanup_fail; + } + + m->hash_version = *(m->data + 5); m->data[5] +cleanup_fail: + FREE_AND_NULL(m); + munmap(midx_map, midx_size); + close(fd); + exit(1); It's bad enough that you die() but exit() in this code seems too much. Please just return NULL and let the caller handle the error. Will do. diff --git a/midx.h b/midx.h index 3a63673952..a1d18ed991 100644 --- a/midx.h +++ b/midx.h @@ -1,4 +1,13 @@ +#ifndef MIDX_H +#define MIDX_H + +#include "git-compat-util.h" #include "cache.h" +#include "object-store.h" I don't really think you need object-store here (git-compat-util.h too). "struct mixed_git;" would be enough for load_midxed_git declaration below. #include "packfile.h" +struct midxed_git *load_midxed_git(const char *object_dir); + int write_midx_file(const char *object_dir); + +#endif diff --git a/object-store.h b/object-store.h index d683112fd7..77cb82621a 100644 --- a/object-store.h +++ b/object-store.h @@ -84,6 +84,25 @@ struct packed_git { char pack_name[FLEX_ARRAY]; /* more */ }; +struct midxed_git { + struct midxed_git *next; Do we really have multiple midx files? There is one per object directory currently, but you may have one locally and one in each of your alternates. I do need to double-check that we populate this list later in the series. (And I'll remove it from this commit and save it for when it is needed.) + + int fd; + + const unsigned char *data; + size_t data_len; + + uint32_t signature; + unsigned char version; + unsigned char hash_version; + unsigned char hash_len; + unsigned char num_chunks; + uint32_t num_packs; + uint32_t num_objects; + + char object_dir[FLEX_ARRAY]; Why do you need to keep object_dir when it could be easily retrieved when the repo is available? +}; + struct raw_object_store { /* * Path to the repository's object store.
Re: t5562: gzip -k is not portable
On Wed, Jun 20, 2018 at 08:13:06AM +0200, Torsten Bögershausen wrote: > Good eyes, thanks. > The "-f -c" combo works: > > - gzip -k fetch_body && > + gzip -f -c fetch_body >fetch_body.gz && > test_copy_bytes 10 fetch_body.gz.trunc && > - gzip -k push_body && > + gzip -f -c push_body >push_body.gz && Do we still need "-f"? With "-c" gzip is only writing to stdout, so we should not need to force anything. -Peff
Re: Adding nested repository with slash adds files instead of gitlink
On Wed, Jun 20, 2018 at 5:39 AM Kevin Daudt wrote: > > What this is about that when doing `git add path/` (with trailing /), > This is what I was referring to. If you search for 'Fake Submodules', you'll see that some people were/are intentionally using this instead of subtrees or submodules. Unfortunately the original article [1] seems to be dead, but searching url in the mailing list archives leads to some additional discussion on the subject [2,3]. [1]:http://debuggable.com/posts/git-fake-submodules:4b563ee4-f3cc-4061-967e-0e48cbdd56cb [2]:https://public-inbox.org/git/xmqqy47o6q71@gitster.mtv.corp.google.com/ [3]:https://public-inbox.org/git/CAGZ79kZofg3jS+g0weTdco+PGo_p-_Hd-NScZ=q2ufb7tf2...@mail.gmail.com/
Re: [msysGit] Possible git status problem at case insensitive file system
Hi Frank, On Mon, 9 Aug 2010, Frank Li wrote: > All: > I use msysgit 1.7.0.2 at windows xp. > Problem: git status will list tracked directory as untracked dir. That is the *least* problem you have with that version: you are vulnerable to several security issues. Sadly, there were literally zero contributions from community members using Windows XP to uphold XP support, so the latest Git for Windows you can use (unless you build it yourself, fixing several compile issues by now) is v2.10. See https://gitforwindows.org/requirements.html for details. Ciao, Johannes
Re: [PATCH] t3404: check root commit in 'rebase -i --root reword root commit'
Hi Junio, On Tue, 19 Jun 2018, Junio C Hamano wrote: > Todd Zullinger writes: > > > index e500d7c320..352a52e59d 100755 > > --- a/t/t3404-rebase-interactive.sh > > +++ b/t/t3404-rebase-interactive.sh > > @@ -977,7 +977,8 @@ test_expect_success 'rebase -i --root reword root > > commit' ' > > set_fake_editor && > > FAKE_LINES="reword 1 2" FAKE_COMMIT_MESSAGE="A changed" \ > > git rebase -i --root && > > - git show HEAD^ | grep "A changed" > > + git show HEAD^ | grep "A changed" && > > + test -z "$(git show -s --format=%p HEAD^)" > > ' > > The additional test probably will pass when HEAD is a root commit by > failing to refer to HEAD^, resulting an empty output from show. The > previous step would also give an error and won't emit anything that > would match "A changed", so it probably is OK, though. That matches my thinking. Otherwise, we would have had to do the git show -s --format=%p HEAD^ >out && test_must_be_empty out dance. Ciao, Dscho
[ANNOUNCE] Git Rev News edition 40
Hi everyone, The 40th edition of Git Rev News is now published: https://git.github.io/rev_news/2018/06/20/edition-40/ Thanks a lot to all the contributors: Adam Spiers, Bryan Turner and Nicolas Pitre! Enjoy, Christian, Jakub, Markus and Gabriel.
Re: [PATCH] t3404: check root commit in 'rebase -i --root reword root commit'
Hi Junio, On Tue, 19 Jun 2018, Junio C Hamano wrote: > Todd Zullinger writes: > > > With luck, this will save you a few minutes, assuming the > > commit message is reasonable (or can be improved with help > > from Phillip and others). :) > > OK. > > > Or Junio may just squash this onto js/rebase-i-root-fix. > > Nah, not for a hotfix on the last couple of days before the final. > We'd need to build on top, not "squash". Right. Can we take this on top, at a leisurely pace? I mean: we verified that this works in the upcoming v2.18.0, and it would be nice to have that extra regression test safety in the future, but it is not crucial to include it in v2.18.0 itself. Ciao, Dscho
Re: want option to git-rebase
Jonathan Nieder writes ("Re: want option to git-rebase"): > Oh, good catch. "git rebase" already generally does the right thing > when GIT_REFLOG_ACTION is set (by only appending to it and never > replacing it). Great. I indeed did not know about this. > Ian, does that work well for you? If so, any ideas where it should go > in the documentation to be more discoverable for next time? Thanks for asking exactly the right question :-). I didn't make a record of exactly where I looked but I'm pretty sure I looked at the manpages for git-reflog and git-rebase. I think I probably also looked at git-update-ref; I have read git-update-ref a number of times. Right now in Debian unstable I see that none of these places document this convention. I think git-reflog ought to mention it, so that it says where the information it provides comes from. It also ought to be mentioned in git-update-ref, because all callers of git-update-ref need to implement it ! Indeed, because I didn't know about this convention, dgit and git-debrebase do not honour it. At least in my case, if it had been in git-update-ref I would have implemented it myself and then I would obviously have thought of making use of it myself. Also, I have to say, the documentation for GIT_REFLOG_ACTION in git(1) is very obscure. It sort of generally waffles around what it is for, but it does not say: * what does this variable contain * who can and should set it * who should consume it * what the rules are for modifying it I don't think simply adding a cross-reference to GIT_REFLOG_ACTION in git(1) would be sufficient, without also improving this part. The explanations provided by you and Johannes, here in these emails, are much much better: > >> "git rebase" sets this itself, so it doesn't solve your problem. > > > > If it does so unconditionally, then that is a bug. If a script > > wants to set GIT_REFLOG_ACTION, but finds that it is already set, > > then it must not change the value. set_reflog_action in > > git-sh-setup does the right thing. > > > > So, if there is another script or application around git-rebase, then it > > should just set GIT_REFLOG_ACTION (if it is not already set) and export the > > environment variable to git-rebase. > > Oh, good catch. "git rebase" already generally does the right thing > when GIT_REFLOG_ACTION is set (by only appending to it and never > replacing it). Maybe some of this prose, which explains things quite well, could be reworked into a form suitable for the git docs. (Even though there seems to be disagreement about whether a subcommand may *append* to GIT_REFLOG_ACTION; which, ISTM, is a practice which ought to be encouraged rather than discouraged.) Regards, Ian. -- Ian JacksonThese opinions are my own. If I emailed you from an address @fyvzl.net or @evade.org.uk, that is a private address which bypasses my fierce spamfilter.
Re: [GSoC] GSoC with git, week 7
On Tue, Jun 19, 2018 at 12:33 AM, Paul-Sebastian Ungureanu wrote: > Hello! > > > On 18.06.2018 20:34, Alban Gruin wrote: >> >> Hi, >> >> I published a new blog post here: >> >>https://blog.pa1ch.fr/posts/2018/06/18/en/gsoc2018-week-7.html >> >> It’s shorter than the last one, but feel free to tell me what you think >> about it! :) >> >> Cheers, >> Alban >> > > Good job! My blog is also up: > > https://ungps.github.io/ > Awesome! My blog post is also up: https://prertik.github.io/post/week-07 Cheers, Pratik
Org
Sent from my iPhone