Re: [PATCH v6 04/19] fsck: Offer a function to demote fsck errors to warnings
Hi Junio, On 2015-06-19 21:26, Junio C Hamano wrote: Johannes Schindelin johannes.schinde...@gmx.de writes: +static inline int substrcmp(const char *string, int len, const char *match) +{ +int match_len = strlen(match); +if (match_len != len) +return -1; +return memcmp(string, match, len); +} What this does looks suspiciously like starts_with(), but its name substrcmp() does not give any hint that this is about the beginnig part of string; if anything, it gives a wrong hint that it may be any substring. prefixcmp() might be a better name but that was the old name for !starts_with() so we cannot use it here. It is a mouthful, but starts_with_counted() may be. But the whole thing may be moot. If we take the why not upcase the end-user string upfront suggestion from the previous review, fsck_set_msg_types() would have an upcased copy of the end-user string that it can muck with; it can turn badfoo=error,poorbar=... into BADFOO=error,POORBAR=... that is stored in its own writable memory (possibly a strbuf), and at that point it can afford to NUL-terminate BADFOO=error after finding where one specification ends with strcspn() before calling fsck_set_msg_type(), which in turn calls parse_msg_type(). So all parse_msg_type() needs to do is just !strcmp(). Turns out that the diffstat says it saves 10 lines. So I changed it according to your suggestion. Part of v7. Ciao, Dscho -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 08/19] parse-option: rename parse_opt_with_commit()
On Mon, Jun 22, 2015 at 8:38 AM, Karthik Nayak karthik@gmail.com wrote: On Mon, Jun 22, 2015 at 6:34 AM, Junio C Hamano gits...@pobox.com wrote: On Sun, Jun 21, 2015 at 1:48 PM, Karthik Nayak karthik@gmail.com wrote: Rename parse_opt_with_commit() to parse_opt_commit_object_name() to show that it can be used to obtain a list of commits and is not constricted to usage of '--contains' option. I think that is a brilliant idea, but unlike the other function you added earlier that can do only one object and adopts last one wins rule, this is cumulative, and that fact should be made clear to the developers in some way, no? Will add a comment I didn't mean that. Can't we indicate this with plural somewhere in the name? parse_opt_commits(), parse_opt_commit_into_list(), etc.? -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 18/19] fsck: git receive-pack: support excluding objects from fsck'ing
The optional new config option `receive.fsck.skipList` specifies the path to a file listing the names, i.e. SHA-1s, one per line, of objects that are to be ignored by `git receive-pack` when `receive.fsckObjects = true`. This is extremely handy in case of legacy repositories where it would cause more pain to change incorrect objects than to live with them (e.g. a duplicate 'author' line in an early commit object). The intended use case is for server administrators to inspect objects that are reported by `git push` as being too problematic to enter the repository, and to add the objects' SHA-1 to a (preferably sorted) file when the objects are legitimate, i.e. when it is determined that those problematic objects should be allowed to enter the server. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- Documentation/config.txt| 8 +++ builtin/receive-pack.c | 11 + fsck.c | 50 + fsck.h | 1 + t/t5504-fetch-receive-strict.sh | 12 ++ 5 files changed, 82 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index bfccd2b..ed7f37f 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2230,6 +2230,14 @@ which would not pass pushing when `receive.fsckObjects = true`, allowing the host to accept repositories with certain known issues but still catch other issues. +receive.fsck.skipList:: + The path to a sorted list of object names (i.e. one SHA-1 per + line) that are known to be broken in a non-fatal way and should + be ignored. This feature is useful when an established project + should be accepted despite early commits containing errors that + can be safely ignored such as invalid committer email addresses. + Note: corrupt objects cannot be skipped with this setting. + receive.unpackLimit:: If the number of objects received in a push is below this limit then the objects will be unpacked into loose object diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 3afe8f8..3fbed23 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -117,6 +117,17 @@ static int receive_pack_config(const char *var, const char *value, void *cb) return 0; } + if (strcmp(var, receive.fsck.skiplist) == 0) { + const char *path; + + if (git_config_pathname(path, var, value)) + return 1; + strbuf_addf(fsck_msg_types, %cskiplist=%s, + fsck_msg_types.len ? ',' : '=', path); + free((char *) path); + return 0; + } + if (skip_prefix(var, receive.fsck., var)) { if (is_valid_msg_type(var, value)) strbuf_addf(fsck_msg_types, %c%s=%s, diff --git a/fsck.c b/fsck.c index f6fc384..a677b50 100644 --- a/fsck.c +++ b/fsck.c @@ -8,6 +8,7 @@ #include fsck.h #include refs.h #include utf8.h +#include sha1-array.h #define FSCK_FATAL -1 #define FSCK_INFO -2 @@ -127,6 +128,43 @@ static int fsck_msg_type(enum fsck_msg_id msg_id, return msg_type; } +static void init_skiplist(struct fsck_options *options, const char *path) +{ + static struct sha1_array skiplist = SHA1_ARRAY_INIT; + int sorted, fd; + char buffer[41]; + unsigned char sha1[20]; + + if (options-skiplist) + sorted = options-skiplist-sorted; + else { + sorted = 1; + options-skiplist = skiplist; + } + + fd = open(path, O_RDONLY); + if (fd 0) + die(Could not open skip list: %s, path); + for (;;) { + int result = read_in_full(fd, buffer, sizeof(buffer)); + if (result 0) + die_errno(Could not read '%s', path); + if (!result) + break; + if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') + die(Invalid SHA-1: %s, buffer); + sha1_array_append(skiplist, sha1); + if (sorted skiplist.nr 1 + hashcmp(skiplist.sha1[skiplist.nr - 2], + sha1) 0) + sorted = 0; + } + close(fd); + + if (sorted) + skiplist.sorted = 1; +} + static int parse_msg_type(const char *str) { if (!strcmp(str, error)) @@ -190,6 +228,14 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values) buf[equal] = tolower(buf[equal]); buf[equal] = '\0'; + if (!strcmp(buf, skiplist)) { + if (equal == len) + die(skiplist requires a path); + init_skiplist(options, buf + equal + 1); + buf += len + 1; +
[PATCH v7 09/19] fsck: Handle multiple authors in commits specially
This problem has been detected in the wild, and is the primary reason to introduce an option to demote certain fsck errors to warnings. Let's offer to ignore this particular problem specifically. Technically, we could handle such repositories by setting receive.fsck.msg-id to missingCommitter=warn, but that could hide missing tree objects in the same commit because we cannot continue verifying any commit object after encountering a missing committer line, while we can continue in the case of multiple author lines. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- fsck.c | 17 + 1 file changed, 13 insertions(+), 4 deletions(-) diff --git a/fsck.c b/fsck.c index ef3bf68..daa07ad 100644 --- a/fsck.c +++ b/fsck.c @@ -38,6 +38,7 @@ FUNC(MISSING_TREE, ERROR) \ FUNC(MISSING_TYPE, ERROR) \ FUNC(MISSING_TYPE_ENTRY, ERROR) \ + FUNC(MULTIPLE_AUTHORS, ERROR) \ FUNC(NUL_IN_HEADER, ERROR) \ FUNC(TAG_OBJECT_NOT_TAG, ERROR) \ FUNC(TREE_NOT_SORTED, ERROR) \ @@ -528,7 +529,7 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer, { unsigned char tree_sha1[20], sha1[20]; struct commit_graft *graft; - unsigned parent_count, parent_line_count = 0; + unsigned parent_count, parent_line_count = 0, author_count; int err; if (require_end_of_header(buffer, size, commit-object, options)) @@ -568,9 +569,17 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer, return err; } } - if (!skip_prefix(buffer, author , buffer)) - return report(options, commit-object, FSCK_MSG_MISSING_AUTHOR, invalid format - expected 'author' line); - err = fsck_ident(buffer, commit-object, options); + author_count = 0; + while (skip_prefix(buffer, author , buffer)) { + author_count++; + err = fsck_ident(buffer, commit-object, options); + if (err) + return err; + } + if (author_count 1) + err = report(options, commit-object, FSCK_MSG_MISSING_AUTHOR, invalid format - expected 'author' line); + else if (author_count 1) + err = report(options, commit-object, FSCK_MSG_MULTIPLE_AUTHORS, invalid format - multiple 'author' lines); if (err) return err; if (!skip_prefix(buffer, committer , buffer)) -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 19/19] fsck: support ignoring objects in `git fsck` via fsck.skiplist
Identical to support in `git receive-pack for the config option `receive.fsck.skiplist`, we now support ignoring given objects in `git fsck` via `fsck.skiplist` altogether. This is extremely handy in case of legacy repositories where it would cause more pain to change incorrect objects than to live with them (e.g. a duplicate 'author' line in an early commit object). Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- Documentation/config.txt | 8 builtin/fsck.c | 13 + 2 files changed, 21 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index ed7f37f..69dda93 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1261,6 +1261,14 @@ that setting `fsck.missingEmail = ignore` will hide that issue. This feature is intended to support working with legacy repositories which cannot be repaired without disruptive changes. +fsck.skipList:: + The path to a sorted list of object names (i.e. one SHA-1 per + line) that are known to be broken in a non-fatal way and should + be ignored. This feature is useful when an established project + should be accepted despite early commits containing errors that + can be safely ignored such as invalid committer email addresses. + Note: corrupt objects cannot be skipped with this setting. + gc.aggressiveDepth:: The depth parameter used in the delta compression algorithm used by 'git gc --aggressive'. This defaults diff --git a/builtin/fsck.c b/builtin/fsck.c index 2d14298..7e3df20 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -49,6 +49,19 @@ static int show_dangling = 1; static int fsck_config(const char *var, const char *value, void *cb) { + if (strcmp(var, fsck.skiplist) == 0) { + const char *path; + struct strbuf sb = STRBUF_INIT; + + if (git_config_pathname(path, var, value)) + return 1; + strbuf_addf(sb, skiplist=%s, path); + free((char *) path); + fsck_set_msg_types(fsck_obj_options, sb.buf); + strbuf_release(sb); + return 0; + } + if (skip_prefix(var, fsck., var)) { fsck_set_msg_type(fsck_obj_options, var, value); return 0; -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 17/19] fsck: Introduce `git fsck --connectivity-only`
This option avoids unpacking each and all blob objects, and just verifies the connectivity. In particular with large repositories, this speeds up the operation, at the expense of missing corrupt blobs, ignoring unreachable objects and other fsck issues, if any. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- Documentation/git-fsck.txt | 7 ++- builtin/fsck.c | 7 ++- t/t1450-fsck.sh| 22 ++ 3 files changed, 34 insertions(+), 2 deletions(-) diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt index 25c431d..84ee92e 100644 --- a/Documentation/git-fsck.txt +++ b/Documentation/git-fsck.txt @@ -11,7 +11,7 @@ SYNOPSIS [verse] 'git fsck' [--tags] [--root] [--unreachable] [--cache] [--no-reflogs] [--[no-]full] [--strict] [--verbose] [--lost-found] -[--[no-]dangling] [--[no-]progress] [object*] +[--[no-]dangling] [--[no-]progress] [--connectivity-only] [object*] DESCRIPTION --- @@ -60,6 +60,11 @@ index file, all SHA-1 references in `refs` namespace, and all reflogs object pools. This is now default; you can turn it off with --no-full. +--connectivity-only:: + Check only the connectivity of tags, commits and tree objects. By + avoiding to unpack blobs, this speeds up the operation, at the + expense of missing corrupt objects or other problematic issues. + --strict:: Enable more strict checking, namely to catch a file mode recorded with g+w bit set, which was created by older diff --git a/builtin/fsck.c b/builtin/fsck.c index adaa802..2d14298 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -23,6 +23,7 @@ static int show_tags; static int show_unreachable; static int include_reflogs = 1; static int check_full = 1; +static int connectivity_only; static int check_strict; static int keep_cache_objects; static struct fsck_options fsck_walk_options = FSCK_OPTIONS_DEFAULT; @@ -181,6 +182,8 @@ static void check_reachable_object(struct object *obj) if (!(obj-flags HAS_OBJ)) { if (has_sha1_pack(obj-sha1)) return; /* it is in pack - forget about it */ + if (connectivity_only has_sha1_file(obj-sha1)) + return; printf(missing %s %s\n, typename(obj-type), sha1_to_hex(obj-sha1)); errors_found |= ERROR_REACHABLE; return; @@ -623,6 +626,7 @@ static struct option fsck_opts[] = { OPT_BOOL(0, cache, keep_cache_objects, N_(make index objects head nodes)), OPT_BOOL(0, reflogs, include_reflogs, N_(make reflogs head nodes (default))), OPT_BOOL(0, full, check_full, N_(also consider packs and alternate objects)), + OPT_BOOL(0, connectivity-only, connectivity_only, N_(check only connectivity)), OPT_BOOL(0, strict, check_strict, N_(enable more strict checking)), OPT_BOOL(0, lost-found, write_lost_and_found, N_(write dangling objects in .git/lost-found)), @@ -659,7 +663,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) git_config(fsck_config, NULL); fsck_head_link(); - fsck_object_dir(get_object_directory()); + if (!connectivity_only) + fsck_object_dir(get_object_directory()); prepare_alt_odb(); for (alt = alt_odb_list; alt; alt = alt-next) { diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 1727129..956673b 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -431,4 +431,26 @@ test_expect_success 'fsck notices ref pointing to missing tag' ' test_must_fail git -C missing fsck ' +test_expect_success 'fsck --connectivity-only' ' + rm -rf connectivity-only + git init connectivity-only + ( + cd connectivity-only + touch empty + git add empty + test_commit empty + empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 + rm -f $empty + echo invalid $empty + test_must_fail git fsck --strict + git fsck --strict --connectivity-only + tree=$(git rev-parse HEAD:) + suffix=${tree#??} + tree=.git/objects/${tree%$suffix}/$suffix + rm -f $tree + echo invalid $tree + test_must_fail git fsck --strict --connectivity-only + ) +' + test_done -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 11/19] fsck: Add a simple test for receive.fsck.msg-id
Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- t/t5504-fetch-receive-strict.sh | 21 + 1 file changed, 21 insertions(+) diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh index 69ee13c..36024fc 100755 --- a/t/t5504-fetch-receive-strict.sh +++ b/t/t5504-fetch-receive-strict.sh @@ -115,4 +115,25 @@ test_expect_success 'push with transfer.fsckobjects' ' test_cmp exp act ' +cat bogus-commit \EOF +tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904 +author Bugs Bunny 1234567890 + +committer Bugs Bunny b...@bun.ni 1234567890 + + +This commit object intentionally broken +EOF + +test_expect_success 'push with receive.fsck.missingEmail=warn' ' + commit=$(git hash-object -t commit -w --stdin bogus-commit) + git push . $commit:refs/heads/bogus + rm -rf dst + git init dst + git --git-dir=dst/.git config receive.fsckobjects true + test_must_fail git push --porcelain dst bogus + git --git-dir=dst/.git config \ + receive.fsck.missingEmail warn + git push --porcelain dst bogus act 21 + grep missingEmail act +' + test_done -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 08/19] fsck: Make fsck_commit() warn-friendly
When fsck_commit() identifies a problem with the commit, it should try to make it possible to continue checking the commit object, in case the user wants to demote the detected errors to mere warnings. Note that some problems are too problematic to simply ignore. For example, when the header lines are mixed up, we punt after encountering an incorrect line. Therefore, demoting certain warnings to errors can hide other problems. Example: demoting the missingauthor error to a warning would hide a problematic committer line. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- fsck.c | 28 1 file changed, 20 insertions(+), 8 deletions(-) diff --git a/fsck.c b/fsck.c index d0a7282..ef3bf68 100644 --- a/fsck.c +++ b/fsck.c @@ -536,12 +536,18 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer, if (!skip_prefix(buffer, tree , buffer)) return report(options, commit-object, FSCK_MSG_MISSING_TREE, invalid format - expected 'tree' line); - if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n') - return report(options, commit-object, FSCK_MSG_BAD_TREE_SHA1, invalid 'tree' line format - bad sha1); + if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n') { + err = report(options, commit-object, FSCK_MSG_BAD_TREE_SHA1, invalid 'tree' line format - bad sha1); + if (err) + return err; + } buffer += 41; while (skip_prefix(buffer, parent , buffer)) { - if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') - return report(options, commit-object, FSCK_MSG_BAD_PARENT_SHA1, invalid 'parent' line format - bad sha1); + if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') { + err = report(options, commit-object, FSCK_MSG_BAD_PARENT_SHA1, invalid 'parent' line format - bad sha1); + if (err) + return err; + } buffer += 41; parent_line_count++; } @@ -550,11 +556,17 @@ static int fsck_commit_buffer(struct commit *commit, const char *buffer, if (graft) { if (graft-nr_parent == -1 !parent_count) ; /* shallow commit */ - else if (graft-nr_parent != parent_count) - return report(options, commit-object, FSCK_MSG_MISSING_GRAFT, graft objects missing); + else if (graft-nr_parent != parent_count) { + err = report(options, commit-object, FSCK_MSG_MISSING_GRAFT, graft objects missing); + if (err) + return err; + } } else { - if (parent_count != parent_line_count) - return report(options, commit-object, FSCK_MSG_MISSING_PARENT, parent objects missing); + if (parent_count != parent_line_count) { + err = report(options, commit-object, FSCK_MSG_MISSING_PARENT, parent objects missing); + if (err) + return err; + } } if (!skip_prefix(buffer, author , buffer)) return report(options, commit-object, FSCK_MSG_MISSING_AUTHOR, invalid format - expected 'author' line); -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 07/19] fsck: Make fsck_ident() warn-friendly
When fsck_ident() identifies a problem with the ident, it should still advance the pointer to the next line so that fsck can continue in the case of a mere warning. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- fsck.c | 49 +++-- 1 file changed, 27 insertions(+), 22 deletions(-) diff --git a/fsck.c b/fsck.c index 0ec02b2..d0a7282 100644 --- a/fsck.c +++ b/fsck.c @@ -481,40 +481,45 @@ static int require_end_of_header(const void *data, unsigned long size, static int fsck_ident(const char **ident, struct object *obj, struct fsck_options *options) { + const char *p = *ident; char *end; - if (**ident == '') + *ident = strchrnul(*ident, '\n'); + if (**ident == '\n') + (*ident)++; + + if (*p == '') return report(options, obj, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, invalid author/committer line - missing space before email); - *ident += strcspn(*ident, \n); - if (**ident == '') + p += strcspn(p, \n); + if (*p == '') return report(options, obj, FSCK_MSG_BAD_NAME, invalid author/committer line - bad name); - if (**ident != '') + if (*p != '') return report(options, obj, FSCK_MSG_MISSING_EMAIL, invalid author/committer line - missing email); - if ((*ident)[-1] != ' ') + if (p[-1] != ' ') return report(options, obj, FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, invalid author/committer line - missing space before email); - (*ident)++; - *ident += strcspn(*ident, \n); - if (**ident != '') + p++; + p += strcspn(p, \n); + if (*p != '') return report(options, obj, FSCK_MSG_BAD_EMAIL, invalid author/committer line - bad email); - (*ident)++; - if (**ident != ' ') + p++; + if (*p != ' ') return report(options, obj, FSCK_MSG_MISSING_SPACE_BEFORE_DATE, invalid author/committer line - missing space before date); - (*ident)++; - if (**ident == '0' (*ident)[1] != ' ') + p++; + if (*p == '0' p[1] != ' ') return report(options, obj, FSCK_MSG_ZERO_PADDED_DATE, invalid author/committer line - zero-padded date); - if (date_overflows(strtoul(*ident, end, 10))) + if (date_overflows(strtoul(p, end, 10))) return report(options, obj, FSCK_MSG_BAD_DATE_OVERFLOW, invalid author/committer line - date causes integer overflow); - if (end == *ident || *end != ' ') + if ((end == p || *end != ' ')) return report(options, obj, FSCK_MSG_BAD_DATE, invalid author/committer line - bad date); - *ident = end + 1; - if ((**ident != '+' **ident != '-') || - !isdigit((*ident)[1]) || - !isdigit((*ident)[2]) || - !isdigit((*ident)[3]) || - !isdigit((*ident)[4]) || - ((*ident)[5] != '\n')) + p = end + 1; + if ((*p != '+' *p != '-') || + !isdigit(p[1]) || + !isdigit(p[2]) || + !isdigit(p[3]) || + !isdigit(p[4]) || + (p[5] != '\n')) return report(options, obj, FSCK_MSG_BAD_TIMEZONE, invalid author/committer line - bad time zone); - (*ident) += 6; + p += 6; return 0; } -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 10/19] fsck: Make fsck_tag() warn-friendly
When fsck_tag() identifies a problem with the commit, it should try to make it possible to continue checking the commit object, in case the user wants to demote the detected errors to mere warnings. Just like fsck_commit(), there are certain problems that could hide other issues with the same tag object. For example, if the 'type' line is not encountered in the correct position, the 'tag' line – if there is any – would not be handled at all. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- fsck.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/fsck.c b/fsck.c index daa07ad..3f76f99 100644 --- a/fsck.c +++ b/fsck.c @@ -642,7 +642,8 @@ static int fsck_tag_buffer(struct tag *tag, const char *data, } if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') { ret = report(options, tag-object, FSCK_MSG_BAD_OBJECT_SHA1, invalid 'object' line format - bad sha1); - goto done; + if (ret) + goto done; } buffer += 41; -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 15/19] fsck: Document the new receive.fsck.msg-id options
Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- Documentation/config.txt | 14 ++ 1 file changed, 14 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index 3e37b93..4e5fbea 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2205,6 +2205,20 @@ receive.fsckObjects:: Defaults to false. If not set, the value of `transfer.fsckObjects` is used instead. +receive.fsck.msg-id:: + When `receive.fsckObjects` is set to true, errors can be switched + to warnings and vice versa by configuring the `receive.fsck.msg-id` + setting where the `msg-id` is the fsck message ID and the value + is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes + the error/warning with the message ID, e.g. missingEmail: invalid + author/committer line - missing email means that setting + `receive.fsck.missingEmail = ignore` will hide that issue. ++ +This feature is intended to support working with legacy repositories +which would not pass pushing when `receive.fsckObjects = true`, allowing +the host to accept repositories with certain known issues but still catch +other issues. + receive.unpackLimit:: If the number of objects received in a push is below this limit then the objects will be unpacked into loose object -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 12/19] fsck: Disallow demoting grave fsck errors to warnings
Some kinds of errors are intrinsically unrecoverable (e.g. errors while uncompressing objects). It does not make sense to allow demoting them to mere warnings. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- fsck.c | 13 +++-- t/t5504-fetch-receive-strict.sh | 11 +++ 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/fsck.c b/fsck.c index 3f76f99..85535b1 100644 --- a/fsck.c +++ b/fsck.c @@ -9,7 +9,12 @@ #include refs.h #include utf8.h +#define FSCK_FATAL -1 + #define FOREACH_MSG_ID(FUNC) \ + /* fatal errors */ \ + FUNC(NUL_IN_HEADER, FATAL) \ + FUNC(UNTERMINATED_HEADER, FATAL) \ /* errors */ \ FUNC(BAD_DATE, ERROR) \ FUNC(BAD_DATE_OVERFLOW, ERROR) \ @@ -39,11 +44,9 @@ FUNC(MISSING_TYPE, ERROR) \ FUNC(MISSING_TYPE_ENTRY, ERROR) \ FUNC(MULTIPLE_AUTHORS, ERROR) \ - FUNC(NUL_IN_HEADER, ERROR) \ FUNC(TAG_OBJECT_NOT_TAG, ERROR) \ FUNC(TREE_NOT_SORTED, ERROR) \ FUNC(UNKNOWN_TYPE, ERROR) \ - FUNC(UNTERMINATED_HEADER, ERROR) \ FUNC(ZERO_PADDED_DATE, ERROR) \ /* warnings */ \ FUNC(BAD_FILEMODE, WARN) \ @@ -149,6 +152,9 @@ void fsck_set_msg_type(struct fsck_options *options, die(Unhandled message id: %s, msg_id); type = parse_msg_type(msg_type); + if (type != FSCK_ERROR msg_id_info[id].msg_type == FSCK_FATAL) + die(Cannot demote %s to %s, msg_id, msg_type); + if (!options-msg_type) { int i; int *msg_type = xmalloc(sizeof(int) * FSCK_MSG_MAX); @@ -215,6 +221,9 @@ static int report(struct fsck_options *options, struct object *object, struct strbuf sb = STRBUF_INIT; int msg_type = fsck_msg_type(id, options), result; + if (msg_type == FSCK_FATAL) + msg_type = FSCK_ERROR; + append_msg_id(sb, msg_id_info[id].id_string); va_start(ap, fmt); diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh index 36024fc..f5d6d0d 100755 --- a/t/t5504-fetch-receive-strict.sh +++ b/t/t5504-fetch-receive-strict.sh @@ -136,4 +136,15 @@ test_expect_success 'push with receive.fsck.missingEmail=warn' ' grep missingEmail act ' +test_expect_success \ + 'receive.fsck.unterminatedHeader=warn triggers error' ' + rm -rf dst + git init dst + git --git-dir=dst/.git config receive.fsckobjects true + git --git-dir=dst/.git config \ + receive.fsck.unterminatedheader warn + test_must_fail git push --porcelain dst HEAD act 21 + grep Cannot demote unterminatedheader act +' + test_done -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 06/19] fsck: Report the ID of the error/warning
Some legacy code has objects with non-fatal fsck issues; To enable the user to ignore those issues, let's print out the ID (e.g. when encountering missingEmail, the user might want to call `git config --add receive.fsck.missingEmail=warn`). Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- fsck.c | 20 t/t1450-fsck.sh | 4 ++-- 2 files changed, 22 insertions(+), 2 deletions(-) diff --git a/fsck.c b/fsck.c index 02af3ed..0ec02b2 100644 --- a/fsck.c +++ b/fsck.c @@ -188,6 +188,24 @@ void fsck_set_msg_types(struct fsck_options *options, const char *values) free(to_free); } +static void append_msg_id(struct strbuf *sb, const char *msg_id) +{ + for (;;) { + char c = *(msg_id)++; + + if (!c) + break; + if (c != '_') + strbuf_addch(sb, tolower(c)); + else { + assert(*msg_id); + strbuf_addch(sb, *(msg_id)++); + } + } + + strbuf_addstr(sb, : ); +} + __attribute__((format (printf, 4, 5))) static int report(struct fsck_options *options, struct object *object, enum fsck_msg_id id, const char *fmt, ...) @@ -196,6 +214,8 @@ static int report(struct fsck_options *options, struct object *object, struct strbuf sb = STRBUF_INIT; int msg_type = fsck_msg_type(id, options), result; + append_msg_id(sb, msg_id_info[id].id_string); + va_start(ap, fmt); strbuf_vaddf(sb, fmt, ap); result = options-error_func(object, msg_type, sb.buf); diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index cfb32b6..7c5b3d5 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -231,8 +231,8 @@ test_expect_success 'tag with incorrect tag name missing tagger' ' git fsck --tags 2out cat expect -EOF - warning in tag $tag: invalid '\''tag'\'' name: wrong name format - warning in tag $tag: invalid format - expected '\''tagger'\'' line + warning in tag $tag: badTagName: invalid '\''tag'\'' name: wrong name format + warning in tag $tag: missingTaggerEntry: invalid format - expected '\''tagger'\'' line EOF test_cmp expect out ' -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 16/19] fsck: Support demoting errors to warnings
We already have support in `git receive-pack` to deal with some legacy repositories which have non-fatal issues. Let's make `git fsck` itself useful with such repositories, too, by allowing users to ignore known issues, or at least demote those issues to mere warnings. Example: `git -c fsck.missingEmail=ignore fsck` would hide problems with missing emails in author, committer and tagger lines. In the same spirit that `git receive-pack`'s usage of the fsck machinery differs from `git fsck`'s – some of the non-fatal warnings in `git fsck` are fatal with `git receive-pack` when receive.fsckObjects = true, for example – we strictly separate the fsck.msg-id from the receive.fsck.msg-id settings. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- Documentation/config.txt | 11 +++ builtin/fsck.c | 12 t/t1450-fsck.sh | 11 +++ 3 files changed, 34 insertions(+) diff --git a/Documentation/config.txt b/Documentation/config.txt index 4e5fbea..bfccd2b 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1250,6 +1250,17 @@ filter.driver.smudge:: object to a worktree file upon checkout. See linkgit:gitattributes[5] for details. +fsck.msg-id:: + Allows overriding the message type (error, warn or ignore) of a + specific message ID such as `missingEmail`. ++ +For convenience, fsck prefixes the error/warning with the message ID, +e.g. missingEmail: invalid author/committer line - missing email means +that setting `fsck.missingEmail = ignore` will hide that issue. ++ +This feature is intended to support working with legacy repositories +which cannot be repaired without disruptive changes. + gc.aggressiveDepth:: The depth parameter used in the delta compression algorithm used by 'git gc --aggressive'. This defaults diff --git a/builtin/fsck.c b/builtin/fsck.c index fff38fe..adaa802 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -46,6 +46,16 @@ static int show_dangling = 1; #define DIRENT_SORT_HINT(de) ((de)-d_ino) #endif +static int fsck_config(const char *var, const char *value, void *cb) +{ + if (skip_prefix(var, fsck., var)) { + fsck_set_msg_type(fsck_obj_options, var, value); + return 0; + } + + return git_default_config(var, value, cb); +} + static void objreport(struct object *obj, const char *msg_type, const char *err) { @@ -646,6 +656,8 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) include_reflogs = 0; } + git_config(fsck_config, NULL); + fsck_head_link(); fsck_object_dir(get_object_directory()); diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh index 7c5b3d5..1727129 100755 --- a/t/t1450-fsck.sh +++ b/t/t1450-fsck.sh @@ -287,6 +287,17 @@ test_expect_success 'rev-list --verify-objects with bad sha1' ' grep -q error: sha1 mismatch 63ff out ' +test_expect_success 'force fsck to ignore double author' ' + git cat-file commit HEAD basis + sed s/^author .*/,/ basis | tr , \\n multiple-authors + new=$(git hash-object -t commit -w --stdin multiple-authors) + test_when_finished remove_object $new + git update-ref refs/heads/bogus $new + test_when_finished git update-ref -d refs/heads/bogus + test_must_fail git fsck + git -c fsck.multipleAuthors=ignore fsck +' + _bz='\0' _bz5=$_bz$_bz$_bz$_bz$_bz _bz20=$_bz5$_bz5$_bz5$_bz5 -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v7 04/19] fsck: Offer a function to demote fsck errors to warnings
Johannes Schindelin johannes.schinde...@gmx.de writes: diff --git a/fsck.c b/fsck.c index 1a3f7ce..e81a342 100644 --- a/fsck.c +++ b/fsck.c @@ -64,30 +64,29 @@ enum fsck_msg_id { #undef MSG_ID #define STR(x) #x -#define MSG_ID(id, msg_type) { STR(id), FSCK_##msg_type }, +#define MSG_ID(id, msg_type) { STR(id), NULL, FSCK_##msg_type }, static struct { const char *id_string; + const char *lowercased; int msg_type; } msg_id_info[FSCK_MSG_MAX + 1] = { FOREACH_MSG_ID(MSG_ID) - { NULL, -1 } + { NULL, NULL, -1 } }; #undef MSG_ID static int parse_msg_id(const char *text) { - static char **lowercased; int i; - if (!lowercased) { + if (!msg_id_info[0].lowercased) { /* convert id_string to lower case, without underscores. */ - lowercased = xmalloc(FSCK_MSG_MAX * sizeof(*lowercased)); for (i = 0; i FSCK_MSG_MAX; i++) { const char *p = msg_id_info[i].id_string; int len = strlen(p); char *q = xmalloc(len); - lowercased[i] = q; + msg_id_info[i].lowercased = q; while (*p) if (*p == '_') p++; @@ -98,7 +97,7 @@ static int parse_msg_id(const char *text) } for (i = 0; i FSCK_MSG_MAX; i++) - if (!strcmp(text, lowercased[i])) + if (!strcmp(text, msg_id_info[i].lowercased)) return i; return -1; Heh, this was the first thing that came to my mind when I saw 03/19 that lazily prepares downcased version (which is good) but do so in a separately allocated buffer (which is improved by this change) ;-) IOW, I think all of the above should have been part of 03/19, not oops I belatedly realized that this way is better fixup here. The end result looks good, so let's keep reading. +void fsck_set_msg_types(struct fsck_options *options, const char *values) +{ + char *buf = xstrdup(values), *to_free = buf; + int done = 0; + + while (!done) { + int len = strcspn(buf, ,|), equal; + + done = !buf[len]; + if (!len) { + buf++; + continue; + } + buf[len] = '\0'; + + for (equal = 0; equal len + buf[equal] != '=' buf[equal] != ':'; equal++) Style. I'd format this more like so: for (equal = 0; equal len buf[equal] != '=' buf[equal] != ':'; equal++) + buf[equal] = tolower(buf[equal]); + buf[equal] = '\0'; + + if (equal == len) + die(Missing '=': '%s', buf); + + fsck_set_msg_type(options, buf, buf + equal + 1); + buf += len + 1; + } + free(to_free); +} Overall, the change is good (and it was good in v6, too), and I think it has become simpler to follow the logic with the upfront downcasing. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 04/19] fsck: Offer a function to demote fsck errors to warnings
There are legacy repositories out there whose older commits and tags have issues that prevent pushing them when 'receive.fsckObjects' is set. One real-life example is a commit object that has been hand-crafted to list two authors. Often, it is not possible to fix those issues without disrupting the work with said repositories, yet it is still desirable to perform checks by setting `receive.fsckObjects = true`. This commit is the first step to allow demoting specific fsck issues to mere warnings. The `fsck_set_msg_types()` function added by this commit parses a list of settings in the form: missingemail=warn,badname=warn,... Unfortunately, the FSCK_WARN/FSCK_ERROR flag is only really heeded by git fsck so far, but other call paths (e.g. git index-pack --strict) error out *always* no matter what type was specified. Therefore, we need to take extra care to set all message types to FSCK_ERROR by default in those cases. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- fsck.c | 88 ++ fsck.h | 9 +-- 2 files changed, 85 insertions(+), 12 deletions(-) diff --git a/fsck.c b/fsck.c index 1a3f7ce..e81a342 100644 --- a/fsck.c +++ b/fsck.c @@ -64,30 +64,29 @@ enum fsck_msg_id { #undef MSG_ID #define STR(x) #x -#define MSG_ID(id, msg_type) { STR(id), FSCK_##msg_type }, +#define MSG_ID(id, msg_type) { STR(id), NULL, FSCK_##msg_type }, static struct { const char *id_string; + const char *lowercased; int msg_type; } msg_id_info[FSCK_MSG_MAX + 1] = { FOREACH_MSG_ID(MSG_ID) - { NULL, -1 } + { NULL, NULL, -1 } }; #undef MSG_ID static int parse_msg_id(const char *text) { - static char **lowercased; int i; - if (!lowercased) { + if (!msg_id_info[0].lowercased) { /* convert id_string to lower case, without underscores. */ - lowercased = xmalloc(FSCK_MSG_MAX * sizeof(*lowercased)); for (i = 0; i FSCK_MSG_MAX; i++) { const char *p = msg_id_info[i].id_string; int len = strlen(p); char *q = xmalloc(len); - lowercased[i] = q; + msg_id_info[i].lowercased = q; while (*p) if (*p == '_') p++; @@ -98,7 +97,7 @@ static int parse_msg_id(const char *text) } for (i = 0; i FSCK_MSG_MAX; i++) - if (!strcmp(text, lowercased[i])) + if (!strcmp(text, msg_id_info[i].lowercased)) return i; return -1; @@ -109,13 +108,78 @@ static int fsck_msg_type(enum fsck_msg_id msg_id, { int msg_type; - msg_type = msg_id_info[msg_id].msg_type; - if (options-strict msg_type == FSCK_WARN) - msg_type = FSCK_ERROR; + assert(msg_id = 0 msg_id FSCK_MSG_MAX); + + if (options-msg_type) + msg_type = options-msg_type[msg_id]; + else { + msg_type = msg_id_info[msg_id].msg_type; + if (options-strict msg_type == FSCK_WARN) + msg_type = FSCK_ERROR; + } return msg_type; } +static int parse_msg_type(const char *str) +{ + if (!strcmp(str, error)) + return FSCK_ERROR; + else if (!strcmp(str, warn)) + return FSCK_WARN; + else + die(Unknown fsck message type: '%s', str); +} + +void fsck_set_msg_type(struct fsck_options *options, + const char *msg_id, const char *msg_type) +{ + int id = parse_msg_id(msg_id), type; + + if (id 0) + die(Unhandled message id: %s, msg_id); + type = parse_msg_type(msg_type); + + if (!options-msg_type) { + int i; + int *msg_type = xmalloc(sizeof(int) * FSCK_MSG_MAX); + for (i = 0; i FSCK_MSG_MAX; i++) + msg_type[i] = fsck_msg_type(i, options); + options-msg_type = msg_type; + } + + options-msg_type[id] = type; +} + +void fsck_set_msg_types(struct fsck_options *options, const char *values) +{ + char *buf = xstrdup(values), *to_free = buf; + int done = 0; + + while (!done) { + int len = strcspn(buf, ,|), equal; + + done = !buf[len]; + if (!len) { + buf++; + continue; + } + buf[len] = '\0'; + + for (equal = 0; equal len + buf[equal] != '=' buf[equal] != ':'; equal++) + buf[equal] = tolower(buf[equal]); + buf[equal] = '\0'; + + if (equal == len) + die(Missing '=': '%s', buf); + + fsck_set_msg_type(options, buf, buf + equal + 1); +
[PATCH v7 00/19] Introduce an internal API to interact with the fsck machinery
At the moment, the git-fsck's integrity checks are targeted toward the end user, i.e. the error messages are really just messages, intended for human consumption. Under certain circumstances, some of those errors should be allowed to be turned into mere warnings, though, because the cost of fixing the issues might well be larger than the cost of carrying those flawed objects. For example, when an already-public repository contains a commit object with two authors for years, it does not make sense to force the maintainer to rewrite the history, affecting all contributors negatively by forcing them to update. This branch introduces an internal fsck API to be able to turn some of the errors into warnings, and to make it easier to call the fsck machinery from elsewhere in general. I am proud to report that this work has been sponsored by GitHub. Changes since v6: - camelCased message IDs - multiple author checking now as suggested by Junio - renamed `--quick` to `--connectivity-only`, better commit message - `fsck.skipList` is now handled correctly (and not mistaken for a message type setting) - `fsck.skipList` can handle user paths now - index-pack configures the walk function in a more logical place now - simplified code by avoiding working on partial strings (i.e. removed `substrcmp()`). This saves 10 lines. To accomodate parsing config variables directly, we now work on lowercased message IDs; unfortunately this means that we cannot use them in append_msg_id() because that function wants to append camelCased message IDs. Interdiff below diffstat. Johannes Schindelin (19): fsck: Introduce fsck options fsck: Introduce identifiers for fsck messages fsck: Provide a function to parse fsck message IDs fsck: Offer a function to demote fsck errors to warnings fsck (receive-pack): Allow demoting errors to warnings fsck: Report the ID of the error/warning fsck: Make fsck_ident() warn-friendly fsck: Make fsck_commit() warn-friendly fsck: Handle multiple authors in commits specially fsck: Make fsck_tag() warn-friendly fsck: Add a simple test for receive.fsck.msg-id fsck: Disallow demoting grave fsck errors to warnings fsck: Optionally ignore specific fsck issues completely fsck: Allow upgrading fsck warnings to errors fsck: Document the new receive.fsck.msg-id options fsck: Support demoting errors to warnings fsck: Introduce `git fsck --connectivity-only` fsck: git receive-pack: support excluding objects from fsck'ing fsck: support ignoring objects in `git fsck` via fsck.skiplist Documentation/config.txt| 41 +++ Documentation/git-fsck.txt | 7 +- builtin/fsck.c | 78 -- builtin/index-pack.c| 13 +- builtin/receive-pack.c | 28 +- builtin/unpack-objects.c| 16 +- fsck.c | 554 +++- fsck.h | 30 ++- t/t1450-fsck.sh | 37 ++- t/t5302-pack-index.sh | 2 +- t/t5504-fetch-receive-strict.sh | 51 11 files changed, 692 insertions(+), 165 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 5aba63a..69dda93 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1252,11 +1252,11 @@ filter.driver.smudge:: fsck.msg-id:: Allows overriding the message type (error, warn or ignore) of a - specific message ID such as `missingemail`. + specific message ID such as `missingEmail`. + For convenience, fsck prefixes the error/warning with the message ID, -e.g. missingemail: invalid author/committer line - missing email means -that setting `fsck.missingemail = ignore` will hide that issue. +e.g. missingEmail: invalid author/committer line - missing email means +that setting `fsck.missingEmail = ignore` will hide that issue. + This feature is intended to support working with legacy repositories which cannot be repaired without disruptive changes. @@ -1267,6 +1267,7 @@ fsck.skipList:: be ignored. This feature is useful when an established project should be accepted despite early commits containing errors that can be safely ignored such as invalid committer email addresses. + Note: corrupt objects cannot be skipped with this setting. gc.aggressiveDepth:: The depth parameter used in the delta compression @@ -2228,9 +2229,9 @@ receive.fsck.msg-id:: to warnings and vice versa by configuring the `receive.fsck.msg-id` setting where the `msg-id` is the fsck message ID and the value is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes - the error/warning with the message ID, e.g. missingemail: invalid + the error/warning with the message ID, e.g. missingEmail: invalid author/committer line - missing email means that setting - `receive.fsck.missingemail = ignore` will hide that issue. +
[PATCH v7 03/19] fsck: Provide a function to parse fsck message IDs
These functions will be used in the next commits to allow the user to ask fsck to handle specific problems differently, e.g. demoting certain errors to warnings. The upcoming `fsck_set_msg_types()` function has to handle partial strings because we would like to be able to parse, say, 'missingemail=warn,missingtaggerentry=warn' command line parameters (which will be passed by receive-pack to index-pack and unpack-objects). To make the parsing robust, we generate strings from the enum keys, and using these keys, we match up strings without dashes case-insensitively to the corresponding enum values. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- fsck.c | 36 ++-- 1 file changed, 34 insertions(+), 2 deletions(-) diff --git a/fsck.c b/fsck.c index ab24618..1a3f7ce 100644 --- a/fsck.c +++ b/fsck.c @@ -63,15 +63,47 @@ enum fsck_msg_id { }; #undef MSG_ID -#define MSG_ID(id, msg_type) { FSCK_##msg_type }, +#define STR(x) #x +#define MSG_ID(id, msg_type) { STR(id), FSCK_##msg_type }, static struct { + const char *id_string; int msg_type; } msg_id_info[FSCK_MSG_MAX + 1] = { FOREACH_MSG_ID(MSG_ID) - { -1 } + { NULL, -1 } }; #undef MSG_ID +static int parse_msg_id(const char *text) +{ + static char **lowercased; + int i; + + if (!lowercased) { + /* convert id_string to lower case, without underscores. */ + lowercased = xmalloc(FSCK_MSG_MAX * sizeof(*lowercased)); + for (i = 0; i FSCK_MSG_MAX; i++) { + const char *p = msg_id_info[i].id_string; + int len = strlen(p); + char *q = xmalloc(len); + + lowercased[i] = q; + while (*p) + if (*p == '_') + p++; + else + *(q)++ = tolower(*(p)++); + *q = '\0'; + } + } + + for (i = 0; i FSCK_MSG_MAX; i++) + if (!strcmp(text, lowercased[i])) + return i; + + return -1; +} + static int fsck_msg_type(enum fsck_msg_id msg_id, struct fsck_options *options) { -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 05/19] fsck (receive-pack): Allow demoting errors to warnings
For example, missing emails in commit and tag objects can be demoted to mere warnings with git config receive.fsck.missingemail=warn The value is actually a comma-separated list. In case that the same key is listed in multiple receive.fsck.msg-id lines in the config, the latter configuration wins (this can happen for example when both $HOME/.gitconfig and .git/config contain message type settings). As git receive-pack does not actually perform the checks, it hands off the setting to index-pack or unpack-objects in the form of an optional argument to the --strict option. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- builtin/index-pack.c | 4 builtin/receive-pack.c | 17 +++-- builtin/unpack-objects.c | 5 + fsck.c | 8 fsck.h | 1 + 5 files changed, 33 insertions(+), 2 deletions(-) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index f8b0c64..f0d283b 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -1633,6 +1633,10 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) } else if (!strcmp(arg, --strict)) { strict = 1; do_fsck_object = 1; + } else if (skip_prefix(arg, --strict=, arg)) { + strict = 1; + do_fsck_object = 1; + fsck_set_msg_types(fsck_options, arg); } else if (!strcmp(arg, --check-self-contained-and-connected)) { strict = 1; check_self_contained_and_connected = 1; diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c index 94d0571..3afe8f8 100644 --- a/builtin/receive-pack.c +++ b/builtin/receive-pack.c @@ -19,6 +19,7 @@ #include tag.h #include gpg-interface.h #include sigchain.h +#include fsck.h static const char receive_pack_usage[] = git receive-pack git-dir; @@ -36,6 +37,7 @@ static enum deny_action deny_current_branch = DENY_UNCONFIGURED; static enum deny_action deny_delete_current = DENY_UNCONFIGURED; static int receive_fsck_objects = -1; static int transfer_fsck_objects = -1; +static struct strbuf fsck_msg_types = STRBUF_INIT; static int receive_unpack_limit = -1; static int transfer_unpack_limit = -1; static int advertise_atomic_push = 1; @@ -115,6 +117,15 @@ static int receive_pack_config(const char *var, const char *value, void *cb) return 0; } + if (skip_prefix(var, receive.fsck., var)) { + if (is_valid_msg_type(var, value)) + strbuf_addf(fsck_msg_types, %c%s=%s, + fsck_msg_types.len ? ',' : '=', var, value); + else + warning(Skipping unknown msg id '%s', var); + return 0; + } + if (strcmp(var, receive.fsckobjects) == 0) { receive_fsck_objects = git_config_bool(var, value); return 0; @@ -1490,7 +1501,8 @@ static const char *unpack(int err_fd, struct shallow_info *si) if (quiet) argv_array_push(child.args, -q); if (fsck_objects) - argv_array_push(child.args, --strict); + argv_array_pushf(child.args, --strict%s, + fsck_msg_types.buf); child.no_stdout = 1; child.err = err_fd; child.git_cmd = 1; @@ -1508,7 +1520,8 @@ static const char *unpack(int err_fd, struct shallow_info *si) argv_array_pushl(child.args, index-pack, --stdin, hdr_arg, keep_arg, NULL); if (fsck_objects) - argv_array_push(child.args, --strict); + argv_array_pushf(child.args, --strict%s, + fsck_msg_types.buf); if (fix_thin) argv_array_push(child.args, --fix-thin); child.out = -1; diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c index 6d17040..7cc086f 100644 --- a/builtin/unpack-objects.c +++ b/builtin/unpack-objects.c @@ -530,6 +530,11 @@ int cmd_unpack_objects(int argc, const char **argv, const char *prefix) strict = 1; continue; } + if (skip_prefix(arg, --strict=, arg)) { + strict = 1; + fsck_set_msg_types(fsck_options, arg); + continue; + } if (starts_with(arg, --pack_header=)) { struct pack_header *hdr; char *c; diff --git a/fsck.c b/fsck.c index e81a342..02af3ed 100644 --- a/fsck.c
[PATCH v7 01/19] fsck: Introduce fsck options
Just like the diff machinery, we are about to introduce more settings, therefore it makes sense to carry them around as a (pointer to a) struct containing all of them. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- builtin/fsck.c | 20 +-- builtin/index-pack.c | 9 +-- builtin/unpack-objects.c | 11 ++-- fsck.c | 150 +++ fsck.h | 17 +- 5 files changed, 114 insertions(+), 93 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 2679793..981dca5 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -25,6 +25,8 @@ static int include_reflogs = 1; static int check_full = 1; static int check_strict; static int keep_cache_objects; +static struct fsck_options fsck_walk_options = FSCK_OPTIONS_DEFAULT; +static struct fsck_options fsck_obj_options = FSCK_OPTIONS_DEFAULT; static struct object_id head_oid; static const char *head_points_at; static int errors_found; @@ -76,7 +78,7 @@ static int fsck_error_func(struct object *obj, int type, const char *err, ...) static struct object_array pending; -static int mark_object(struct object *obj, int type, void *data) +static int mark_object(struct object *obj, int type, void *data, struct fsck_options *options) { struct object *parent = data; @@ -119,7 +121,7 @@ static int mark_object(struct object *obj, int type, void *data) static void mark_object_reachable(struct object *obj) { - mark_object(obj, OBJ_ANY, NULL); + mark_object(obj, OBJ_ANY, NULL, NULL); } static int traverse_one_object(struct object *obj) @@ -132,7 +134,7 @@ static int traverse_one_object(struct object *obj) if (parse_tree(tree) 0) return 1; /* error already displayed */ } - result = fsck_walk(obj, mark_object, obj); + result = fsck_walk(obj, obj, fsck_walk_options); if (tree) free_tree_buffer(tree); return result; @@ -158,7 +160,7 @@ static int traverse_reachable(void) return !!result; } -static int mark_used(struct object *obj, int type, void *data) +static int mark_used(struct object *obj, int type, void *data, struct fsck_options *options) { if (!obj) return 1; @@ -296,9 +298,9 @@ static int fsck_obj(struct object *obj) fprintf(stderr, Checking %s %s\n, typename(obj-type), sha1_to_hex(obj-sha1)); - if (fsck_walk(obj, mark_used, NULL)) + if (fsck_walk(obj, NULL, fsck_obj_options)) objerror(obj, broken links); - if (fsck_object(obj, NULL, 0, check_strict, fsck_error_func)) + if (fsck_object(obj, NULL, 0, fsck_obj_options)) return -1; if (obj-type == OBJ_TREE) { @@ -638,6 +640,12 @@ int cmd_fsck(int argc, const char **argv, const char *prefix) argc = parse_options(argc, argv, prefix, fsck_opts, fsck_usage, 0); + fsck_walk_options.walk = mark_object; + fsck_obj_options.walk = mark_used; + fsck_obj_options.error_func = fsck_error_func; + if (check_strict) + fsck_obj_options.strict = 1; + if (show_progress == -1) show_progress = isatty(2); if (verbose) diff --git a/builtin/index-pack.c b/builtin/index-pack.c index 48fa472..f8b0c64 100644 --- a/builtin/index-pack.c +++ b/builtin/index-pack.c @@ -75,6 +75,7 @@ static int nr_threads; static int from_stdin; static int strict; static int do_fsck_object; +static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT; static int verbose; static int show_stat; static int check_self_contained_and_connected; @@ -192,7 +193,7 @@ static void cleanup_thread(void) #endif -static int mark_link(struct object *obj, int type, void *data) +static int mark_link(struct object *obj, int type, void *data, struct fsck_options *options) { if (!obj) return -1; @@ -838,10 +839,9 @@ static void sha1_object(const void *data, struct object_entry *obj_entry, if (!obj) die(_(invalid %s), typename(type)); if (do_fsck_object - fsck_object(obj, buf, size, 1, - fsck_error_function)) + fsck_object(obj, buf, size, fsck_options)) die(_(Error in object)); - if (fsck_walk(obj, mark_link, NULL)) + if (fsck_walk(obj, NULL, fsck_options)) die(_(Not all child objects of %s are reachable), sha1_to_hex(obj-sha1)); if (obj-type == OBJ_TREE) { @@ -1615,6 +1615,7 @@ int cmd_index_pack(int argc, const char **argv, const char *prefix) usage(index_pack_usage); check_replace_refs = 0; + fsck_options.walk = mark_link;
[PATCH v7 02/19] fsck: Introduce identifiers for fsck messages
Instead of specifying whether a message by the fsck machinery constitutes an error or a warning, let's specify an identifier relating to the concrete problem that was encountered. This is necessary for upcoming support to be able to demote certain errors to warnings. In the process, simplify the requirements on the calling code: instead of having to handle full-blown varargs in every callback, we now send a string buffer ready to be used by the callback. We could use a simple enum for the message IDs here, but we want to guarantee that the enum values are associated with the appropriate message types (i.e. error or warning?). Besides, we want to introduce a parser in the next commit that maps the string representation to the enum value, hence we use the slightly ugly preprocessor construct that is extensible for use with said parser. Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- builtin/fsck.c | 26 +++- fsck.c | 201 + fsck.h | 5 +- 3 files changed, 154 insertions(+), 78 deletions(-) diff --git a/builtin/fsck.c b/builtin/fsck.c index 981dca5..fff38fe 100644 --- a/builtin/fsck.c +++ b/builtin/fsck.c @@ -46,33 +46,23 @@ static int show_dangling = 1; #define DIRENT_SORT_HINT(de) ((de)-d_ino) #endif -static void objreport(struct object *obj, const char *severity, - const char *err, va_list params) +static void objreport(struct object *obj, const char *msg_type, + const char *err) { - fprintf(stderr, %s in %s %s: , - severity, typename(obj-type), sha1_to_hex(obj-sha1)); - vfprintf(stderr, err, params); - fputs(\n, stderr); + fprintf(stderr, %s in %s %s: %s\n, + msg_type, typename(obj-type), sha1_to_hex(obj-sha1), err); } -__attribute__((format (printf, 2, 3))) -static int objerror(struct object *obj, const char *err, ...) +static int objerror(struct object *obj, const char *err) { - va_list params; - va_start(params, err); errors_found |= ERROR_OBJECT; - objreport(obj, error, err, params); - va_end(params); + objreport(obj, error, err); return -1; } -__attribute__((format (printf, 3, 4))) -static int fsck_error_func(struct object *obj, int type, const char *err, ...) +static int fsck_error_func(struct object *obj, int type, const char *message) { - va_list params; - va_start(params, err); - objreport(obj, (type == FSCK_WARN) ? warning : error, err, params); - va_end(params); + objreport(obj, (type == FSCK_WARN) ? warning : error, message); return (type == FSCK_WARN) ? 0 : 1; } diff --git a/fsck.c b/fsck.c index d83b811..ab24618 100644 --- a/fsck.c +++ b/fsck.c @@ -9,6 +9,98 @@ #include refs.h #include utf8.h +#define FOREACH_MSG_ID(FUNC) \ + /* errors */ \ + FUNC(BAD_DATE, ERROR) \ + FUNC(BAD_DATE_OVERFLOW, ERROR) \ + FUNC(BAD_EMAIL, ERROR) \ + FUNC(BAD_NAME, ERROR) \ + FUNC(BAD_OBJECT_SHA1, ERROR) \ + FUNC(BAD_PARENT_SHA1, ERROR) \ + FUNC(BAD_TAG_OBJECT, ERROR) \ + FUNC(BAD_TIMEZONE, ERROR) \ + FUNC(BAD_TREE, ERROR) \ + FUNC(BAD_TREE_SHA1, ERROR) \ + FUNC(BAD_TYPE, ERROR) \ + FUNC(DUPLICATE_ENTRIES, ERROR) \ + FUNC(MISSING_AUTHOR, ERROR) \ + FUNC(MISSING_COMMITTER, ERROR) \ + FUNC(MISSING_EMAIL, ERROR) \ + FUNC(MISSING_GRAFT, ERROR) \ + FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \ + FUNC(MISSING_OBJECT, ERROR) \ + FUNC(MISSING_PARENT, ERROR) \ + FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \ + FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \ + FUNC(MISSING_TAG, ERROR) \ + FUNC(MISSING_TAG_ENTRY, ERROR) \ + FUNC(MISSING_TAG_OBJECT, ERROR) \ + FUNC(MISSING_TREE, ERROR) \ + FUNC(MISSING_TYPE, ERROR) \ + FUNC(MISSING_TYPE_ENTRY, ERROR) \ + FUNC(NUL_IN_HEADER, ERROR) \ + FUNC(TAG_OBJECT_NOT_TAG, ERROR) \ + FUNC(TREE_NOT_SORTED, ERROR) \ + FUNC(UNKNOWN_TYPE, ERROR) \ + FUNC(UNTERMINATED_HEADER, ERROR) \ + FUNC(ZERO_PADDED_DATE, ERROR) \ + /* warnings */ \ + FUNC(BAD_FILEMODE, WARN) \ + FUNC(BAD_TAG_NAME, WARN) \ + FUNC(EMPTY_NAME, WARN) \ + FUNC(FULL_PATHNAME, WARN) \ + FUNC(HAS_DOT, WARN) \ + FUNC(HAS_DOTDOT, WARN) \ + FUNC(HAS_DOTGIT, WARN) \ + FUNC(MISSING_TAGGER_ENTRY, WARN) \ + FUNC(NULL_SHA1, WARN) \ + FUNC(ZERO_PADDED_FILEMODE, WARN) + +#define MSG_ID(id, msg_type) FSCK_MSG_##id, +enum fsck_msg_id { + FOREACH_MSG_ID(MSG_ID) + FSCK_MSG_MAX +}; +#undef MSG_ID + +#define MSG_ID(id, msg_type) { FSCK_##msg_type }, +static struct { + int msg_type; +} msg_id_info[FSCK_MSG_MAX + 1] = { + FOREACH_MSG_ID(MSG_ID) + { -1 } +}; +#undef MSG_ID + +static int fsck_msg_type(enum fsck_msg_id msg_id, + struct fsck_options *options) +{ +
Re: [PATCH v4 05/19] ref-filter: add parse_opt_merge_filter()
Matthieu Moy matthieu@grenoble-inp.fr writes: Junio C Hamano gits...@pobox.com writes: This is copied from 'builtin/branch.c' which will eventually be removed when we port 'branch.c' to use ref-filter APIs. Hmph. I somehow thought Matthieu's instruction was to finish tag.c side first I would call in advice rather than instruction. Hmph, the choice of the word came from my understanding that a mentor is like a teacher, but advice is fine by me ;-) I still think we should prioritize the tag.c side, but if this patch is ready, it makes sense to keep it in the series. OK. -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 13/19] fsck: Optionally ignore specific fsck issues completely
An fsck issue in a legacy repository might be so common that one would like not to bother the user with mentioning it at all. With this change, that is possible by setting the respective message type to ignore. This change abuses the missingEmail=warn test to verify that ignore is also accepted and works correctly. And while at it, it makes sure that multiple options work, too (they are passed to unpack-objects or index-pack as a comma-separated list via the --strict=... command-line option). Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- fsck.c | 5 + fsck.h | 1 + t/t5504-fetch-receive-strict.sh | 9 - 3 files changed, 14 insertions(+), 1 deletion(-) diff --git a/fsck.c b/fsck.c index 85535b1..cbfff1f 100644 --- a/fsck.c +++ b/fsck.c @@ -131,6 +131,8 @@ static int parse_msg_type(const char *str) return FSCK_ERROR; else if (!strcmp(str, warn)) return FSCK_WARN; + else if (!strcmp(str, ignore)) + return FSCK_IGNORE; else die(Unknown fsck message type: '%s', str); } @@ -221,6 +223,9 @@ static int report(struct fsck_options *options, struct object *object, struct strbuf sb = STRBUF_INIT; int msg_type = fsck_msg_type(id, options), result; + if (msg_type == FSCK_IGNORE) + return 0; + if (msg_type == FSCK_FATAL) msg_type = FSCK_ERROR; diff --git a/fsck.h b/fsck.h index 3ef92a3..1dab276 100644 --- a/fsck.h +++ b/fsck.h @@ -3,6 +3,7 @@ #define FSCK_ERROR 1 #define FSCK_WARN 2 +#define FSCK_IGNORE 3 struct fsck_options; diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh index f5d6d0d..af373ba 100755 --- a/t/t5504-fetch-receive-strict.sh +++ b/t/t5504-fetch-receive-strict.sh @@ -133,7 +133,14 @@ test_expect_success 'push with receive.fsck.missingEmail=warn' ' git --git-dir=dst/.git config \ receive.fsck.missingEmail warn git push --porcelain dst bogus act 21 - grep missingEmail act + grep missingEmail act + git --git-dir=dst/.git branch -D bogus + git --git-dir=dst/.git config --add \ + receive.fsck.missingEmail ignore + git --git-dir=dst/.git config --add \ + receive.fsck.badDate warn + git push --porcelain dst bogus act 21 + test_must_fail grep missingEmail act ' test_expect_success \ -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v7 14/19] fsck: Allow upgrading fsck warnings to errors
The 'invalid tag name' and 'missing tagger entry' warnings can now be upgraded to errors by specifying `invalidTagName` and `missingTaggerEntry` in the receive.fsck.msg-id config setting. Incidentally, the missing tagger warning is now really shown as a warning (as opposed to being reported with the error: prefix, as it used to be the case before this commit). Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de --- fsck.c| 24 +--- t/t5302-pack-index.sh | 2 +- 2 files changed, 18 insertions(+), 8 deletions(-) diff --git a/fsck.c b/fsck.c index cbfff1f..f6fc384 100644 --- a/fsck.c +++ b/fsck.c @@ -10,6 +10,7 @@ #include utf8.h #define FSCK_FATAL -1 +#define FSCK_INFO -2 #define FOREACH_MSG_ID(FUNC) \ /* fatal errors */ \ @@ -50,15 +51,16 @@ FUNC(ZERO_PADDED_DATE, ERROR) \ /* warnings */ \ FUNC(BAD_FILEMODE, WARN) \ - FUNC(BAD_TAG_NAME, WARN) \ FUNC(EMPTY_NAME, WARN) \ FUNC(FULL_PATHNAME, WARN) \ FUNC(HAS_DOT, WARN) \ FUNC(HAS_DOTDOT, WARN) \ FUNC(HAS_DOTGIT, WARN) \ - FUNC(MISSING_TAGGER_ENTRY, WARN) \ FUNC(NULL_SHA1, WARN) \ - FUNC(ZERO_PADDED_FILEMODE, WARN) + FUNC(ZERO_PADDED_FILEMODE, WARN) \ + /* infos (reported as warnings, but ignored by default) */ \ + FUNC(BAD_TAG_NAME, INFO) \ + FUNC(MISSING_TAGGER_ENTRY, INFO) #define MSG_ID(id, msg_type) FSCK_MSG_##id, enum fsck_msg_id { @@ -228,6 +230,8 @@ static int report(struct fsck_options *options, struct object *object, if (msg_type == FSCK_FATAL) msg_type = FSCK_ERROR; + else if (msg_type == FSCK_INFO) + msg_type = FSCK_WARN; append_msg_id(sb, msg_id_info[id].id_string); @@ -686,15 +690,21 @@ static int fsck_tag_buffer(struct tag *tag, const char *data, goto done; } strbuf_addf(sb, refs/tags/%.*s, (int)(eol - buffer), buffer); - if (check_refname_format(sb.buf, 0)) - report(options, tag-object, FSCK_MSG_BAD_TAG_NAME, + if (check_refname_format(sb.buf, 0)) { + ret = report(options, tag-object, FSCK_MSG_BAD_TAG_NAME, invalid 'tag' name: %.*s, (int)(eol - buffer), buffer); + if (ret) + goto done; + } buffer = eol + 1; - if (!skip_prefix(buffer, tagger , buffer)) + if (!skip_prefix(buffer, tagger , buffer)) { /* early tags do not contain 'tagger' lines; warn only */ - report(options, tag-object, FSCK_MSG_MISSING_TAGGER_ENTRY, invalid format - expected 'tagger' line); + ret = report(options, tag-object, FSCK_MSG_MISSING_TAGGER_ENTRY, invalid format - expected 'tagger' line); + if (ret) + goto done; + } else ret = fsck_ident(buffer, tag-object, options); diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh index 61bc8da..3dc5ec4 100755 --- a/t/t5302-pack-index.sh +++ b/t/t5302-pack-index.sh @@ -259,7 +259,7 @@ EOF thirtyeight=${tag#??} rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight git index-pack --strict tag-test-${pack1}.pack 2err -grep ^error:.* expected .tagger. line err +grep ^warning:.* expected .tagger. line err ' test_done -- 2.3.1.windows.1.9.g8c01ab4 -- To unsubscribe from this list: send the line unsubscribe git in
Re: 'eol' documentation confusion
Torsten Bögershausen tbo...@web.de writes: eol=lf or eol=crlf are the only useful settings. Everything else is ignored because it does not make sense. See convert.c: static enum eol git_path_check_eol() That makes me wonder... The original reasoning behind the current behaviour that we ignore unknown values given to configuration variables and attributes is so that people can use the same file that has values that are understood by newer versions of Git with older versions of Git. You may be trying the eol=cleverLF setting introduced in Git version 47-prerelease by adding it to .git/info/attributes, and may have found it useful. But you may also have to use the same repository on another machine that you didn't install that future version of Git over the network filesystem. Barfing and not proceeding when we see unknown eol=cleverLF does not sound like a nice thing to do, which is why we just ignore and behave as if the setting was not there. Ideally, however, I think we should ignore an unknown setting as long as it does not matter (i.e. we do not come to the codepath that wants to know eol settings for the path, e.g. running git log to show only the commit log messages and the topology of the history), but we should error out when the unknown setting possibly matters (i.e. we do need the eol setting for the path in order to correctly convert the contents to end-user's liking). Thoughts (and patches ;-)? -- To unsubscribe from this list: send the line unsubscribe git in
Re: [Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux?
Le 22/06/2015 17:04, Charles Bailey a écrit : Note that these aren't decomposed (in the unicode decomposition sense) but are merely octal escaped representations of the utf-8 encoded file names. Thanks, I had read that term in similar context (German umlaut) and thought it was correctly describing the phenomenon. Key words octal escape return more precise results :) My understanding that this is normal and probably dates back (at least for status as far as: commit a734d0b10bd0f5554abb3acdf11426040cfc4df0 Author: Dmitry Potapov dpota...@gmail.com Date: Fri Mar 7 05:30:58 2008 +0300 Make private quote_path() in wt-status.c available as quote_path_relative() [...] The behaviour can be changed by setting the git config variable core.quotePath to false. This is awesome, thank you. Indeed I just tried my test case with this config option set to false and accented characters appear normally. Thank you! -- To unsubscribe from this list: send the line unsubscribe git in
Re: Submodule and proxy server.
On 22/06, Jamie Archibald wrote: fatal: unable to access 'http://http://path/to/submodule/MySubmodule.git/': The requested URL returned error: 502 Did you copy this error verbatim? -- Sincerely, Johannes Löthberg PGP Key ID: 0x50FB9B273A9D0BB5 https://theos.kyriasis.com/~kyrias/ signature.asc Description: PGP signature
Re: apply --cached --whitespace=fix now failing on items added with add -N
Duy Nguyen pclo...@gmail.com writes: On Mon, Jun 22, 2015 at 9:29 PM, Patrick Higgins phigg...@google.com wrote: I like to use git to remove trailing whitespace from my files. I use the following ~/.gitconfig to make this convenient: [alias] wsadd = !sh -c 'git diff -- \$@\ | git apply --cached --whitespace=fix;\ git checkout -- ${1-.} \$@\' - The wsadd alias doesn't work with new files, so I have to use git add -N on them first. As of a week or two ago, the apply --cached step now fails with the following, assuming a new file named bar containing foo has been added with add -N: $ git diff -- $@ | git apply --cached --whitespace=fix stdin:7: trailing whitespace. foo error: bar: already exists in index The final git checkout at the end of wsadd truncates my file. I've changed the ; to a to fix the truncation. Were there any recent changes to git apply that might have caused this? Probably fallout from this one, merged to 'master' about 5 weeks ago. I'll have a look at it tomorrow unless somebody does it first d95d728 (diff-lib.c: adjust position of i-t-a entries in diff - 2015-03-16) Yeah, the world order has changed by that commit, and I would expect to see some more fallouts. After add -N, git diff used to pretend that an empty blob was in the index, and showed a modification between an existing empty and the full file contents, and git apply --cached was happy to take that modification. In the new world order, git diff is instructed to pretend as if the path that was added by add -N does not exist yet in the index at all, but the index still knows about the path, which is a strange half-born state. It shows an addition of the full file contents as a new file. git apply --cached sees an addition of a new path, which requires that the path does not exist in the index. In the new world order, it should be taught to pretend that these add -N paths do not exist in the index, but that was not done. Something like the attached (totally untested) may be a good starting point. For another potential fallout, try this: $ cp COPYING RENAMING $ cp COPYING UNTRACKED $ EMPTY $ git add -N RENAMING EMPTY $ git rm UNTRACKED fatal: pathspec 'UNTRACKED' did not match any files $ git rm EMPTY RENAMING error: the following file has staged content different from both the file and the HEAD: EMPTY RENAMING (use -f to force removal) One could argue that these three should behave the same way, if the new world order is path added by 'add -N' does not exist in the index. I however think the new world order should be slightly different from does not exist in the index, but should be more like the index is aware of its presence but has not been told about its contents yet. The consequences of this are: - git rm RENAMING shouldn't say 'did not match any files'. - git rm RENAMING does not know about 'staged content', so complaining about staged contents different from file and HEAD feels wrong. Having said that, I do think erroring out and requiring -f is the right thing when remiving RENAMING and EMPTY. Unlike contents added by git add without -N, we do not know what is in the working tree file at all. Given that we check and refuse when the contents are different between the working tree file, the index and the HEAD even when we know what was in the working tree when git add without -N was done, we should keep the safety to prevent accidental removal of the working tree file, which has the only existing copy of the user content. On the other hand, I am also OK if the behaviour was like this: $ git rm --cached RENAMING ... removed without complaints ... $ git rm RENAMING error: file 'RENAMING' does not have staged content yet. (use -f to force removal) In any case, here is a how about this weather-balloon patch for git apply builtin/apply.c | 13 - 1 file changed, 12 insertions(+), 1 deletion(-) diff --git a/builtin/apply.c b/builtin/apply.c index 146be97..653191e 100644 --- a/builtin/apply.c +++ b/builtin/apply.c @@ -3546,12 +3546,23 @@ static int check_preimage(struct patch *patch, struct cache_entry **ce, struct s #define EXISTS_IN_INDEX 1 #define EXISTS_IN_WORKTREE 2 +static int exists_in_index(const char *new_name) +{ + int pos = cache_name_pos(new_name, strlen(new_name)); + + if (pos 0) + return 0; + if (active_cache[pos]-ce_flags CE_INTENT_TO_ADD) + return 0; + return 1; +} + static int check_to_create(const char *new_name, int ok_if_exists) { struct stat nst; if (check_index - cache_name_pos(new_name, strlen(new_name)) = 0 + exists_in_index(new_name) !ok_if_exists) return EXISTS_IN_INDEX; if (cached) -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 08/19] parse-option: rename parse_opt_with_commit()
On Mon, Jun 22, 2015 at 6:34 AM, Junio C Hamano gits...@pobox.com wrote: On Sun, Jun 21, 2015 at 1:48 PM, Karthik Nayak karthik@gmail.com wrote: Rename parse_opt_with_commit() to parse_opt_commit_object_name() to show that it can be used to obtain a list of commits and is not constricted to usage of '--contains' option. I think that is a brilliant idea, but unlike the other function you added earlier that can do only one object and adopts last one wins rule, this is cumulative, and that fact should be made clear to the developers in some way, no? Will add a comment :) Thanks! -- Regards, Karthik Nayak -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 08/19] parse-option: rename parse_opt_with_commit()
On Mon, Jun 22, 2015 at 9:57 PM, Junio C Hamano gits...@pobox.com wrote: On Mon, Jun 22, 2015 at 8:38 AM, Karthik Nayak karthik@gmail.com wrote: On Mon, Jun 22, 2015 at 6:34 AM, Junio C Hamano gits...@pobox.com wrote: On Sun, Jun 21, 2015 at 1:48 PM, Karthik Nayak karthik@gmail.com wrote: Rename parse_opt_with_commit() to parse_opt_commit_object_name() to show that it can be used to obtain a list of commits and is not constricted to usage of '--contains' option. I think that is a brilliant idea, but unlike the other function you added earlier that can do only one object and adopts last one wins rule, this is cumulative, and that fact should be made clear to the developers in some way, no? Will add a comment I didn't mean that. Can't we indicate this with plural somewhere in the name? parse_opt_commits(), parse_opt_commit_into_list(), etc.? Definitely! thanks :) -- Regards, Karthik Nayak -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 00/11] add options to for-each-ref
Just FYI, you can git format-patch -11 my-work~8 or something like that and get 01/11 to 11/11 even if you have more commits that are not yet ready near the tip. I usually do a `git format-patch a..b` but I missed out the b it seems ;-) Thanks! -- Regards, Karthik Nayak -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v7 13/19] fsck: Optionally ignore specific fsck issues completely
Johannes Schindelin johannes.schinde...@gmx.de writes: + git --git-dir=dst/.git branch -D bogus + git --git-dir=dst/.git config --add \ + receive.fsck.missingEmail ignore + git --git-dir=dst/.git config --add \ + receive.fsck.badDate warn Funny double-SP (will locally fix). There are a few other minor style nits (not in this patch but in other patches in the series) like s/free((char *) var)/free((char *)var)/; that I locally added SQUASH, and I may also have tweaked some log messages. Please check what you will see in 'pu' when I push the day's integration out later. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in
Submodule and proxy server.
I am behind a proxy at work and I've setup a git repo with a submodule. Here are the commands I'm executing. $ mkdir MyProject $ cd MyProject $ git init $ git remote add origin http://path/to/repo/MyProject.git $ git config —add remote.origin.proxy $ git pull origin master MyProject has a submodule called MySubmodule The above commands work and have no problem pulling down the files to local. The issue occurs when I attempt to update the submodule $ git submodule init Submodule 'MySubmodule' (http://path/to/submodule/MySubmodule.git) registered for path 'MySubmodule' $ git submodule update Cloning into 'MySubmodule'... fatal: unable to access 'http://http://path/to/submodule/MySubmodule.git/': The requested URL returned error: 502 Clone of 'http://path/to/submodule/MySubmodule' into sub module path 'MySubmodule' failed It seems I am getting a 502. I have a funny feeling that perhaps the submodule command is not inheriting the proxy settings I've configured above. Any help would be appreciated. GIT rocks! -- Jamie -- To unsubscribe from this list: send the line unsubscribe git in
Re: apply --cached --whitespace=fix now failing on items added with add -N
On Mon, Jun 22, 2015 at 9:29 PM, Patrick Higgins phigg...@google.com wrote: I like to use git to remove trailing whitespace from my files. I use the following ~/.gitconfig to make this convenient: [alias] wsadd = !sh -c 'git diff -- \$@\ | git apply --cached --whitespace=fix;\ git checkout -- ${1-.} \$@\' - The wsadd alias doesn't work with new files, so I have to use git add -N on them first. As of a week or two ago, the apply --cached step now fails with the following, assuming a new file named bar containing foo has been added with add -N: $ git diff -- $@ | git apply --cached --whitespace=fix stdin:7: trailing whitespace. foo error: bar: already exists in index The final git checkout at the end of wsadd truncates my file. I've changed the ; to a to fix the truncation. Were there any recent changes to git apply that might have caused this? Probably fallout from this one, merged to 'master' about 5 weeks ago. I'll have a look at it tomorrow unless somebody does it first d95d728 (diff-lib.c: adjust position of i-t-a entries in diff - 2015-03-16) -- Duy -- To unsubscribe from this list: send the line unsubscribe git in
Re: [Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux?
On Mon, Jun 22, 2015 at 03:17:40PM +0200, Bastien Traverse wrote: test case: $ mkdir accent-test cd !$ $ git init $ touch rêve réunion $ git status On branch master Initial commit Untracked files: (use git add file... to include in what will be committed) r\303\251union r\303\252ve Note that these aren't decomposed (in the unicode decomposition sense) but are merely octal escaped representations of the utf-8 encoded file names. My understanding that this is normal and probably dates back (at least for status as far as: commit a734d0b10bd0f5554abb3acdf11426040cfc4df0 Author: Dmitry Potapov dpota...@gmail.com Date: Fri Mar 7 05:30:58 2008 +0300 Make private quote_path() in wt-status.c available as quote_path_relative() [...] The behaviour can be changed by setting the git config variable core.quotePath to false. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 06/19] ref-filter: implement '--merged' and '--no-merged' options
On Mon, Jun 22, 2015 at 6:30 AM, Junio C Hamano gits...@pobox.com wrote: On Sun, Jun 21, 2015 at 1:48 PM, Karthik Nayak karthik@gmail.com wrote: +static void do_merge_filter(struct ref_filter_cbdata *ref_cbdata) +{ + struct rev_info revs; + int i, old_nr; + struct ref_filter *filter = ref_cbdata-filter; + struct ref_array *array = ref_cbdata-array; + struct commit_list *p, *to_clear = NULL; + + init_revisions(revs, NULL); + + for (i = 0; i array-nr; i++) { + struct ref_array_item *item = array-items[i]; + add_pending_object(revs, item-commit-object, item-refname); + commit_list_insert(item-commit, to_clear); Use of commit_list for an array of known number of commits feels somewhat wasteful. Couldn't to_clear be struct commit **to_clear = xcalloc(sizeof(struct commit *), array-nr); instread? Awesome! you're right, will drop the commit_list. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option
On Tue, Jun 23, 2015 at 12:54 AM, Junio C Hamano gits...@pobox.com wrote: Matthieu Moy matthieu@grenoble-inp.fr writes: Karthik Nayak karthik@gmail.com writes: On Mon, Jun 22, 2015 at 6:15 AM, Junio C Hamano gits...@pobox.com wrote: 3 4 as a single patch may make more sense, if we were to tolerate the let's copy paste first and then later remove the duplicate as a way to postpone touching tag.c side in order to first concentrate on for-each-ref. I have not formed a firm opinion on what the right split of the series is, but so far (assuming that the temporary duplication is the best we can do) what I am seeing in this series makes sense to me. Thanks. That would mean squashing 34, 67 and 1011 also on similar lines. I have a slight preference for keeping the pairs not squashed. This way, we have a clear separation write reusable library code / use it. But I'm fine with squashing if others prefer. As I cannot firmly say that copy paste first and then later clean-up is bad and we should split them in different way, I am fine with leaving them separate as they are. Even I think it's better to leave them separate, on the lines of what Matthieu said. -- Regards, Karthik Nayak -- To unsubscribe from this list: send the line unsubscribe git in
Re: Fast enumeration of objects
Jeff King p...@peff.net writes: ... So my conclusions are: 1. Yes, the pipe/parsing overhead of a separate processor really is measurable. That's hidden in the wall-clock time if you have multiple cores, but you may care more about CPU time. I still think the flexibility is worth it. 2. Cutting out the pipe to cat-file is worth doing, as it saves a few seconds. Cutting out %(objecttype) saves a lot, too, and is worth doing. We should teach list-all-objects -v to use cat-file's custom formatters (alternatively, we could just teach cat-file a --batch-all-objects option rather than add a new command). 3. We should teach cat-file a --buffer option to use fwrite. Even if we end up with list-all-objects --format='%(objectsize)' for this task, it would help all the other uses of cat-file. All sounds very sensible. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH] Add list-all-objects command
On Mon, Jun 22, 2015 at 06:33:21AM -0400, Jeff King wrote: On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote: + prepare_packed_git(); + for (p = packed_git; p; p = p-next) { + open_pack_index(p); + } Yikes. The fact that you need to do this means that for_each_packed_object is buggy, IMHO. I'll send a patch. Here's that patch. And since I did not want to pile work on Charles, I went ahead and just implemented the patches I suggested in the other email. I have to say that I think that adding this functionality to cat-file makes a lot of sense. If it only catted files it might be a stretch but as it's already grown --batch-check functionality, it now seems a reasonable extension. I'm not particularly attached to having a standalone list-all-objects command per se. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs
Michael Haggerty mhag...@alum.mit.edu writes: Error out if the ref_transaction includes more than one update for any refname. Signed-off-by: Michael Haggerty mhag...@alum.mit.edu --- refs.c | 11 +++ 1 file changed, 11 insertions(+) This somehow feels like ehh, I now know better and this function should have been like this from the beginning to me. But that is OK. Is the initial creation logic too fragile to deserve its own function to force callers to think about it, by the way? What I am wondering is if we could turn the safety logic that appear here (i.e. no existing refs must be assumed by the set of updates, etc.) into an optimization cue and implement this as a special case helper to ref_transaction_commit(), i.e. ref_transaction_commit(...) { if (updates are all initial creation no existing refs in repository) return initial_ref_transaction_commit(...); /* otherwise we do the usual thing */ ... } and have clone call ref_transaction_commit() as usual. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v3 19/19] delete_ref(): use the usual convention for old_sha1
Michael Haggerty mhag...@alum.mit.edu writes: The ref_transaction_update() family of functions use the following convention for their old_sha1 parameters: * old_sha1 == NULL: Don't check the old value at all. * is_null_sha1(old_sha1): Ensure that the reference didn't exist before the transaction. * otherwise: Ensure that the reference had the specified value before the transaction. delete_ref() had a different convention, namely treating is_null_sha1(old_sha1) as don't care. Change it to adhere to the standard convention to reduce the scope for confusion. Please note that it is now a bug to pass old_sha1=NULL_SHA1 to delete_ref() (because it doesn't make sense to delete a reference that you already know doesn't exist). This is consistent with the behavior of ref_transaction_delete(). Nice. Everything I read in this round of changes makes sense (except for the ones I had minor comments on, which were separately sent). Thanks. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v7 13/19] fsck: Optionally ignore specific fsck issues completely
Hi Junio, On 2015-06-22 20:04, Junio C Hamano wrote: Johannes Schindelin johannes.schinde...@gmx.de writes: +git --git-dir=dst/.git branch -D bogus +git --git-dir=dst/.git config --add \ +receive.fsck.missingEmail ignore +git --git-dir=dst/.git config --add \ +receive.fsck.badDate warn Funny double-SP (will locally fix). There are a few other minor style nits (not in this patch but in other patches in the series) like s/free((char *) var)/free((char *)var)/; that I locally added SQUASH, I fixed all of this locally, ready to push out v8, but... and I may also have tweaked some log messages. Please check what you will see in 'pu' when I push the day's integration out later. ... I did not see that yet. For the record, my current state is available at https://github.com/dscho/git/compare/fsck-api~19...fsck-api I could wait for tomorrow to adjust my commit messages... or what do you want me to do? Ciao, Dscho -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v7 00/19] Introduce an internal API to interact with the fsck machinery
Hi Junio, On 2015-06-22 20:02, Junio C Hamano wrote: Johannes Schindelin johannes.schinde...@gmx.de writes: Changes since v6: - camelCased message IDs - multiple author checking now as suggested by Junio - renamed `--quick` to `--connectivity-only`, better commit message - `fsck.skipList` is now handled correctly (and not mistaken for a message type setting) - `fsck.skipList` can handle user paths now - index-pack configures the walk function in a more logical place now - simplified code by avoiding working on partial strings (i.e. removed `substrcmp()`). This saves 10 lines. To accomodate parsing config variables directly, we now work on lowercased message IDs; unfortunately this means that we cannot use them in append_msg_id() because that function wants to append camelCased message IDs. Interdiff below diffstat. Except for minor nits I sent separate messages, this round looks very nicely done (I however admit that I haven't read the skiplist parsing code carefully at all, expecting that you wouldn't screw up with something simple like that ;-)) Thanks, will replace what is queued. Let's start thinking about moving it down to 'next' (meaning: we _could_ still accept a reroll, but I think we are in a good shape and minor incremental refinements would suffice), cooking it for the remainder of the cycle and having it graduate to 'master' at the beginning of the next cycle. Let me submit a v8 with the borked fixup fixed (i.e. part of 04/19 moved to 03/19, where it really belongs), the `for` style fix, the fixed double space and the cast style, too. Ciao, Dscho -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH] Add list-all-objects command
Jeff King p...@peff.net writes: On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote: + prepare_packed_git(); + for (p = packed_git; p; p = p-next) { + open_pack_index(p); + } Yikes. The fact that you need to do this means that for_each_packed_object is buggy, IMHO. I'll send a patch. Here's that patch. And since I did not want to pile work on Charles, I went ahead and just implemented the patches I suggested in the other email. We may want to take patch 1 separately for the maint-track, as it is really a bug-fix (albeit one that I do not think actually affects anyone in practice right now). Hmph, add_unseen_recent_objects_to_traversal() is the only existing user, and before d3038d22 (prune: keep objects reachable from recent objects, 2014-10-15) added that function, for-each-packed-object existed but had no callers. And the objects not beeing seen by that function (due to lack of open) would matter only for pruning purposes, which would mean you have to be calling into the codepath when running a full repack, so you would have opened all the packs that matter anyway (if you have a old cruft archive pack that contains only objects that are unreachable, you may not have opened that pack, though, and you may prune the thing away prematurely). So, I think I can agree that this would unlikely affect anybody in practice. Patches 2-5 are useful even if we go with Charles' command, as they make cat-file better (cleanups and he new buffer option). Patches 6-7 implement the cat-file option that would be redundant with list-all-objects. By the way, in addition to not showing objects in order, list-all-objects (and my cat-file option) may show duplicates. Do we want to sort -u for the user? It might be nice for them to always get a de-duped and sorted list. Aside from the CPU cost of sorting, it does mean we'll allocate ~80MB for the kernel to store the sha1s. I guess that's not too much when you are talking about the kernel repo. I took the coward's way out and just mentioned the limitation in the documentation, but I'm happy to be persuaded. [1/7]: for_each_packed_object: automatically open pack index [2/7]: cat-file: minor style fix in options list [3/7]: cat-file: move batch_options definition to top of file [4/7]: cat-file: add --buffer option [5/7]: cat-file: stop returning value from batch_one_object [6/7]: cat-file: split batch_one_object into two stages [7/7]: cat-file: add --batch-all-objects option -Peff -- To unsubscribe from this list: send the line unsubscribe git in
[PATCHv6 1/3] git-rebase -i: add command drop to remove a commit
Instead of removing a line to remove the commit, you can use the command drop (just like pick or edit). It has the same effect as deleting the line (removing the commit) except that you keep a visual trace of your actions, allowing a better control and reducing the possibility of removing a commit by mistake. Signed-off-by: Galan Rémi remi.galan-alfo...@ensimag.grenoble-inp.fr --- Documentation/git-rebase.txt | 3 +++ git-rebase--interactive.sh| 3 ++- t/lib-rebase.sh | 4 ++-- t/t3404-rebase-interactive.sh | 16 4 files changed, 23 insertions(+), 3 deletions(-) diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt index 1d01baa..34bd070 100644 --- a/Documentation/git-rebase.txt +++ b/Documentation/git-rebase.txt @@ -514,6 +514,9 @@ rebasing. If you just want to edit the commit message for a commit, replace the command pick with the command reword. +To drop a commit, replace the command pick with drop, or just +delete the matching line. + If you want to fold two or more commits into one, replace the command pick for the second and subsequent commits with squash or fixup. If the commits had different authors, the folded commit will be diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh index dc3133f..72abf90 100644 --- a/git-rebase--interactive.sh +++ b/git-rebase--interactive.sh @@ -152,6 +152,7 @@ Commands: s, squash = use commit, but meld into previous commit f, fixup = like squash, but discard this commit's log message x, exec = run command (the rest of the line) using shell + d, drop = remove commit These lines can be re-ordered; they are executed from top to bottom. @@ -505,7 +506,7 @@ do_next () { rm -f $msg $author_script $amend $state_dir/stopped-sha || exit read -r command sha1 rest $todo case $command in - $comment_char*|''|noop) + $comment_char*|''|noop|drop|d) mark_action_done ;; pick|p) diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh index 6bd2522..fdbc900 100644 --- a/t/lib-rebase.sh +++ b/t/lib-rebase.sh @@ -14,7 +14,7 @@ # specified line. # # cmd lineno -- add a line with the specified command -# (squash, fixup, edit, or reword) and the SHA1 taken +# (squash, fixup, edit, reword or drop) and the SHA1 taken # from the specified line. # # exec_cmd_with_args -- add an exec cmd with args line. @@ -46,7 +46,7 @@ set_fake_editor () { action=pick for line in $FAKE_LINES; do case $line in - squash|fixup|edit|reword) + squash|fixup|edit|reword|drop) action=$line;; exec*) echo $line | sed 's/_/ /g' $1;; diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh index ac429a0..ecd277c 100755 --- a/t/t3404-rebase-interactive.sh +++ b/t/t3404-rebase-interactive.sh @@ -1102,4 +1102,20 @@ test_expect_success 'rebase -i commits that overwrite untracked files (no ff)' ' test $(git cat-file commit HEAD | sed -ne \$p) = I ' +test_rebase_end () { + test_when_finished git checkout master + git branch -D $1 + test_might_fail git rebase --abort + git checkout -b $1 master +} + +test_expect_success 'drop' ' + test_rebase_end dropTest + set_fake_editor + FAKE_LINES=1 drop 2 3 drop 4 5 git rebase -i --root + test E = $(git cat-file commit HEAD | sed -ne \$p) + test C = $(git cat-file commit HEAD^ | sed -ne \$p) + test A = $(git cat-file commit HEAD^^ | sed -ne \$p) +' + test_done -- 2.4.3.371.g8992f2a -- To unsubscribe from this list: send the line unsubscribe git in
[PATCHv6 2/3] git rebase -i: warn about removed commits
Check if commits were removed (i.e. a line was deleted) and print warnings or stop git rebase depending on the value of the configuration variable rebase.missingCommitsCheck. This patch gives the user the possibility to avoid silent loss of information (losing a commit through deleting the line in this case) if he wants. Add the configuration variable rebase.missingCommitsCheck. - When unset or set to ignore, no checking is done. - When set to warn, the commits are checked, warnings are displayed but git rebase still proceeds. - When set to error, the commits are checked, warnings are displayed and the rebase is stopped. (The user can then use 'git rebase --edit-todo' and 'git rebase --continue', or 'git rebase --abort') rebase.missingCommitsCheck defaults to ignore. Signed-off-by: Galan Rémi remi.galan-alfo...@ensimag.grenoble-inp.fr --- On the contrary of v5, the SHA-1 that are read are the short one. (the todo-list is read directly after the user closes his editor and not after expansion of ids) Documentation/config.txt | 11 + Documentation/git-rebase.txt | 6 +++ git-rebase--interactive.sh| 104 -- t/t3404-rebase-interactive.sh | 66 +++ 4 files changed, 184 insertions(+), 3 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index 4d21ce1..25b2a04 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -2160,6 +2160,17 @@ rebase.autoStash:: successful rebase might result in non-trivial conflicts. Defaults to false. +rebase.missingCommitsCheck:: + If set to warn, git rebase -i will print a warning if some + commits are removed (e.g. a line was deleted), however the + rebase will still proceed. If set to error, it will print + the previous warning and stop the rebase, 'git rebase + --edit-todo' can then be used to correct the error. If set to + ignore, no checking is done. + To drop a commit without warning or error, use the `drop` + command in the todo-list. + Defaults to ignore. + receive.advertiseAtomic:: By default, git-receive-pack will advertise the atomic push capability to its clients. If you don't want to this capability diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt index 34bd070..2ca3b8d 100644 --- a/Documentation/git-rebase.txt +++ b/Documentation/git-rebase.txt @@ -213,6 +213,12 @@ rebase.autoSquash:: rebase.autoStash:: If set to true enable '--autostash' option by default. +rebase.missingCommitsCheck:: + If set to warn, print warnings about removed commits in + interactive mode. If set to error, print the warnings and + stop the rebase. If set to ignore, no checking is + done. ignore by default. + OPTIONS --- --onto newbase:: diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh index 72abf90..66daacf 100644 --- a/git-rebase--interactive.sh +++ b/git-rebase--interactive.sh @@ -834,6 +834,104 @@ add_exec_commands () { mv $1.new $1 } +# Print the list of the SHA-1 of the commits +# from stdin to stdout +todo_list_to_sha_list () { + git stripspace --strip-comments | + while read -r command sha1 rest + do + case $command in + $comment_char*|''|noop|x|exec) + ;; + *) + long_sha=$(git rev-list --no-walk $sha1 2/dev/null) + printf %s\n $long_sha + ;; + esac + done +} + +# Use warn for each line in stdin +warn_lines () { + while read -r line + do + warn - $line + done +} + +# Switch to the branch in $into and notify it in the reflog +checkout_onto () { + GIT_REFLOG_ACTION=$GIT_REFLOG_ACTION: checkout $onto_name + output git checkout $onto || die_abort could not detach HEAD + git update-ref ORIG_HEAD $orig_head +} + +# Check if the user dropped some commits by mistake +# Behaviour determined by rebase.missingCommitsCheck. +check_todo_list () { + raise_error=f + + check_level=$(git config --get rebase.missingCommitsCheck) + check_level=${check_level:-ignore} + # Don't be case sensitive + check_level=$(printf '%s' $check_level | tr 'A-Z' 'a-z') + + case $check_level in + warn|error) + # Get the SHA-1 of the commits + todo_list_to_sha_list $todo.backup $todo.oldsha1 + todo_list_to_sha_list $todo $todo.newsha1 + + # Sort the SHA-1 and compare them + sort -u $todo.oldsha1 $todo.oldsha1+ + mv $todo.oldsha1+ $todo.oldsha1 + sort -u $todo.newsha1 $todo.newsha1+ + mv $todo.newsha1+ $todo.newsha1 + comm -2 -3 $todo.oldsha1 $todo.newsha1 $todo.miss + + # Warn about
[PATCHv6 3/3] git rebase -i: add static check for commands and SHA-1
Check before the start of the rebasing if the commands exists, and for the commands expecting a SHA-1, check if the SHA-1 is present and corresponds to a commit. In case of error, print the error, stop git rebase and prompt the user to fix with 'git rebase --edit-todo' or to abort. This allows to avoid doing half of a rebase before finding an error and giving back what's left of the todo list to the user and prompt him to fix when it might be too late for him to do so (he might have to abort and restart the rebase). Signed-off-by: Galan Rémi remi.galan-alfo...@ensimag.grenoble-inp.fr --- I used: read -r command sha1 rest EOF $line EOF because printf '%s' $line | read -r command sha1 rest doesn't work (the 3 variables have no value as a result). There might be a better way to do this, but I don't have it right now. git-rebase--interactive.sh| 70 +++ t/lib-rebase.sh | 5 t/t3404-rebase-interactive.sh | 40 + 3 files changed, 115 insertions(+) diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh index 66daacf..6381686 100644 --- a/git-rebase--interactive.sh +++ b/git-rebase--interactive.sh @@ -834,6 +834,52 @@ add_exec_commands () { mv $1.new $1 } +# Check if the SHA-1 passed as an argument is a +# correct one, if not then print $2 in $todo.badsha +# $1: the SHA-1 to test +# $2: the line to display if incorrect SHA-1 +check_commit_sha () { + badsha=f + if test -z $1 + then + badsha=t + else + sha1_verif=$(git rev-parse --verify --quiet $1^{commit}) + if test -z $sha1_verif + then + badsha=t + fi + fi + + if test $badsha = t + then + printf '%s\n' $2 $todo.badsha + fi +} + +# prints the bad commits and bad commands +# from the todolist in stdin +check_bad_cmd_and_sha () { + git stripspace --strip-comments | + while read -r line + do + read -r command sha1 rest EOF +$line +EOF + case $command in + ''|noop|x|exec) + # Doesn't expect a SHA-1 + ;; + pick|p|drop|d|reword|r|edit|e|squash|s|fixup|f) + check_commit_sha $sha1 $line + ;; + *) + printf '%s\n' $line $todo.badcmd + ;; + esac + done +} + # Print the list of the SHA-1 of the commits # from stdin to stdout todo_list_to_sha_list () { @@ -868,6 +914,8 @@ checkout_onto () { # Check if the user dropped some commits by mistake # Behaviour determined by rebase.missingCommitsCheck. +# Check if there is an unrecognized command or a +# bad SHA-1 in a command. check_todo_list () { raise_error=f @@ -919,6 +967,28 @@ check_todo_list () { ;; esac + check_bad_cmd_and_sha $todo + + if test -s $todo.badsha + then + raise_error=t + + warn Warning: the SHA-1 is missing or isn't \ + a commit in the following line(s): + warn_lines $todo.badsha + warn + fi + + if test -s $todo.badcmd + then + raise_error=t + + warn Warning: the command isn't recognized \ + in the following line(s): + warn_lines $todo.badcmd + warn + fi + if test $raise_error = t then # Checkout before the first commit of the diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh index fdbc900..9a96e15 100644 --- a/t/lib-rebase.sh +++ b/t/lib-rebase.sh @@ -54,6 +54,11 @@ set_fake_editor () { echo '# comment' $1;; ) echo $1;; + bad) + action=badcmd;; + fakesha) + echo $action XXX False commit $1 + action=pick;; *) sed -n ${line}s/^pick/$action/p $1.tmp $1 action=pick;; diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh index a92ae19..d691b1c 100755 --- a/t/t3404-rebase-interactive.sh +++ b/t/t3404-rebase-interactive.sh @@ -1184,4 +1184,44 @@ test_expect_success 'rebase -i respects rebase.missingCommitsCheck = error' ' test B = $(git cat-file commit HEAD^ | sed -ne \$p) ' +cat expect EOF +Warning: the command isn't recognized in the following line(s): + - badcmd $(git rev-list --oneline -1 master~1) + +You can fix this with 'git rebase --edit-todo'. +Or you can abort the rebase with 'git rebase --abort'. +EOF + +test_expect_success 'static check of bad command' ' + test_rebase_end tmp2 + set_fake_editor + test_must_fail env FAKE_LINES=1 2 3 bad 4 5 \ +
Re: [PATCH v4 03/19] ref-filter: implement '--points-at' option
On Sun, Jun 21, 2015 at 4:48 PM, Karthik Nayak karthik@gmail.com wrote: In 'tag -l' we have '--points-at' option which lets users list only tags which point to a particular commit. Implement this option in 'ref-filter.{c,h}' so that other commands can benefit from this. This is duplicated from tag.c, we will eventually remove that when we port tag.c to use ref-filter APIs. Based-on-patch-by: Jeff King p...@peff.net Mentored-by: Christian Couder christian.cou...@gmail.com Mentored-by: Matthieu Moy matthieu@grenoble-inp.fr Signed-off-by: Karthik Nayak karthik@gmail.com --- diff --git a/builtin/tag.c b/builtin/tag.c index e36c43e..280981f 100644 --- a/builtin/tag.c +++ b/builtin/tag.c @@ -56,6 +56,10 @@ static int match_pattern(const char **patterns, const char *ref) return 0; } +/* + * This is currently duplicated in ref-filter.c, and will eventually be + * removed as we port tag.c to use the ref-filter APIs. + */ static const unsigned char *match_points_at(const char *refname, const unsigned char *sha1) { diff --git a/ref-filter.c b/ref-filter.c index 43502a4..591e281 100644 --- a/ref-filter.c +++ b/ref-filter.c @@ -842,6 +842,29 @@ static int match_name_as_path(const char **pattern, const char *refname) return 0; } +/* + * Given a ref (sha1, refname) see if it points to a one of the sha1s s/a one/one/ + * in a sha1_array. + */ +static int match_points_at(struct sha1_array *points_at, const unsigned char *sha1, + const char *refname) +{ + struct object *obj; + + if (!points_at || !points_at-nr) + return 1; + + if (sha1_array_lookup(points_at, sha1) = 0) + return 1; + + obj = parse_object_or_die(sha1, refname); + if (obj-type == OBJ_TAG + sha1_array_lookup(points_at, ((struct tag *)obj)-tagged-sha1) = 0) + return 1; + + return 0; +} -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 07/19] for-each-ref: add '--merged' and '--no-merged' options
On Sun, Jun 21, 2015 at 4:48 PM, Karthik Nayak karthik@gmail.com wrote: Add the '--merged' and '--no-merged' options provided by 'ref-filter'. The '--merged' option lets the user to only list refs merged into the named commit. The '--no-merged' option lets the user to only list refs not merged into the named commit. Add documentation and tests for the same. Based-on-patch-by: Jeff King p...@peff.net Mentored-by: Christian Couder christian.cou...@gmail.com Mentored-by: Matthieu Moy matthieu@grenoble-inp.fr Signed-off-by: Karthik Nayak karthik@gmail.com --- diff --git a/Documentation/git-for-each-ref.txt b/Documentation/git-for-each-ref.txt index 0ede41d..c6dcd99 100644 --- a/Documentation/git-for-each-ref.txt +++ b/Documentation/git-for-each-ref.txt @@ -10,7 +10,7 @@ SYNOPSIS [verse] 'git for-each-ref' [--count=count] [--shell|--perl|--python|--tcl] [(--sort=key)...] [--format=format] [pattern...] - [--points-at object] + [--points-at object] [(--merged | --no-merged) object] According to the documentation you added to the OPTIONS section, the object following --merged and --no-merged is optional. Therefore, shouldn't this be s/object/[object]/ ? Also, in the OPTIONS section, you spelled it commit rather than object. DESCRIPTION --- @@ -66,6 +66,14 @@ OPTIONS --points-at object:: Only list refs pointing to the given object. +--merged [commit]:: + Only list refs whose tips are reachable from the + specified commit (HEAD if not specified). + +--no-merged [commit]:: + Only list refs whose tips are not reachable from the + specified commit (HEAD if not specified). + FIELD NAMES --- diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c index 46f9b05..09d48da 100644 --- a/builtin/for-each-ref.c +++ b/builtin/for-each-ref.c @@ -8,6 +8,7 @@ static char const * const for_each_ref_usage[] = { N_(git for-each-ref [options] [pattern]), N_(git for-each-ref [--points-at object]), + N_(git for-each-ref [(--merged | --no-merged) object]), Ditto: s/object/[object]/ NULL }; -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c
Charles Bailey char...@hashpling.org writes: On Sun, Jun 21, 2015 at 07:25:44PM +0100, Charles Bailey wrote: From: Charles Bailey cbaile...@bloomberg.net diff --git a/parse-options.c b/parse-options.c index 80106c0..101b649 100644 --- a/parse-options.c +++ b/parse-options.c @@ -180,6 +180,23 @@ static int get_value(struct parse_opt_ctx_t *p, return opterror(opt, expects a numerical value, flags); return 0; +case OPTION_MAGNITUDE: +if (unset) { +*(unsigned long *)opt-value = 0; +return 0; +} +if (opt-flags PARSE_OPT_OPTARG !p-opt) { +*(unsigned long *)opt-value = opt-defval; +return 0; +} +if (get_arg(p, opt, flags, arg)) +return -1; +if (!git_parse_ulong(arg, opt-value)) +return opterror(opt, +expects a integer value with an optional k/m/g suffix, +flags); +return 0; + Spotted after sending: s/expects a integer/expects an integer/ Thanks. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects
On Mon, Jun 22, 2015 at 07:06:32AM -0400, Jeff King wrote: On Mon, Jun 22, 2015 at 06:33:21AM -0400, Jeff King wrote: By the way, in addition to not showing objects in order, list-all-objects (and my cat-file option) may show duplicates. Do we want to sort -u for the user? It might be nice for them to always get a de-duped and sorted list. Aside from the CPU cost of sorting, it does mean we'll allocate ~80MB for the kernel to store the sha1s. I guess that's not too much when you are talking about the kernel repo. I took the coward's way out and just mentioned the limitation in the documentation, but I'm happy to be persuaded. The patch below does the sort/de-dup. I'd probably just squash it into patch 7, though. Woah, 8 out of 7! Did you get a chance to measure the performance hit of the sort? If not, I may test it out when I next get the chance. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option
On Sun, Jun 21, 2015 at 4:48 PM, Karthik Nayak karthik@gmail.com wrote: Add the '--points-at' option provided by 'ref-filter'. The option lets the user to pick only refs which point to a particular commit. Add documentation and tests for the same. Based-on-patch-by: Jeff King p...@peff.net Mentored-by: Christian Couder christian.cou...@gmail.com Mentored-by: Matthieu Moy matthieu@grenoble-inp.fr Signed-off-by: Karthik Nayak karthik@gmail.com --- diff --git a/t/t6301-for-each-ref-filter.sh b/t/t6301-for-each-ref-filter.sh index b1fa8d4..67de3a7 100755 --- a/t/t6301-for-each-ref-filter.sh +++ b/t/t6301-for-each-ref-filter.sh @@ -16,4 +16,24 @@ test_expect_success 'setup some history and refs' ' git update-ref refs/odd/spot master ' +test_expect_success 'filtering with --points-at' ' + cat expect -\EOF + refs/heads/master + refs/odd/spot + refs/tags/three + EOF + git for-each-ref --format=%(refname) --points-at=master actual + test_cmp expect actual +' + +test_expect_success 'check signed tags with --points-at' ' + cat expect -\EOF + refs/heads/side + refs/tags/four + refs/tags/signed-tag four + EOF + git for-each-ref --format=%(refname) %(*subject) --points-at=side actual s/for-each-ref\s+/for-each-ref / + test_cmp expect actual +' + test_done -- 2.4.3.439.gfea0c2a.dirty -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH 1/3] contrib/subtree: Use tabs consitently for indentation in tests
On Mon, Jun 22, 2015 at 9:53 AM, Charles Bailey char...@hashpling.org wrote: contrib/subtree: Use tabs consitently for indentation in tests s/consitently/consistently/ Although subtrees tests uses more spaces for indentation than tabs, there are still quite a lot of lines indented with tabs. As tabs conform with Git coding guidelines resolve the inconsistency in favour of tabs. Signed-off-by: Charles Bailey cbaile...@bloomberg.net -- To unsubscribe from this list: send the line unsubscribe git in
Re: Improvements to integer option parsing
On 22 Jun 2015, at 23:09, Junio C Hamano gits...@pobox.com wrote: Charles Bailey char...@hashpling.org writes: - marginally improved the opterror message on failed parses I'd queue with s/a integer/a non-negative integer/. Ha! That's what I had before I submitted, but then the source line got quite long (which could have been split) and the generated message got quite long as well so I cropped it. This was probably the source of the grammar mistake. If you're happy with the longer message, I am happy with it too. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects
On Mon, Jun 22, 2015 at 11:03:50PM +0100, Charles Bailey wrote: The patch below does the sort/de-dup. I'd probably just squash it into patch 7, though. Woah, 8 out of 7! Did you get a chance to measure the performance hit of the sort? If not, I may test it out when I next get the chance. No, that last patch was my eh, one more thing before bed patch. ;) It's easy enough to time, though. Running: git cat-file --batch-all-objects \ --batch-check='%(objectsize) %(objectname)' \ --buffer /dev/null on linux.git, my best-of-five goes from (no sorting): real0m3.604s user0m3.556s sys 0m0.048s to (with sorting): real0m4.053s user0m4.004s sys 0m0.052s So it does matter, but not too much. We could de-dup with a hash table, which might be a little faster, but I doubt it would make much difference. It's also mostly in sorted order already; it's possible that a merge sort would behave a little better. I'm not sure how deep it's worth going into that rabbit hole. -Peff -- To unsubscribe from this list: send the line unsubscribe git in
config commands not working _Noobe question
after adding git config ‹global user.name Greg Ledger and git config ‹global user.email gled...@glcdelivers.com, when I run: source ~/.gitconfig I get -bash: [user]: command not found -bash: name: command not found -bash: email: command not found -bash: [color]: command not found -bash: ui: command not found What gives? (I had to download xcode tools before I could do my first git init Greg Ledger Web Developer E-Marketing Specialist | GLC 847.205.3125 gled...@glcdelivers.com www.glcdelivers.com http://www.glcdelivers.com -- To unsubscribe from this list: send the line unsubscribe git in
Re: Improvements to integer option parsing
Charles Bailey char...@hashpling.org writes: This is a re-roll of the first two patches in my previous series which used to include filter-objects which is now a separate topic. [PATCH 1/2] Correct test-parse-options to handle negative ints The first one has changed only in that I've moved the additional test to a more logical place in the test file. [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c I've made the following changes to the second commit: - renamed this OPT_MAGNITUDE to try and convey something that is both unsigned and might benefit from a 'scale' suffix. I'm expecting more discussion on the name! I think that name is very sensible. - marginally improved the opterror message on failed parses I'd queue with s/a integer/a non-negative integer/. - noted the change in behavior for the error messages generated for pack-objects' --max-pack-size and --window-memory in the commit message Thanks. Queued. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c
Charles Bailey char...@hashpling.org writes: From: Charles Bailey cbaile...@bloomberg.net The unsigned long option parsing (including 'k'/'m'/'g' suffix parsing) is more widely applicable. Add support for OPT_MAGNITUDE to parse-options.h and change pack-objects.c use this support. The error behavior on parse errors follows that of OPT_INTEGER. The name of the option that failed to parse is reported with a brief message describing the expect format for the option argument and then the full usage message for the command invoked. This is differs from the previous behavior for OPT_ULONG used in s/is //; (locally fixed--no need to resend). -- To unsubscribe from this list: send the line unsubscribe git in
What's cooking in git.git (Jun 2015, #05; Mon, 22)
Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. Some of the topics in flight have overlaps with each other and have been excluded from 'pu'; most notably, I think the remainder of bc/object-id needs to wait until the for-each-ref topic from Karthik settles and then rebased on it, or something. Hopefully the pre-release freeze can start late this week, and the next cycle would reopen mid next month. In the meantime, let's shift the focus on making sure that what has already been merged to 'master' are good (i.e. regression hunting and fixes). You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -- [New Topics] * js/rebase-i-clean-up-upon-continue-to-skip (2015-06-18) 3 commits - rebase -i: do not leave a CHERRY_PICK_HEAD file behind - SQUASH: test_must_fail is a shell function - t3404: demonstrate CHERRY_PICK_HEAD bug Abandoning an already applied change in git rebase -i with --continue left CHERRY_PICK_HEAD and confused later steps. Expecting a reroll. ($gmane/271856) * me/fetch-into-shallow-safety (2015-06-17) 1 commit - fetch-pack: check for shallow if depth given git fetch --depth=depth and git clone --depth=depth issued a shallow transfer request even to an upload-pack that does not support the capability. Will merge to 'next'. * mm/describe-doc (2015-06-16) 1 commit - Documentation/describe: improve one-line summary Will merge to 'next'. * rh/test-color-avoid-terminfo-in-original-home (2015-06-17) 2 commits - test-lib.sh: fix color support when tput needs ~/.terminfo - Revert test-lib.sh: do tests for color support after changing HOME An ancient test framework enhancement to allow color was not entirely correct; this makes it work even when tput needs to read from the ~/.terminfo under the user's real HOME directory. Will merge to 'next'. * tb/checkout-doc (2015-06-17) 1 commit - git-checkout.txt: document git checkout pathspec better $gmane/271812 * jk/pretty-encoding-doc (2015-06-17) 1 commit - docs: clarify that --encoding can produce invalid sequences $gmane/271879 * ak/format-patch-odir-config (2015-06-19) 1 commit - format-patch: introduce format.outputDirectory configuration Reroll exists but didn't pick it up as it seems to be still collecting review comments. * bc/gpg-verify-raw (2015-06-22) 7 commits - verify-tag: add option to print raw gpg status information - verify-commit: add option to print raw gpg status information - gpg: centralize printing signature buffers - gpg: centralize signature check - verify-commit: add test for exit status on untrusted signature - verify-tag: share code with verify-commit - verify-tag: add tests git verify-tag and git verify-commit have been taught to share more code, and then learned to optionally show the verification message from the underlying GPG implementation. Will merge to 'next'. * cb/parse-magnitude (2015-06-22) 2 commits - parse-options: move unsigned long option parsing out of pack-objects.c - test-parse-options: update to handle negative ints Move machinery to parse human-readable scaled numbers like 1k, 4M, and 2G as an option parameter's value from pack-objects to parse-options API, to make it available to other codepaths. Will merge to 'next'. * cb/subtree-tests-update (2015-06-22) 3 commits - contrib/subtree: small tidy-up to test - contrib/subtree: fix broken -chains and revealed test error - contrib/subtree: use tabs consitently for indentation in tests Will merge to 'next'. * da/mergetool-winmerge (2015-06-19) 1 commit - mergetool-lib: fix default tool selection An earlier change already in 'master' broke the default tool selection for mergetool. Will merge to 'next'. * jk/cat-file-batch-all (2015-06-22) 8 commits - cat-file: sort and de-dup output of --batch-all-objects - cat-file: add --batch-all-objects option - cat-file: split batch_one_object into two stages - cat-file: stop returning value from batch_one_object - cat-file: add --buffer option - cat-file: move batch_options definition to top of file - cat-file: minor style fix in options list - Merge branch 'jk/maint-for-each-packed-object' into jk/cat-file-batch-all (this branch uses jk/maint-for-each-packed-object.) cat-file learned --batch-all-objects option to enumerate all available objects in the repository more quickly than rev-list --all --objects (the output includes unreachable objects, though). * jk/maint-for-each-packed-object (2015-06-22) 1 commit - for_each_packed_object: automatically open pack index (this branch is used by jk/cat-file-batch-all.) The for_each_packed_object() API function did not iterate over objects in a packfile that hasn't been used yet. Will merge to 'next'.
Re: [PATCH] Add list-all-objects command
On Mon, Jun 22, 2015 at 02:50:10PM -0700, Junio C Hamano wrote: We may want to take patch 1 separately for the maint-track, as it is really a bug-fix (albeit one that I do not think actually affects anyone in practice right now). Hmph, add_unseen_recent_objects_to_traversal() is the only existing user, and before d3038d22 (prune: keep objects reachable from recent objects, 2014-10-15) added that function, for-each-packed-object existed but had no callers. I think that is because it was added by d3038d22^. :) And the objects not beeing seen by that function (due to lack of open) would matter only for pruning purposes, which would mean you have to be calling into the codepath when running a full repack, so you would have opened all the packs that matter anyway (if you have a old cruft archive pack that contains only objects that are unreachable, you may not have opened that pack, though, and you may prune the thing away prematurely). Exactly, that matches my analysis. So, I think I can agree that this would unlikely affect anybody in practice. Yep. I am OK if we do not even worry about it for maint, then. -Peff -- To unsubscribe from this list: send the line unsubscribe git in
RFC/Pull Request: Refs db backend
I've revived and modified Ronnie Sahlberg's work on the refs db backend. The work is on top of be3c13e5564, Junio's First batch for 2.5 cycle. I recognize that there have been changes to the refs code since then, and that there are some further changes in-flight from e.g. Michael Haggerty. If there is interest in this, I can rebase once Michael's changes land. The changes can be found here: https://github.com/dturner-tw/git.git on the dturner/pluggable-backends branch The db backend code was added in the penultimate commit; the rest is just code rearrangement and minor changes to make alternate backends possible. There ended up being a fair amount of this rearrangement, but the end result is that almost the entire git test suite runs under the db backend without error (see below for details). The db backend runs git for-each-ref about 30% faster than the files backend with fully-packed refs on a repo with ~120k refs. It's also about 4x faster than using fully-unpacked refs. In addition, and perhaps more importantly, it avoids case-conflict issues on OS X. I chose to use LMDB for the database. LMDB has a few features that make it suitable for usage in git: 1. It is relatively lightweight; it requires only one header file, and the library itself is under 300k (as opposed to 700k for e.g. sqlite). 2. It is well-tested: it's been used in OpenLDAP for years. 3. It's very fast. LMDB's benchmarks show that it is among the fastest key-value stores. 4. It has a relatively simple concurrency story; readers don't block writers and writers don't block readers. Ronnie Sahlberg's original version of this patchset used tdb. The advantage of tdb is that it's smaller (~125k). The disadvantages are that tdb is hard to build on OS X. It's also not in homebrew. So lmdb seemed simpler. To test this backend's correctness, I hacked test-lib.sh and test-lib-functions.sh to run all tests under the refs backend. Dozens of tests use manual ref/reflog reading/writing, or create submodules without passing --refs-backend-type to git init. If those tests are changed to use the update-ref machinery or test-refs-be-db (or, in the case of packed-refs, corrupt refs, and dumb fetch tests, are skipped), the only remaining failing tests are the git-new-workdir tests and the gitweb tests. Please let me know how it would be best to proceed. -- To unsubscribe from this list: send the line unsubscribe git in
Re: 'eol' documentation confusion
On 06/21/2015 04:16 PM, Robert Dailey wrote: On Sun, Jun 21, 2015 at 9:04 AM, Robert Dailey rcdailey.li...@gmail.com wrote: Upon inspection of the gitattributes documentation page here: https://git-scm.com/docs/gitattributes When comparing the documentation for 'text' with 'eol', I see the following missing explanations for 'eol': * eol * -eol Maybe the fact that these are missing means they are not valid to use. There is also the issue that `text` usually controls EOL anyway. Is there ever any reason to set eol in a way differently than explained in the documentation (that is, `eol=lf` or `eol=crlf`)? No. eol=lf or eol=crlf are the only useful settings. Everything else is ignored because it does not make sense. See convert.c: static enum eol git_path_check_eol() -- To unsubscribe from this list: send the line unsubscribe git in
Re: Sporadic test failures on OSX 10.10.3
On Sat, Jun 20, 2015 at 5:47 AM, Heiko Voigt hvo...@hvoigt.net wrote: I am currently experiencing sporadic test failures on Mac OS X 10.10.3: Test Summary Report --- t7503-pre-commit-hook.sh (Wstat: 256 Tests: 11 Failed: 1) Failed test: 9 Non-zero exit status: 1 t7502-commit.sh (Wstat: 256 Tests: 64 Failed: 1) Failed test: 59 Non-zero exit status: 1 t7407-submodule-foreach.sh (Wstat: 256 Tests: 17 Failed: 1) Failed test: 14 Non-zero exit status: 1 t7406-submodule-update.sh(Wstat: 256 Tests: 43 Failed: 2) Failed tests: 36-37 Non-zero exit status: 1 Files=702, Tests=12559, 618 wallclock secs ( 6.56 usr 2.08 sys + 716.50 cusr 932.03 csys = 1657.17 CPU) Result: FAIL make[1]: *** [prove] Error 1 make: *** [test] Error 2 When I execute the tests individually they succeed. Sometimes running the testsuite succeeds sometimes it does not. For example now running the testsuite the first and second time showed no failures. Only the third run revealed the above. Is anyone else experiencing this as well? I am seeing this on Junios master (f86f31ab33c3). The list of failed tests varies sometimes. Confirmed. -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH 6/7] cat-file: split batch_one_object into two stages
There are really two things going on in this function: 1. We convert the name we got on stdin to a sha1. 2. We look up and print information on the sha1. Let's split out the second half so that we can call it separately. Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 39 +++ 1 file changed, 23 insertions(+), 16 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 7d99c15..499ccda 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -251,10 +251,31 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d } } +static void batch_object_write(const char *obj_name, struct batch_options *opt, + struct expand_data *data) +{ + struct strbuf buf = STRBUF_INIT; + + if (sha1_object_info_extended(data-sha1, data-info, LOOKUP_REPLACE_OBJECT) 0) { + printf(%s missing\n, obj_name); + fflush(stdout); + return; + } + + strbuf_expand(buf, opt-format, expand_format, data); + strbuf_addch(buf, '\n'); + batch_write(opt, buf.buf, buf.len); + strbuf_release(buf); + + if (opt-print_contents) { + print_object_or_die(opt, data); + batch_write(opt, \n, 1); + } +} + static void batch_one_object(const char *obj_name, struct batch_options *opt, struct expand_data *data) { - struct strbuf buf = STRBUF_INIT; struct object_context ctx; int flags = opt-follow_symlinks ? GET_SHA1_FOLLOW_SYMLINKS : 0; enum follow_symlinks_result result; @@ -294,21 +315,7 @@ static void batch_one_object(const char *obj_name, struct batch_options *opt, return; } - if (sha1_object_info_extended(data-sha1, data-info, LOOKUP_REPLACE_OBJECT) 0) { - printf(%s missing\n, obj_name); - fflush(stdout); - return; - } - - strbuf_expand(buf, opt-format, expand_format, data); - strbuf_addch(buf, '\n'); - batch_write(opt, buf.buf, buf.len); - strbuf_release(buf); - - if (opt-print_contents) { - print_object_or_die(opt, data); - batch_write(opt, \n, 1); - } + batch_object_write(obj_name, opt, data); } static int batch_objects(struct batch_options *opt) -- 2.4.4.719.g3984bc6 -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH] Add list-all-objects command
On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote: + prepare_packed_git(); + for (p = packed_git; p; p = p-next) { + open_pack_index(p); + } Yikes. The fact that you need to do this means that for_each_packed_object is buggy, IMHO. I'll send a patch. Here's that patch. And since I did not want to pile work on Charles, I went ahead and just implemented the patches I suggested in the other email. We may want to take patch 1 separately for the maint-track, as it is really a bug-fix (albeit one that I do not think actually affects anyone in practice right now). Patches 2-5 are useful even if we go with Charles' command, as they make cat-file better (cleanups and he new buffer option). Patches 6-7 implement the cat-file option that would be redundant with list-all-objects. By the way, in addition to not showing objects in order, list-all-objects (and my cat-file option) may show duplicates. Do we want to sort -u for the user? It might be nice for them to always get a de-duped and sorted list. Aside from the CPU cost of sorting, it does mean we'll allocate ~80MB for the kernel to store the sha1s. I guess that's not too much when you are talking about the kernel repo. I took the coward's way out and just mentioned the limitation in the documentation, but I'm happy to be persuaded. [1/7]: for_each_packed_object: automatically open pack index [2/7]: cat-file: minor style fix in options list [3/7]: cat-file: move batch_options definition to top of file [4/7]: cat-file: add --buffer option [5/7]: cat-file: stop returning value from batch_one_object [6/7]: cat-file: split batch_one_object into two stages [7/7]: cat-file: add --batch-all-objects option -Peff -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH 1/7] for_each_packed_object: automatically open pack index
When for_each_packed_object is called, we call prepare_packed_git() to make sure we have the actual list of packs. But the latter does not actually open the pack indices, meaning that pack-nr_objects may simply be 0 if the pack has not otherwise been used since the program started. In practice, this didn't come up for the current callers, because they iterate the packed objects only after iterating all reachable objects (so for it to matter you would have to have a pack consisting only of unreachable objects). But it is a dangerous and confusing interface that should be fixed for future callers. Note that we do not end the iteration when a pack cannot be opened, but we do return an error. That lets you complete the iteration even in actively-repacked repository where an .idx file may racily go away, but it also lets callers know that they may not have gotten the complete list (which the current reachability-check caller does care about). We have to tweak one of the prune tests due to the changed return value; an earlier test creates bogus .idx files and does not clean them up. Having to make this tweak is a good thing; it means we will not prune in a broken repository, and the test confirms that we do not negatively impact a more lenient caller, count-objects. Signed-off-by: Jeff King p...@peff.net --- sha1_file.c | 7 ++- t/t5304-prune.sh | 1 + 2 files changed, 7 insertions(+), 1 deletion(-) diff --git a/sha1_file.c b/sha1_file.c index 5038475..f1f0efb 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -3573,14 +3573,19 @@ int for_each_packed_object(each_packed_object_fn cb, void *data, unsigned flags) { struct packed_git *p; int r = 0; + int pack_errors = 0; prepare_packed_git(); for (p = packed_git; p; p = p-next) { if ((flags FOR_EACH_OBJECT_LOCAL_ONLY) !p-pack_local) continue; + if (open_pack_index(p)) { + pack_errors = 1; + continue; + } r = for_each_object_in_pack(p, cb, data); if (r) break; } - return r; + return r ? r : pack_errors; } diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh index 0794d33..023d7c6 100755 --- a/t/t5304-prune.sh +++ b/t/t5304-prune.sh @@ -218,6 +218,7 @@ test_expect_success 'gc: prune old objects after local clone' ' ' test_expect_success 'garbage report in count-objects -v' ' + test_when_finished rm -f .git/objects/pack/fake* : .git/objects/pack/foo : .git/objects/pack/foo.bar : .git/objects/pack/foo.keep -- 2.4.4.719.g3984bc6 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH 3/7] cat-file: move batch_options definition to top of file
That way all of the functions can make use of it. Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 6cb..d4101b7 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -10,6 +10,13 @@ #include streaming.h #include tree-walk.h +struct batch_options { + int enabled; + int follow_symlinks; + int print_contents; + const char *format; +}; + static int cat_one_file(int opt, const char *exp_type, const char *obj_name, int unknown_type) { @@ -232,12 +239,6 @@ static void print_object_or_die(int fd, struct expand_data *data) } } -struct batch_options { - int enabled; - int follow_symlinks; - int print_contents; - const char *format; -}; static int batch_one_object(const char *obj_name, struct batch_options *opt, struct expand_data *data) -- 2.4.4.719.g3984bc6 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH 2/7] cat-file: minor style fix in options list
We do not put extra whitespace before the first macro argument. Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 049a95f..6cb 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -412,7 +412,7 @@ int cmd_cat_file(int argc, const char **argv, const char *prefix) OPT_CMDMODE('p', NULL, opt, N_(pretty-print object's content), 'p'), OPT_CMDMODE(0, textconv, opt, N_(for blob objects, run textconv on object's content), 'c'), - OPT_BOOL( 0, allow-unknown-type, unknown_type, + OPT_BOOL(0, allow-unknown-type, unknown_type, N_(allow -s and -t to work with broken/corrupt objects)), { OPTION_CALLBACK, 0, batch, batch, format, N_(show info and content of objects fed from the standard input), -- 2.4.4.719.g3984bc6 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects
On Mon, Jun 22, 2015 at 06:33:21AM -0400, Jeff King wrote: By the way, in addition to not showing objects in order, list-all-objects (and my cat-file option) may show duplicates. Do we want to sort -u for the user? It might be nice for them to always get a de-duped and sorted list. Aside from the CPU cost of sorting, it does mean we'll allocate ~80MB for the kernel to store the sha1s. I guess that's not too much when you are talking about the kernel repo. I took the coward's way out and just mentioned the limitation in the documentation, but I'm happy to be persuaded. The patch below does the sort/de-dup. I'd probably just squash it into patch 7, though. I did have one additional thought, though. We are treating this as two separate operations: what are the sha1s in the repo and show me information about this sha1. But by integrating with cat-file, we could actually show information not just about a particular sha1, but about a particular on-disk object. E.g., if there are duplicates of a particular object, some formatters like %(objectsize:disk) and %(deltabase) pick one arbitrarily to show. I don't know if anybody actually cares about that in practice, but if we show duplicates, we could give the accurate information for each instance (and in fact we could give other information like loose vs packed, which file contains the object, etc). I tend to think that the lack of de-duping is sufficiently confusing that it should be the default, and we can always add a no really, show me the duplicates option later. It is not as simple as skipping the de-dup step. We'd have to actually avoid calling sha1_object_info, and use the information found in the loose/pack traversal (which would in turn require exposing the low-level bits of sha1_object_info). -- 8 -- Subject: cat-file: sort and de-dup output of --batch-all-objects The sorting we could probably live without, but printing duplicates is just a hassle for the user, who must then de-dup themselves (or risk a wrong answer if they are doing something like counting objects with a particular property). Signed-off-by: Jeff King p...@peff.net --- Documentation/git-cat-file.txt | 3 +-- builtin/cat-file.c | 22 +++--- t/t1006-cat-file.sh| 3 +-- 3 files changed, 17 insertions(+), 11 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 6831b08..3105fc0 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -74,8 +74,7 @@ OPTIONS requested batch operation on all objects in the repository and any alternate object stores (not just reachable objects). Requires `--batch` or `--batch-check` be specified. Note that - the order of the objects is unspecified, and there may be - duplicate entries. + the objects are visited in order sorted by their hashes. --buffer:: Normally batch output is flushed after each object is output, so diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 95604c4..07baad1 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -9,6 +9,7 @@ #include userdiff.h #include streaming.h #include tree-walk.h +#include sha1-array.h struct batch_options { int enabled; @@ -324,19 +325,19 @@ struct object_cb_data { struct expand_data *expand; }; -static int batch_object_cb(const unsigned char *sha1, - struct object_cb_data *data) +static void batch_object_cb(const unsigned char sha1[20], void *vdata) { + struct object_cb_data *data = vdata; hashcpy(data-expand-sha1, sha1); batch_object_write(NULL, data-opt, data-expand); - return 0; } static int batch_loose_object(const unsigned char *sha1, const char *path, void *data) { - return batch_object_cb(sha1, data); + sha1_array_append(data, sha1); + return 0; } static int batch_packed_object(const unsigned char *sha1, @@ -344,7 +345,8 @@ static int batch_packed_object(const unsigned char *sha1, uint32_t pos, void *data) { - return batch_object_cb(sha1, data); + sha1_array_append(data, sha1); + return 0; } static int batch_objects(struct batch_options *opt) @@ -375,11 +377,17 @@ static int batch_objects(struct batch_options *opt) data.info.typep = data.type; if (opt-all_objects) { + struct sha1_array sa = SHA1_ARRAY_INIT; struct object_cb_data cb; + + for_each_loose_object(batch_loose_object, sa, 0); + for_each_packed_object(batch_packed_object, sa, 0); + cb.opt = opt; cb.expand = data; - for_each_loose_object(batch_loose_object, cb, 0); - for_each_packed_object(batch_packed_object, cb, 0); + sha1_array_for_each_unique(sa,
Re: [PATCH v4 05/19] ref-filter: add parse_opt_merge_filter()
Junio C Hamano gits...@pobox.com writes: This is copied from 'builtin/branch.c' which will eventually be removed when we port 'branch.c' to use ref-filter APIs. Hmph. I somehow thought Matthieu's instruction was to finish tag.c side first I would call in advice rather than instruction. I still think we should prioritize the tag.c side, but if this patch is ready, it makes sense to keep it in the series. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH 7/7] cat-file: add --batch-all-objects option
It can sometimes be useful to examine all objects in the repository. Normally this is done with git rev-list --all --objects, but: 1. That shows only reachable objects. You may want to look at all available objects. 2. It's slow. We actually open each object to walk the graph. If your operation is OK with seeing unreachable objects, it's an order of magnitude faster to just enumerate the loose directories and pack indices. You can do this yourself using ls and git show-index, but it's non-obvious. This patch adds an option to cat-file --batch-check to operate on all available objects (rather than reading names from stdin). This is based on a proposal by Charles Bailey to provide a separate git list-all-objects command. That is more orthogonal, as it splits enumerating the objects from getting information about them. However, in practice you will either: a. Feed the list of objects directly into cat-file anyway, so you can find out information about them. Keeping it in a single process is more efficient. b. Ask the listing process to start telling you more information about the objects, in which case you will reinvent cat-file's batch-check formatter. Adding a cat-file option is simple and efficient. And if you really do want just the object names, you can always do: git cat-file --batch-check='%(objectname)' --batch-all-objects Signed-off-by: Jeff King p...@peff.net --- Documentation/git-cat-file.txt | 8 builtin/cat-file.c | 44 -- t/t1006-cat-file.sh| 27 ++ 3 files changed, 77 insertions(+), 2 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 0058bd4..6831b08 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -69,6 +69,14 @@ OPTIONS not be combined with any other options or arguments. See the section `BATCH OUTPUT` below for details. +--batch-all-objects:: + Instead of reading a list of objects on stdin, perform the + requested batch operation on all objects in the repository and + any alternate object stores (not just reachable objects). + Requires `--batch` or `--batch-check` be specified. Note that + the order of the objects is unspecified, and there may be + duplicate entries. + --buffer:: Normally batch output is flushed after each object is output, so that a process can interactively read and write from diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 499ccda..95604c4 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -15,6 +15,7 @@ struct batch_options { int follow_symlinks; int print_contents; int buffer_output; + int all_objects; const char *format; }; @@ -257,7 +258,7 @@ static void batch_object_write(const char *obj_name, struct batch_options *opt, struct strbuf buf = STRBUF_INIT; if (sha1_object_info_extended(data-sha1, data-info, LOOKUP_REPLACE_OBJECT) 0) { - printf(%s missing\n, obj_name); + printf(%s missing\n, obj_name ? obj_name : sha1_to_hex(data-sha1)); fflush(stdout); return; } @@ -318,6 +319,34 @@ static void batch_one_object(const char *obj_name, struct batch_options *opt, batch_object_write(obj_name, opt, data); } +struct object_cb_data { + struct batch_options *opt; + struct expand_data *expand; +}; + +static int batch_object_cb(const unsigned char *sha1, + struct object_cb_data *data) +{ + hashcpy(data-expand-sha1, sha1); + batch_object_write(NULL, data-opt, data-expand); + return 0; +} + +static int batch_loose_object(const unsigned char *sha1, + const char *path, + void *data) +{ + return batch_object_cb(sha1, data); +} + +static int batch_packed_object(const unsigned char *sha1, + struct packed_git *pack, + uint32_t pos, + void *data) +{ + return batch_object_cb(sha1, data); +} + static int batch_objects(struct batch_options *opt) { struct strbuf buf = STRBUF_INIT; @@ -345,6 +374,15 @@ static int batch_objects(struct batch_options *opt) if (opt-print_contents) data.info.typep = data.type; + if (opt-all_objects) { + struct object_cb_data cb; + cb.opt = opt; + cb.expand = data; + for_each_loose_object(batch_loose_object, cb, 0); + for_each_packed_object(batch_packed_object, cb, 0); + return 0; + } + /* * We are going to call get_sha1 on a potentially very large number of * objects. In most large cases, these will be actual object sha1s. The @@ -436,6
[PATCH 4/7] cat-file: add --buffer option
We use a direct write() to output the results of --batch and --batch-check. This is good for processes feeding the input and reading the output interactively, but it introduces measurable overhead if you do not want this feature. For example, on linux.git: $ git rev-list --objects --all | cut -d' ' -f1 objects $ time git cat-file --batch-check='%(objectsize)' \ objects /dev/null real0m5.440s user0m5.060s sys 0m0.384s This patch adds an option to use regular stdio buffering: $ time git cat-file --batch-check='%(objectsize)' \ --buffer objects /dev/null real0m4.975s user0m4.888s sys 0m0.092s Signed-off-by: Jeff King p...@peff.net --- This selectively uses fwrite or write_or_die, depending on the buffer setting. Another option would be to just always use fwrite(), and then selectively fflush(). It feels kind of wasteful in the non-buffered case, as it's just another layer to write through. OTOH, the cost of writing a line into the buffer only to flush is probably dwarfed by the system call of actually flushing. If we went that direction, we could probably simplify the code a bit (both getting rid of the batch_write function I call here, and dropping a bunch of existing fflush() calls, where we must flush any time we use printf for its formatting capabilities). I also considered that the --buffer case is likely to be the common one. We cannot flip the default, though, as it would break any existing callers (who would need to specify --no-buffer). We can do the usual deprecation dance, but I don't know if it is worth it for a plumbing command like this. Documentation/git-cat-file.txt | 7 +++ builtin/cat-file.c | 26 +++--- 2 files changed, 26 insertions(+), 7 deletions(-) diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt index 319ab4c..0058bd4 100644 --- a/Documentation/git-cat-file.txt +++ b/Documentation/git-cat-file.txt @@ -69,6 +69,13 @@ OPTIONS not be combined with any other options or arguments. See the section `BATCH OUTPUT` below for details. +--buffer:: + Normally batch output is flushed after each object is output, so + that a process can interactively read and write from + `cat-file`. With this option, the output uses normal stdio + buffering; this is much more efficient when invoking + `--batch-check` on a large number of objects. + --allow-unknown-type:: Allow -s or -t to query broken/corrupt objects of unknown type. diff --git a/builtin/cat-file.c b/builtin/cat-file.c index d4101b7..741e100 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -14,6 +14,7 @@ struct batch_options { int enabled; int follow_symlinks; int print_contents; + int buffer_output; const char *format; }; @@ -211,14 +212,25 @@ static size_t expand_format(struct strbuf *sb, const char *start, void *data) return end - start + 1; } -static void print_object_or_die(int fd, struct expand_data *data) +static void batch_write(struct batch_options *opt, const void *data, int len) +{ + if (opt-buffer_output) { + if (fwrite(data, 1, len, stdout) != len) + die_errno(unable to write to stdout); + } else + write_or_die(1, data, len); +} + +static void print_object_or_die(struct batch_options *opt, struct expand_data *data) { const unsigned char *sha1 = data-sha1; assert(data-info.typep); if (data-type == OBJ_BLOB) { - if (stream_blob_to_fd(fd, sha1, NULL, 0) 0) + if (opt-buffer_output) + fflush(stdout); + if (stream_blob_to_fd(1, sha1, NULL, 0) 0) die(unable to stream %s to stdout, sha1_to_hex(sha1)); } else { @@ -234,12 +246,11 @@ static void print_object_or_die(int fd, struct expand_data *data) if (data-info.sizep size != data-size) die(object %s changed size!?, sha1_to_hex(sha1)); - write_or_die(fd, contents, size); + batch_write(opt, contents, size); free(contents); } } - static int batch_one_object(const char *obj_name, struct batch_options *opt, struct expand_data *data) { @@ -294,12 +305,12 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt, strbuf_expand(buf, opt-format, expand_format, data); strbuf_addch(buf, '\n'); - write_or_die(1, buf.buf, buf.len); + batch_write(opt, buf.buf, buf.len); strbuf_release(buf); if (opt-print_contents) { - print_object_or_die(1, data); - write_or_die(1, \n, 1); + print_object_or_die(opt, data); + batch_write(opt, \n, 1); } return 0; } @@ -415,6 +426,7 @@ int cmd_cat_file(int
[PATCH 5/7] cat-file: stop returning value from batch_one_object
If batch_one_object returns an error code, we stop reading input. However, it will only do so if we feed it NULL, which cannot happen; we give it the buf member of a strbuf, which is always non-NULL. We did originally stop on other errors (like a missing object), but this was changed in 3c076db (cat-file --batch / --batch-check: do not exit if hashes are missing, 2008-06-09). These days we keep going for any per-object error (and print missing when necessary). Signed-off-by: Jeff King p...@peff.net --- builtin/cat-file.c | 18 ++ 1 file changed, 6 insertions(+), 12 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 741e100..7d99c15 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -251,17 +251,14 @@ static void print_object_or_die(struct batch_options *opt, struct expand_data *d } } -static int batch_one_object(const char *obj_name, struct batch_options *opt, - struct expand_data *data) +static void batch_one_object(const char *obj_name, struct batch_options *opt, +struct expand_data *data) { struct strbuf buf = STRBUF_INIT; struct object_context ctx; int flags = opt-follow_symlinks ? GET_SHA1_FOLLOW_SYMLINKS : 0; enum follow_symlinks_result result; - if (!obj_name) - return 1; - result = get_sha1_with_context(obj_name, flags, data-sha1, ctx); if (result != FOUND) { switch (result) { @@ -286,7 +283,7 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt, break; } fflush(stdout); - return 0; + return; } if (ctx.mode == 0) { @@ -294,13 +291,13 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt, (uintmax_t)ctx.symlink_path.len, ctx.symlink_path.buf); fflush(stdout); - return 0; + return; } if (sha1_object_info_extended(data-sha1, data-info, LOOKUP_REPLACE_OBJECT) 0) { printf(%s missing\n, obj_name); fflush(stdout); - return 0; + return; } strbuf_expand(buf, opt-format, expand_format, data); @@ -312,7 +309,6 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt, print_object_or_die(opt, data); batch_write(opt, \n, 1); } - return 0; } static int batch_objects(struct batch_options *opt) @@ -367,9 +363,7 @@ static int batch_objects(struct batch_options *opt) data.rest = p; } - retval = batch_one_object(buf.buf, opt, data); - if (retval) - break; + batch_one_object(buf.buf, opt, data); } strbuf_release(buf); -- 2.4.4.719.g3984bc6 -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH] Add list-all-objects command
On Sun, Jun 21, 2015 at 08:20:31PM +0100, Charles Bailey wrote: +OPTIONS +--- + +-v:: +--verbose:: + Output in the followin format instead of just printing object ids: + sha1 SP type SP size s/followin/g/ +int cmd_list_all_objects(int argc, const char **argv, const char *prefix) +{ + struct packed_git *p; + + argc = parse_options(argc, argv, prefix, builtin_filter_objects_options, + NULL, 0); + + for_each_loose_object(check_loose_object, NULL, 0); + + prepare_packed_git(); + for (p = packed_git; p; p = p-next) { + open_pack_index(p); + } Yikes. The fact that you need to do this means that for_each_packed_object is buggy, IMHO. I'll send a patch. -Peff -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH] Add list-all-objects command
On Mon, Jun 22, 2015 at 2:20 AM, Charles Bailey char...@hashpling.org wrote: From: Charles Bailey cbaile...@bloomberg.net list-all-objects is a command to print the ids of all objects in the object database of a repository. It is designed as a low overhead interface for scripts that want to analyse all objects but don't require the ordering implied by a revision walk. It will list all objects, loose and packed, and will include unreachable objects. Nit picking, but perhaps we should allow to select object source: loose, packed, alternates.. These info are available now and cheap to get. It's ok not to do it now though. Personally I would name this command find-objects (after unix command find) where we could still filter objects _not_ based on object content. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in
Re: broken repo after power cut
Am 22.06.2015 um 02:35 schrieb Theodore Ts'o: On Sun, Jun 21, 2015 at 03:07:41PM +0200, Richard Weinberger wrote: I was then shocked to learn that ext4 apparently has a default setting that allows it to truncate files upon power failure (something about a full journal vs a fast journal or some such) s/ext4/all modern file systems/ POSIX makes **no guarantees** about what happens after a power failure unless you use fsync() --- which git does not do by default (see below). Thanks for pointing this out. The bottome lins is that if you care about files being written, you need to use fsync(). Should git use fsync() by default? Well, if you are willing to accept that if your system crashes within a second or so of your last git operation, you might need to run git fsck and potentially recover from a busted repo, maybe speed is more important for you (and git is known for its speed/performance, after all. :-) The actual state of the source tree would have been written using a text editor which tends to be paranoid about using fsync (at least, if you use a real editor like Emacs or Vi, as opposed to the toy notepad editors shipped with GNOME or KDE :-). So as long as you know what you're doing, it's unlikely that you will actually lose any work. Personally, I have core.fsyncobjectfiles set to yes in my .gitconfig. Part of this is because I have an SSD, so the speed hit really doesn't bother me, and needing to recover a corrupted git repository is a pain (although I have certainly done it in the past). I think core.fsyncObjectFiles documentation really needs an update. What about this one? diff --git a/Documentation/config.txt b/Documentation/config.txt index 43bb53c..b08fa11 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -693,10 +693,16 @@ core.whitespace:: core.fsyncObjectFiles:: This boolean will enable 'fsync()' when writing object files. + -This is a total waste of time and effort on a filesystem that orders -data writes properly, but can be useful for filesystems that do not use -journalling (traditional UNIX filesystems) or that only journal metadata -and not file contents (OS X's HFS+, or Linux ext3 with data=writeback). +For performance reasons git does not call 'fsync()' after writing object +files. This means that after a power cut your git repository can get +corrupted as not all data hit the storage media. Especially on modern +filesystems like ext4, xfs or btrfs this can happen very easily. +If you have to face power cuts and care about your data it is strongly +recommended to enable this setting. +Please note that git's behavior used to be safe on ext3 with data=ordered, +for any other filesystems or mount settings this is not the case as +POSIX clearly states that you have to call 'fsync()' to make sure that +all data is written. core.preloadIndex:: Enable parallel index preload for operations like 'git diff' -- Thanks, //richard -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH] Add list-all-objects command
On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote: On Sun, Jun 21, 2015 at 08:20:31PM +0100, Charles Bailey wrote: + prepare_packed_git(); + for (p = packed_git; p; p = p-next) { + open_pack_index(p); + } Yikes. The fact that you need to do this means that for_each_packed_object is buggy, IMHO. I'll send a patch. I'm glad you said that; the interface did seem a bit warty at the time but as I fixed this early in my hacking I didn't remeber to revisit this and ask if it was actually intentional. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v6 18/19] fsck: git receive-pack: support excluding objects from fsck'ing
Hi Junio, On 2015-06-22 06:21, Junio C Hamano wrote: On Fri, Jun 19, 2015 at 6:35 AM, Johannes Schindelin johannes.schinde...@gmx.de wrote: @@ -227,6 +277,10 @@ static int report(struct fsck_options *options, struct object *object, if (msg_type == FSCK_IGNORE) return 0; + if (options-skiplist object + sha1_array_lookup(options-skiplist, object-sha1) = 0) + return 0; + if (msg_type == FSCK_FATAL) msg_type = FSCK_ERROR; else if (msg_type == FSCK_INFO) I just double checked this patch because I wanted to make sure this was applied in the report() function (i.e. behave as if FSCK_IGNORE was specified for specific objects on the skip list), and I am happy to see that it indeed is the case. That was because I briefly feared that skip could be done before going through the usual verification chain, which would have been very wrong (e.g. we may want not to hear about missing tagger in v2.6.11-tree tag, but nevertheless we would want to check all the tree contents pointed at by that tag, as that tree may not be reachable by any other way). To be honest, an earlier iteration actually did have that test much earlier in the call chain, but I had changed it to the current location in v5. My rationale was slightly different from yours: I wanted to affect the performance as little as possible. So looking up each and every object in the skip list (which I expect to be relatively small) seemed to be wasteful. And then it occurred to me that it would make much more sense to just make the skip-list functionality equivalent to the ignore message type. It just occurred to me, however, that one thing is possibly surprising with either version of the skip list functionality: if a certain object is corrupt on disk, it cannot be skipped via the skip-list, as the object is *still* unpacked (which would fail in the case of a corrupt object). I will document that in the man page. Ciao, Dscho -- To unsubscribe from this list: send the line unsubscribe git in
Re: Fast enumeration of objects
On Sun, Jun 21, 2015 at 08:20:30PM +0100, Charles Bailey wrote: I performed some test timings of some different commands on a clone of the Linux kernel which was completely packed. Thanks for timing things. I think we can fairly easily improve a bit on what you have here. I'll go through my full analysis, but see the conclusions at the end. $ time git rev-list --all --objects | cut -d -f1 | git cat-file --batch-check | awk '{if ($3 = 512000) { print $1 }}' | wc -l 958 real0m30.823s user0m41.904s sys 0m7.728s list-all-objects gives a significant improvement: $ time git list-all-objects | git cat-file --batch-check | awk '{if ($3 = 512000) { print $1 }}' | wc -l 958 real0m9.585s user0m10.820s sys 0m4.960s That makes sense; of course these two are not necessarily producing the same answer (they do in your case because it's a fresh clone, and all of the objects are reachable). I think that's an acceptable caveat. You can speed up the second one by asking batch-check only for the parts you care about: git list-all-objects | git cat-file --batch-check='%(objectsize) %(objectname)' | awk '{if ($1 = 512000) { print $2 }}' | wc -l That dropped my best-of-five timings for the same test down from 9.5s to 7.0s. The answer should be the same. The reason is that cat-file will only compute the items it needs to show, and the object-type is more expensive to get than the size[1]. Replacing awk with: perl -alne 'print $F[0] if $F[1] 512000' dropped that to 6.0s. That mostly means my awk sucks, but it is interesting to note that not all of the extra time is pipe overhead inherent to this approach; your choice of processor matters, too. If you're willing to get a slightly different answer, but one that is often just as useful, you can replace the %(objectsize) in the cat-file invocation with %(objectsize:disk). That gives you the actual on-disk size of the object, which includes delta and zlib compression. For 512K, that produces very different results (because files of that size may actually be text file). But for most truly huge files, they typically do not delta or compress at all, and the on-disk size is roughly the same. That only shaves off 100-200 milliseconds, though. [1] If you are wondering why the size is cheaper than the type, it is because of deltas. For base objects, we can get either immediately from the pack entry's header. For a delta, to get the size we have to open the object data; the expected size is part of the delta data. So we pay the extra cost to zlib-inflate the first few bytes. But finding the type works differently; the type in the pack header is OFS_DELTA, so we have to walk back to the parent entry to find the real type. If that parent is a delta, we walk back recursively until we hit a base object. You'd think that would also make %(objectsize:disk) much cheaper than %(objectsize), too. But the disk sizes require computing a the pack revindex on the fly, which takes a few hundred milliseconds on linux.git. skipping the cat-filter filter is a lesser but still significant improvement: $ time git list-all-objects -v | awk '{if ($3 = 512000) { print $1 }}' | wc -l 958 real0m5.637s user0m6.652s sys 0m0.156s That's a pretty nice improvement over the piped version. But we cannot do the same custom-format optimization there, because -v does not support it. It would be nice if it supported the full range of cat-file formatters. I did a hacky proof-of-concept, and that brought my 6.0s time down to 4.9s. I also noticed that cat-file doesn't do any output buffering; this is because it may be used interactively, line by line, by a caller controlling both pipes. Replacing write_or_die() with fwrite in my proof-of-concept dropped the time to 3.7s. That's faster still than your original (different machines, obviously, but your times are similar to mine): The old filter-objects could do the size filter a little be faster, but not by much: $ time git filter-objects --min-size=500k | wc -l 958 real0m4.564s user0m4.496s sys 0m0.064s This is likely caused by your use of sha1_object_info(), which always computes the type. Switching to the extended form would probably buy you another 2 seconds or so. Also, all my numbers are wall-clock times. The CPU time for my 3.7s time is actually 6.8s. Whereas doing it all in one process would probably require 3.0s or so of actual CPU time. So my conclusions are: 1. Yes, the pipe/parsing overhead of a separate processor really is measurable. That's hidden in the wall-clock time if you have multiple cores, but you may care more
Re: [PATCH] Add list-all-objects command
On Mon, Jun 22, 2015 at 04:57:28PM +0700, Duy Nguyen wrote: On Mon, Jun 22, 2015 at 2:20 AM, Charles Bailey char...@hashpling.org wrote: From: Charles Bailey cbaile...@bloomberg.net list-all-objects is a command to print the ids of all objects in the object database of a repository. It is designed as a low overhead interface for scripts that want to analyse all objects but don't require the ordering implied by a revision walk. It will list all objects, loose and packed, and will include unreachable objects. Nit picking, but perhaps we should allow to select object source: loose, packed, alternates.. These info are available now and cheap to get. It's ok not to do it now though. There is already plumbing to do those individual operations if you want. Although some of the plumbing involves for i in objects/pack/*.pack, which is perhaps a little less abstract than we'd like. :) Personally I would name this command find-objects (after unix command find) where we could still filter objects _not_ based on object content. I like that better than ls, too, but I propose that we actually add this as a feature to cat-file. I'll send patches in a moment. -Peff -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v3 3/6] bisect: simplify the addition of new bisect terms
We create a file BISECT_TERMS in the repository .git to be read during a bisection. The fonctions to be changed if we add new terms are quite few. In git-bisect.sh : check_and_set_terms bisect_voc Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr Signed-off-by: Louis Stuber stub...@ensimag.grenoble-inp.fr Signed-off-by: Valentin Duperray valentin.duper...@ensimag.imag.fr Signed-off-by: Franck Jonas franck.jo...@ensimag.imag.fr Signed-off-by: Lucien Kong lucien.k...@ensimag.imag.fr Signed-off-by: Thomas Nguy thomas.n...@ensimag.imag.fr Signed-off-by: Huynh Khoi Nguyen Nguyen huynh-khoi-nguyen.ngu...@ensimag.imag.fr Signed-off-by: Matthieu Moy matthieu@grenoble-inp.fr --- bisect.c | 38 -- git-bisect.sh | 70 +++- 2 files changed, 98 insertions(+), 10 deletions(-) diff --git a/bisect.c b/bisect.c index cda30fa..2fc8a78 100644 --- a/bisect.c +++ b/bisect.c @@ -747,7 +747,10 @@ static void handle_bad_merge_base(void) between %s and [%s].\n, bad_hex, bad_hex, good_hex); } else { - die(BUG: terms %s/%s not managed, name_bad, name_good); + fprintf(stderr, The merge base %s is %s.\n + This means the first commit marked %s is + between %s and [%s].\n, + bad_hex, name_bad, name_bad, bad_hex, good_hex); } exit(3); } @@ -902,6 +905,36 @@ static void show_diff_tree(const char *prefix, struct commit *commit) } /* + * The terms used for this bisect session are stored in BISECT_TERMS. + * We read them and store them to adapt the messages accordingly. + * Default is bad/good. + */ +void read_bisect_terms(const char **read_bad, const char **read_good) +{ + struct strbuf str = STRBUF_INIT; + const char *filename = git_path(BISECT_TERMS); + FILE *fp = fopen(filename, r); + + if (!fp) { + if (errno==2) { + *read_bad = bad; + *read_good = good; + return; + } else { + die(could not read file '%s': %s, filename, + strerror(errno)); + } + } else { + strbuf_getline(str, fp, '\n'); + *read_bad = strbuf_detach(str, NULL); + strbuf_getline(str, fp, '\n'); + *read_good = strbuf_detach(str, NULL); + } + strbuf_release(str); + fclose(fp); +} + +/* * We use the convention that exiting with an exit code 10 means that * the bisection process finished successfully. * In this case the calling shell script should exit 0. @@ -917,8 +950,7 @@ int bisect_next_all(const char *prefix, int no_checkout) const unsigned char *bisect_rev; char bisect_rev_hex[GIT_SHA1_HEXSZ + 1]; - name_bad=bad; - name_good=good; + read_bisect_terms(name_bad, name_good); if (read_bisect_refs()) die(reading bisect refs failed); diff --git a/git-bisect.sh b/git-bisect.sh index ce6412f..55b9ebd 100644 --- a/git-bisect.sh +++ b/git-bisect.sh @@ -77,6 +77,9 @@ bisect_start() { orig_args=$(git rev-parse --sq-quote $@) bad_seen=0 eval='' + # revision_seen is true if a git bisect start + # has revision as arguments + revision_seen=0 if test z$(git rev-parse --is-bare-repository) != zfalse then mode=--no-checkout @@ -101,6 +104,9 @@ bisect_start() { die $(eval_gettext '\$arg' does not appear to be a valid revision) break } + + revision_seen=1 + case $bad_seen in 0) state=$NAME_BAD ; bad_seen=1 ;; *) state=$NAME_GOOD ;; @@ -172,6 +178,11 @@ bisect_start() { } git rev-parse --sq-quote $@ $GIT_DIR/BISECT_NAMES eval $eval true + if test $revision_seen -eq 1 test ! -s $GIT_DIR/BISECT_TERMS + then + echo $NAME_BAD $GIT_DIR/BISECT_TERMS + echo $NAME_GOOD $GIT_DIR/BISECT_TERMS + fi echo git bisect start$orig_args $GIT_DIR/BISECT_LOG || exit # # Check if we can proceed to the next bisect state. @@ -232,6 +243,7 @@ bisect_skip() { bisect_state() { bisect_autostart state=$1 + check_and_set_terms $state case $#,$state in 0,*) die $(gettext Please call 'bisect_state' with at least one argument.) ;; @@ -291,15 +303,17 @@ bisect_next_check() { : bisect without $NAME_GOOD... ;; *) - + bad_syn=$(bisect_voc bad) +
[PATCH v3 2/6] bisect: replace hardcoded bad|good by variables
To add new tags like old/new and have keywords less confusing, the first step is to avoid hardcoding the keywords. The default mode is still bad/good. Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr Signed-off-by: Louis Stuber stub...@ensimag.grenoble-inp.fr Signed-off-by: Valentin Duperray valentin.duper...@ensimag.imag.fr Signed-off-by: Franck Jonas franck.jo...@ensimag.imag.fr Signed-off-by: Lucien Kong lucien.k...@ensimag.imag.fr Signed-off-by: Thomas Nguy thomas.n...@ensimag.imag.fr Signed-off-by: Huynh Khoi Nguyen Nguyen huynh-khoi-nguyen.ngu...@ensimag.imag.fr Signed-off-by: Matthieu Moy matthieu@grenoble-inp.fr --- bisect.c | 51 ++- git-bisect.sh | 57 +++-- 2 files changed, 65 insertions(+), 43 deletions(-) mode change 100755 = 100644 git-bisect.sh diff --git a/bisect.c b/bisect.c index 5b8357d..cda30fa 100644 --- a/bisect.c +++ b/bisect.c @@ -21,6 +21,9 @@ static const char *argv_checkout[] = {checkout, -q, NULL, --, NULL}; static const char *argv_show_branch[] = {show-branch, NULL, NULL}; static const char *argv_update_ref[] = {update-ref, --no-deref, BISECT_HEAD, NULL, NULL}; +static const char *name_bad; +static const char *name_good; + /* Remember to update object flag allocation in object.h */ #define COUNTED(1u16) @@ -403,15 +406,21 @@ struct commit_list *find_bisection(struct commit_list *list, static int register_ref(const char *refname, const struct object_id *oid, int flags, void *cb_data) { - if (!strcmp(refname, bad)) { + struct strbuf good_prefix = STRBUF_INIT; + strbuf_addstr(good_prefix, name_good); + strbuf_addstr(good_prefix, -); + + if (!strcmp(refname, name_bad)) { current_bad_oid = xmalloc(sizeof(*current_bad_oid)); oidcpy(current_bad_oid, oid); - } else if (starts_with(refname, good-)) { + } else if (starts_with(refname, good_prefix.buf)) { sha1_array_append(good_revs, oid-hash); } else if (starts_with(refname, skip-)) { sha1_array_append(skipped_revs, oid-hash); } + strbuf_release(good_prefix); + return 0; } @@ -634,7 +643,7 @@ static void exit_if_skipped_commits(struct commit_list *tried, return; printf(There are only 'skip'ped commits left to test.\n - The first bad commit could be any of:\n); + The first %s commit could be any of:\n, name_bad); print_commit_list(tried, %s\n, %s\n); if (bad) printf(%s\n, oid_to_hex(bad)); @@ -732,18 +741,21 @@ static void handle_bad_merge_base(void) if (is_expected_rev(current_bad_oid)) { char *bad_hex = oid_to_hex(current_bad_oid); char *good_hex = join_sha1_array_hex(good_revs, ' '); - - fprintf(stderr, The merge base %s is bad.\n - This means the bug has been fixed - between %s and [%s].\n, - bad_hex, bad_hex, good_hex); - + if (!strcmp(name_bad, bad)) { + fprintf(stderr, The merge base %s is bad.\n + This means the bug has been fixed + between %s and [%s].\n, + bad_hex, bad_hex, good_hex); + } else { + die(BUG: terms %s/%s not managed, name_bad, name_good); + } exit(3); } - fprintf(stderr, Some good revs are not ancestor of the bad rev.\n + fprintf(stderr, Some %s revs are not ancestor of the %s rev.\n git bisect cannot work properly in this case.\n - Maybe you mistook good and bad revs?\n); + Maybe you mistook %s and %s revs?\n, + name_good, name_bad, name_good, name_bad); exit(1); } @@ -755,10 +767,10 @@ static void handle_skipped_merge_base(const unsigned char *mb) warning(the merge base between %s and [%s] must be skipped.\n - So we cannot be sure the first bad commit is + So we cannot be sure the first %s commit is between %s and %s.\n We continue anyway., - bad_hex, good_hex, mb_hex, bad_hex); + bad_hex, good_hex, name_bad, mb_hex, bad_hex); free(good_hex); } @@ -839,7 +851,7 @@ static void check_good_are_ancestors_of_bad(const char *prefix, int no_checkout) int fd; if (!current_bad_oid) - die(a bad revision is needed); + die(a %s revision is needed, name_bad); /* Check if file BISECT_ANCESTORS_OK exists. */ if (!stat(filename, st) S_ISREG(st.st_mode)) @@ -905,6 +917,8 @@ int bisect_next_all(const char *prefix, int
[PATCH v3 6/6] bisect: allows any terms set by user
Introduction of the git bisect terms function. The user can set its own terms. It will work exactly like before. The terms must be set before the start. Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr Signed-off-by: Louis Stuber stub...@ensimag.grenoble-inp.fr --- Documentation/git-bisect.txt | 19 git-bisect.sh| 68 ++--- t/t6030-bisect-porcelain.sh | 43 ++ 3 files changed, 125 insertions(+), 5 deletions(-) diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt index 3c3021a..ef0c03c 100644 --- a/Documentation/git-bisect.txt +++ b/Documentation/git-bisect.txt @@ -133,6 +133,25 @@ You must run `git bisect start` without commits as argument and run `git bisect new rev`/`git bisect old rev...` after to add the commits. +Alternative terms: use your own terms + + +If the builtins terms bad/good and new/old do not satisfy you, you can +set your own terms. + + +git bisect terms term1 term2 + + +This command has to be used before a bisection has started. +The term1 must be associated with the latest revisions and term2 with the +ancestors of term1. + +Only the first bisection following the 'git bisect terms' will use the terms. +If you mistyped one of the terms you can do again 'git bisect terms term1 +term2'. + + Bisect visualize diff --git a/git-bisect.sh b/git-bisect.sh index a11ca06..7da22b1 100644 --- a/git-bisect.sh +++ b/git-bisect.sh @@ -1,6 +1,6 @@ #!/bin/sh -USAGE='[help|start|bad|good|new|old|skip|next|reset|visualize|replay|log|run]' +USAGE='[help|start|bad|good|new|old|terms|skip|next|reset|visualize|replay|log|run]' LONG_USAGE='git bisect help print this long help message. git bisect start [--no-checkout] [bad [good...]] [--] [pathspec...] @@ -11,6 +11,8 @@ git bisect (bad|new) [rev] git bisect (good|old) [rev...] mark rev... known-good revisions/ revisions before change in a given property. +git bisect terms term1 term2 + set up term1 and term2 as bisection terms. git bisect skip [(rev|range)...] mark rev... untestable revisions. git bisect next @@ -82,6 +84,14 @@ bisect_start() { # revision_seen is true if a git bisect start # has revision as arguments revision_seen=0 + # terms_defined is used to detect if the user + # defined his own terms with git bisect terms + terms_defined=0 + if test -s $GIT_DIR/TERMS_DEFINED + then + terms_defined=1 + get_terms + fi if test z$(git rev-parse --is-bare-repository) != zfalse then mode=--no-checkout @@ -180,10 +190,14 @@ bisect_start() { } git rev-parse --sq-quote $@ $GIT_DIR/BISECT_NAMES eval $eval true - if test $revision_seen -eq 1 test ! -s $GIT_DIR/BISECT_TERMS + if test $revision_seen -eq 1 test ! -s $GIT_DIR/BISECT_TERMS || test $terms_defined -eq 1 then echo $NAME_BAD $GIT_DIR/BISECT_TERMS - echo $NAME_GOOD $GIT_DIR/BISECT_TERMS + echo $NAME_GOOD $GIT_DIR/BISECT_TERMS + if test $terms_defined -eq 1 + then + echo git bisect terms $NAME_BAD $NAME_GOOD $GIT_DIR/BISECT_LOG || exit + fi fi echo git bisect start$orig_args $GIT_DIR/BISECT_LOG || exit # @@ -419,6 +433,7 @@ bisect_clean_state() { rm -f $GIT_DIR/BISECT_NAMES rm -f $GIT_DIR/BISECT_RUN rm -f $GIT_DIR/BISECT_TERMS + rm -f $GIT_DIR/TERMS_DEFINED # Cleanup head-name if it got left by an old version of git-bisect rm -f $GIT_DIR/head-name git update-ref -d --no-deref BISECT_HEAD @@ -447,6 +462,8 @@ bisect_replay () { eval $cmd ;; $NAME_GOOD|$NAME_BAD|skip) bisect_write $command $rev ;; + terms) + bisect_terms $rev ;; *) die $(gettext ?? what are you talking about?) ;; esac @@ -531,7 +548,8 @@ get_terms () { check_and_set_terms () { cmd=$1 case $cmd in - bad|good|new|old) + skip|start|terms) ;; + *) if test -s $GIT_DIR/BISECT_TERMS test $cmd != $NAME_BAD test $cmd != $NAME_GOOD then die $(eval_gettext Invalid command: you're currently in a \$NAME_BAD/\$NAME_GOOD bisect.) @@ -564,6 +582,44 @@ bisect_voc () { esac } +bisect_terms () { + case $# in + 0) + if test -s $GIT_DIR/BISECT_TERMS + then + { + read term1 + read term2 +
[PATCH v3 5/6] revision: fix rev-list --bisect in old/new mode
From: Louis Stuber stub...@ensimag.grenoble-inp.fr Calling git rev-list --bisect when an old/new mode bisection was started shows the help notice. This has been fixed by reading BISECT_TERMS in revision.c to find the correct bisect refs path (which was always refs/bisect/bad (or good) before and can be refs/bisect/new (old) now). Signed-off-by: Louis Stuber stub...@ensimag.grenoble-inp.fr Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr --- revision.c | 16 ++-- 1 files changed, 14 insertions(+), 2 deletions(-) diff --git a/revision.c b/revision.c index 3ff8723..27750ac 100644 --- a/revision.c +++ b/revision.c @@ -21,6 +21,9 @@ volatile show_early_output_fn_t show_early_output; +static const char *name_bad; +static const char *name_good; + char *path_name(const struct name_path *path, const char *name) { const struct name_path *p; @@ -2076,14 +2079,22 @@ void parse_revision_opt(struct rev_info *revs, struct parse_opt_ctx_t *ctx, ctx-argc -= n; } +extern void read_bisect_terms(const char **bad, const char **good); + static int for_each_bad_bisect_ref(const char *submodule, each_ref_fn fn, void *cb_data) { - return for_each_ref_in_submodule(submodule, refs/bisect/bad, fn, cb_data); + char bisect_refs_path[256]; + strcpy(bisect_refs_path, refs/bisect/); + strcat(bisect_refs_path, name_bad); + return for_each_ref_in_submodule(submodule, bisect_refs_path, fn, cb_data); } static int for_each_good_bisect_ref(const char *submodule, each_ref_fn fn, void *cb_data) { - return for_each_ref_in_submodule(submodule, refs/bisect/good, fn, cb_data); + char bisect_refs_path[256]; + strcpy(bisect_refs_path, refs/bisect/); + strcat(bisect_refs_path, name_good); + return for_each_ref_in_submodule(submodule, bisect_refs_path, fn, cb_data); } static int handle_revision_pseudo_opt(const char *submodule, @@ -2112,6 +2123,7 @@ static int handle_revision_pseudo_opt(const char *submodule, handle_refs(submodule, revs, *flags, for_each_branch_ref_submodule); clear_ref_exclusion(revs-ref_excludes); } else if (!strcmp(arg, --bisect)) { + read_bisect_terms(name_bad, name_good); handle_refs(submodule, revs, *flags, for_each_bad_bisect_ref); handle_refs(submodule, revs, *flags ^ (UNINTERESTING | BOTTOM), for_each_good_bisect_ref); revs-bisect = 1; -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v3 4/6] bisect: add the terms old/new
When not looking for a regression during a bisect but for a fix or a change in another given property, it can be confusing to use 'good' and 'bad'. This patch introduce `git bisect new` and `git bisect old` as an alternative to 'bad' and good': the commits which have a certain property must be marked as `new` and the ones which do not as `old`. The output will be the first commit after the change in the property. During a new/old bisect session you cannot use bad/good commands and vice-versa. Some commands are still not available for old/new: * git rev-list --bisect does not treat the revs/bisect/new and revs/bisect/old-SHA1 files. Old discussions: - http://thread.gmane.org/gmane.comp.version-control.git/86063 introduced bisect fix unfixed to find fix. - http://thread.gmane.org/gmane.comp.version-control.git/182398 discussion around bisect yes/no or old/new. - http://thread.gmane.org/gmane.comp.version-control.git/199758 last discussion and reviews New discussions: - http://thread.gmane.org/gmane.comp.version-control.git/271320 ( v2 1/7-4/7 ) - http://comments.gmane.org/gmane.comp.version-control.git/271343 ( v2 5/7-7/7 ) Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr Signed-off-by: Louis Stuber stub...@ensimag.grenoble-inp.fr Signed-off-by: Valentin Duperray valentin.duper...@ensimag.imag.fr Signed-off-by: Franck Jonas franck.jo...@ensimag.imag.fr Signed-off-by: Lucien Kong lucien.k...@ensimag.imag.fr Signed-off-by: Thomas Nguy thomas.n...@ensimag.imag.fr Signed-off-by: Huynh Khoi Nguyen Nguyen huynh-khoi-nguyen.ngu...@ensimag.imag.fr Signed-off-by: Matthieu Moy matthieu@grenoble-inp.fr --- Documentation/git-bisect.txt | 48 - bisect.c | 11 +++-- git-bisect.sh| 30 + t/t6030-bisect-porcelain.sh | 38 + 4 files changed, 112 insertions(+), 15 deletions(-) diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt index 4cb52a7..3c3021a 100644 --- a/Documentation/git-bisect.txt +++ b/Documentation/git-bisect.txt @@ -18,8 +18,8 @@ on the subcommand: git bisect help git bisect start [--no-checkout] [bad [good...]] [--] [paths...] - git bisect bad [rev] - git bisect good [rev...] + git bisect (bad|new) [rev] + git bisect (good|old) [rev...] git bisect skip [(rev|range)...] git bisect reset [commit] git bisect visualize @@ -104,6 +104,35 @@ For example, `git bisect reset HEAD` will leave you on the current bisection commit and avoid switching commits at all, while `git bisect reset bisect/bad` will check out the first bad revision. + +Alternative terms: bisect new and bisect old + + +If you are not at ease with the terms bad and good, perhaps +because you are looking for the commit that introduced a fix, you can +alternatively use new and old instead. +But note that you cannot mix bad and good with new and old. + + +git bisect new [rev] + + +Marks the commit as new, e.g. the bug is no longer there, if you are looking +for a commit that fixed a bug, or the feature that used to work is now broken +at this point, if you are looking for a commit that introduced a bug. +It is the equivalent of git bisect bad [rev]. + + +git bisect old [rev...] + + +Marks the commit as old, as the opposite of 'git bisect new'. +It is the equivalent of git bisect good [rev...]. + +You must run `git bisect start` without commits as argument and run +`git bisect new rev`/`git bisect old rev...` after to add the +commits. + Bisect visualize @@ -379,6 +408,21 @@ In this case, when 'git bisect run' finishes, bisect/bad will refer to a commit has at least one parent whose reachable graph is fully traversable in the sense required by 'git pack objects'. +* Look for a fix instead of a regression in the code ++ + +$ git bisect start +$ git bisect new HEAD# current commit is marked as new +$ git bisect old HEAD~10 # the tenth commit from now is marked as old + ++ +Let's consider the last commit has a given property, and that we are looking +for the commit which introduced this property. For each commit the bisection +guide us to, we will test if the property is present. If it is we will mark +the commit as new with 'git bisect new', otherwise we will mark it as old. +At the end of the bisect session, the result will be the first new commit (e.g +the first one with the property). + SEE ALSO diff --git a/bisect.c b/bisect.c index 2fc8a78..7492fdc 100644 --- a/bisect.c +++ b/bisect.c @@ -746,6 +746,11 @@ static
Re: [PATCH v7 04/19] fsck: Offer a function to demote fsck errors to warnings
Hi Junio, On 2015-06-22 19:37, Junio C Hamano wrote: Johannes Schindelin johannes.schinde...@gmx.de writes: diff --git a/fsck.c b/fsck.c index 1a3f7ce..e81a342 100644 --- a/fsck.c +++ b/fsck.c @@ -64,30 +64,29 @@ enum fsck_msg_id { #undef MSG_ID #define STR(x) #x -#define MSG_ID(id, msg_type) { STR(id), FSCK_##msg_type }, +#define MSG_ID(id, msg_type) { STR(id), NULL, FSCK_##msg_type }, static struct { const char *id_string; +const char *lowercased; int msg_type; } msg_id_info[FSCK_MSG_MAX + 1] = { FOREACH_MSG_ID(MSG_ID) -{ NULL, -1 } +{ NULL, NULL, -1 } }; #undef MSG_ID static int parse_msg_id(const char *text) { -static char **lowercased; int i; -if (!lowercased) { +if (!msg_id_info[0].lowercased) { /* convert id_string to lower case, without underscores. */ -lowercased = xmalloc(FSCK_MSG_MAX * sizeof(*lowercased)); for (i = 0; i FSCK_MSG_MAX; i++) { const char *p = msg_id_info[i].id_string; int len = strlen(p); char *q = xmalloc(len); -lowercased[i] = q; +msg_id_info[i].lowercased = q; while (*p) if (*p == '_') p++; @@ -98,7 +97,7 @@ static int parse_msg_id(const char *text) } for (i = 0; i FSCK_MSG_MAX; i++) -if (!strcmp(text, lowercased[i])) +if (!strcmp(text, msg_id_info[i].lowercased)) return i; return -1; Heh, this was the first thing that came to my mind when I saw 03/19 that lazily prepares downcased version (which is good) but do so in a separately allocated buffer (which is improved by this change) ;-) IOW, I think all of the above should have been part of 03/19, not oops I belatedly realized that this way is better fixup here. Gh. Wrong commit fixed up. Sorry. Will be fixed in v8. +void fsck_set_msg_types(struct fsck_options *options, const char *values) +{ +char *buf = xstrdup(values), *to_free = buf; +int done = 0; + +while (!done) { +int len = strcspn(buf, ,|), equal; + +done = !buf[len]; +if (!len) { +buf++; +continue; +} +buf[len] = '\0'; + +for (equal = 0; equal len +buf[equal] != '=' buf[equal] != ':'; equal++) Style. I'd format this more like so: for (equal = 0; equal len buf[equal] != '=' buf[equal] != ':'; equal++) Will be fixed. +buf[equal] = tolower(buf[equal]); +buf[equal] = '\0'; + +if (equal == len) +die(Missing '=': '%s', buf); + +fsck_set_msg_type(options, buf, buf + equal + 1); +buf += len + 1; +} +free(to_free); +} Overall, the change is good (and it was good in v6, too), and I think it has become simpler to follow the logic with the upfront downcasing. Yep, I agree. I did not expect that, but it was worth the effort to compare the two versions. Ciao, Dscho -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v3 1/6] bisect: correction of typo
Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr --- bisect.c|2 +- t/t6030-bisect-porcelain.sh |2 +- 2 files changed, 2 insertions(+), 2 deletions(-) diff --git a/bisect.c b/bisect.c index 03d5cd9..5b8357d 100644 --- a/bisect.c +++ b/bisect.c @@ -743,7 +743,7 @@ static void handle_bad_merge_base(void) fprintf(stderr, Some good revs are not ancestor of the bad rev.\n git bisect cannot work properly in this case.\n - Maybe you mistake good and bad revs?\n); + Maybe you mistook good and bad revs?\n); exit(1); } diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh index 06b4868..9e2c203 100755 --- a/t/t6030-bisect-porcelain.sh +++ b/t/t6030-bisect-porcelain.sh @@ -362,7 +362,7 @@ test_expect_success 'bisect starting with a detached HEAD' ' test_expect_success 'bisect errors out if bad and good are mistaken' ' git bisect reset test_must_fail git bisect start $HASH2 $HASH4 2 rev_list_error - grep mistake good and bad rev_list_error + grep mistook good and bad rev_list_error git bisect reset ' -- 1.7.1 -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option
Junio C Hamano gits...@pobox.com writes: I have a slight preference for keeping the pairs not squashed. This way, we have a clear separation write reusable library code / use it. But I'm fine with squashing if others prefer. As I cannot firmly say that copy paste first and then later clean-up is bad and we should split them in different way, I am fine with leaving them separate as they are. Having said that, I have a slight preference that a split does not break my build in the middle of the series by introducing an unused function, which is noticed by the compiler as a warning, and turned into an error with -Werror. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 05/19] ref-filter: add parse_opt_merge_filter()
On Mon, Jun 22, 2015 at 6:25 AM, Junio C Hamano gits...@pobox.com wrote: Why SHOUT here? Just used to typing macros in caps. Will change! This is copied from 'builtin/branch.c' which will eventually be removed when we port 'branch.c' to use ref-filter APIs. Hmph. I somehow thought Matthieu's instruction was to finish tag.c side first and then do branch (i.e. with 3 and 4 you brought things from tag to for-each-ref, now it is a time to rewrite tag by using what you wrote for for-each-ref with 3 and 4, before moving to this patch)? Was that plan scrapped or found inappropriate or something? I would call in advice rather than instruction. I still think we should prioritize the tag.c side, but if this patch is ready, it makes sense to keep it in the series. Yes, Matthieu did advice that. But I had already started working on this. But if you guys think thats a better option I can do that also, as I already have tag.c ported over to use ref-filter on my local branch. But that'll include a lot of changes. Also I found this more systematic as we will have a complete ref-filter library ready and only porting of tag.c and branch.c would be left. When does this trigger? You have lastarg-default with HEAD, and I am having trouble guessing when arg upon entry to this function can ever be NULL. This is redundant. will remove. Can --merged (or --no-merged) be given more than once? Is the rule the last one wins? Does the old value of rf-merge_commit leak (no, it does not, but I am just asking for completeness)? Yes! currently for e.g. `git for-each-ref --merged master --merged ref_filter` would mean that ref_filter would be considered for the --merge option. Does the old value of rf-merge_commit leak From my understanding of object.c/commit.c, It just points to an object in the obj_hash array, so when multiple options are given the pointer just points to another part of obj_hash, I'm sure there's more to it. This is what I gathered from an over the top view. -- Regards, Karthik Nayak -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option
Matthieu Moy matthieu@grenoble-inp.fr writes: Karthik Nayak karthik@gmail.com writes: On Mon, Jun 22, 2015 at 6:15 AM, Junio C Hamano gits...@pobox.com wrote: 3 4 as a single patch may make more sense, if we were to tolerate the let's copy paste first and then later remove the duplicate as a way to postpone touching tag.c side in order to first concentrate on for-each-ref. I have not formed a firm opinion on what the right split of the series is, but so far (assuming that the temporary duplication is the best we can do) what I am seeing in this series makes sense to me. Thanks. That would mean squashing 34, 67 and 1011 also on similar lines. I have a slight preference for keeping the pairs not squashed. This way, we have a clear separation write reusable library code / use it. But I'm fine with squashing if others prefer. As I cannot firmly say that copy paste first and then later clean-up is bad and we should split them in different way, I am fine with leaving them separate as they are. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option
On Mon, Jun 22, 2015 at 6:15 AM, Junio C Hamano gits...@pobox.com wrote: 3 4 as a single patch may make more sense, if we were to tolerate the let's copy paste first and then later remove the duplicate as a way to postpone touching tag.c side in order to first concentrate on for-each-ref. I have not formed a firm opinion on what the right split of the series is, but so far (assuming that the temporary duplication is the best we can do) what I am seeing in this series makes sense to me. Thanks. That would mean squashing 34, 67 and 1011 also on similar lines. -- Regards, Karthik Nayak -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option
Karthik Nayak karthik@gmail.com writes: On Mon, Jun 22, 2015 at 6:15 AM, Junio C Hamano gits...@pobox.com wrote: 3 4 as a single patch may make more sense, if we were to tolerate the let's copy paste first and then later remove the duplicate as a way to postpone touching tag.c side in order to first concentrate on for-each-ref. I have not formed a firm opinion on what the right split of the series is, but so far (assuming that the temporary duplication is the best we can do) what I am seeing in this series makes sense to me. Thanks. That would mean squashing 34, 67 and 1011 also on similar lines. I have a slight preference for keeping the pairs not squashed. This way, we have a clear separation write reusable library code / use it. But I'm fine with squashing if others prefer. -- Matthieu Moy http://www-verimag.imag.fr/~moy/ -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v7 00/19] Introduce an internal API to interact with the fsck machinery
Johannes Schindelin johannes.schinde...@gmx.de writes: Changes since v6: - camelCased message IDs - multiple author checking now as suggested by Junio - renamed `--quick` to `--connectivity-only`, better commit message - `fsck.skipList` is now handled correctly (and not mistaken for a message type setting) - `fsck.skipList` can handle user paths now - index-pack configures the walk function in a more logical place now - simplified code by avoiding working on partial strings (i.e. removed `substrcmp()`). This saves 10 lines. To accomodate parsing config variables directly, we now work on lowercased message IDs; unfortunately this means that we cannot use them in append_msg_id() because that function wants to append camelCased message IDs. Interdiff below diffstat. Except for minor nits I sent separate messages, this round looks very nicely done (I however admit that I haven't read the skiplist parsing code carefully at all, expecting that you wouldn't screw up with something simple like that ;-)) Thanks, will replace what is queued. Let's start thinking about moving it down to 'next' (meaning: we _could_ still accept a reroll, but I think we are in a good shape and minor incremental refinements would suffice), cooking it for the remainder of the cycle and having it graduate to 'master' at the beginning of the next cycle. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in
Re: config commands not working _Noobe question
On Tue, Jun 23, 2015 at 1:31 AM, Greg Ledger gled...@glcdelivers.com wrote: after adding git config ‹global user.name Greg Ledger and git config ‹global user.email gled...@glcdelivers.com, when I run: source ~/.gitconfig The ~/.gitconfig file is not a shell script. You should not source it. It is a text file that is read by Git commands when those commands are run. -- To unsubscribe from this list: send the line unsubscribe git in
Re: RFC/Pull Request: Refs db backend
David Turner dtur...@twopensource.com writes: I've revived and modified Ronnie Sahlberg's work on the refs db backend. The work is on top of be3c13e5564, Junio's First batch for 2.5 cycle. I recognize that there have been changes to the refs code since then, and that there are some further changes in-flight from e.g. Michael Haggerty. If there is interest in this, I can rebase once Michael's changes land. ... The db backend runs git for-each-ref about 30% faster than the files backend with fully-packed refs on a repo with ~120k refs. It's also about 4x faster than using fully-unpacked refs. In addition, and perhaps more importantly, it avoids case-conflict issues on OS X. I chose to use LMDB for the database... ... Ronnie Sahlberg's original version of this patchset used tdb. The advantage of tdb is that it's smaller (~125k). The disadvantages are that tdb is hard to build on OS X. It's also not in homebrew. So lmdb seemed simpler. If there is interest? Shut up and take my money ;-) More seriously, that's great that you stepped up to resurrect this topic. In a sense, the choice of sample database backend does not matter. I do not care if it is tdb, lmdb, or even Berkeley DB as long as it functions. ;-) As long as the interface between ref-transaction system on the Git side and the database backend is designed right, your lmdb thing can serve as a reference implementation for other people to plug other database backends to the same interface, right? As one step to validate the interface to the database backends, it would be nice to eventually have at least two backends that talk to meaningfully different systems, but we have to start somewhere, and for now we have lmdb is as good a place to start as any other db backend. I wonder if we can do a filesystem backend on top of the same backend interface---is that too much impedance mismatch to make it unpractical? Thanks. -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH v2 2/7] bisect: replace hardcoded bad|good by variables
Matthieu Moy matthieu@grenoble-inp.fr wrote: Being an acceptable ref name is a constraint you have to check (Junio already mentionned check-ref-format). I think quoting variables makes sense too I don't get how 'git check-ref-format' works exactly. It says it needs at least one slash in the ref name. So it would not be good for bisect terms. Then do we take the same format as branches ? ( i.e 'git check-ref-format --branch' ) Thanks. -- To unsubscribe from this list: send the line unsubscribe git in
Re: [PATCH v6 17/19] fsck: Introduce `git fsck --quick`
Hi Junio, On 2015-06-21 22:35, Junio C Hamano wrote: Johannes Schindelin johannes.schinde...@gmx.de writes: On 2015-06-21 19:15, Junio C Hamano wrote: Michael Haggerty mhag...@alum.mit.edu writes: That's brilliant. Just to make sure I am reading you correctly, you mean the current overall structure: [...] The way I read Michael's mail, he actually meant something different: if all of the blob-related errors/warnings are switched to ignore, simply skip unpacking the blobs. That is how I read his mail, too. But because IIRC we do not check anything special with blob other than we can read it correctly, my description of overall structure stayed at a very high conceptual level. The unpacking may happen at a much higher level in the code, i.e. it comes way before this part of the logic flow: if (is bad_blob ignored?) ; else if (! is the blob loadable and well-formed?) { in which case is bad blobs ignored? check may have to happen before we unpack the object. And I do not suggest introducing yet another BAD_BLOB error class; I would have guessed that you already have an error class for objects that are not stored correctly (be it truncated loose object, checksum mismatch in the packed base object, or corrupt delta in pack). Sadly, there is no BAD_BLOB class. The reason is that we actually perform no test on blobs, as you pointed out, except for the implicit one: read it as a blob object. And reading them even only partially would still imply a lot of I/O, taking away much of the performance improvement I wanted to achieve here. Further, please note that the `--quick` option *solely* impacts `git fsck`, not `git receive-pack`, because we actually really skipped everything except the connectivity test. To allow this discussion to be resolved without further ado, I therefore renamed the `--quick` option to `--connectivity-only`, as even I realize that there is not much of a check left if not even author or committer lines are tested. Ciao, Dscho -- To unsubscribe from this list: send the line unsubscribe git in
Re: broken repo after power cut
On Mon, Jun 22, 2015 at 01:19:59PM +0200, Richard Weinberger wrote: The bottome lins is that if you care about files being written, you need to use fsync(). Should git use fsync() by default? Well, if you are willing to accept that if your system crashes within a second or so of your last git operation, you might need to run git fsck and potentially recover from a busted repo, maybe speed is more important for you (and git is known for its speed/performance, after all. :-) I made a typo in the above. s/second/minute/. (Linux's writeback timer is 30 seconds, but if the disk is busy it might take a bit longer to get all of the data blocks written out to disk and committed.) I think core.fsyncObjectFiles documentation really needs an update. What about this one? diff --git a/Documentation/config.txt b/Documentation/config.txt index 43bb53c..b08fa11 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -693,10 +693,16 @@ core.whitespace:: core.fsyncObjectFiles:: This boolean will enable 'fsync()' when writing object files. + -This is a total waste of time and effort on a filesystem that orders -data writes properly, but can be useful for filesystems that do not use -journalling (traditional UNIX filesystems) or that only journal metadata -and not file contents (OS X's HFS+, or Linux ext3 with data=writeback). +For performance reasons git does not call 'fsync()' after writing object +files. This means that after a power cut your git repository can get +corrupted as not all data hit the storage media. Especially on modern +filesystems like ext4, xfs or btrfs this can happen very easily. +If you have to face power cuts and care about your data it is strongly +recommended to enable this setting. +Please note that git's behavior used to be safe on ext3 with data=ordered, +for any other filesystems or mount settings this is not the case as +POSIX clearly states that you have to call 'fsync()' to make sure that +all data is written. My main complaint about this is that it's a bit Linux-centric. For example, the fact that fsync(2) is needed to push data out of the cache is also true for MacOS (and indeed all other Unix systems going back three decades) as well as Windows. In fact, it's not a matter of POSIX says, but POSIX documented, but since standards are held in high esteem, it's sometimes a bit more convenient to use them as an appeal to authority. :-) (Ext3's data=ordered behaviour is an outlier, and in fact, the reason why it mostly safe to skip fsync(2) calls when using ext3 data=ordered was an accidental side effect of another problem which was trying to solve based on the relatively primitive way it handled block allocation.) Cheers, - Ted -- To unsubscribe from this list: send the line unsubscribe git in
[Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux?
Hi everybody, I have a repository where some files and folders contain accented characters due to being in French. Such names include rêve (dream), réunion (meeting) etc. Whether already in version control or not, git tools only show their *decomposed* representation (I use a UTF-8 locale, see below), but don't accept those representations as input (and auto-completion is broken for those), which is a bit misleading (test case follows). I've seen the threads about accented characters on OSX and the use of 'core.precomposeunicode', but as I'm running on GNU/Linux I thought this shouldn't apply. Since I've already had a problem in git with a weirdly encoded character (see http://thread.gmane.org/gmane.comp.version-control.git/269710), I wanted to get some feedback to determine whether my setup was the cause of it or if it was normal to see decomposed file names in git. I found in man git-status: If a filename contains whitespace or other nonprintable characters, that field will be quoted in the manner of a C string literal: surrounded by ASCII double quote (34) characters, and with interior special characters backslash-escaped. So do everybody using accented characters see those in decomposed form in git? And if so why some softwares built on top of it (like gitit [1]) don't inherit those decomposed representations? [1] http://gitit.net/ Thanks! --- test case: $ mkdir accent-test cd !$ $ git init $ touch rêve réunion $ git status On branch master Initial commit Untracked files: (use git add file... to include in what will be committed) r\303\251union r\303\252ve $ git add . $ git commit -m accent test [master (root commit) 0d776b7] accent test 2 files changed, 0 insertions(+), 0 deletions(-) create mode 100644 r\303\251union create mode 100644 r\303\252ve $ git log --summary commit 0d776b7a09d5384a76066999431507018e292efe Author: Bastien Traverse bastien@traverse.email Date: 2015-06-22 14:13:46 +0200 accent test create mode 100644 r\303\251union create mode 100644 r\303\252ve $ mv rêve reve $ git status On branch master Changes not staged for commit: (use git add/rm file... to update what will be committed) (use git checkout -- file... to discard changes in working directory) deleted:r\303\252ve Untracked files: (use git add file... to include in what will be committed) reve no changes added to commit (use git add and/or git commit -a) $ git add [[TAB-TAB]] r\303\252ve reve $ git add [[TAB]] -- git add \r\\303\\252ve\ fatal: pathspec 'r\303\252ve' did not match any files $ git add r\303\252ve fatal: pathspec 'r\303\252ve' did not match any files $ git add rêve reve OR git add . $ git status On branch master Changes to be committed: (use git reset HEAD file... to unstage) renamed:r\303\252ve - reve I'm running an up-to-date Arch linux with following software versions and locale config: $ uname -a Linux xxx 4.0.5-1-ARCH #1 SMP PREEMPT Sat Jun 6 18:37:49 CEST 2015 x86_64 GNU/Linux $ bash --version GNU bash, version 4.3.39(1)-release (x86_64-unknown-linux-gnu) $ git --version git version 2.4.3 $ locale LANG=fr_FR.utf8 LC_CTYPE=fr_FR.utf8 LC_NUMERIC=fr_FR.utf8 LC_TIME=fr_FR.utf8 LC_COLLATE=fr_FR.utf8 LC_MONETARY=fr_FR.utf8 LC_MESSAGES=fr_FR.utf8 LC_PAPER=fr_FR.utf8 LC_NAME=fr_FR.utf8 LC_ADDRESS=fr_FR.utf8 LC_TELEPHONE=fr_FR.utf8 LC_MEASUREMENT=fr_FR.utf8 LC_IDENTIFICATION=fr_FR.utf8 LC_ALL= $ localectl System Locale: LANG=fr_FR.UTF8 VC Keymap: fr X11 Layout: fr X11 Variant: oss Cheers -- To unsubscribe from this list: send the line unsubscribe git in
[PATCH 2/3] contrib/subtree: Fix broken -chains and revealed test error
From: Charles Bailey cbaile...@bloomberg.net This fixes two instances where a -chain was broken in the subtree tests and fixes a test error that was revealed because of this. Many tests in t7900-subtree.sh make a commit and then use 'undo' to reset the state for the next test. In the 'check hash of split' test, an 'undo' was being invoked after a 'subtree split' even though the particular invocation of 'subtree split' did not actually make a commit. The subsequent check_equal was failing, but this failure was masked by that broken -chain. Removing this undo causes the failing check_equal to succeed but breaks the a check_equal later on in the same test. It turns out that an earlier test ('check if --message for merge works with squash too') makes a commit but doesn't 'undo' to the state expected by the remaining tests. None of the intervening tests cared enough about the state of the test repo to fail and the spurious 'undo' in 'check hash of split' restored the expected state for any remaining test that might care. Adding the missing 'undo' to 'check if --message for merge works with squash too' and removing the spurious one from 'check hash of split' fixes all tests once the -chains are completed. Signed-off-by: Charles Bailey cbaile...@bloomberg.net --- contrib/subtree/t/t7900-subtree.sh | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/contrib/subtree/t/t7900-subtree.sh b/contrib/subtree/t/t7900-subtree.sh index 2c5bfc1..001c604 100755 --- a/contrib/subtree/t/t7900-subtree.sh +++ b/contrib/subtree/t/t7900-subtree.sh @@ -177,7 +177,8 @@ test_expect_success 'check if --message for merge works with squash too' ' test_expect_success 'merge new subproj history into subdir' ' git subtree merge --prefix=subdir FETCH_HEAD git branch pre-split - check_equal ''$(last_commit_message)'' Merge commit '''$(git rev-parse sub2)''' into mainline + check_equal ''$(last_commit_message)'' Merge commit '''$(git rev-parse sub2)''' into mainline + undo ' test_expect_success 'Check that prefix argument is required for split' ' @@ -218,9 +219,8 @@ test_expect_success 'check split with --branch' ' test_expect_success 'check hash of split' ' spl1=$(git subtree split --prefix subdir) - undo git subtree split --prefix subdir --branch splitbr1test - check_equal ''$(git rev-parse splitbr1test)'' $spl1 + check_equal ''$(git rev-parse splitbr1test)'' $spl1 git checkout splitbr1test new_hash=$(git rev-parse HEAD~2) git checkout mainline @@ -269,7 +269,7 @@ test_expect_success 'add sub9' ' cd .. test_expect_success 'split for sub8' ' - split2=''$(git subtree split --annotate=''*'' --prefix subdir/ --rejoin)'' + split2=''$(git subtree split --annotate=''*'' --prefix subdir/ --rejoin)'' git branch split2 $split2 ' -- 2.4.0.53.g8440f74 -- To unsubscribe from this list: send the line unsubscribe git in