Re: [PATCH v2 3/4] t4203: test mailmap functionality directly rather than indirectly
On Fri, Jul 12, 2013 at 1:48 AM, Junio C Hamano gits...@pobox.com wrote: Jonathan Nieder jrnie...@gmail.com writes: My current thinking is no --- the patch has as a justification Now we can test these aspects of .mailmap handling directly with a low-level tool instead of using the tool most people will use, so do so, which sounds an awful lot like Reduce test coverage of commonly used tools, because we can. Yes, that was exactly my reaction that prompted my response. Does any of my follow-up commentary result in a different reaction? If not, I'll drop patches 3 4 in the re-roll. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/7] cat-file --batch-check performance improvements
In my earlier series introducing git cat-file --batch-check=format, found here: http://thread.gmane.org/gmane.comp.version-control.git/229761/focus=230041 I spent a little time optimizing revindex generation, and measured by requesting information on a single object from a large repository. This series takes the next logical step: requesting a large number of objects from a large repository. There are two major optimizations here: 1. Avoiding extra ref lookups due to the warning in 798c35f (get_sha1: warn about full or short object names that look like refs, 2013-05-29). 2. Avoiding extra work for delta type resolution when the user has not asked for %(objecttype). I prepared the series on top of jk/in-pack-size-measurement, and certainly optimization 2 is pointless without it (before that topic, --batch-check always printed the type). However, the first optimization affects regular --batch-check, and represents a much more serious performance regression. It looks like 798c35f is in master, but hasn't been released yet, so assuming these topics graduate before the next release, it should be OK. But if not, we should consider pulling the first patch out and applying it (or something like it) separately. The results for running (in linux.git): $ git rev-list --objects --all objects $ git cat-file --batch-check='%(objectsize:disk)' objects /dev/null are: before after real 1m17.143s 0m7.205s user 0m27.684s 0m6.580s sys 0m49.320s 0m0.608s Now, _most_ of that speedup is coming from the first patch, and it's quite trivial. The rest of the patches involve a lot of refactoring, and only manage to eke out one more second of performance, so it may not be worth it (though I think the result actually cleans up the sha1_object_info_extended interface a bit, and is worth it). Individual timings are in the commit messages. The patches are: [1/7]: cat-file: disable object/refname ambiguity check for batch mode Optimization 1. [2/7]: sha1_object_info_extended: rename status to type [3/7]: sha1_loose_object_info: make type lookup optional [4/7]: packed_object_info: hoist delta type resolution to helper [5/7]: packed_object_info: make type lookup optional [6/7]: sha1_object_info_extended: make type calculation optional Optimization 2. [7/7]: sha1_object_info_extended: pass object_info to helpers Optional cleanup. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] check-attr -z: a single -z should apply to both input and output
Unless a command has separate --nul-terminated-{input,output} options, the --nul-terminated-records (-z) option should apply to both input and output for consistency. The caller knows that its input paths may need to be protected for LF, and the program shows these problematic paths to its output. Signed-off-by: Junio C Hamano gits...@pobox.com --- Documentation/git-check-attr.txt | 9 +++-- builtin/check-attr.c | 14 +++--- 2 files changed, 18 insertions(+), 5 deletions(-) diff --git a/Documentation/git-check-attr.txt b/Documentation/git-check-attr.txt index 5abdbaa..760aca9 100644 --- a/Documentation/git-check-attr.txt +++ b/Documentation/git-check-attr.txt @@ -31,8 +31,9 @@ OPTIONS Read file names from stdin instead of from the command-line. -z:: - Only meaningful with `--stdin`; paths are separated with a - NUL character instead of a linefeed character. + The output format is modified to be machine-parseable. + If `--stdin` is also given, input paths are separated + with a NUL character instead of a linefeed character. \--:: Interpret all preceding arguments as attributes and all following @@ -48,6 +49,10 @@ OUTPUT The output is of the form: path COLON SP attribute COLON SP info LF +unless `-z` is in effect, in which case NUL is used as delimiter: +path NUL attribute NUL info NUL + + path is the path of a file being queried, attribute is an attribute being queried and info can be either: diff --git a/builtin/check-attr.c b/builtin/check-attr.c index 7cc9b5d..cd46690 100644 --- a/builtin/check-attr.c +++ b/builtin/check-attr.c @@ -20,7 +20,7 @@ static const struct option check_attr_options[] = { OPT_BOOLEAN(0, cached, cached_attrs, N_(use .gitattributes only from the index)), OPT_BOOLEAN(0 , stdin, stdin_paths, N_(read file names from stdin)), OPT_BOOLEAN('z', NULL, nul_term_line, - N_(input paths are terminated by a NUL character)), + N_(terminate input and output records by a NUL character)), OPT_END() }; @@ -38,8 +38,16 @@ static void output_attr(int cnt, struct git_attr_check *check, else if (ATTR_UNSET(value)) value = unspecified; - quote_c_style(file, NULL, stdout, 0); - printf(: %s: %s\n, git_attr_name(check[j].attr), value); + if (nul_term_line) { + printf(%s%c /* path */ + %s%c /* attrname */ + %s%c /* attrvalue */, + file, 0, git_attr_name(check[j].attr), 0, value, 0); + } else { + quote_c_style(file, NULL, stdout, 0); + printf(: %s: %s\n, git_attr_name(check[j].attr), value); + } + } } -- 1.8.3.2-911-g2c4daa5 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] check-attr: the name of the character is NUL, not NULL
Signed-off-by: Junio C Hamano gits...@pobox.com --- builtin/check-attr.c | 8 1 file changed, 4 insertions(+), 4 deletions(-) diff --git a/builtin/check-attr.c b/builtin/check-attr.c index 075d01d..7cc9b5d 100644 --- a/builtin/check-attr.c +++ b/builtin/check-attr.c @@ -13,14 +13,14 @@ N_(git check-attr --stdin [-z] [-a | --all | attr...] list-of-paths), NULL }; -static int null_term_line; +static int nul_term_line; static const struct option check_attr_options[] = { OPT_BOOLEAN('a', all, all_attrs, N_(report all attributes set on file)), OPT_BOOLEAN(0, cached, cached_attrs, N_(use .gitattributes only from the index)), OPT_BOOLEAN(0 , stdin, stdin_paths, N_(read file names from stdin)), - OPT_BOOLEAN('z', NULL, null_term_line, - N_(input paths are terminated by a null character)), + OPT_BOOLEAN('z', NULL, nul_term_line, + N_(input paths are terminated by a NUL character)), OPT_END() }; @@ -65,7 +65,7 @@ static void check_attr_stdin_paths(const char *prefix, int cnt, struct git_attr_check *check) { struct strbuf buf, nbuf; - int line_termination = null_term_line ? 0 : '\n'; + int line_termination = nul_term_line ? 0 : '\n'; strbuf_init(buf, 0); strbuf_init(nbuf, 0); -- 1.8.3.2-911-g2c4daa5 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Make check-{attr,ignore} -z consistent
A command that has to deal with input/output that may contain LF needs to offer the -z (--nul-terminated-records) option, and if it does not support separate --nul-terminated-{input,output} options, the -z option should govern both input and output. A caller that uses -z knows that the paths it feeds to these commands as input may have LF that cannot be expressed in LF delimited input format, and the output from these commands do contain the same paths, so there is no way for their output to be expressed unambiguously for an input that requires -z. Unfortunately, git check-attr -z was broken and ignored the option on the output side. This is a backward-incompatible fix, so we may need to add a checkAttr.brokenZ configuration to allow people to keep the existing breakage on top of these fixes, and then flip the default at Git 2.0 boundary (sometime early next year). Credit goes to Eric Sunshine for finding this discrepancy ($gmane/230158). Junio C Hamano (4): check-ignore: the name of the character is NUL, not NULL check-attr: the name of the character is NUL, not NULL check-ignore -z: a single -z should apply to both input and output check-attr -z: a single -z should apply to both input and output Documentation/git-check-attr.txt | 9 +++-- builtin/check-attr.c | 20 ++-- builtin/check-ignore.c | 12 ++-- 3 files changed, 27 insertions(+), 14 deletions(-) -- 1.8.3.2-911-g2c4daa5 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] check-ignore: the name of the character is NUL, not NULL
Signed-off-by: Junio C Hamano gits...@pobox.com --- builtin/check-ignore.c | 12 ++-- 1 file changed, 6 insertions(+), 6 deletions(-) diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c index 0240f99..be22bce 100644 --- a/builtin/check-ignore.c +++ b/builtin/check-ignore.c @@ -12,7 +12,7 @@ static const char * const check_ignore_usage[] = { NULL }; -static int null_term_line; +static int nul_term_line; static const struct option check_ignore_options[] = { OPT__QUIET(quiet, N_(suppress progress reporting)), @@ -20,8 +20,8 @@ static const struct option check_ignore_options[] = { OPT_GROUP(), OPT_BOOLEAN(0, stdin, stdin_paths, N_(read file names from stdin)), - OPT_BOOLEAN('z', NULL, null_term_line, - N_(input paths are terminated by a null character)), + OPT_BOOLEAN('z', NULL, nul_term_line, + N_(input paths are terminated by a NUL character)), OPT_END() }; @@ -29,7 +29,7 @@ static void output_exclude(const char *path, struct exclude *exclude) { char *bang = exclude-flags EXC_FLAG_NEGATIVE ? ! : ; char *slash = exclude-flags EXC_FLAG_MUSTBEDIR ? / : ; - if (!null_term_line) { + if (!nul_term_line) { if (!verbose) { write_name_quoted(path, stdout, '\n'); } else { @@ -111,7 +111,7 @@ static int check_ignore_stdin_paths(const char *prefix) struct strbuf buf, nbuf; char **pathspec = NULL; size_t nr = 0, alloc = 0; - int line_termination = null_term_line ? 0 : '\n'; + int line_termination = nul_term_line ? 0 : '\n'; int num_ignored; strbuf_init(buf, 0); @@ -150,7 +150,7 @@ int cmd_check_ignore(int argc, const char **argv, const char *prefix) if (argc 0) die(_(cannot specify pathnames with --stdin)); } else { - if (null_term_line) + if (nul_term_line) die(_(-z only makes sense with --stdin)); if (argc == 0) die(_(no path specified)); -- 1.8.3.2-911-g2c4daa5 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] check-ignore -z: a single -z should apply to both input and output
Unless a command has separate --nul-terminated-{input,output} options, the --nul-terminated-records (-z) option should apply to both input and output for consistency. The caller knows that its input paths may need to be protected for LF, and the program shows these problematic paths to its output. The code already did the right thing. Only the help text needs fixing. Signed-off-by: Junio C Hamano gits...@pobox.com --- builtin/check-ignore.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c index be22bce..03e509e 100644 --- a/builtin/check-ignore.c +++ b/builtin/check-ignore.c @@ -21,7 +21,7 @@ static const struct option check_ignore_options[] = { OPT_BOOLEAN(0, stdin, stdin_paths, N_(read file names from stdin)), OPT_BOOLEAN('z', NULL, nul_term_line, - N_(input paths are terminated by a NUL character)), + N_(terminate input and output records by a NUL character)), OPT_END() }; -- 1.8.3.2-911-g2c4daa5 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode
A common use of cat-file --batch-check is to feed a list of objects from rev-list --objects or a similar command. In this instance, all of our input objects are 40-byte sha1 ids. However, cat-file has always allowed arbitrary revision specifiers, and feeds the result to get_sha1(). Fortunately, get_sha1() recognizes a 40-byte sha1 before doing any hard work trying to look up refs, meaning this scenario should end up spending very little time converting the input into an object sha1. However, since 798c35f (get_sha1: warn about full or short object names that look like refs, 2013-05-29), when we encounter this case, we spend the extra effort to do a refname lookup anyway, just to print a warning. This is further exacerbated by ca91993 (get_packed_ref_cache: reload packed-refs file when it changes, 2013-06-20), which makes individual ref lookup more expensive by requiring a stat() of the packed-refs file for each missing ref. With no patches, this is the time it takes to run: $ git rev-list --objects --all objects $ time git cat-file --batch-check='%(objectname)' objects on the linux.git repository: real1m13.494s user0m25.924s sys 0m47.532s If we revert ca91993, the packed-refs up-to-date check, it gets a little better: real0m54.697s user0m21.692s sys 0m32.916s but we are still spending quite a bit of time on ref lookup (and we would not want to revert that patch, anyway, which has correctness issues). If we revert 798c35f, disabling the warning entirely, we get a much more reasonable time: real0m7.452s user0m6.836s sys 0m0.608s This patch does the moral equivalent of this final case (and gets similar speedups). We introduce a global flag that callers of get_sha1() can use to avoid paying the price for the warning. Signed-off-by: Jeff King p...@peff.net --- The solution feels a little hacky, but I'm not sure there is a better one short of reverting the warning entirely. We could also tie it to warn_ambiguous_refs (or add another config variable), but I don't think that makes sense. It is not about do I care about ambiguities in this repository, but rather I am going to do a really large number of sha1 resolutions, and I do not want to pay the price in this particular code path for the warning). There may be other sites that resolve a large number of refs and run into this, but I couldn't think of any. Doing for_each_ref would not have the same problem, as we already know they are refs there. builtin/cat-file.c | 9 + cache.h| 1 + environment.c | 1 + sha1_name.c| 14 -- 4 files changed, 19 insertions(+), 6 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index 0e64b41..fe5c77f 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -266,6 +266,15 @@ static int batch_objects(struct batch_options *opt) strbuf_expand(buf, opt-format, expand_format, data); data.mark_query = 0; + /* +* We are going to call get_sha1 on a potentially very large number of +* objects. In most large cases, these will be actual object sha1s. The +* cost to double-check that each one is not also a ref (just so we can +* warn) ends up dwarfing the actual cost of the object lookups +* themselves. We can work around it by just turning off the warning. +*/ + warn_on_object_refname_ambiguity = 0; + while (strbuf_getline(buf, stdin, '\n') != EOF) { char *p; int error; diff --git a/cache.h b/cache.h index 2d06169..c1fd82c 100644 --- a/cache.h +++ b/cache.h @@ -575,6 +575,7 @@ extern int warn_ambiguous_refs; extern int prefer_symlink_refs; extern int log_all_ref_updates; extern int warn_ambiguous_refs; +extern int warn_on_object_refname_ambiguity; extern int shared_repository; extern const char *apply_default_whitespace; extern const char *apply_default_ignorewhitespace; diff --git a/environment.c b/environment.c index 0cb67b2..5398c36 100644 --- a/environment.c +++ b/environment.c @@ -22,6 +22,7 @@ int warn_ambiguous_refs = 1; int is_bare_repository_cfg = -1; /* unspecified */ int log_all_ref_updates = -1; /* unspecified */ int warn_ambiguous_refs = 1; +int warn_on_object_refname_ambiguity = 1; int repository_format_version; const char *git_commit_encoding; const char *git_log_output_encoding; diff --git a/sha1_name.c b/sha1_name.c index 90419ef..6f8812a 100644 --- a/sha1_name.c +++ b/sha1_name.c @@ -452,13 +452,15 @@ static int get_sha1_basic(const char *str, int len, unsigned char *sha1) int at, reflog_len, nth_prior = 0; if (len == 40 !get_sha1_hex(str, sha1)) { - refs_found = dwim_ref(str, len, tmp_sha1, real_ref); - if (refs_found 0 warn_ambiguous_refs) { - warning(warn_msg, len, str); - if (advice_object_name_warning) - fprintf(stderr, %s\n,
[PATCH 2/7] sha1_object_info_extended: rename status to type
The value we get from each low-level object_info function (e.g., loose, packed) is actually the object type (or -1 for error). Let's explicitly call it type, which will make further refactorings easier to read. Signed-off-by: Jeff King p...@peff.net --- sha1_file.c | 20 ++-- 1 file changed, 10 insertions(+), 10 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 4c2365f..e826066 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2394,7 +2394,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) { struct cached_object *co; struct pack_entry e; - int status, rtype; + int type, rtype; co = find_cached_object(sha1); if (co) { @@ -2408,23 +2408,23 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) if (!find_pack_entry(sha1, e)) { /* Most likely it's a loose object. */ - status = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep); - if (status = 0) { + type = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep); + if (type = 0) { oi-whence = OI_LOOSE; - return status; + return type; } /* Not a loose object; someone else may have just packed it. */ reprepare_packed_git(); if (!find_pack_entry(sha1, e)) - return status; + return type; } - status = packed_object_info(e.p, e.offset, oi-sizep, rtype, - oi-disk_sizep); - if (status 0) { + type = packed_object_info(e.p, e.offset, oi-sizep, rtype, + oi-disk_sizep); + if (type 0) { mark_bad_packed_object(e.p, sha1); - status = sha1_object_info_extended(sha1, oi); + type = sha1_object_info_extended(sha1, oi); } else if (in_delta_base_cache(e.p, e.offset)) { oi-whence = OI_DBCACHED; } else { @@ -2435,7 +2435,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) rtype == OBJ_OFS_DELTA); } - return status; + return type; } int sha1_object_info(const unsigned char *sha1, unsigned long *sizep) -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 1/4] builtin: add git-check-mailmap command
On Fri, Jul 12, 2013 at 1:47 AM, Junio C Hamano gits...@pobox.com wrote: Eric Sunshine sunsh...@sunshineco.com writes: For each contact information (either in the form of ``Name user@host'' or ...) in order to clarify that the two forms of input is what you call contact information. Is this easier to read? For each ``Name $$user@host$$'' or ``$$user@host$$'' from the command-line or standard input (when using `--stdin`), print a line showing either the canonical name and email address (see Mapping Authors below), or the input ``Name $$user@host$$'' or ``$$user@host$$'' if there is no mapping for that person. I find it easier than your original, but I do not know if you would want to repeat the Name... or user@host at the end. It does not seem to add much useful information and is distracting. Next attempt: For each ``Name $$user@host$$'' or ``$$user@host$$'' from the command-line or standard input (when using `--stdin`) look up the person's canonical name and email address (see Mapping Authors below). If found, print them; otherwise print the input as-is. In check-attr, null_term_line indicates that _input_ lines are null-terminated. In check-ignore, null_term_lines is overloaded (and perhaps abused) to mean that both _input_ and _output_ lines are null-terminated. That is unfortunate but it is good that you found the breakage. As we do not have --nul-terminated-input and --nul-terminated-output options separtely, -z should apply to both input and output. What b4666852 (check-attr: Add --stdin option, 2008-10-07) did is broken. I can make git-check-mailmap behave this way, however, other than git-check-ignore (which is quite new), there doesn't seem to be any precedence (that I can find) anywhere else in git which ties input and output null-termination to a single switch. Is it desirable to do so or should the user have more fine-grained control? (xargs -0 comes to mind when thinking of a null-termination input switch.) Also git check-ignore -h advertises -z as only affecting --stdin, which is also wrong. It does affect both input and output as it should, so it should be described as such, I think. I also noticed this. (It was copied from check-attr.c). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2 3/4] t4203: test mailmap functionality directly rather than indirectly
Eric Sunshine sunsh...@sunshineco.com writes: On Fri, Jul 12, 2013 at 1:48 AM, Junio C Hamano gits...@pobox.com wrote: Jonathan Nieder jrnie...@gmail.com writes: My current thinking is no --- the patch has as a justification Now we can test these aspects of .mailmap handling directly with a low-level tool instead of using the tool most people will use, so do so, which sounds an awful lot like Reduce test coverage of commonly used tools, because we can. Yes, that was exactly my reaction that prompted my response. Does any of my follow-up commentary result in a different reaction? Not really. While I _do_ think direct testing has merits, I think that should be done by adding direct tests, not by removing the tests that are meant to protect higher level _users_ of the underlying mechanism from breakages. After all, their breakages may not come from new breakages of the lower level mechanism (i.e. the mailmap.c code) but the way these higher level code makes calls into the mechanism, and direct test of the lower level mechanism will not protect them from the latter kind of breakages. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/7] sha1_loose_object_info: make type lookup optional
Until recently, the only items to request from sha1_object_info_extended were type and size. This meant that we always had to open a loose object file to determine one or the other. But with the addition of the disk_size query, it's possible that we can fulfill the query without even opening the object file at all. However, since the function interface always returns the type, we have no way of knowing whether the caller cares about it or not. This patch only modified sha1_loose_object_info to make type lookup optional using an out-parameter, similar to the way the size is handled (and the return value is 0 or -1 for success or error, respectively). There should be no functional change yet, though, as sha1_object_info_extended, the only caller, will always ask for a type. Signed-off-by: Jeff King p...@peff.net --- Obviously the end goal is to have sha1_object_info_extended do this optionally, too (which happens in patch 6). I'm not too happy about the stat_sha1_file function, which is almost identical to open_sha1_file (and which in turn is almost the same thing as has_loose_object). They all do: try operation X on sha1_file_name(sha1) prepare_alt_odb(); foreach alt_odb try operation X on alt_odb/sha1_file_name(sha1) Unfortunately it's hard to do this kind of factoring out in C, because the argument and return types for operation X are different in these cases; you are stuck with providing callback function that takes a void pointer to some operation-specific data. The boilerplate ends up worse than the repeated code. Another solution would be to have a find the file for loose object Y function, and then just do operation X on that. But since X is a filesystem operation in each case, you do not want to lose the atomicity of performing the operation directly (not to mention incurring the cost of an extra stat() on each file). So I am open to clever refactoring suggestions. sha1_file.c | 48 +++- 1 file changed, 43 insertions(+), 5 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index e826066..39e7313 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1306,6 +1306,26 @@ static int git_open_noatime(const char *name) } } +static int stat_sha1_file(const unsigned char *sha1, struct stat *st) +{ + char *name = sha1_file_name(sha1); + struct alternate_object_database *alt; + + if (!lstat(name, st)) + return 0; + + prepare_alt_odb(); + errno = ENOENT; + for (alt = alt_odb_list; alt; alt = alt-next) { + name = alt-name; + fill_sha1_path(name, sha1); + if (!lstat(alt-base, st)) + return 0; + } + + return -1; +} + static int open_sha1_file(const unsigned char *sha1) { int fd; @@ -2363,7 +2383,9 @@ struct packed_git *find_sha1_pack(const unsigned char *sha1, } -static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *sizep, +static int sha1_loose_object_info(const unsigned char *sha1, + enum object_type *typep, + unsigned long *sizep, unsigned long *disk_sizep) { int status; @@ -2372,6 +2394,20 @@ static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *size git_zstream stream; char hdr[32]; + /* +* If we don't care about type or size, then we don't +* need to look inside the object at all. +*/ + if (!typep !sizep) { + if (disk_sizep) { + struct stat st; + if (stat_sha1_file(sha1, st) 0) + return -1; + *disk_sizep = st.st_size; + } + return 0; + } + map = map_sha1_file(sha1, mapsize); if (!map) return -1; @@ -2386,7 +2422,9 @@ static int sha1_loose_object_info(const unsigned char *sha1, unsigned long *size *sizep = size; git_inflate_end(stream); munmap(map, mapsize); - return status; + if (typep) + *typep = status; + return 0; } /* returns enum object_type or negative */ @@ -2408,8 +2446,8 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) if (!find_pack_entry(sha1, e)) { /* Most likely it's a loose object. */ - type = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep); - if (type = 0) { + if (!sha1_loose_object_info(sha1, type, + oi-sizep, oi-disk_sizep)) { oi-whence = OI_LOOSE; return type; } @@ -2417,7 +2455,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) /* Not a loose object; someone else may have
[PATCH 5/7] packed_object_info: make type lookup optional
Currently, packed_object_info can save some work by not calculating the size or disk_size of the object if the caller is not interested. However, it always calculates the true object type, whether the caller cares or not, and only optionally returns the easy-to-get representation type. Let's swap these types. The function will now return the representation type (or OBJ_BAD on failure), and will only optionally fill in the true type. There should be no behavior change yet, as the only caller, sha1_object_info_extended, will always feed it a type pointer. Signed-off-by: Jeff King p...@peff.net --- sha1_file.c | 25 - 1 file changed, 16 insertions(+), 9 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 6f4aff3..2a1e230 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1783,7 +1783,7 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, } static int packed_object_info(struct packed_git *p, off_t obj_offset, - unsigned long *sizep, int *rtype, + enum object_type *typep, unsigned long *sizep, unsigned long *disk_sizep) { struct pack_window *w_curs = NULL; @@ -1791,11 +1791,12 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, off_t curpos = obj_offset; enum object_type type; + /* +* We always get the representation type, but only convert it to +* a real type later if the caller is interested. +*/ type = unpack_object_header(p, w_curs, curpos, size); - if (rtype) - *rtype = type; /* representation type */ - if (sizep) { if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { off_t tmp_pos = curpos; @@ -1820,7 +1821,13 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, *disk_sizep = revidx[1].offset - obj_offset; } - type = packed_to_object_type(p, obj_offset, type, w_curs, curpos); + if (typep) { + *typep = packed_to_object_type(p, obj_offset, type, w_curs, curpos); + if (*typep 0) { + type = OBJ_BAD; + goto out; + } + } out: unuse_pack(w_curs); @@ -2471,11 +2478,11 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) return -1; } - type = packed_object_info(e.p, e.offset, oi-sizep, rtype, - oi-disk_sizep); - if (type 0) { + rtype = packed_object_info(e.p, e.offset, type, oi-sizep, + oi-disk_sizep); + if (rtype 0) { mark_bad_packed_object(e.p, sha1); - type = sha1_object_info_extended(sha1, oi); + return sha1_object_info_extended(sha1, oi); } else if (in_delta_base_cache(e.p, e.offset)) { oi-whence = OI_DBCACHED; } else { -- 1.8.3.rc3.24.gec82cb9 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/7] packed_object_info: hoist delta type resolution to helper
To calculate the type of a packed object, we must walk down its delta chain until we hit a true base object with a real type. Most of the code in packed_object_info is for handling this case. Let's hoist it out into a separate helper function, which will make it easier to make the type-lookup optional in the future (and keep our indentation level sane). Signed-off-by: Jeff King p...@peff.net --- Annoyingly, since the part we are moving comprises the bulk of the function, the diff tends to show the opposite of what actually happened: it looks like we renamed the function to its helper name, and moved the other parts to a new function, which happens to have the same name as the old one. So read the diff with care. :) sha1_file.c | 93 +++-- 1 file changed, 53 insertions(+), 40 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 39e7313..6f4aff3 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1713,52 +1713,21 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, return type; } - #define POI_STACK_PREALLOC 64 -static int packed_object_info(struct packed_git *p, off_t obj_offset, - unsigned long *sizep, int *rtype, - unsigned long *disk_sizep) +static enum object_type packed_to_object_type(struct packed_git *p, + off_t obj_offset, + enum object_type type, + struct pack_window **w_curs, + off_t curpos) { - struct pack_window *w_curs = NULL; - unsigned long size; - off_t curpos = obj_offset; - enum object_type type; off_t small_poi_stack[POI_STACK_PREALLOC]; off_t *poi_stack = small_poi_stack; int poi_stack_nr = 0, poi_stack_alloc = POI_STACK_PREALLOC; - type = unpack_object_header(p, w_curs, curpos, size); - - if (rtype) - *rtype = type; /* representation type */ - - if (sizep) { - if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { - off_t tmp_pos = curpos; - off_t base_offset = get_delta_base(p, w_curs, tmp_pos, - type, obj_offset); - if (!base_offset) { - type = OBJ_BAD; - goto out; - } - *sizep = get_size_from_delta(p, w_curs, tmp_pos); - if (*sizep == 0) { - type = OBJ_BAD; - goto out; - } - } else { - *sizep = size; - } - } - - if (disk_sizep) { - struct revindex_entry *revidx = find_pack_revindex(p, obj_offset); - *disk_sizep = revidx[1].offset - obj_offset; - } - while (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { off_t base_offset; + unsigned long size; /* Push the object we're going to leave behind */ if (poi_stack_nr = poi_stack_alloc poi_stack == small_poi_stack) { poi_stack_alloc = alloc_nr(poi_stack_nr); @@ -1769,11 +1738,11 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, } poi_stack[poi_stack_nr++] = obj_offset; /* If parsing the base offset fails, just unwind */ - base_offset = get_delta_base(p, w_curs, curpos, type, obj_offset); + base_offset = get_delta_base(p, w_curs, curpos, type, obj_offset); if (!base_offset) goto unwind; curpos = obj_offset = base_offset; - type = unpack_object_header(p, w_curs, curpos, size); + type = unpack_object_header(p, w_curs, curpos, size); if (type = OBJ_NONE) { /* If getting the base itself fails, we first * retry the base, otherwise unwind */ @@ -1800,7 +1769,6 @@ out: out: if (poi_stack != small_poi_stack) free(poi_stack); - unuse_pack(w_curs); return type; unwind: @@ -1814,6 +1782,51 @@ unwind: goto out; } +static int packed_object_info(struct packed_git *p, off_t obj_offset, + unsigned long *sizep, int *rtype, + unsigned long *disk_sizep) +{ + struct pack_window *w_curs = NULL; + unsigned long size; + off_t curpos = obj_offset; + enum object_type type; + + type = unpack_object_header(p, w_curs, curpos, size); + + if (rtype) + *rtype = type; /* representation type */ + + if (sizep) { + if (type
Re: [PATCH v2 1/4] builtin: add git-check-mailmap command
Eric Sunshine sunsh...@sunshineco.com writes: I find it easier than your original, but I do not know if you would want to repeat the Name... or user@host at the end. It does not seem to add much useful information and is distracting. Next attempt: For each ``Name $$user@host$$'' or ``$$user@host$$'' from the command-line or standard input (when using `--stdin`) look up the person's canonical name and email address (see Mapping Authors below). If found, print them; otherwise print the input as-is. Nice. ... Is it desirable to do so or should the user have more fine-grained control? (xargs -0 comes to mind when thinking of a null-termination input switch.) For the purposes of check-attr and check-ignore, a single -z governing both is sufficient. I think you already got that from my 4-patch series, but the core reason for that is : - when -z is used, the user knows the input paths may need protection against LF. - our output contains these same paths. - which means our output cannot be expressed unambiguously using LF as record separator. For the purpose of check-mailmap, I actually do not see much point in supporting -z format. We do not even handle names or addresses with LF in it. The mailmap format would not let you express such records in the first place, no? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 6/7] sha1_object_info_extended: make type calculation optional
Each caller of sha1_object_info_extended sets up an object_info struct to tell the function which elements of the object it wants to get. Until now, getting the type of the object has always been required (and it is returned via the return type rather than a pointer in object_info). This can involve actually opening a loose object file to determine its type, or following delta chains to determine a packed file's base type. These effects produce a measurable slow-down when doing a cat-file --batch-check that does not include %(objecttype). This patch adds a typep query to struct object_info, so that it can be optionally queried just like size and disk_size. As a result, the return type of the function is no longer the object type, but rather 0/-1 for success/error. As there are only three callers total, we just fix up each caller rather than keep a compatibility wrapper: 1. The simpler sha1_object_info wrapper continues to always ask for and return the type field. 2. The istream_source function wants to know the type, and so always asks for it. 3. The cat-file batch code asks for the type only when %(objecttype) is part of the format string. On linux.git, the best-of-five for running: $ git rev-list --objects --all objects $ time git cat-file --batch-check='%(objectsize:disk)' on a fully packed repository goes from: real0m8.680s user0m8.160s sys 0m0.512s to: real0m7.205s user0m6.580s sys 0m0.608s Signed-off-by: Jeff King p...@peff.net --- This ends up changing the behavior of sha1_object_info_extended without changing its function signature. Given that it is a fairly inactive area of the code and that there are no topics in flight, I think this is OK. But an alternative could be to add (yet another) wrapper to leave the first two call-sites untouched. builtin/cat-file.c | 7 --- cache.h| 1 + sha1_file.c| 20 +--- streaming.c| 2 +- 4 files changed, 19 insertions(+), 11 deletions(-) diff --git a/builtin/cat-file.c b/builtin/cat-file.c index fe5c77f..163ce6c 100644 --- a/builtin/cat-file.c +++ b/builtin/cat-file.c @@ -150,7 +150,9 @@ static void expand_atom(struct strbuf *sb, const char *atom, int len, if (!data-mark_query) strbuf_addstr(sb, sha1_to_hex(data-sha1)); } else if (is_atom(objecttype, atom, len)) { - if (!data-mark_query) + if (data-mark_query) + data-info.typep = data-type; + else strbuf_addstr(sb, typename(data-type)); } else if (is_atom(objectsize, atom, len)) { if (data-mark_query) @@ -229,8 +231,7 @@ static int batch_one_object(const char *obj_name, struct batch_options *opt, return 0; } - data-type = sha1_object_info_extended(data-sha1, data-info); - if (data-type = 0) { + if (sha1_object_info_extended(data-sha1, data-info) 0) { printf(%s missing\n, obj_name); fflush(stdout); return 0; diff --git a/cache.h b/cache.h index c1fd82c..d3b770c 100644 --- a/cache.h +++ b/cache.h @@ -1130,6 +1130,7 @@ struct object_info { struct object_info { /* Request */ + enum object_type *typep; unsigned long *sizep; unsigned long *disk_sizep; diff --git a/sha1_file.c b/sha1_file.c index 2a1e230..52f7a1e 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -2452,24 +2452,26 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) { struct cached_object *co; struct pack_entry e; - int type, rtype; + int rtype; co = find_cached_object(sha1); if (co) { + if (oi-typep) + *(oi-typep) = co-type; if (oi-sizep) *(oi-sizep) = co-size; if (oi-disk_sizep) *(oi-disk_sizep) = 0; oi-whence = OI_CACHED; - return co-type; + return 0; } if (!find_pack_entry(sha1, e)) { /* Most likely it's a loose object. */ - if (!sha1_loose_object_info(sha1, type, + if (!sha1_loose_object_info(sha1, oi-typep, oi-sizep, oi-disk_sizep)) { oi-whence = OI_LOOSE; - return type; + return 0; } /* Not a loose object; someone else may have just packed it. */ @@ -2478,7 +2480,7 @@ int sha1_object_info_extended(const unsigned char *sha1, struct object_info *oi) return -1; } - rtype = packed_object_info(e.p, e.offset, type, oi-sizep, + rtype = packed_object_info(e.p, e.offset, oi-typep, oi-sizep, oi-disk_sizep); if (rtype 0) {
[PATCH 7/7] sha1_object_info_extended: pass object_info to helpers
We take in a struct object_info which contains pointers to storage for items the caller cares about. But then rather than pass the whole object to the low-level loose/packed helper functions, we pass the individual pointers. Let's pass the whole struct instead, which will make adding more items later easier. Signed-off-by: Jeff King p...@peff.net --- This one is an optional cleanup. The diff is quite noisy due to all of the s/foo/oi-foo/, so it is arguable whether the result is nicer or not. It would make later additions to object_info nicer, but I do not plan to add any more. It _would_ have been a nice cleanup to do at the beginning of the series (and further diffs would not have to add extra parameters to the function calls), but that would make the incremental learn to do type optionally patches quite awkward. So I am on the fence over this one, and do not mind too much if it gets dropped. sha1_file.c | 49 ++--- 1 file changed, 22 insertions(+), 27 deletions(-) diff --git a/sha1_file.c b/sha1_file.c index 52f7a1e..563f521 100644 --- a/sha1_file.c +++ b/sha1_file.c @@ -1783,8 +1783,7 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, } static int packed_object_info(struct packed_git *p, off_t obj_offset, - enum object_type *typep, unsigned long *sizep, - unsigned long *disk_sizep) + struct object_info *oi) { struct pack_window *w_curs = NULL; unsigned long size; @@ -1797,7 +1796,7 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, */ type = unpack_object_header(p, w_curs, curpos, size); - if (sizep) { + if (oi-sizep) { if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) { off_t tmp_pos = curpos; off_t base_offset = get_delta_base(p, w_curs, tmp_pos, @@ -1806,24 +1805,24 @@ static int packed_object_info(struct packed_git *p, off_t obj_offset, type = OBJ_BAD; goto out; } - *sizep = get_size_from_delta(p, w_curs, tmp_pos); - if (*sizep == 0) { + *oi-sizep = get_size_from_delta(p, w_curs, tmp_pos); + if (*oi-sizep == 0) { type = OBJ_BAD; goto out; } } else { - *sizep = size; + *oi-sizep = size; } } - if (disk_sizep) { + if (oi-disk_sizep) { struct revindex_entry *revidx = find_pack_revindex(p, obj_offset); - *disk_sizep = revidx[1].offset - obj_offset; + *oi-disk_sizep = revidx[1].offset - obj_offset; } - if (typep) { - *typep = packed_to_object_type(p, obj_offset, type, w_curs, curpos); - if (*typep 0) { + if (oi-typep) { + *oi-typep = packed_to_object_type(p, obj_offset, type, w_curs, curpos); + if (*oi-typep 0) { type = OBJ_BAD; goto out; } @@ -2404,9 +2403,7 @@ static int sha1_loose_object_info(const unsigned char *sha1, } static int sha1_loose_object_info(const unsigned char *sha1, - enum object_type *typep, - unsigned long *sizep, - unsigned long *disk_sizep) + struct object_info *oi) { int status; unsigned long mapsize, size; @@ -2418,12 +2415,12 @@ static int sha1_loose_object_info(const unsigned char *sha1, * If we don't care about type or size, then we don't * need to look inside the object at all. */ - if (!typep !sizep) { - if (disk_sizep) { + if (!oi-typep !oi-sizep) { + if (oi-disk_sizep) { struct stat st; if (stat_sha1_file(sha1, st) 0) return -1; - *disk_sizep = st.st_size; + *oi-disk_sizep = st.st_size; } return 0; } @@ -2431,19 +2428,19 @@ static int sha1_loose_object_info(const unsigned char *sha1, map = map_sha1_file(sha1, mapsize); if (!map) return -1; - if (disk_sizep) - *disk_sizep = mapsize; + if (oi-disk_sizep) + *oi-disk_sizep = mapsize; if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr)) 0) status = error(unable to unpack %s header, sha1_to_hex(sha1)); else if ((status = parse_sha1_header(hdr, size)) 0)
Re: [PATCH v2 1/4] builtin: add git-check-mailmap command
On Fri, Jul 12, 2013 at 2:34 AM, Junio C Hamano gits...@pobox.com wrote: Eric Sunshine sunsh...@sunshineco.com writes: ... Is it desirable to do so or should the user have more fine-grained control? (xargs -0 comes to mind when thinking of a null-termination input switch.) For the purposes of check-attr and check-ignore, a single -z governing both is sufficient. I think you already got that from my 4-patch series, but the core reason for that is : I'm reading it right now. - when -z is used, the user knows the input paths may need protection against LF. - our output contains these same paths. - which means our output cannot be expressed unambiguously using LF as record separator. For the purpose of check-mailmap, I actually do not see much point in supporting -z format. We do not even handle names or addresses with LF in it. The mailmap format would not let you express such records in the first place, no? You're right. I had this exact argument in mind for why null-termination was not needed on the input side of check-mailmap, but for some reason I had a blind spot concerning the output side. I'll drop this option in the next re-roll. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/4] Make check-{attr,ignore} -z consistent
On Fri, Jul 12, 2013 at 2:18 AM, Junio C Hamano gits...@pobox.com wrote: A command that has to deal with input/output that may contain LF needs to offer the -z (--nul-terminated-records) option, and if it does not support separate --nul-terminated-{input,output} options, the -z option should govern both input and output. A caller that uses -z knows that the paths it feeds to these commands as input may have LF that cannot be expressed in LF delimited input format, and the output from these commands do contain the same paths, so there is no way for their output to be expressed unambiguously for an input that requires -z. FWIW, applying to the entire series: Reviewed-by: Eric Sunshine sunsh...@sunshineco.com Unfortunately, git check-attr -z was broken and ignored the option on the output side. This is a backward-incompatible fix, so we may need to add a checkAttr.brokenZ configuration to allow people to keep the existing breakage on top of these fixes, and then flip the default at Git 2.0 boundary (sometime early next year). Credit goes to Eric Sunshine for finding this discrepancy ($gmane/230158). Junio C Hamano (4): check-ignore: the name of the character is NUL, not NULL check-attr: the name of the character is NUL, not NULL check-ignore -z: a single -z should apply to both input and output check-attr -z: a single -z should apply to both input and output Documentation/git-check-attr.txt | 9 +++-- builtin/check-attr.c | 20 ++-- builtin/check-ignore.c | 12 ++-- 3 files changed, 27 insertions(+), 14 deletions(-) -- 1.8.3.2-911-g2c4daa5 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects
Hi Junio, [administrivia: you seem to have mail-followup-to that points at you and the list; is that really needed???] I'm not subscribed to the list, so yes :-) This happens when a client issues a fetch with a depth bigger or equal to the number of commits the server is ahead of the client. Do you mean smaller (not bigger)? Yes, I meant smaller (reworded this first sentence a few times and then messed up :-) diff --git a/upload-pack.c b/upload-pack.c index 59f43d1..5885f33 100644 --- a/upload-pack.c +++ b/upload-pack.c @@ -122,6 +122,14 @@ static int do_rev_list(int in, int out, void *user_data) if (prepare_revision_walk(revs)) die(revision walk setup failed); mark_edges_uninteresting(revs.commits, revs, show_edge); + /* In case we create a new shallow root, make sure that all +* we don't send over objects that the client already has just +* because their have revisions are no longer reachable from +* the shallow root. */ + for (i = 0; i have_obj.nr; i++) { + struct commit *commit = (struct commit *)have_obj.objects[i].item; + mark_tree_uninteresting(commit-tree); + } Hmph. In your discussion (including the comment), you talk about shallow root (I think that is the same as what we call shallow boundary), I think so, yes. I mean to refer to the commits referenced in .git/shallow, that have their parents hidden. but in this added block, there is nothing that checks CLIENT_SHALLOW or SHALLOW flags to special case that. Is it a good idea to unconditionally do this for all have revisions? That's what I meant in my mail with applying the fix unconditionally - there is probably some check needed (I discussed a few options in the mail as well). Note that this entire do_rev_list function is only called when there are shallow revisions involved, so there is also a basic only when shallow check in place. Also there is another loop that iterates over have revisions just above the precontext. I wonder if this added code belongs in that loop. I think we could add it there, yes. On the other hand, if we only want to execute this code when there are shallow boundaries in the list of revisions to send (as I suggested in my previous mail), then we can't move this code up. Gr. Matthijs -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] git-clone.txt: remove the restriction on pushing from a shallow clone
On Fri, Jul 12, 2013 at 12:54 PM, Junio C Hamano gits...@pobox.com wrote: Nguyễn Thái Ngọc Duy pclo...@gmail.com writes: + number of limitations (you cannot clone or fetch from it, nor + push into it), but is adequate if you are only interested in + the recent history of a large project with a long history. Ahh, sorry for the noise. You still say you cannot push _into_ it. Yes, I actually removed that check in the first iteration of this patch, reasoning that it can't cause any harm and it mostly works. But mostly is not good enough. I will try remove it later if I can make it always work by extending git protocol a bit (I think full clone pushing to shallow one can always work. Shallow pushing to shallow, probably no such guarantee) -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode
On 07/12/2013 08:20 AM, Jeff King wrote: A common use of cat-file --batch-check is to feed a list of objects from rev-list --objects or a similar command. In this instance, all of our input objects are 40-byte sha1 ids. However, cat-file has always allowed arbitrary revision specifiers, and feeds the result to get_sha1(). Fortunately, get_sha1() recognizes a 40-byte sha1 before doing any hard work trying to look up refs, meaning this scenario should end up spending very little time converting the input into an object sha1. However, since 798c35f (get_sha1: warn about full or short object names that look like refs, 2013-05-29), when we encounter this case, we spend the extra effort to do a refname lookup anyway, just to print a warning. This is further exacerbated by ca91993 (get_packed_ref_cache: reload packed-refs file when it changes, 2013-06-20), which makes individual ref lookup more expensive by requiring a stat() of the packed-refs file for each missing ref. With no patches, this is the time it takes to run: $ git rev-list --objects --all objects $ time git cat-file --batch-check='%(objectname)' objects on the linux.git repository: real1m13.494s user0m25.924s sys 0m47.532s If we revert ca91993, the packed-refs up-to-date check, it gets a little better: real0m54.697s user0m21.692s sys 0m32.916s but we are still spending quite a bit of time on ref lookup (and we would not want to revert that patch, anyway, which has correctness issues). If we revert 798c35f, disabling the warning entirely, we get a much more reasonable time: real0m7.452s user0m6.836s sys 0m0.608s This patch does the moral equivalent of this final case (and gets similar speedups). We introduce a global flag that callers of get_sha1() can use to avoid paying the price for the warning. Signed-off-by: Jeff King p...@peff.net --- The solution feels a little hacky, but I'm not sure there is a better one short of reverting the warning entirely. We could also tie it to warn_ambiguous_refs (or add another config variable), but I don't think that makes sense. It is not about do I care about ambiguities in this repository, but rather I am going to do a really large number of sha1 resolutions, and I do not want to pay the price in this particular code path for the warning). There may be other sites that resolve a large number of refs and run into this, but I couldn't think of any. Doing for_each_ref would not have the same problem, as we already know they are refs there. ISTM that callers of --batch-check would usually know whether they are feeding it SHA-1s or arbitrary object specifiers. (And in most cases when there are a large number of them, they are probably all SHA-1s). So instead of framing this change as skip object refname ambiguity check (i.e., sacrifice some correctness for performance), perhaps it would be less hacky to offer the following new feature: To cat-file we could add an option like --sha1-only or --literal or --no-dwim (... better names are failing me) which would skip *all* dwimming of 40-character strings. It would also assume that any shorter strings are abbreviated SHA-1s and fail if they are not. This would be a nice feature by itself (these are object names, dammit, and don't try to tell me differently!) and would have the additional small advantage of speeding up lookups of abbreviated SHA-1s, which (regardless of your patch) otherwise go through the whole DWIM process. I understand that such an option alone would leave some old scripts suffering the performance regression caused by the check, but in most cases it would be easy to fix them. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/2] open() error checking
#1 is Dale's suggested change. Dale, to include it we'd need your Signed-off-by as per Documentation/SubmittingPatches. #2 is a similar error-checking fix; I reviewed 'git grep \bopen\b' and found one case where the return value was obviously not tested. The corresponding Windows code path has the same problem, but I dare not touch it; perhaps someone from the Windows side can look into it? I originally had a four-patch series to open 0/1/2 from /dev/null, but then I noticed that this was shot down in 2008: http://thread.gmane.org/gmane.comp.version-control.git/93605/focus=93896 Do you want to resurrect this? The worst part about it is that because we don't have a stderr to rely on, we can't simply die(stop playing mind games). Dale R. Worley (1): git_mkstemps: correctly test return value of open() Thomas Rast (1): run-command: dup_devnull(): guard against syscalls failing run-command.c | 5 - wrapper.c | 2 +- 2 files changed, 5 insertions(+), 2 deletions(-) -- 1.8.3.2.998.g1d087bc -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/2] git_mkstemps: correctly test return value of open()
From: Dale R. Worley wor...@alum.mit.edu open() returns -1 on failure, and indeed 0 is a possible success value if the user closed stdin in our process. Fix the test. Signed-off-by: Thomas Rast tr...@inf.ethz.ch --- wrapper.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/wrapper.c b/wrapper.c index dd7ecbb..6a015de 100644 --- a/wrapper.c +++ b/wrapper.c @@ -322,7 +322,7 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int mode) template[5] = letters[v % num_letters]; v /= num_letters; fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode); - if (fd 0) + if (fd = 0) return fd; /* * Fatal error (EPERM, ENOSPC etc). -- 1.8.3.2.998.g1d087bc -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/2] run-command: dup_devnull(): guard against syscalls failing
dup_devnull() did not check the return values of open() and dup2(). Fix this omission. Signed-off-by: Thomas Rast tr...@inf.ethz.ch --- run-command.c | 5 - 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/run-command.c b/run-command.c index aece872..1b7f88e 100644 --- a/run-command.c +++ b/run-command.c @@ -76,7 +76,10 @@ static inline void close_pair(int fd[2]) static inline void dup_devnull(int to) { int fd = open(/dev/null, O_RDWR); - dup2(fd, to); + if (fd 0) + die_errno(_(open /dev/null failed)); + if (dup2(fd, to) 0) + die_errno(_(dup2(%d,%d) failed), fd, to); close(fd); } #endif -- 1.8.3.2.998.g1d087bc -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode
On Fri, Jul 12, 2013 at 10:47:46AM +0200, Michael Haggerty wrote: The solution feels a little hacky, but I'm not sure there is a better one short of reverting the warning entirely. We could also tie it to warn_ambiguous_refs (or add another config variable), but I don't think that makes sense. It is not about do I care about ambiguities in this repository, but rather I am going to do a really large number of sha1 resolutions, and I do not want to pay the price in this particular code path for the warning). There may be other sites that resolve a large number of refs and run into this, but I couldn't think of any. Doing for_each_ref would not have the same problem, as we already know they are refs there. ISTM that callers of --batch-check would usually know whether they are feeding it SHA-1s or arbitrary object specifiers. (And in most cases when there are a large number of them, they are probably all SHA-1s). Thanks for bringing this up; I had meant to outline this option in the cover letter, but forgot when posting. If were designing cat-file from scratch, I'd consider having --batch take just 40-hex sha1s in the first place. Since we can't do that now for backwards compatibility, having an option is the best we can do there. But having to use such an option to avoid a 10x slowdown just strikes me as ugly for two reasons: 1. It's part of the user-visible interface, and it seems like an extra wtf? moment when somebody wonders why their script is painfully slow, and why they need a magic option to fix it. We're cluttering the interface forever. 2. It's a sign that the refname check was put in for a specific performance profile (resolving one or a few refs), but didn't consider resolving a large number. I'm wondering what other performance regressions it's possible to trigger. So instead of framing this change as skip object refname ambiguity check (i.e., sacrifice some correctness for performance), perhaps it would be less hacky to offer the following new feature: I wouldn't frame it as correctness for performance at all. The check does not impact behavior at all, and is purely informational (to catch quite a rare situation, too). I'd argue that the check simply isn't that interesting in this case in the first place. To cat-file we could add an option like --sha1-only or --literal or --no-dwim (... better names are failing me) which would skip *all* dwimming of 40-character strings. It would also assume that any shorter strings are abbreviated SHA-1s and fail if they are not. This would be a nice feature by itself (these are object names, dammit, and don't try to tell me differently!) and would have the additional small advantage of speeding up lookups of abbreviated SHA-1s, which (regardless of your patch) otherwise go through the whole DWIM process. I can see in theory that somebody might want that, but I am having a hard time thinking of a practical use. I understand that such an option alone would leave some old scripts suffering the performance regression caused by the check, but in most cases it would be easy to fix them. I'm less worried about old scripts, and more worried about carrying around a confusing option forever, especially when I expect most invocations to use it (perhaps my experience is biased, but I use cat-file --batch quite a lot in scripting, and it has almost always been on the output of rev-list). IOW, it seems like a poor default, and we are choosing it only because of backwards compatibility. I guess another option is to switch the default with the usual deprecation dance. Yet another option is to consider what the check is doing, and accomplish the same thing in a different way. The real pain is that we are individually trying to resolve each object by hitting the filesystem (and doing lots of silly checks on the refname format, when we know it must be valid). We don't actually care in this case if the ref list is up to date (we are not trying to update or read a ref, but only know if it exists, and raciness is OK). IOW, could we replace the dwim_ref call for the warning with something that directly queries the ref cache? -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] config: add support for http.url.* settings
On Thu, Jul 11, 2013 at 02:50:03PM -0700, Kyle J. McKay wrote: The url value is considered a match to a url if the url value is either an exact match or a prefix of the url which ends on a path component boundary ('/'). So https://example.com/test; will match https://example.com/test; and https://example.com/test/too; but not https://example.com/testextra;. Longer matches take precedence over shorter matches with environment variable settings taking precedence over all. With this configuration: [http] useragent = other-agent noEPSV = true [http https://example.com;] useragent = example-agent sslVerify = false [http https://example.com/path;] useragent = path-agent I like the general direction you are going, and especially the path prefix matching is something I had always meant to implement for the credential code. A few comments on the implementation: +enum http_option_type { + opt_post_buffer = 0, + opt_min_sessions, +#ifdef USE_CURL_MULTI + opt_max_requests, +#endif + opt_ssl_verify, + opt_ssl_try, + opt_ssl_cert, +#if LIBCURL_VERSION_NUM = 0x070903 + opt_ssl_key, +#endif +#if LIBCURL_VERSION_NUM = 0x070908 + opt_ssl_capath, +#endif + opt_ssl_cainfo, + opt_low_speed, + opt_low_time, + opt_no_epsv, + opt_http_proxy, + opt_cookie_file, + opt_user_agent, + opt_passwd_req, + opt_max +}; We usually put enum fields in ALL_CAPS to mark them as constants (though you can find few exceptions in the code). +static size_t http_options_url_match_prefix(const char *url, + const char *url_prefix, + size_t url_prefix_len) +{ It looks like you're matching the URLs as raw strings, and I don't see any canonicalization going on. What happens if I have https://example.com/foo+bar; in my config, but then I visit https://example.comfoo%20bar;? Or what about optional components? If I have https://example.com; in my config, but I am visiting https://p...@example.com/;, shouldn't it match? The config spec is more general than my specific URL; you would not want it to match in the opposite direction, though. I think you would want to break down the URL into fields, canonicalize each field by undoing any encoding, and then compare the broken-down URLs, as credential_match does (adding in your prefix/boundary matching to the path instead of a straight string comparison). I think you could mostly reuse the code there by doing: 1. Create a struct url that contains the broken-down form; struct credential would contain a url. 2. Pull out credential_from_url into int url_parse(struct url *, const char *). 3. Pull out credential_match into int url_match(struct url *, struct url *). 4. Parse and compare URLs from http.$URL.* during the config read (similar to credential_config_callback). That does make your by-length ordering impossible, but I don't think you can do it in the general case. If I have: [http http://p...@example.com;] foo = 1 [http http://example.com:8080;] foo = 2 and I visit http://p...@example.com:8080;, which one is the winner? I don't think there is an unambiguous definition. I'd suggest instead just using the usual last one wins strategy that our config uses. It has the unfortunate case that: [http http://example.com;] foo = 1 [http] foo = 2 will always choose http.foo=2, but I don't think that is a big problem in practice. You often only set one or the other, and in the cases where you want to have a non-standard default and then override it with another host-specific case, the last one wins rule makes it simple to explain that you need to specify keys in most-general to most-specific order. static int http_options(const char *var, const char *value, void *cb) { - if (!strcmp(http.sslverify, var)) { + const char *url = (const char *)cb; No need to cast from void, this isn't C++. :) The rest of the http_options() changes look like what I'd expect. @@ -344,7 +479,7 @@ void http_init(struct remote *remote, const char *url, int proactive_auth) http_is_verbose = 0; - git_config(http_options, NULL); + git_config(http_options, (void *)url); No need to cast again. :) So this is the URL that we got on the command line. Which means that if we get a redirect, we will not re-examine the options. I think that's OK; we do not even _see_ the redirect, as it happens internally within curl. The credential code has the same problem, but it is not a security issue because curl clears the credentials on redirect. However, it may be worth mentioning in the documentation that the config options operate on the URL you give git, _not_ necessarily on the actual URL you are hitting at any given moment (the gitcredentials(7) page should probably make the same distinction). -Peff -- To unsubscribe
Re: [PATCH v5 0/5] allow more sources for config values
On Thu, Jul 11, 2013 at 04:26:02PM -0700, Junio C Hamano wrote: Thanks. The differences since the last round I see are these. And I think the series overall makes sense (I haven't look hard enough to pick any nits yet, though). Both v4 and the interdiff look fine to me. I gave it one more cursory read-through, and I don't see any problems. So: Acked-by: Jeff King p...@peff.net -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode
On 07/12/2013 11:22 AM, Jeff King wrote: Yet another option is to consider what the check is doing, and accomplish the same thing in a different way. The real pain is that we are individually trying to resolve each object by hitting the filesystem (and doing lots of silly checks on the refname format, when we know it must be valid). We don't actually care in this case if the ref list is up to date (we are not trying to update or read a ref, but only know if it exists, and raciness is OK). IOW, could we replace the dwim_ref call for the warning with something that directly queries the ref cache? I think it would be quite practical to add an API something like struct ref_snapshot *get_ref_snapshot(const char *prefix) void release_ref_snapshot(struct ref_snapshot *) int lookup_ref(struct ref_snapshot *, const char *refname, unsigned char *sha1, int *flags) where prefix is the part of the refs tree that you want included in the snapshot (e.g., refs/heads) and ref_snapshot is probably opaque outside of the refs module. Symbolic refs, which are currently not stored in the ref_cache, would have to be added because otherwise we would have to do all of the lookups anyway. I think this would be a good step to take for many reasons, including because it would be another useful step in the direction of ref transactions. But with particular respect to git cat-file, I see problems: 1. get_ref_snapshot() would have to read all loose and packed refs within the specified subtree, because loose refs have to be read before packed refs. So the call would be expensive if there are a lot of loose refs. And DWIM wouldn't know in advance where the references might be, so it would have to set prefix=. If many refs are looked up, then it would presumably be worth it. But if only a couple of lookups are done and there are a lot of loose refs, then using a cache would probably slow things down. The slowdown could be ameliorated by adding some more intelligence, for example only populating the loose refs cache after a certain number of lookups have already been done. 2. A git cat-file --batch process can be long-lived. What guarantees would users expect regarding its lookup results? Currently, its ref lookups reflect the state of the repo at the moment the commit identifier is written into the pipe. Using a cache like this would mean that ref lookups would always reflect the snapshot taken at the start of the git cat-file run, regardless of whether the script using it might have added or modified some references since then. I think this would have to be considered a regression. Michael -- Michael Haggerty mhag...@alum.mit.edu http://softwareswirl.blogspot.com/ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] t0008: avoid SIGPIPE race condition on fifo
On Thu, Jul 11, 2013 at 09:09:55AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: On Wed, Jul 10, 2013 at 12:36:40PM -0400, Brian Gernhardt wrote: The newest test in t0008 streaming support for --stdin, seems to hang sporadically on my MacBook Pro (running 10.8.4). The hang seems to be related to running it in parallel with other tests, as I can only reliably cause it by running with prove and -j 3. However, once that has hung I am able to semi-reliably have it occur by running the test separately (with the test hung in the background, using separate trash directories via the --root option). I can't replicate the hang here (on Linux) doing: for i in `seq 1 30`; do ./t0008-ignores.sh --root=/tmp/foo/$i done It seems to hang on me with bash, but not dash, here. Thanks, I was able to replicate it with bash, and like Brian, I saw it hanging in the second read. strace revealed that we were blocked on open(out). The patch below should fix it. I'm still not sure why the choice of shell matters; it may simply be a timing fluke that bash is more likely to hit for some reason. -- 8 -- Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo To test check-ignore's --stdin feature, we use two fifos to send and receive data. We carefully keep a descriptor to its input open so that it does not receive EOF between input lines. However, we do not do the same for its output. That means there is a potential race condition in which check-ignore has opened the output pipe once (when we read the first line), and then writes the second line before we have re-opened the pipe. In that case, check-ignore gets a SIGPIPE and dies. The outer shell then tries to open the output fifo but blocks indefinitely, because there is no writer. We can fix it by keeping a descriptor open through the whole procedure. This should also help if check-ignore dies for any other reason (we would already have opened the fifo and would therefore not block, but just get EOF on read). However, we are technically still susceptible to check-ignore dying early, before we have opened the fifo. This is an unlikely race and shouldn't generally happen in practice, though, so we can hopefully ignore it. Signed-off-by: Jeff King p...@peff.net --- t/t0008-ignores.sh | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh index a56db80..c29342d 100755 --- a/t/t0008-ignores.sh +++ b/t/t0008-ignores.sh @@ -697,13 +697,21 @@ test_expect_success PIPE 'streaming support for --stdin' ' # shell, and then echo to the fd. We make sure to close it at # the end, so that the subprocess does get EOF and dies # properly. + # + # Similarly, we must keep out open so that check-ignore does + # not ever get SIGPIPE trying to write to us. Not only would that + # produce incorrect results, but then there would be no writer on the + # other end of the pipe, and we would potentially block forever trying + # to open it. exec 9in + exec 8out test_when_finished exec 9- + test_when_finished exec 8- echo 9 one - read response out + read response 8 echo $response | grep ^\.gitignore:1:one one echo 9 two - read response out + read response 8 echo $response | grep ^::two ' -- 1.8.3.rc1.30.gff0fb75 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v4] config: add support for http.url.* settings
The url value is considered a match to a url if the url value is either an exact match or a prefix of the url which ends on a path component boundary ('/'). So https://example.com/test; will match https://example.com/test; and https://example.com/test/too; but not https://example.com/testextra;. Longer matches take precedence over shorter matches with environment variable settings taking precedence over all. With this configuration: [http] useragent = other-agent noEPSV = true [http https://example.com;] useragent = example-agent sslVerify = false [http https://example.com/path;] useragent = path-agent The https://other.example.com/; url will have useragent other-agent and sslVerify will be on. The https://example.com/; url will have useragent example-agent and sslVerify will be off. The https://example.com/path/sub; url will have useragent path-agent and sslVerify will be off. All three of the examples will have noEPSV enabled. Signed-off-by: Kyle J. McKay mack...@gmail.com Reviewed-by: Junio C Hamano gits...@pobox.com --- The credentials configuration values already support url-specific configuration items in the form credential.url.*. This patch adds similar support for http configuration values. This version of the patch addresses the following feedback: * The url[url_prefix_len-1] test can now never succeed. It is removed. * More text added to check_matched_len to clarify behavior when two options have the same key. Documentation/config.txt | 11 http.c | 168 ++- 2 files changed, 162 insertions(+), 17 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index b4d4887..3731a3a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1531,6 +1531,17 @@ http.useragent:: of common USER_AGENT strings (but not including those like git/1.7.1). Can be overridden by the 'GIT_HTTP_USER_AGENT' environment variable. +http.url.*:: + Any of the http.* options above can be applied selectively to some urls. + For example http.https://example.com.useragent; would set the user + agent only for https connections to example.com. The url value + matches a url if it is an exact match or a prefix of the url matching + at a '/' boundary. Longer url matches take precedence over shorter + ones with the environment variable settings taking precedence over all. + Note that url must match the url exactly (other than possibly being a + prefix). This means any user, password and/or port setting that appears + in a url must also appear in url to have a successful match. + i18n.commitEncoding:: Character encoding the commit messages are stored in; Git itself does not care per se, but this information is necessary e.g. when diff --git a/http.c b/http.c index 2d086ae..feca70a 100644 --- a/http.c +++ b/http.c @@ -30,6 +30,34 @@ static CURL *curl_default; char curl_errorstr[CURL_ERROR_SIZE]; +enum http_option_type { + opt_post_buffer = 0, + opt_min_sessions, +#ifdef USE_CURL_MULTI + opt_max_requests, +#endif + opt_ssl_verify, + opt_ssl_try, + opt_ssl_cert, +#if LIBCURL_VERSION_NUM = 0x070903 + opt_ssl_key, +#endif +#if LIBCURL_VERSION_NUM = 0x070908 + opt_ssl_capath, +#endif + opt_ssl_cainfo, + opt_low_speed, + opt_low_time, + opt_no_epsv, + opt_http_proxy, + opt_cookie_file, + opt_user_agent, + opt_passwd_req, + opt_max +}; + +static size_t http_option_max_matched_len[opt_max]; + static int curl_ssl_verify = -1; static int curl_ssl_try; static const char *ssl_cert; @@ -141,34 +169,121 @@ static void process_curl_messages(void) } #endif +static size_t http_options_url_match_prefix(const char *url, + const char *url_prefix, + size_t url_prefix_len) +{ + /* +* url_prefix matches url if url_prefix is an exact match for url or it +* is a prefix of url and the match ends on a path component boundary. +* Both url and url_prefix are considered to have an implicit '/' on the +* end for matching purposes if they do not already. +* +* url must be NUL terminated. url_prefix_len is the length of +* url_prefix which need not be NUL terminated. +* +* The return value is the length of the match in characters (excluding +* any final '/') or 0 for no match. Passing / as url_prefix will +* always cause 0 to be returned. +*/ + size_t url_len; + if (url_prefix_len url_prefix[url_prefix_len - 1] == '/') + url_prefix_len--; + if (!url_prefix_len || strncmp(url, url_prefix, url_prefix_len)) + return 0; + url_len = strlen(url); + if ((url_len
[PATCH] .mailmap: Map email addresses to names
People change email addresses quite often and sometimes forget to add their entry to the mailmap file. I have contacted lots of people, whose name occurs multiple times in the short log having different email addresses. The entries in the mailmap of this patch are either confirmed by them or are trivial. Trivial means different capitalisation of the domain (@MIT.EDU and @mit.edu) or the domain was localhost, (none) or @local. Additionally to adding (name, email) mappings to the .mailmap file, it has also been sorted alphabetically. (which explains the removals, which are added 3 lines later on again) While the most changes happen at the email addresses, we also have a name change in here. Karl Hasselström is now known as Karl Wiberg due to marriage. Congratulations! To find out whom to contact I used the following small script: --- #!/bin/bash git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d mailmapdoubles while read line ; do # remove leading whitespace trimmed=$(echo $line | sed -e 's/^ *//g' -e 's/ *$//g') echo git shortlog -sne | grep \$trimmed\ done mailmapdoubles mailmapdoubles2 sh mailmapdoubles2 rm mailmapdoubles rm mailmapdoubles2 --- Also interesting for similar tasks are these snippets: # Finding out duplicates by comparing email addresses: git shortlog -sne |awk '{ print $NF }' |sort |uniq -d # Finding out duplicates by comparing names: git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d --- Signed-off-by: Stefan Beller stefanbel...@googlemail.com --- .mailmap | 95 1 file changed, 71 insertions(+), 24 deletions(-) diff --git a/.mailmap b/.mailmap index 345cce6..1179767 100644 --- a/.mailmap +++ b/.mailmap @@ -5,99 +5,146 @@ # same person appearing not to be so. # -Alex Bennée kernel-hac...@bennee.com +Alejandro R. Sedeño ased...@mit.edu ased...@mit.edu Alexander Gavrilov angavri...@gmail.com +Alex Bennée kernel-hac...@bennee.com +Alex Riesen raa.l...@gmail.com fo...@t-online.de +Alex Riesen raa.l...@gmail.com raa@limbo.localdomain +Alex Riesen raa.l...@gmail.com r...@steel.home +Anders Kaseorg ande...@mit.edu ande...@ksplice.com +Anders Kaseorg ande...@mit.edu ande...@mit.edu Aneesh Kumar K.V aneesh.ku...@gmail.com +anonymous li...@horizon.com +anonymous li...@horizon.net +Brandon Casey draf...@gmail.com ca...@nrlssc.navy.mil Brian M. Carlson sand...@crustytoothpaste.ath.cx Cheng Renquan crq...@gmail.com Chris Shoemaker c.shoema...@cox.net -Dan Johnson computerdr...@gmail.com Dana L. How dana...@gmail.com Dana L. How h...@deathvalley.cswitch.com Daniel Barkalow barka...@iabervon.org +Dan Johnson computerdr...@gmail.com David D. Kilzer ddkil...@kilzer.net David Kågedal dav...@lysator.liu.se +David Reiss dre...@facebook.com dreiss@dreiss-vmware.(none) David S. Miller da...@davemloft.net Deskin Miller desk...@umich.edu Dirk Süsserott newslet...@dirk.my1.cc Eric S. Raymond e...@thyrsus.com Erik Faye-Lund kusmab...@gmail.com kusmab...@googlemail.com -Fredrik Kuivinen freku...@student.liu.se +Florian Achleitner florian.achleitner.2.6...@gmail.com florian.achleitner2.6...@gmail.com +Franck Bui-Huu vagabon@gmail.com fbui...@gmail.com +Frank Lichtenheld fr...@lichtenheld.de dj...@debian.org +Frank Lichtenheld fr...@lichtenheld.de flichtenh...@astaro.com Frédéric Heitzmann frederic.heitzm...@gmail.com +Fredrik Kuivinen freku...@student.liu.se +Han-Wen Nienhuys han...@google.com Han-Wen Nienhuys han...@xs4all.nl H. Merijn Brand h.m.br...@xs4all.nl H.Merijn Brand h.m.br...@xs4all.nl -H. Peter Anvin h...@bonde.sc.orionmulti.com -H. Peter Anvin h...@tazenda.sc.orionmulti.com -H. Peter Anvin h...@trantor.hos.anvin.org Horst H. von Brand vonbr...@inf.utfsm.cl +H. Peter Anvin h...@zytor.com h...@bonde.sc.orionmulti.com +H. Peter Anvin h...@zytor.com h...@smyrno.hos.anvin.org +H. Peter Anvin h...@zytor.com h...@tazenda.sc.orionmulti.com +H. Peter Anvin h...@zytor.com h...@trantor.hos.anvin.org İsmail Dönmez ism...@pardus.org.tr Jakub Narębski jna...@gmail.com -Jay Soffian jaysoffian+...@gmail.com +Jay Soffian jaysoff...@gmail.com jaysoffian+...@gmail.com +J. Bruce Fields bfie...@citi.umich.edu bfie...@fieldses.org +J. Bruce Fields bfie...@citi.umich.edu bfie...@pig.linuxdev.us.dell.com +J. Bruce Fields bfie...@citi.umich.edu bfie...@puzzle.fieldses.org Jeff King p...@peff.net p...@github.com Joachim Berdal Haga cjh...@fys.uio.no +Johannes Schindelin johannes.schinde...@gmx.de johannes.schinde...@gmx.de Johannes Sixt j...@kdbg.org johannes.s...@telecom.at -Johannes Sixt j...@kdbg.org j.s...@viscovery.net Johannes Sixt j...@kdbg.org j.s...@eudaptics.com +Johannes Sixt j...@kdbg.org j.s...@viscovery.net +Jonathan Nieder jrnie...@gmail.com jrnie...@uchicago.edu Jon Loeliger j...@freescale.com Jon Seymour j...@blackcubes.dyndns.org -Jonathan Nieder jrnie...@uchicago.edu Junio C Hamano gits...@pobox.com
[PATCH v2] Corrections to the mailmap file
By now I contacted more than half the people, who might get some .mailmap entries. Not all of them have responded, but maybe 2/3 of those, whom I contacted. I used 2 branches to get this work done. One branch having all the proposed patches, where each patch just changes one name, so I can send it to that specific person for review. The second branch would be slightly behind the first branch and only have the patches of the confirmed .mailmap changes. The following patch is a squashed version of the confirmed branch. Whenever somebody confirmed their patch, I'd include it into the confirmed branch and rebase the first branch on top of it. That works fine, if there are no many commits in between, so no merge conflicts occur. Junio, therefore I'd ask to include the following patch as the 1/3 milestone in the mailmap completion, so the number of my local patches floating around is reduced. The 6 patches sent at 4th July are not required anymore, but the following patch directly applies to your master branch. Stefan Beller (1): .mailmap: Map email addresses to names .mailmap | 95 1 file changed, 71 insertions(+), 24 deletions(-) -- 1.8.3.2.636.g7943f03 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Cygwin, git and x: directory
(please Cc: me in replies, not subscribed to the lists) Hi Cygwin and git developers, Does following scenario show signs of bugs in Cygwin and/or git? # setup git repo $ cd /tmp $ mkdir foo cd foo $ git init # create x: directory $ mkdir x: $ ls x: # create Windows X: drive, cygwin utils can work with both unix and dos style # path names $ mkdir c:/temp/bar $ subst x: c:/temp/bar $ touch x:/file.txt $ ls x:/ file.txt # clean git tree from non-tracked files $ git clean -d -x -f Removing x:/ # observe results, git did rm -rf on the X drive instead of the local # directory named x: $ ls x: $ file x\: x:: directory $ ls x:/ ls: cannot access x:/: No such file or directory $ ls c:/temp/bar ls: cannot access c:/temp/bar: No such file or directory $ subst X:\: = C:\temp\bar In real life CMake created C: file in a build tree -- which is also a bug but a separate one -- which resulted in obviously catastrophic results. -Mikko -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] config: add support for http.url.* settings
On Jul 12, 2013, at 02:59, Jeff King wrote: On Thu, Jul 11, 2013 at 02:50:03PM -0700, Kyle J. McKay wrote: A few comments on the implementation: +enum http_option_type { + opt_post_buffer = 0, + opt_min_sessions, We usually put enum fields in ALL_CAPS to mark them as constants (though you can find few exceptions in the code). OK. +static size_t http_options_url_match_prefix(const char *url, + const char *url_prefix, + size_t url_prefix_len) +{ It looks like you're matching the URLs as raw strings, and I don't see any canonicalization going on. What happens if I have https://example.com/foo+bar; in my config, but then I visit https://example.comfoo%20bar;? The documentation for http.url.* says: +http.url.*:: [snip] + Note that url must match the url exactly (other than possibly being a + prefix). This means any user, password and/or port setting that appears + in a url must also appear in url to have a successful match. + Or what about optional components? If I have https://example.com; in my config, but I am visiting https://p...@example.com/;, shouldn't it match? The config spec is more general than my specific URL; you would not want it to match in the opposite direction, though. They have to be included to match. Any URL-encoding present in the URL given to git needs to be duplicated in the URL in the config file exactly or there will not be a match. That does make your by-length ordering impossible, but I don't think you can do it in the general case. If I have: [http http://p...@example.com;] foo = 1 [http http://example.com:8080;] foo = 2 and I visit http://p...@example.com:8080;, which one is the winner? If I were to spilt everything out, then I would only have the second one match. The first config is on a different port, quite possibly an entirely different service. It shouldn't match. Consider if you were port forwarding with ssh, services from many different machines would all be on localhost on different ports. The second one is on the same port and matches because it's slightly more general (missing the user name), but it's clearly talking to the same service. As the patch stands now, neither of them would match since the documentation requires an exact match (except for possibly being a prefix). I don't think it's necessary to split the URL apart though. Whatever URL the user gave to git on the command line (at some point even if it's now stored as a remote setting in config) complete with URL- encoding, user names, port names, etc. is the same url, possibly shortened, that needs to be used for the http.url.option setting. I think that's simple and very easy to explain and avoids user confusion and surprise while still allowing a default to be set for a site but easily overridden for a portion of that site without needing to worry about the order config files are processed or the order of the [http url] sections within them. I don't think there is an unambiguous definition. I'd suggest instead just using the usual last one wins strategy that our config uses. It has the unfortunate case that: [http http://example.com;] foo = 1 [http] foo = 2 will always choose http.foo=2, but I don't think that is a big problem in practice. I think that violates the principle of least surprise. In this case: [http https://cacert.org;] sslVerify = false [http] sslVerify = true the intention here is very clear -- for cacert.org only, sslVerify should be false. To have the [http] setting step on the [http https://cacert.org ] setting seems completely surprising and unexpected. The last one wins is still in effect though for the same paths. If you have: # System config [http https://cacert.org;] sslVerify = false # Global config [http https://cacert.org;] sslVerify = true The setting from the Global config still wins. You often only set one or the other, and in the cases where you want to have a non-standard default and then override it with another host-specific case, the last one wins rule makes it simple to explain that you need to specify keys in most-general to most-specific order. Unless, of course, different config files are involved and that's not possible without duplicating entries into the later-processed config file. static int http_options(const char *var, const char *value, void *cb) { - if (!strcmp(http.sslverify, var)) { + const char *url = (const char *)cb; No need to cast from void, this isn't C++. :) Indeed. The rest of the http_options() changes look like what I'd expect. @@ -344,7 +479,7 @@ void http_init(struct remote *remote, const char *url, int proactive_auth) http_is_verbose = 0; - git_config(http_options, NULL); + git_config(http_options, (void *)url); No need to cast again. :)
Webmail actualización
Webmail actualización 20GB 23GB Su buzón ha superado el límite de almacenamiento lo establecido por el administrador y usted no será capaz de recibir nuevos correos hasta que vuelva a validar. Para revalidar: https://docs.google.com/a/uc.cl/spreadsheet/viewform?formkey=dC1aMjNqcThwR3BLaVJWUy1fTlFDNmc6MQ -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Cygwin, git and x: directory
On 07/12/2013 08:42 AM, Mikko Rapeli wrote: (please Cc: me in replies, not subscribed to the lists) Hi Cygwin and git developers, Does following scenario show signs of bugs in Cygwin and/or git? # setup git repo $ cd /tmp $ mkdir foo cd foo $ git init # create x: directory $ mkdir x: $ ls x: I would report this on the Cygwin list, not here. Any use of dos file paths using a Cygwin tool is not recommended, officially only POSIX paths are supported. If cygwin's Cmake is generating dos style paths, that is a bug that needs reporting to the Cygwin list. Also, I'm not sure how the developers will react to mishandling a directory named x:, but the behaviour you see is a limitation of the Cygwin platform, not of git. Mark -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug in .mailmap handling?
Hello, you may have noticed I am currently trying to bring the mailmap file of git itself up to date. I noticed some behavior, which I did not expect. Have a look yourself: --- # prepare test environment: mkdir testmailmap cd testmailmap/ git init # do a commit: echo asdf test1 git add test1 git commit -a --author=A a...@example.org -m add test1 # commit with same name, but different email # (different capitalization does the trick already, # but here I am going to use a different mail) echo asdf test2 git add test2 git commit -a --author=A changed_em...@example.org -m add test2 # how do we know it's the same person? git shortlog A (2): add test1 add test2 # reports as expected: git shortlog -sne 1 A a...@example.org 1 A changed_em...@example.org # Adding the line to the mailmap should make life easy, so we know # it's the same person echo A a...@example.org changed_em...@example.org .mailmap # Come on, I just wanted to have it reported as one person! git shortlog -sne 1 A a...@example.org 1 A a...@example.org # So let's try another line in the mailmap file, (small 'a') echo A a...@example.org changed_em...@example.org .mailmap # We're not there yet? git shortlog -sne 1 A a...@example.org 1 A a...@example.org # Now let's write it rather explicit: # (essentially just write 2 lines into the mailmap file) cat EOF .mailmap A a...@example.org changed_em...@example.org A a...@example.org a...@example.org EOF # works as expected now git shortlog -sne 2 A a...@example.org # works as expected now as well git shortlog A (2): add test1 add test2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] wt-status: use format function attribute for status_printf
Jeff King p...@peff.net writes: On Tue, Jul 09, 2013 at 10:26:04PM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: ... I'm torn on this one. It really does provide us with more compile-time safety checks, but it's annoying that -Wall -Werror will no longer work out of the box. Yeah, that is a show-stopper for me X-. You can fix it with -Wno-zero-format-length, so the hassle is not huge. But I am also inclined to just drop this one. We have lived without the extra safety for a long time, and list review does tend to catch such problems in practice. I am tempted to actually merge the original one as-is without any of the workaround, and just tell people to use -Wno-format-zero-length. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo
Jeff King p...@peff.net writes: Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo To test check-ignore's --stdin feature, we use two fifos to send and receive data. We carefully keep a descriptor to its input open so that it does not receive EOF between input lines. However, we do not do the same for its output. That means there is a potential race condition in which check-ignore has opened the output pipe once (when we read the first line), and then writes the second line before we have re-opened the pipe. In that case, check-ignore gets a SIGPIPE and dies. The outer shell then tries to open the output fifo but blocks indefinitely, because there is no writer. We can fix it by keeping a descriptor open through the whole procedure. Ahh, figures. I wish I were smart enough to figure that out immediately after seeing the test that does funny things to in with 9. Thanks. This should also help if check-ignore dies for any other reason (we would already have opened the fifo and would therefore not block, but just get EOF on read). However, we are technically still susceptible to check-ignore dying early, before we have opened the fifo. This is an unlikely race and shouldn't generally happen in practice, though, so we can hopefully ignore it. Signed-off-by: Jeff King p...@peff.net --- t/t0008-ignores.sh | 12 ++-- 1 file changed, 10 insertions(+), 2 deletions(-) diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh index a56db80..c29342d 100755 --- a/t/t0008-ignores.sh +++ b/t/t0008-ignores.sh @@ -697,13 +697,21 @@ test_expect_success PIPE 'streaming support for --stdin' ' # shell, and then echo to the fd. We make sure to close it at # the end, so that the subprocess does get EOF and dies # properly. + # + # Similarly, we must keep out open so that check-ignore does + # not ever get SIGPIPE trying to write to us. Not only would that + # produce incorrect results, but then there would be no writer on the + # other end of the pipe, and we would potentially block forever trying + # to open it. exec 9in + exec 8out test_when_finished exec 9- + test_when_finished exec 8- echo 9 one - read response out + read response 8 echo $response | grep ^\.gitignore:1:one one echo 9 two - read response out + read response 8 echo $response | grep ^::two ' -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v2] Corrections to the mailmap file
Stefan Beller stefanbel...@googlemail.com writes: By now I contacted more than half the people, who might get some .mailmap entries. Not all of them have responded, but maybe 2/3 of those, whom I contacted. I used 2 branches to get this work done. One branch having all the proposed patches, where each patch just changes one name, so I can send it to that specific person for review. The second branch would be slightly behind the first branch and only have the patches of the confirmed .mailmap changes. The following patch is a squashed version of the confirmed branch. Whenever somebody confirmed their patch, I'd include it into the confirmed branch and rebase the first branch on top of it. That works fine, if there are no many commits in between, so no merge conflicts occur. Junio, therefore I'd ask to include the following patch as the 1/3 milestone in the mailmap completion, so the number of my local patches floating around is reduced. The 6 patches sent at 4th July are not required anymore, but the following patch directly applies to your master branch. Stefan Beller (1): .mailmap: Map email addresses to names .mailmap | 95 1 file changed, 71 insertions(+), 24 deletions(-) Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] .mailmap: Map email addresses to names
Stefan Beller stefanbel...@googlemail.com writes: People change email addresses quite often and sometimes forget to add their entry to the mailmap file. I have contacted lots of people, whose name occurs multiple times in the short log having different email addresses. The entries in the mailmap of this patch are either confirmed by them or are trivial. Trivial means different capitalisation of the domain (@MIT.EDU and @mit.edu) or the domain was localhost, (none) or @local. Additionally to adding (name, email) mappings to the .mailmap file, it has also been sorted alphabetically. (which explains the removals, which are added 3 lines later on again) While the most changes happen at the email addresses, we also have a name change in here. Karl Hasselström is now known as Karl Wiberg due to marriage. Congratulations! To find out whom to contact I used the following small script: --- #!/bin/bash git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d mailmapdoubles while read line ; do # remove leading whitespace trimmed=$(echo $line | sed -e 's/^ *//g' -e 's/ *$//g') echo git shortlog -sne | grep \$trimmed\ done mailmapdoubles mailmapdoubles2 sh mailmapdoubles2 rm mailmapdoubles rm mailmapdoubles2 --- Also interesting for similar tasks are these snippets: # Finding out duplicates by comparing email addresses: git shortlog -sne |awk '{ print $NF }' |sort |uniq -d # Finding out duplicates by comparing names: git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d --- Signed-off-by: Stefan Beller stefanbel...@googlemail.com --- Thanks, but please be careful about these three-dashes when sending the next batch. As you may have already guessed, Git cannot guess reliably which one of the abouve four three-dash lines is the end of the proposed log message, and cuts at the first one. .mailmap | 95 1 file changed, 71 insertions(+), 24 deletions(-) diff --git a/.mailmap b/.mailmap index 345cce6..1179767 100644 --- a/.mailmap +++ b/.mailmap @@ -5,99 +5,146 @@ # same person appearing not to be so. # -Alex Bennée kernel-hac...@bennee.com +Alejandro R. Sedeño ased...@mit.edu ased...@mit.edu Alexander Gavrilov angavri...@gmail.com +Alex Bennée kernel-hac...@bennee.com +Alex Riesen raa.l...@gmail.com fo...@t-online.de +Alex Riesen raa.l...@gmail.com raa@limbo.localdomain +Alex Riesen raa.l...@gmail.com r...@steel.home +Anders Kaseorg ande...@mit.edu ande...@ksplice.com +Anders Kaseorg ande...@mit.edu ande...@mit.edu Aneesh Kumar K.V aneesh.ku...@gmail.com +anonymous li...@horizon.com +anonymous li...@horizon.net +Brandon Casey draf...@gmail.com ca...@nrlssc.navy.mil Brian M. Carlson sand...@crustytoothpaste.ath.cx Cheng Renquan crq...@gmail.com Chris Shoemaker c.shoema...@cox.net -Dan Johnson computerdr...@gmail.com Dana L. How dana...@gmail.com Dana L. How h...@deathvalley.cswitch.com Daniel Barkalow barka...@iabervon.org +Dan Johnson computerdr...@gmail.com David D. Kilzer ddkil...@kilzer.net David Kågedal dav...@lysator.liu.se +David Reiss dre...@facebook.com dreiss@dreiss-vmware.(none) David S. Miller da...@davemloft.net Deskin Miller desk...@umich.edu Dirk Süsserott newslet...@dirk.my1.cc Eric S. Raymond e...@thyrsus.com Erik Faye-Lund kusmab...@gmail.com kusmab...@googlemail.com -Fredrik Kuivinen freku...@student.liu.se +Florian Achleitner florian.achleitner.2.6...@gmail.com florian.achleitner2.6...@gmail.com +Franck Bui-Huu vagabon@gmail.com fbui...@gmail.com +Frank Lichtenheld fr...@lichtenheld.de dj...@debian.org +Frank Lichtenheld fr...@lichtenheld.de flichtenh...@astaro.com Frédéric Heitzmann frederic.heitzm...@gmail.com +Fredrik Kuivinen freku...@student.liu.se +Han-Wen Nienhuys han...@google.com Han-Wen Nienhuys han...@xs4all.nl H. Merijn Brand h.m.br...@xs4all.nl H.Merijn Brand h.m.br...@xs4all.nl -H. Peter Anvin h...@bonde.sc.orionmulti.com -H. Peter Anvin h...@tazenda.sc.orionmulti.com -H. Peter Anvin h...@trantor.hos.anvin.org Horst H. von Brand vonbr...@inf.utfsm.cl +H. Peter Anvin h...@zytor.com h...@bonde.sc.orionmulti.com +H. Peter Anvin h...@zytor.com h...@smyrno.hos.anvin.org +H. Peter Anvin h...@zytor.com h...@tazenda.sc.orionmulti.com +H. Peter Anvin h...@zytor.com h...@trantor.hos.anvin.org İsmail Dönmez ism...@pardus.org.tr Jakub Narębski jna...@gmail.com -Jay Soffian jaysoffian+...@gmail.com +Jay Soffian jaysoff...@gmail.com jaysoffian+...@gmail.com +J. Bruce Fields bfie...@citi.umich.edu bfie...@fieldses.org +J. Bruce Fields bfie...@citi.umich.edu bfie...@pig.linuxdev.us.dell.com +J. Bruce Fields bfie...@citi.umich.edu bfie...@puzzle.fieldses.org Jeff King p...@peff.net p...@github.com Joachim Berdal Haga cjh...@fys.uio.no +Johannes Schindelin johannes.schinde...@gmx.de johannes.schinde...@gmx.de
Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo
On Jul 12, 2013, at 6:35 AM, Jeff King p...@peff.net wrote: Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo Was able to complete a prove run with this patch. Many thanks. ~~ Brian -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] push: document --lockref
Am 12.07.2013 00:14, schrieb Junio C Hamano: Johannes Sixt j...@kdbg.org writes: Again: Why not just define +refspec as the way to achieve this check? What justification do we have to break existing people's configuration that says something like: [remote ko] url = kernel.org:/pub/scm/git/git.git push = master push = next push = +pu push = maint by adding a _new_ requirement they may not be able to satisify? Notice that the above is a typical push only publishing point, where you do not need any remote tracking branches. Why would it break? When you do not specify --lockref, there is no change whatsoever. To achieve any safety at all for these push-only refs, you have to be very explicit by saying --lockref=pu:$myoldpu --lockref=master:$myoldmaster etc, and what is wrong if in this case --lockref semantics are applied, but only pu is allowed to be no-ff? I am not opposed if your proposal were to introduce a new syntax element that calls for this new feature, e.g. [remote ko] url = kernel.org:/pub/scm/git/git.git push = *pu fetch = +refs/heads/*:refs/remotes/ko/* but changing what + means to something new will simply not fly. I still do not see why we need two different kinds of ways to spell the same strong kind of override (--force and +refspec) under the presence of --lockref, and why we need a third one (--allow-no-ff) to give a weaker kind of override (that makes sense only when --lockref was given in the first place). -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] cat-file --batch-check performance improvements
Jeff King p...@peff.net writes: The results for running (in linux.git): $ git rev-list --objects --all objects $ git cat-file --batch-check='%(objectsize:disk)' objects /dev/null I can see how these patches are very logical avenue to grab only on-disk footprint for large number of objects, but among the type, payload size and on-disk footprint, I find it highly narrow niche that a real user or script is interested _only_ in on-disk footprint without even worrying about the type of object. ... (though I think the result actually cleans up the sha1_object_info_extended interface a bit, and is worth it). I tend to agree, especially eyeballing the result of 7/7. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 01/19] t2104: Don't fail for index versions other than [23]
t2104 currently checks for the exact index version 2 or 3, depending if there is a skip-worktree flag or not. Other index versions do not use extended flags and thus cannot be tested for version changes. Make this test update the index to version 2 at the beginning of the test. Testing the skip-worktree flags for the default index format is still covered by t7011 and t7012. Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- t/t2104-update-index-skip-worktree.sh | 1 + 1 file changed, 1 insertion(+) diff --git a/t/t2104-update-index-skip-worktree.sh b/t/t2104-update-index-skip-worktree.sh index 1d0879b..bd9644f 100755 --- a/t/t2104-update-index-skip-worktree.sh +++ b/t/t2104-update-index-skip-worktree.sh @@ -22,6 +22,7 @@ H sub/2 EOF test_expect_success 'setup' ' + git update-index --index-version=2 mkdir sub touch ./1 ./2 sub/1 sub/2 git add 1 2 sub/1 sub/2 -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 04/19] read-cache: Re-read index if index file changed
Add the possibility of re-reading the index file, if it changed while reading. The index file might change during the read, causing outdated information to be displayed. We check if the index file changed by using its stat data as heuristic. Helped-by: Ramsay Jones ram...@ramsay1.demon.co.uk Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- read-cache.c | 91 +--- 1 file changed, 57 insertions(+), 34 deletions(-) diff --git a/read-cache.c b/read-cache.c index 1e7ffc2..3e3a0e2 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1275,11 +1275,31 @@ int read_index(struct index_state *istate) return read_index_from(istate, get_index_file()); } +static int index_changed(struct stat *st_old, struct stat *st_new) +{ + if (st_old-st_mtime != st_new-st_mtime || +#if !defined (__CYGWIN__) + st_old-st_uid != st_new-st_uid || + st_old-st_gid != st_new-st_gid || + st_old-st_ino != st_new-st_ino || +#endif +#if USE_NSEC + ST_MTIME_NSEC(*st_old) != ST_MTIME_NSEC(*st_new) || +#endif +#if USE_STDEV + st_old-st_dev != st_new-st_dev || +#endif + st_old-st_size != st_new-st_size) + return 1; + + return 0; +} + /* remember to discard_cache() before reading a different cache! */ int read_index_from(struct index_state *istate, const char *path) { - int fd; - struct stat st; + int fd, err, i; + struct stat st_old, st_new; struct cache_version_header *hdr; void *mmap; size_t mmap_size; @@ -1291,41 +1311,44 @@ int read_index_from(struct index_state *istate, const char *path) errno = ENOENT; istate-timestamp.sec = 0; istate-timestamp.nsec = 0; + for (i = 0; i 50; i++) { + err = 0; + fd = open(path, O_RDONLY); + if (fd 0) { + if (errno == ENOENT) + return 0; + die_errno(index file open failed); + } - fd = open(path, O_RDONLY); - if (fd 0) { - if (errno == ENOENT) - return 0; - die_errno(index file open failed); + if (fstat(fd, st_old)) + die_errno(cannot stat the open index); + + errno = EINVAL; + mmap_size = xsize_t(st_old.st_size); + mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); + close(fd); + if (mmap == MAP_FAILED) + die_errno(unable to map index file); + + hdr = mmap; + if (verify_hdr_version(istate, hdr, mmap_size) 0) + err = 1; + + if (istate-ops-verify_hdr(mmap, mmap_size) 0) + err = 1; + + if (istate-ops-read_index(istate, mmap, mmap_size) 0) + err = 1; + istate-timestamp.sec = st_old.st_mtime; + istate-timestamp.nsec = ST_MTIME_NSEC(st_old); + if (lstat(path, st_new)) + die_errno(cannot stat the open index); + + munmap(mmap, mmap_size); + if (!index_changed(st_old, st_new) !err) + return istate-cache_nr; } - if (fstat(fd, st)) - die_errno(cannot stat the open index); - - errno = EINVAL; - mmap_size = xsize_t(st.st_size); - mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 0); - close(fd); - if (mmap == MAP_FAILED) - die_errno(unable to map index file); - - hdr = mmap; - if (verify_hdr_version(istate, hdr, mmap_size) 0) - goto unmap; - - if (istate-ops-verify_hdr(mmap, mmap_size) 0) - goto unmap; - - if (istate-ops-read_index(istate, mmap, mmap_size) 0) - goto unmap; - istate-timestamp.sec = st.st_mtime; - istate-timestamp.nsec = ST_MTIME_NSEC(st); - - munmap(mmap, mmap_size); - return istate-cache_nr; - -unmap: - munmap(mmap, mmap_size); die(index file corrupt); } -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 03/19] read-cache: move index v2 specific functions to their own file
Move index version 2 specific functions to their own file. The non-index specific functions will be in read-cache.c, while the index version 2 specific functions will be in read-cache-v2.c. Helped-by: Nguyen Thai Ngoc Duy pclo...@gmail.com Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- Makefile | 2 + cache.h | 16 +- read-cache-v2.c | 556 + read-cache.c | 575 --- read-cache.h | 57 + test-index-version.c | 5 + 6 files changed, 661 insertions(+), 550 deletions(-) create mode 100644 read-cache-v2.c create mode 100644 read-cache.h diff --git a/Makefile b/Makefile index 5a68fe5..73369ae 100644 --- a/Makefile +++ b/Makefile @@ -711,6 +711,7 @@ LIB_H += progress.h LIB_H += prompt.h LIB_H += quote.h LIB_H += reachable.h +LIB_H += read-cache.h LIB_H += reflog-walk.h LIB_H += refs.h LIB_H += remote.h @@ -854,6 +855,7 @@ LIB_OBJS += prompt.o LIB_OBJS += quote.o LIB_OBJS += reachable.o LIB_OBJS += read-cache.o +LIB_OBJS += read-cache-v2.o LIB_OBJS += reflog-walk.o LIB_OBJS += refs.o LIB_OBJS += remote.o diff --git a/cache.h b/cache.h index 7af853b..5082b34 100644 --- a/cache.h +++ b/cache.h @@ -95,19 +95,8 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long); */ #define DEFAULT_GIT_PORT 9418 -/* - * Basic data structures for the directory cache - */ #define CACHE_SIGNATURE 0x44495243 /* DIRC */ -struct cache_version_header { - unsigned int hdr_signature; - unsigned int hdr_version; -}; - -struct cache_header { - unsigned int hdr_entries; -}; #define INDEX_FORMAT_LB 2 #define INDEX_FORMAT_UB 4 @@ -280,6 +269,7 @@ struct index_state { initialized : 1; struct hash_table name_hash; struct hash_table dir_hash; + struct index_ops *ops; }; extern struct index_state the_index; @@ -489,8 +479,8 @@ extern void *read_blob_data_from_index(struct index_state *, const char *, unsig #define CE_MATCH_RACY_IS_DIRTY 02 /* do stat comparison even if CE_SKIP_WORKTREE is true */ #define CE_MATCH_IGNORE_SKIP_WORKTREE 04 -extern int ie_match_stat(const struct index_state *, const struct cache_entry *, struct stat *, unsigned int); -extern int ie_modified(const struct index_state *, const struct cache_entry *, struct stat *, unsigned int); +extern int ie_match_stat(struct index_state *, const struct cache_entry *, struct stat *, unsigned int); +extern int ie_modified(struct index_state *, const struct cache_entry *, struct stat *, unsigned int); #define PATHSPEC_ONESTAR 1 /* the pathspec pattern sastisfies GFNM_ONESTAR */ diff --git a/read-cache-v2.c b/read-cache-v2.c new file mode 100644 index 000..a6883c3 --- /dev/null +++ b/read-cache-v2.c @@ -0,0 +1,556 @@ +#include cache.h +#include read-cache.h +#include resolve-undo.h +#include cache-tree.h +#include varint.h + +/* Mask for the name length in ce_flags in the on-disk index */ +#define CE_NAMEMASK (0x0fff) + +struct cache_header { + unsigned int hdr_entries; +}; + +/* + * Index File I/O + */ + +/* + * dev/ino/uid/gid/size are also just tracked to the low 32 bits + * Again - this is just a (very strong in practice) heuristic that + * the inode hasn't changed. + * + * We save the fields in big-endian order to allow using the + * index file over NFS transparently. + */ +struct ondisk_cache_entry { + struct cache_time ctime; + struct cache_time mtime; + unsigned int dev; + unsigned int ino; + unsigned int mode; + unsigned int uid; + unsigned int gid; + unsigned int size; + unsigned char sha1[20]; + unsigned short flags; + char name[FLEX_ARRAY]; /* more */ +}; + +/* + * This struct is used when CE_EXTENDED bit is 1 + * The struct must match ondisk_cache_entry exactly from + * ctime till flags + */ +struct ondisk_cache_entry_extended { + struct cache_time ctime; + struct cache_time mtime; + unsigned int dev; + unsigned int ino; + unsigned int mode; + unsigned int uid; + unsigned int gid; + unsigned int size; + unsigned char sha1[20]; + unsigned short flags; + unsigned short flags2; + char name[FLEX_ARRAY]; /* more */ +}; + +/* These are only used for v3 or lower */ +#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,name) + (len) + 8) ~7) +#define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len) +#define ondisk_cache_entry_extended_size(len) align_flex_name(ondisk_cache_entry_extended,len) +#define ondisk_ce_size(ce) (((ce)-ce_flags CE_EXTENDED) ? \ + ondisk_cache_entry_extended_size(ce_namelen(ce)) : \ +
[PATCH v2 05/19] Add documentation for the index api
Add documentation for the index reading api. This also includes documentation for the new api functions introduced in the next patch. Helped-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- Documentation/technical/api-in-core-index.txt | 54 +-- 1 file changed, 50 insertions(+), 4 deletions(-) diff --git a/Documentation/technical/api-in-core-index.txt b/Documentation/technical/api-in-core-index.txt index adbdbf5..9b8c37c 100644 --- a/Documentation/technical/api-in-core-index.txt +++ b/Documentation/technical/api-in-core-index.txt @@ -1,14 +1,60 @@ in-core index API = +Reading API +--- + +`cache`:: + + An array of cache entries. This is used to access the cache + entries directly. Use `index_name_pos` to search for the + index of a specific cache entry. + +`read_index_filtered`:: + + Read a part of the index, filtered by the pathspec given in + the opts. The function may load more than necessary, so the + caller still responsible to apply filters appropriately. The + filtering is only done for performance reasons, as it's + possible to only read part of the index when the on-disk + format is index-v5. + + To iterate only over the entries that match the pathspec, use + the for_each_index_entry function. + +`read_index`:: + + Read the whole index file from disk. + +`index_name_pos`:: + + Find a cache_entry with name in the index. Returns pos if an + entry is matched exactly and -1-pos if an entry is matched + partially. + e.g. + index: + file1 + file2 + path/file1 + zzz + + index_name_pos(path/file1, 10) returns 2, while + index_name_pos(path, 4) returns -3 + +`for_each_index_entry`:: + + Iterates over all cache_entries in the index filtered by + filter_opts in the index_state. For each cache entry fn is + executed with cb_data as callback data. From within the loop + do `return 0` to continue, or `return 1` to break the loop. + +TODO + Talk about read-cache.c and cache-tree.c, things like: -* cache - the_index macros -* read_index() * write_index() * ie_match_stat() and ie_modified(); how they are different and when to use which. -* index_name_pos() * remove_index_entry_at() * remove_file_from_index() * add_file_to_index() @@ -18,4 +64,4 @@ Talk about read-cache.c and cache-tree.c, things like: * cache_tree_invalidate_path() * cache_tree_update() -(JC, Linus) +(JC, Linus, Thomas Gummerer) -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 06/19] read-cache: add index reading api
Add an api for access to the index file. Currently there is only a very basic api for accessing the index file, which only allows a full read of the index, and lets the users of the data filter it. The new index api gives the users the possibility to use only part of the index and provides functions for iterating over and accessing cache entries. This simplifies future improvements to the in-memory format, as changes will be concentrated on one file, instead of the whole git source code. Helped-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- cache.h | 42 +- read-cache-v2.c | 35 +-- read-cache.c| 44 read-cache.h| 8 +++- 4 files changed, 121 insertions(+), 8 deletions(-) diff --git a/cache.h b/cache.h index 5082b34..d305d21 100644 --- a/cache.h +++ b/cache.h @@ -127,7 +127,7 @@ struct cache_entry { unsigned int ce_flags; unsigned int ce_namelen; unsigned char sha1[20]; - struct cache_entry *next; + struct cache_entry *next; /* used by name_hash */ char name[FLEX_ARRAY]; /* more */ }; @@ -258,6 +258,29 @@ static inline unsigned int canon_mode(unsigned int mode) #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1) +/* + * Options by which the index should be filtered when read partially. + * + * pathspec: The pathspec which the index entries have to match + * seen: Used to return the seen parameter from match_pathspec() + * max_prefix_len: The common prefix length of the pathspecs + * + * read_staged: used to indicate if the conflicted entries (entries + * with a stage) should be included + * read_cache_tree: used to indicate if the cache-tree should be read + * read_resolve_undo: used to indicate if the resolve undo data should + * be read + */ +struct filter_opts { + const struct pathspec *pathspec; + char *seen; + int max_prefix_len; + + int read_staged; + int read_cache_tree; + int read_resolve_undo; +}; + struct index_state { struct cache_entry **cache; unsigned int version; @@ -270,6 +293,8 @@ struct index_state { struct hash_table name_hash; struct hash_table dir_hash; struct index_ops *ops; + struct internal_ops *internal_ops; + struct filter_opts *filter_opts; }; extern struct index_state the_index; @@ -311,6 +336,12 @@ extern void free_name_hash(struct index_state *istate); #define unmerge_cache_entry_at(at) unmerge_index_entry_at(the_index, at) #define unmerge_cache(pathspec) unmerge_index(the_index, pathspec) #define read_blob_data_from_cache(path, sz) read_blob_data_from_index(the_index, (path), (sz)) + +/* index api */ +#define read_cache_filtered(opts) read_index_filtered(the_index, (opts)) +#define read_cache_filtered_from(path, opts) read_index_filtered_from(the_index, (path), (opts)) +#define for_each_cache_entry(fn, cb_data) \ + for_each_index_entry(the_index, (fn), (cb_data)) #endif enum object_type { @@ -438,6 +469,15 @@ extern int init_db(const char *template_dir, unsigned int flags); } \ } while (0) +/* index api */ +extern int read_index_filtered(struct index_state *, struct filter_opts *opts); +extern int read_index_filtered_from(struct index_state *, const char *path, struct filter_opts *opts); + +typedef int each_cache_entry_fn(struct cache_entry *ce, void *); +extern int for_each_index_entry(struct index_state *istate, + each_cache_entry_fn, void *); + + /* Initialize and use the cache information */ extern int read_index(struct index_state *); extern int read_index_preload(struct index_state *, const char **pathspec); diff --git a/read-cache-v2.c b/read-cache-v2.c index a6883c3..51b618f 100644 --- a/read-cache-v2.c +++ b/read-cache-v2.c @@ -3,6 +3,7 @@ #include resolve-undo.h #include cache-tree.h #include varint.h +#include dir.h /* Mask for the name length in ce_flags in the on-disk index */ #define CE_NAMEMASK (0x0fff) @@ -207,8 +208,14 @@ static int read_index_extension(struct index_state *istate, return 0; } +/* + * The performance is the same if we read the whole index or only + * part of it, therefore we always read the whole index to avoid + * having to re-read it later. The filter_opts will determine + * what part of the index is used when retrieving the cache-entries. + */ static int read_index_v2(struct index_state *istate, void *mmap, -unsigned long mmap_size) +unsigned long mmap_size, struct filter_opts *opts) { int i; unsigned long src_offset; @@ -238,7 +245,6 @@ static int read_index_v2(struct index_state *istate, void *mmap, disk_ce = (struct ondisk_cache_entry *)((char *)mmap + src_offset); ce =
[PATCH v2 07/19] make sure partially read index is not changed
A partially read index file currently cannot be written to disk. Make sure that never happens, by erroring out when a caller tries to change a partially read index. The caller is responsible for reading the whole index when it's trying to change it later. Forcing the caller to load the right part of the index file instead of re-reading it when changing it, gives a bit of a performance advantage, by avoiding to read parts of the index twice. Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- builtin/update-index.c | 4 cache.h| 1 + read-cache-v2.c| 2 ++ read-cache.c | 10 ++ 4 files changed, 17 insertions(+) diff --git a/builtin/update-index.c b/builtin/update-index.c index 5c7762e..4c6e3a6 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -49,6 +49,8 @@ static int mark_ce_flags(const char *path, int flag, int mark) int namelen = strlen(path); int pos = cache_name_pos(path, namelen); if (0 = pos) { + if (active_cache_partially_read) + die(BUG: Can't change a partially read index); if (mark) active_cache[pos]-ce_flags |= flag; else @@ -253,6 +255,8 @@ static void chmod_path(int flip, const char *path) pos = cache_name_pos(path, strlen(path)); if (pos 0) goto fail; + if (active_cache_partially_read) + die(BUG: Can't change a partially read index); ce = active_cache[pos]; mode = ce-ce_mode; if (!S_ISREG(mode)) diff --git a/cache.h b/cache.h index d305d21..455b772 100644 --- a/cache.h +++ b/cache.h @@ -311,6 +311,7 @@ extern void free_name_hash(struct index_state *istate); #define active_alloc (the_index.cache_alloc) #define active_cache_changed (the_index.cache_changed) #define active_cache_tree (the_index.cache_tree) +#define active_cache_partially_read (the_index.filter_opts) #define read_cache() read_index(the_index) #define read_cache_from(path) read_index_from(the_index, (path)) diff --git a/read-cache-v2.c b/read-cache-v2.c index 51b618f..f3c0685 100644 --- a/read-cache-v2.c +++ b/read-cache-v2.c @@ -479,6 +479,8 @@ static int write_index_v2(struct index_state *istate, int newfd) struct stat st; struct strbuf previous_name_buf = STRBUF_INIT, *previous_name; + if (istate-filter_opts) + die(BUG: index: cannot write a partially read index); for (i = removed = extended = 0; i entries; i++) { if (cache[i]-ce_flags CE_REMOVE) removed++; diff --git a/read-cache.c b/read-cache.c index 9053d43..ab716ed 100644 --- a/read-cache.c +++ b/read-cache.c @@ -30,6 +30,8 @@ static void replace_index_entry(struct index_state *istate, int nr, struct cache { struct cache_entry *old = istate-cache[nr]; + if (istate-filter_opts) + die(BUG: Can't change a partially read index); remove_name_hash(istate, old); set_index_entry(istate, nr, ce); istate-cache_changed = 1; @@ -467,6 +469,8 @@ int remove_index_entry_at(struct index_state *istate, int pos) { struct cache_entry *ce = istate-cache[pos]; + if (istate-filter_opts) + die(BUG: Can't change a partially read index); record_resolve_undo(istate, ce); remove_name_hash(istate, ce); istate-cache_changed = 1; @@ -973,6 +977,8 @@ int add_index_entry(struct index_state *istate, struct cache_entry *ce, int opti { int pos; + if (istate-filter_opts) + die(BUG: Can't change a partially read index); if (option ADD_CACHE_JUST_APPEND) pos = istate-cache_nr; else { @@ -1173,6 +1179,8 @@ int refresh_index(struct index_state *istate, unsigned int flags, const char **p /* If we are doing --really-refresh that * means the index is not valid anymore. */ + if (istate-filter_opts) + die(BUG: Can't change a partially read index); ce-ce_flags = ~CE_VALID; istate-cache_changed = 1; } @@ -1331,6 +1339,8 @@ int read_index_filtered_from(struct index_state *istate, const char *path, void *mmap; size_t mmap_size; + if (istate-filter_opts) + die(BUG: Can't re-read partially read index); errno = EBUSY; if (istate-initialized) return istate-cache_nr; -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 08/19] grep.c: Use index api
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- builtin/grep.c | 71 ++ 1 file changed, 37 insertions(+), 34 deletions(-) diff --git a/builtin/grep.c b/builtin/grep.c index a419cda..8b02644 100644 --- a/builtin/grep.c +++ b/builtin/grep.c @@ -368,41 +368,33 @@ static void run_pager(struct grep_opt *opt, const char *prefix) free(argv); } -static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, int cached) +struct grep_opts { + struct grep_opt *opt; + const struct pathspec *pathspec; + int cached; + int hit; +}; + +static int grep_cache(struct cache_entry *ce, void *cb_data) { - int hit = 0; - int nr; - read_cache(); + struct grep_opts *opts = cb_data; - for (nr = 0; nr active_nr; nr++) { - struct cache_entry *ce = active_cache[nr]; - if (!S_ISREG(ce-ce_mode)) - continue; - if (!match_pathspec_depth(pathspec, ce-name, ce_namelen(ce), 0, NULL)) - continue; - /* -* If CE_VALID is on, we assume worktree file and its cache entry -* are identical, even if worktree file has been modified, so use -* cache version instead -*/ - if (cached || (ce-ce_flags CE_VALID) || ce_skip_worktree(ce)) { - if (ce_stage(ce)) - continue; - hit |= grep_sha1(opt, ce-sha1, ce-name, 0, ce-name); - } - else - hit |= grep_file(opt, ce-name); - if (ce_stage(ce)) { - do { - nr++; - } while (nr active_nr -!strcmp(ce-name, active_cache[nr]-name)); - nr--; /* compensate for loop control */ - } - if (hit opt-status_only) - break; - } - return hit; + if (!S_ISREG(ce-ce_mode)) + return 0; + if (!match_pathspec_depth(opts-pathspec, ce-name, ce_namelen(ce), 0, NULL)) + return 0; + /* +* If CE_VALID is on, we assume worktree file and its cache entry +* are identical, even if worktree file has been modified, so use +* cache version instead +*/ + if (opts-cached || (ce-ce_flags CE_VALID) || ce_skip_worktree(ce)) + opts-hit |= grep_sha1(opts-opt, ce-sha1, ce-name, 0, ce-name); + else + opts-hit |= grep_file(opts-opt, ce-name); + if (opts-hit opts-opt-status_only) + return 1; + return 0; } static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec, @@ -895,10 +887,21 @@ int cmd_grep(int argc, const char **argv, const char *prefix) } else if (0 = opt_exclude) { die(_(--[no-]exclude-standard cannot be used for tracked contents.)); } else if (!list.nr) { + struct grep_opts opts; + struct filter_opts *filter_opts = xmalloc(sizeof(*filter_opts)); + if (!cached) setup_work_tree(); - hit = grep_cache(opt, pathspec, cached); + memset(filter_opts, 0, sizeof(*filter_opts)); + filter_opts-pathspec = pathspec; + opts.opt = opt; + opts.pathspec = pathspec; + opts.cached = cached; + opts.hit = 0; + read_cache_filtered(filter_opts); + for_each_cache_entry(grep_cache, opts); + hit = opts.hit; } else { if (cached) die(_(both --cached and trees are given.)); -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 09/19] ls-files.c: use index api
Use the index api to read only part of the index, if the on-disk version of the index is index-v5. Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- builtin/ls-files.c | 31 --- 1 file changed, 20 insertions(+), 11 deletions(-) diff --git a/builtin/ls-files.c b/builtin/ls-files.c index 08d9786..80cc398 100644 --- a/builtin/ls-files.c +++ b/builtin/ls-files.c @@ -31,6 +31,7 @@ static const char *prefix; static int max_prefix_len; static int prefix_len; static const char **pathspec; +static struct pathspec pathspec_struct; static int error_unmatch; static char *ps_matched; static const char *with_tree; @@ -457,6 +458,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) struct dir_struct dir; struct exclude_list *el; struct string_list exclude_list = STRING_LIST_INIT_NODUP; + struct filter_opts *opts = xmalloc(sizeof(*opts)); struct option builtin_ls_files_options[] = { { OPTION_CALLBACK, 'z', NULL, NULL, NULL, N_(paths are separated with NUL character), @@ -522,9 +524,6 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) prefix_len = strlen(prefix); git_config(git_default_config, NULL); - if (read_cache() 0) - die(index file corrupt); - argc = parse_options(argc, argv, prefix, builtin_ls_files_options, ls_files_usage, 0); el = add_exclude_list(dir, EXC_CMDL, --exclude option); @@ -556,14 +555,7 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) setup_work_tree(); pathspec = get_pathspec(prefix, argv); - - /* be nice with submodule paths ending in a slash */ - if (pathspec) - strip_trailing_slash_from_submodules(); - - /* Find common prefix for all pathspec's */ - max_prefix = common_prefix(pathspec); - max_prefix_len = max_prefix ? strlen(max_prefix) : 0; + init_pathspec(pathspec_struct, pathspec); /* Treat unmatching pathspec elements as errors */ if (pathspec error_unmatch) { @@ -573,6 +565,23 @@ int cmd_ls_files(int argc, const char **argv, const char *cmd_prefix) ps_matched = xcalloc(1, num); } + if (!with_tree) { + memset(opts, 0, sizeof(*opts)); + opts-pathspec = pathspec_struct; + opts-read_staged = 1; + if (show_resolve_undo) + opts-read_resolve_undo = 1; + read_cache_filtered(opts); + } else { + read_cache(); + } + /* be nice with submodule paths ending in a slash */ + if (pathspec) + strip_trailing_slash_from_submodules(); + + max_prefix = common_prefix(pathspec); + max_prefix_len = max_prefix ? strlen(max_prefix) : 0; + if ((dir.flags DIR_SHOW_IGNORED) !exc_given) die(ls-files --ignored needs some exclude pattern); -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/2] open() error checking
Thomas Rast tr...@inf.ethz.ch writes: #1 is Dale's suggested change. Dale, to include it we'd need your Signed-off-by as per Documentation/SubmittingPatches. #2 is a similar error-checking fix; I reviewed 'git grep \bopen\b' and found one case where the return value was obviously not tested. The corresponding Windows code path has the same problem, but I dare not touch it; perhaps someone from the Windows side can look into it? I originally had a four-patch series to open 0/1/2 from /dev/null, but then I noticed that this was shot down in 2008: http://thread.gmane.org/gmane.comp.version-control.git/93605/focus=93896 The way I recall the thread was not shot down but more like fizzled out without seeing a clear consensus. As a normal POSIX program, we do rely on fd#2 connected to an error stream, and I do agree with the general sentiment of that old thread that it is very wrong for warning() or die() to write to a pipe or file descriptor we opened for some other purpose, corrupting the destination. I briefly wondered if we can do the sanity check lazily (e.g. upon first warning() see of fd#2 is open and otherwise die silently), but we may open a fd (e.g. to create a new loose object) that may happen to grab fd#2 and then it is too late for us to do anything about it, so... Do you want to resurrect this? The worst part about it is that because we don't have a stderr to rely on, we can't simply die(stop playing mind games). Right. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] .mailmap: Map email addresses to names
The same kind of cleanup as sent earlier today (2e2ae79df4fabea0157c5eb527b5396eb89185a1 locally here) I asked all the people before, whether they like their lines added. Many had requests to change the order of the mail address. When having this patch applied, you'll notice the bug as described here http://marc.info/?l=gitm=137364524514927w=2 http://www.mail-archive.com/git@vger.kernel.org/msg31964.html (Bug in .mailmap handling?, for example look for Knut Franke) Signed-off-by: Stefan Beller stefanbel...@googlemail.com --- .mailmap | 28 ++-- 1 file changed, 26 insertions(+), 2 deletions(-) diff --git a/.mailmap b/.mailmap index 1179767..1d6ba17 100644 --- a/.mailmap +++ b/.mailmap @@ -7,6 +7,7 @@ Alejandro R. Sedeño ased...@mit.edu ased...@mit.edu Alexander Gavrilov angavri...@gmail.com +Alexey Shumkin alex.crez...@gmail.com zap...@mail.ru Alex Bennée kernel-hac...@bennee.com Alex Riesen raa.l...@gmail.com fo...@t-online.de Alex Riesen raa.l...@gmail.com raa@limbo.localdomain @@ -18,12 +19,15 @@ anonymous li...@horizon.com anonymous li...@horizon.net Brandon Casey draf...@gmail.com ca...@nrlssc.navy.mil Brian M. Carlson sand...@crustytoothpaste.ath.cx +Bryan Larsen br...@larsen.st bryan.lar...@gmail.com +Bryan Larsen br...@larsen.st bryanlar...@yahoo.com Cheng Renquan crq...@gmail.com Chris Shoemaker c.shoema...@cox.net Dana L. How dana...@gmail.com Dana L. How h...@deathvalley.cswitch.com Daniel Barkalow barka...@iabervon.org Dan Johnson computerdr...@gmail.com +David Brown g...@davidb.org dav...@quicinc.com David D. Kilzer ddkil...@kilzer.net David Kågedal dav...@lysator.liu.se David Reiss dre...@facebook.com dreiss@dreiss-vmware.(none) @@ -31,7 +35,10 @@ David S. Miller da...@davemloft.net Deskin Miller desk...@umich.edu Dirk Süsserott newslet...@dirk.my1.cc Eric S. Raymond e...@thyrsus.com +Eric Blake ebl...@redhat.com e...@byu.net +Eric Hanchrow eric.hanch...@gmail.com off...@blarg.net Erik Faye-Lund kusmab...@gmail.com kusmab...@googlemail.com +Eyvind Bernhardsen eyvind.bernhard...@gmail.com eyvind-...@orakel.ntnu.no Florian Achleitner florian.achleitner.2.6...@gmail.com florian.achleitner2.6...@gmail.com Franck Bui-Huu vagabon@gmail.com fbui...@gmail.com Frank Lichtenheld fr...@lichtenheld.de dj...@debian.org @@ -47,19 +54,25 @@ H. Peter Anvin h...@zytor.com h...@tazenda.sc.orionmulti.com H. Peter Anvin h...@zytor.com h...@trantor.hos.anvin.org İsmail Dönmez ism...@pardus.org.tr Jakub Narębski jna...@gmail.com +Jason Riedy e...@eecs.berkeley.edu e...@cs.berkeley.edu +Jason Riedy e...@eecs.berkeley.edu e...@eecs.berkeley.edu Jay Soffian jaysoff...@gmail.com jaysoffian+...@gmail.com J. Bruce Fields bfie...@citi.umich.edu bfie...@fieldses.org J. Bruce Fields bfie...@citi.umich.edu bfie...@pig.linuxdev.us.dell.com J. Bruce Fields bfie...@citi.umich.edu bfie...@puzzle.fieldses.org Jeff King p...@peff.net p...@github.com +Jeff Muizelaar jmuizel...@mozilla.com j...@infidigm.net Joachim Berdal Haga cjh...@fys.uio.no Johannes Schindelin johannes.schinde...@gmx.de johannes.schinde...@gmx.de Johannes Sixt j...@kdbg.org johannes.s...@telecom.at Johannes Sixt j...@kdbg.org j.s...@eudaptics.com Johannes Sixt j...@kdbg.org j.s...@viscovery.net +Jonathan del Strother jon.delstrot...@bestbefore.tv maill...@steelskies.com Jonathan Nieder jrnie...@gmail.com jrnie...@uchicago.edu Jon Loeliger j...@freescale.com -Jon Seymour j...@blackcubes.dyndns.org +Jon Seymour jon.seym...@gmail.com j...@blackcubes.dyndns.org +Josh Triplett j...@joshtriplett.org j...@freedesktop.org +Josh Triplett j...@joshtriplett.org jo...@us.ibm.com Junio C Hamano gits...@pobox.com gits...@pobox.com Junio C Hamano gits...@pobox.com ju...@hera.kernel.org Junio C Hamano gits...@pobox.com ju...@kernel.org @@ -71,11 +84,14 @@ Karl Wiberg k...@treskal.com Karl Hasselström k...@treskal.com Karl Wiberg k...@treskal.com Karl Hasselström k...@yoghurt.hemma.treskal.com Kay Sievers kay.siev...@vrfy.org kay@mam.(none) Kay Sievers kay.siev...@vrfy.org kay.siev...@suse.de +Karsten Blees bl...@dcon.de karsten.bl...@dcon.de +Karsten Blees bl...@dcon.de karsten.bl...@gmail.com Keith Cascio ke...@cs.ucla.edu ke...@cs.ucla.edu Kent Engstrom k...@lysator.liu.se Kevin Leung kevin...@gmail.com Kirill Smelkov k...@navytux.spb.ru k...@landau.phys.spbu.ru Kirill Smelkov k...@navytux.spb.ru k...@mns.spb.ru +Knut Franke knut.fra...@gmx.de k.fra...@science-computing.de Lars Doelle lars.doelle@on-line ! de Lars Doelle lars.doe...@on-line.de Li Hong leeh...@pku.edu.cn @@ -85,11 +101,14 @@ Linus Torvalds torva...@linux-foundation.org torva...@osdl.org Linus Torvalds torva...@linux-foundation.org torva...@ppc970.osdl.org Linus Torvalds torva...@linux-foundation.org torva...@ppc970.osdl.org.(none) Linus Torvalds torva...@linux-foundation.org torva...@woody.linux-foundation.org -Lukas Sandström luk...@etek.chalmers.se +Lukas Sandström luk...@gmail.com luk...@etek.chalmers.se +Matt
[PATCH v2 11/19] read-cache: make in-memory format aware of stat_crc
Make the in-memory format aware of the stat_crc used by index-v5. It is simply ignored by index version prior to v5. Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- cache.h | 1 + read-cache.c | 25 + 2 files changed, 26 insertions(+) diff --git a/cache.h b/cache.h index 455b772..2097105 100644 --- a/cache.h +++ b/cache.h @@ -127,6 +127,7 @@ struct cache_entry { unsigned int ce_flags; unsigned int ce_namelen; unsigned char sha1[20]; + uint32_t ce_stat_crc; struct cache_entry *next; /* used by name_hash */ char name[FLEX_ARRAY]; /* more */ }; diff --git a/read-cache.c b/read-cache.c index ab716ed..9bfbb4f 100644 --- a/read-cache.c +++ b/read-cache.c @@ -108,6 +108,29 @@ int match_stat_data(const struct stat_data *sd, struct stat *st) return changed; } +static uint32_t calculate_stat_crc(struct cache_entry *ce) +{ + unsigned int ctimens = 0; + uint32_t stat, stat_crc; + + stat = htonl(ce-ce_stat_data.sd_ctime.sec); + stat_crc = crc32(0, (Bytef*)stat, 4); +#ifdef USE_NSEC + ctimens = ce-ce_stat_data.sd_ctime.nsec; +#endif + stat = htonl(ctimens); + stat_crc = crc32(stat_crc, (Bytef*)stat, 4); + stat = htonl(ce-ce_stat_data.sd_ino); + stat_crc = crc32(stat_crc, (Bytef*)stat, 4); + stat = htonl(ce-ce_stat_data.sd_dev); + stat_crc = crc32(stat_crc, (Bytef*)stat, 4); + stat = htonl(ce-ce_stat_data.sd_uid); + stat_crc = crc32(stat_crc, (Bytef*)stat, 4); + stat = htonl(ce-ce_stat_data.sd_gid); + stat_crc = crc32(stat_crc, (Bytef*)stat, 4); + return stat_crc; +} + /* * This only updates the non-critical parts of the directory * cache, ie the parts that aren't tracked by GIT, and only used @@ -122,6 +145,8 @@ void fill_stat_cache_info(struct cache_entry *ce, struct stat *st) if (S_ISREG(st-st_mode)) ce_mark_uptodate(ce); + + ce-ce_stat_crc = calculate_stat_crc(ce); } static int ce_compare_data(const struct cache_entry *ce, struct stat *st) -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 10/19] documentation: add documentation of the index-v5 file format
Add a documentation of the index file format version 5 to Documentation/technical. Helped-by: Michael Haggerty mhag...@alum.mit.edu Helped-by: Junio C Hamano gits...@pobox.com Helped-by: Thomas Rast tr...@student.ethz.ch Helped-by: Nguyen Thai Ngoc Duy pclo...@gmail.com Helped-by: Robin Rosenberg robin.rosenb...@dewire.com Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- Documentation/technical/index-file-format-v5.txt | 296 +++ 1 file changed, 296 insertions(+) create mode 100644 Documentation/technical/index-file-format-v5.txt diff --git a/Documentation/technical/index-file-format-v5.txt b/Documentation/technical/index-file-format-v5.txt new file mode 100644 index 000..4213087 --- /dev/null +++ b/Documentation/technical/index-file-format-v5.txt @@ -0,0 +1,296 @@ +GIT index format + + +== The git index + + The git index file (.git/index) documents the status of the files + in the git staging area. + + The staging area is used for preparing commits, merging, etc. + +== The git index file format + + All binary numbers are in network byte order. Version 5 is described + here. The index file consists of various sections. They appear in + the following order in the file. + + - header: the description of the index format, including it's signature, + version and various other fields that are used internally. + + - diroffsets (ndir entries of direcotry offset): A 4-byte offset + relative to the beginning of the direntries block (see below) + for each of the ndir directories in the index, sorted by pathname + (of the directory it's pointing to). [1] + + - direntries (ndir entries of directory offset): A directory entry + for each of the ndir directories in the index, sorted by pathname + (see below). [2] + + - fileoffsets (nfile entries of file offset): A 4-byte offset + relative to the beginning of the fileentries block (see below) + for each of the nfile files in the index. [1] + + - fileentries (nfile entries of file entry): A file entry for + each of the nfile files in the index (see below). + + - crdata: A number of entries for conflicted data/resolved conflicts + (see below). + + - Extensions (Currently none, see below in the future) + + Extensions are identified by signature. Optional extensions can + be ignored if GIT does not understand them. + + GIT supports an arbitrary number of extension, but currently none + is implemented. [3] + + extsig (32-bits): extension signature. If the first byte is 'A'..'Z' + the extension is optional and can be ignored. + + extsize (32-bits): size of the extension, excluding the header + (extsig, extsize, extchecksum). + + extchecksum (32-bits): crc32 checksum of the extension signature + and size. + +- Extension data. + +== Header + sig (32-bits): Signature: + The signature is { 'D', 'I', 'R', 'C' } (stands for dircache) + + vnr (32-bits): Version number: + The current supported versions are 2, 3, 4 and 5. + + ndir (32-bits): number of directories in the index. + + nfile (32-bits): number of file entries in the index. + + fblockoffset (32-bits): offset to the file block, relative to the + beginning of the file. + + - Offset to the extensions. + + nextensions (32-bits): number of extensions. + + extoffset (32-bits): offset to the extension. (Possibly none, as + many as indicated in the 4-byte number of extensions) + + headercrc (32-bits): crc checksum including the header and the + offsets to the extensions. + + +== Directory offsets (diroffsets) + + diroffset (32-bits): offset to the directory relative to the beginning +of the index file. There are ndir + 1 offsets in the diroffset table, +the last is pointing to the end of the last direntry. With this last +entry, we are able to replace the strlen of when reading the directory +name, by calculating it from diroffset[n+1]-diroffset[n]-61. 61 is the +size of the directory data, which follows each each directory + the +crc sum + the NUL byte. + + This part is needed for making the directory entries bisectable and +thus allowing a binary search. + +== Directory entry (direntries) + + Directory entries are sorted in lexicographic order by the name +of their path starting with the root. + + pathname (variable length, nul terminated): relative to top level +directory (without the leading slash). '/' is used as path +separator. A string of length 0 ('') indicates the root directory. +The special path components ., and .. (without quotes) are +disallowed. The path also includes a trailing slash. [9] + + foffset (32-bits): offset to the lexicographically first file in +the file offsets (fileoffsets), relative to the beginning of +the fileoffset block. + + cr (32-bits): offset to conflicted/resolved data at the end of the +index. 0 if
[PATCH v2 14/19] read-cache: read cache-tree in index-v5
Since the cache-tree data is saved as part of the directory data, we already read it at the beginning of the index. The cache-tree is only converted from this directory data. The cache-tree data is arranged in a tree, with the children sorted by pathlen at each node, while the ondisk format is sorted lexically. So we have to rebuild this format from the on-disk directory list. Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- cache-tree.c| 2 +- cache-tree.h| 6 read-cache-v5.c | 100 3 files changed, 107 insertions(+), 1 deletion(-) diff --git a/cache-tree.c b/cache-tree.c index 37e4d00..f4b0917 100644 --- a/cache-tree.c +++ b/cache-tree.c @@ -31,7 +31,7 @@ void cache_tree_free(struct cache_tree **it_p) *it_p = NULL; } -static int subtree_name_cmp(const char *one, int onelen, +int subtree_name_cmp(const char *one, int onelen, const char *two, int twolen) { if (onelen twolen) diff --git a/cache-tree.h b/cache-tree.h index 55d0f59..9aac493 100644 --- a/cache-tree.h +++ b/cache-tree.h @@ -21,10 +21,16 @@ struct cache_tree { struct cache_tree_sub **down; }; +struct directory_queue { + struct directory_queue *down; + struct directory_entry *de; +}; + struct cache_tree *cache_tree(void); void cache_tree_free(struct cache_tree **); void cache_tree_invalidate_path(struct cache_tree *, const char *); struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *); +int subtree_name_cmp(const char *, int, const char *, int); void cache_tree_write(struct strbuf *, struct cache_tree *root); struct cache_tree *cache_tree_read(const char *buffer, unsigned long size); diff --git a/read-cache-v5.c b/read-cache-v5.c index 853b97d..0b9c320 100644 --- a/read-cache-v5.c +++ b/read-cache-v5.c @@ -448,6 +448,103 @@ static int read_conflicts(struct conflict_entry **head, return 0; } +static struct cache_tree *convert_one(struct directory_queue *queue, int dirnr) +{ + int i, subtree_nr; + struct cache_tree *it; + struct directory_queue *down; + + it = cache_tree(); + it-entry_count = queue[dirnr].de-de_nentries; + subtree_nr = queue[dirnr].de-de_nsubtrees; + if (0 = it-entry_count) + hashcpy(it-sha1, queue[dirnr].de-sha1); + + /* +* Just a heuristic -- we do not add directories that often but +* we do not want to have to extend it immediately when we do, +* hence +2. +*/ + it-subtree_alloc = subtree_nr + 2; + it-down = xcalloc(it-subtree_alloc, sizeof(struct cache_tree_sub *)); + down = queue[dirnr].down; + for (i = 0; i subtree_nr; i++) { + struct cache_tree *sub; + struct cache_tree_sub *subtree; + char *buf, *name; + + name = ; + buf = strtok(down[i].de-pathname, /); + while (buf) { + name = buf; + buf = strtok(NULL, /); + } + sub = convert_one(down, i); + if(!sub) + goto free_return; + subtree = cache_tree_sub(it, name); + subtree-cache_tree = sub; + } + if (subtree_nr != it-subtree_nr) + die(cache-tree: internal error); + return it; + free_return: + cache_tree_free(it); + return NULL; +} + +static int compare_cache_tree_elements(const void *a, const void *b) +{ + const struct directory_entry *de1, *de2; + + de1 = ((const struct directory_queue *)a)-de; + de2 = ((const struct directory_queue *)b)-de; + return subtree_name_cmp(de1-pathname, de1-de_pathlen, + de2-pathname, de2-de_pathlen); +} + +static struct directory_entry *sort_directories(struct directory_entry *de, + struct directory_queue *queue) +{ + int i, nsubtrees; + + nsubtrees = de-de_nsubtrees; + for (i = 0; i nsubtrees; i++) { + struct directory_entry *new_de; + de = de-next; + new_de = xmalloc(directory_entry_size(de-de_pathlen)); + memcpy(new_de, de, directory_entry_size(de-de_pathlen)); + queue[i].de = new_de; + if (de-de_nsubtrees) { + queue[i].down = xcalloc(de-de_nsubtrees, + sizeof(struct directory_queue)); + de = sort_directories(de, + queue[i].down); + } + } + qsort(queue, nsubtrees, sizeof(struct directory_queue), + compare_cache_tree_elements); + return de; +} + +/* + * This function modifies the directory argument that is given to it. + * Don't use it if the directory entries are still needed after. + */ +static struct
[PATCH v2 13/19] read-cache: read resolve-undo data
Make git read the resolve-undo data from the index. Since the resolve-undo data is joined with the conflicts in the ondisk format of the index file version 5, conflicts and resolved data is read at the same time, and the resolve-undo data is then converted to the in-memory format. Helped-by: Thomas Rast tr...@student.ethz.ch Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- read-cache-v5.c | 39 +++ 1 file changed, 39 insertions(+) diff --git a/read-cache-v5.c b/read-cache-v5.c index 00112ea..853b97d 100644 --- a/read-cache-v5.c +++ b/read-cache-v5.c @@ -1,5 +1,6 @@ #include cache.h #include read-cache.h +#include string-list.h #include resolve-undo.h #include cache-tree.h #include dir.h @@ -447,6 +448,43 @@ static int read_conflicts(struct conflict_entry **head, return 0; } +static void resolve_undo_convert_v5(struct index_state *istate, + struct conflict_entry *conflict) +{ + int i; + + while (conflict) { + struct string_list_item *lost; + struct resolve_undo_info *ui; + struct conflict_part *cp; + + if (conflict-entries + (conflict-entries-flags CONFLICT_CONFLICTED) != 0) { + conflict = conflict-next; + continue; + } + if (!istate-resolve_undo) { + istate-resolve_undo = xcalloc(1, sizeof(struct string_list)); + istate-resolve_undo-strdup_strings = 1; + } + + lost = string_list_insert(istate-resolve_undo, conflict-name); + if (!lost-util) + lost-util = xcalloc(1, sizeof(*ui)); + ui = lost-util; + + cp = conflict-entries; + for (i = 0; i 3; i++) + ui-mode[i] = 0; + while (cp) { + ui-mode[conflict_stage(cp) - 1] = cp-entry_mode; + hashcpy(ui-sha1[conflict_stage(cp) - 1], cp-sha1); + cp = cp-next; + } + conflict = conflict-next; + } +} + static int read_entries(struct index_state *istate, struct directory_entry **de, unsigned int *entry_offset, void **mmap, unsigned long mmap_size, unsigned int *nr, @@ -460,6 +498,7 @@ static int read_entries(struct index_state *istate, struct directory_entry **de, conflict_queue = NULL; if (read_conflicts(conflict_queue, *de, mmap, mmap_size) 0) return -1; + resolve_undo_convert_v5(istate, conflict_queue); for (i = 0; i (*de)-de_nfiles; i++) { if (read_entry(ce, *de, -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 12/19] read-cache: read index-v5
Make git read the index file version 5 without complaining. This version of the reader doesn't read neither the cache-tree nor the resolve undo data, but doesn't choke on an index that includes such data. Helped-by: Junio C Hamano gits...@pobox.com Helped-by: Nguyen Thai Ngoc Duy pclo...@gmail.com Helped-by: Thomas Rast tr...@student.ethz.ch Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- Makefile| 1 + cache.h | 75 ++- read-cache-v5.c | 638 read-cache.h| 1 + 4 files changed, 714 insertions(+), 1 deletion(-) create mode 100644 read-cache-v5.c diff --git a/Makefile b/Makefile index 73369ae..80e35f5 100644 --- a/Makefile +++ b/Makefile @@ -856,6 +856,7 @@ LIB_OBJS += quote.o LIB_OBJS += reachable.o LIB_OBJS += read-cache.o LIB_OBJS += read-cache-v2.o +LIB_OBJS += read-cache-v5.o LIB_OBJS += reflog-walk.o LIB_OBJS += refs.o LIB_OBJS += remote.o diff --git a/cache.h b/cache.h index 2097105..1e5cc77 100644 --- a/cache.h +++ b/cache.h @@ -99,7 +99,7 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long); #define CACHE_SIGNATURE 0x44495243 /* DIRC */ #define INDEX_FORMAT_LB 2 -#define INDEX_FORMAT_UB 4 +#define INDEX_FORMAT_UB 5 /* * The cache_time is just the low 32 bits of the @@ -121,6 +121,15 @@ struct stat_data { unsigned int sd_size; }; +/* + * The *next pointer is used in read_entries_v5 for holding + * all the elements of a directory, and points to the next + * cache_entry in a directory. + * + * It is reset by the add_name_hash call in set_index_entry + * to set it to point to the next cache_entry in the + * correct in-memory format ordering. + */ struct cache_entry { struct stat_data ce_stat_data; unsigned int ce_mode; @@ -132,11 +141,59 @@ struct cache_entry { char name[FLEX_ARRAY]; /* more */ }; +struct directory_entry { + struct directory_entry *next; + struct directory_entry *next_hash; + struct cache_entry *ce; + struct cache_entry *ce_last; + struct conflict_entry *conflict; + struct conflict_entry *conflict_last; + unsigned int conflict_size; + unsigned int de_foffset; + unsigned int de_cr; + unsigned int de_ncr; + unsigned int de_nsubtrees; + unsigned int de_nfiles; + unsigned int de_nentries; + unsigned char sha1[20]; + unsigned short de_flags; + unsigned int de_pathlen; + char pathname[FLEX_ARRAY]; +}; + +struct conflict_part { + struct conflict_part *next; + unsigned short flags; + unsigned short entry_mode; + unsigned char sha1[20]; +}; + +struct conflict_entry { + struct conflict_entry *next; + unsigned int nfileconflicts; + struct conflict_part *entries; + unsigned int namelen; + unsigned int pathlen; + char name[FLEX_ARRAY]; +}; + +struct ondisk_conflict_part { + unsigned short flags; + unsigned short entry_mode; + unsigned char sha1[20]; +}; + +#define CE_NAMEMASK (0x0fff) #define CE_STAGEMASK (0x3000) #define CE_EXTENDED (0x4000) #define CE_VALID (0x8000) +#define CE_SMUDGED (0x0400) /* index v5 only flag */ #define CE_STAGESHIFT 12 +#define CONFLICT_CONFLICTED (0x8000) +#define CONFLICT_STAGESHIFT 13 +#define CONFLICT_STAGEMASK (0x6000) + /* * Range 0x in ce_flags is divided into * two parts: in-memory flags and on-disk ones. @@ -173,6 +230,18 @@ struct cache_entry { #define CE_EXTENDED_FLAGS (CE_INTENT_TO_ADD | CE_SKIP_WORKTREE) /* + * Representation of the extended on-disk flags in the v5 format. + * They must not collide with the ordinary on-disk flags, and need to + * fit in 16 bits. Note however that v5 does not save the name + * length. + */ +#define CE_INTENT_TO_ADD_V5 (0x4000) +#define CE_SKIP_WORKTREE_V5 (0x0800) +#if (CE_VALID|CE_STAGEMASK) (CE_INTENTTOADD_V5|CE_SKIPWORKTREE_V5) +#error v5 on-disk flags collide with ordinary on-disk flags +#endif + +/* * Safeguard to avoid saving wrong flags: * - CE_EXTENDED2 won't get saved until its semantic is known * - Bits in 0x have been saved in ce_flags already @@ -211,6 +280,8 @@ static inline unsigned create_ce_flags(unsigned stage) #define ce_skip_worktree(ce) ((ce)-ce_flags CE_SKIP_WORKTREE) #define ce_mark_uptodate(ce) ((ce)-ce_flags |= CE_UPTODATE) +#define conflict_stage(c) ((CONFLICT_STAGEMASK (c)-flags) CONFLICT_STAGESHIFT) + #define ce_permissions(mode) (((mode) 0100) ? 0755 : 0644) static inline unsigned int create_ce_mode(unsigned int mode) { @@ -258,6 +329,8 @@ static inline unsigned int canon_mode(unsigned int mode) } #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1) +#define directory_entry_size(len) (offsetof(struct directory_entry,pathname) + (len) + 1) +#define conflict_entry_size(len) (offsetof(struct conflict_entry,name) + (len) + 1) /* * Options by which the index
[PATCH v2 16/19] read-cache: write index-v5 cache-tree data
Write the cache-tree data for the index version 5 file format. The in-memory cache-tree data is converted to the ondisk format, by adding it to the directory entries, that were compiled from the cache-entries in the step before. Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- read-cache-v5.c | 53 + 1 file changed, 53 insertions(+) diff --git a/read-cache-v5.c b/read-cache-v5.c index 33667d7..cd819b4 100644 --- a/read-cache-v5.c +++ b/read-cache-v5.c @@ -941,6 +941,57 @@ static struct conflict_entry *create_conflict_entry_from_ce(struct cache_entry * return create_new_conflict(ce-name, ce_namelen(ce), pathlen); } +static void convert_one_to_ondisk_v5(struct hash_table *table, struct cache_tree *it, + const char *path, int pathlen, uint32_t crc) +{ + int i; + struct directory_entry *found, *search; + + crc = crc32(crc, (Bytef*)path, pathlen); + found = lookup_hash(crc, table); + search = found; + while (search strcmp(path, search-pathname + search-de_pathlen - strlen(path)) != 0) + search = search-next_hash; + if (!search) + return; + /* +* The number of subtrees is already calculated by +* compile_directory_data, therefore we only need to +* add the entry_count +*/ + search-de_nentries = it-entry_count; + if (0 = it-entry_count) + hashcpy(search-sha1, it-sha1); + if (strcmp(path, ) != 0) + crc = crc32(crc, (Bytef*)/, 1); + +#if DEBUG + if (0 = it-entry_count) + fprintf(stderr, cache-tree %.*s (%d ent, %d subtree) %s\n, + pathlen, path, it-entry_count, it-subtree_nr, + sha1_to_hex(it-sha1)); + else + fprintf(stderr, cache-tree %.*s (%d subtree) invalid\n, + pathlen, path, it-subtree_nr); +#endif + + for (i = 0; i it-subtree_nr; i++) { + struct cache_tree_sub *down = it-down[i]; + if (i) { + struct cache_tree_sub *prev = it-down[i-1]; + if (subtree_name_cmp(down-name, down-namelen, +prev-name, prev-namelen) = 0) + die(fatal - unsorted cache subtree); + } + convert_one_to_ondisk_v5(table, down-cache_tree, down-name, down-namelen, crc); + } +} + +static void cache_tree_to_ondisk_v5(struct hash_table *table, struct cache_tree *root) +{ + convert_one_to_ondisk_v5(table, root, , 0, 0); +} + static struct directory_entry *compile_directory_data(struct index_state *istate, int nfile, unsigned int *ndir, @@ -1046,6 +1097,8 @@ static struct directory_entry *compile_directory_data(struct index_state *istate previous_entry-next = no_subtrees; } } + if (istate-cache_tree) + cache_tree_to_ondisk_v5(table, istate-cache_tree); return de; } -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 18/19] update-index.c: rewrite index when index-version is given
Make update-index always rewrite the index when a index-version is given, even if the index already has the right version. This option is used for performance testing the writer and reader. Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- builtin/update-index.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/builtin/update-index.c b/builtin/update-index.c index 4c6e3a6..7e723c0 100644 --- a/builtin/update-index.c +++ b/builtin/update-index.c @@ -6,6 +6,7 @@ #include cache.h #include quote.h #include cache-tree.h +#include read-cache.h #include tree-walk.h #include builtin.h #include refs.h @@ -863,8 +864,7 @@ int cmd_update_index(int argc, const char **argv, const char *prefix) preferred_index_format, INDEX_FORMAT_LB, INDEX_FORMAT_UB); - if (the_index.version != preferred_index_format) - active_cache_changed = 1; + active_cache_changed = 1; the_index.version = preferred_index_format; } -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH v2 17/19] read-cache: write resolve-undo data for index-v5
Make git read the resolve-undo data from the index. Since the resolve-undo data is joined with the conflicts in the ondisk format of the index file version 5, conflicts and resolved data is read at the same time, and the resolve-undo data is then converted to the in-memory format. Helped-by: Thomas Rast tr...@student.ethz.ch Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- read-cache-v5.c | 94 + 1 file changed, 94 insertions(+) diff --git a/read-cache-v5.c b/read-cache-v5.c index cd819b4..093ee1a 100644 --- a/read-cache-v5.c +++ b/read-cache-v5.c @@ -992,6 +992,99 @@ static void cache_tree_to_ondisk_v5(struct hash_table *table, struct cache_tree convert_one_to_ondisk_v5(table, root, , 0, 0); } +static void resolve_undo_to_ondisk_v5(struct hash_table *table, + struct string_list *resolve_undo, + unsigned int *ndir, int *total_dir_len, + struct directory_entry *de) +{ + struct string_list_item *item; + struct directory_entry *search; + + if (!resolve_undo) + return; + for_each_string_list_item(item, resolve_undo) { + struct conflict_entry *conflict_entry; + struct resolve_undo_info *ui = item-util; + char *super; + int i, dir_len, len; + uint32_t crc; + struct directory_entry *found, *current, *new_tree; + + if (!ui) + continue; + + super = super_directory(item-string); + if (!super) + dir_len = 0; + else + dir_len = strlen(super); + crc = crc32(0, (Bytef*)super, dir_len); + found = lookup_hash(crc, table); + current = NULL; + new_tree = NULL; + + while (!found) { + struct directory_entry *new; + + new = init_directory_entry(super, dir_len); + if (!current) + current = new; + insert_directory_entry(new, table, total_dir_len, ndir, crc); + if (new_tree != NULL) + new-de_nsubtrees = 1; + new-next = new_tree; + new_tree = new; + super = super_directory(super); + if (!super) + dir_len = 0; + else + dir_len = strlen(super); + crc = crc32(0, (Bytef*)super, dir_len); + found = lookup_hash(crc, table); + } + search = found; + while (search-next_hash strcmp(super, search-pathname) != 0) + search = search-next_hash; + if (search !current) + current = search; + if (!search !current) + current = new_tree; + if (!super new_tree) { + new_tree-next = de-next; + de-next = new_tree; + de-de_nsubtrees++; + } else if (new_tree) { + struct directory_entry *temp; + + search = de-next; + while (strcmp(super, search-pathname)) + search = search-next; + temp = new_tree; + while (temp-next) + temp = temp-next; + search-de_nsubtrees++; + temp-next = search-next; + search-next = new_tree; + } + + len = strlen(item-string); + conflict_entry = create_new_conflict(item-string, len, current-de_pathlen); + add_conflict_to_directory_entry(current, conflict_entry); + for (i = 0; i 3; i++) { + if (ui-mode[i]) { + struct conflict_part *cp; + + cp = xmalloc(sizeof(struct conflict_part)); + cp-flags = (i + 1) CONFLICT_STAGESHIFT; + cp-entry_mode = ui-mode[i]; + cp-next = NULL; + hashcpy(cp-sha1, ui-sha1[i]); + add_part_to_conflict_entry(current, conflict_entry, cp); + } + } + } +} + static struct directory_entry *compile_directory_data(struct index_state *istate, int nfile, unsigned int *ndir, @@ -1099,6 +1192,7 @@ static struct directory_entry *compile_directory_data(struct
[PATCH v2 15/19] read-cache: write index-v5
Write the index version 5 file format to disk. This version doesn't write the cache-tree data and resolve-undo data to the file. The main work is done when filtering out the directories from the current in-memory format, where in the same turn also the conflicts and the file data is calculated. Helped-by: Nguyen Thai Ngoc Duy pclo...@gmail.com Helped-by: Thomas Rast tr...@student.ethz.ch Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- cache.h | 8 + read-cache-v5.c | 594 +++- read-cache.c| 11 +- read-cache.h| 1 + 4 files changed, 611 insertions(+), 3 deletions(-) diff --git a/cache.h b/cache.h index 1e5cc77..f3685f8 100644 --- a/cache.h +++ b/cache.h @@ -565,6 +565,7 @@ extern int unmerged_index(const struct index_state *); extern int verify_path(const char *path); extern struct cache_entry *index_name_exists(struct index_state *istate, const char *name, int namelen, int igncase); extern int index_name_pos(const struct index_state *, const char *name, int namelen); +extern struct directory_entry *init_directory_entry(char *pathname, int len); #define ADD_CACHE_OK_TO_ADD 1 /* Ok to add */ #define ADD_CACHE_OK_TO_REPLACE 2 /* Ok to replace file/directory */ #define ADD_CACHE_SKIP_DFCHECK 4 /* Ok to skip DF conflict checks */ @@ -1363,6 +1364,13 @@ static inline ssize_t write_str_in_full(int fd, const char *str) return write_in_full(fd, str, strlen(str)); } +/* index-v5 helper functions */ +extern char *super_directory(const char *filename); +extern void insert_directory_entry(struct directory_entry *, struct hash_table *, int *, unsigned int *, uint32_t); +extern void add_conflict_to_directory_entry(struct directory_entry *, struct conflict_entry *); +extern void add_part_to_conflict_entry(struct directory_entry *, struct conflict_entry *, struct conflict_part *); +extern struct conflict_entry *create_new_conflict(char *, int, int); + /* pager.c */ extern void setup_pager(void); extern const char *pager_program; diff --git a/read-cache-v5.c b/read-cache-v5.c index 0b9c320..33667d7 100644 --- a/read-cache-v5.c +++ b/read-cache-v5.c @@ -769,9 +769,601 @@ static int read_index_v5(struct index_state *istate, void *mmap, return 0; } +#define WRITE_BUFFER_SIZE 8192 +static unsigned char write_buffer[WRITE_BUFFER_SIZE]; +static unsigned long write_buffer_len; + +static int ce_write_flush(int fd) +{ + unsigned int buffered = write_buffer_len; + if (buffered) { + if (write_in_full(fd, write_buffer, buffered) != buffered) + return -1; + write_buffer_len = 0; + } + return 0; +} + +static int ce_write(uint32_t *crc, int fd, void *data, unsigned int len) +{ + if (crc) + *crc = crc32(*crc, (Bytef*)data, len); + while (len) { + unsigned int buffered = write_buffer_len; + unsigned int partial = WRITE_BUFFER_SIZE - buffered; + if (partial len) + partial = len; + memcpy(write_buffer + buffered, data, partial); + buffered += partial; + if (buffered == WRITE_BUFFER_SIZE) { + write_buffer_len = buffered; + if (ce_write_flush(fd)) + return -1; + buffered = 0; + } + write_buffer_len = buffered; + len -= partial; + data = (char *) data + partial; + } + return 0; +} + +static int ce_flush(int fd) +{ + unsigned int left = write_buffer_len; + + if (left) + write_buffer_len = 0; + + if (write_in_full(fd, write_buffer, left) != left) + return -1; + + return 0; +} + +static void ce_smudge_racily_clean_entry(struct cache_entry *ce) +{ + /* +* This method shall only be called if the timestamp of ce +* is racy (check with is_racy_timestamp). If the timestamp +* is racy, the writer will set the CE_SMUDGED flag. +* +* The reader (match_stat_basic) will then take care +* of checking if the entry is really changed or not, by +* taking into account the size and the stat_crc and if +* that hasn't changed checking the sha1. +*/ + ce-ce_flags |= CE_SMUDGED; +} + +char *super_directory(const char *filename) +{ + char *slash; + + slash = strrchr(filename, '/'); + if (slash) + return xmemdupz(filename, slash-filename); + return NULL; +} + +struct directory_entry *init_directory_entry(char *pathname, int len) +{ + struct directory_entry *de = xmalloc(directory_entry_size(len)); + + memcpy(de-pathname, pathname, len); + de-pathname[len] = '\0'; + de-de_flags = 0; + de-de_foffset= 0; + de-de_cr = 0; + de-de_ncr
[PATCH v2 19/19] p0003-index.sh: add perf test for the index formats
From: Thomas Rast tr...@inf.ethz.ch Add a performance test for index version [23]/4/5 by using git update-index --index-version=x, thus testing both the reader and the writer speed of all index formats. Signed-off-by: Thomas Rast tr...@inf.ethz.ch Signed-off-by: Thomas Gummerer t.gumme...@gmail.com --- t/perf/p0003-index.sh | 59 +++ 1 file changed, 59 insertions(+) create mode 100755 t/perf/p0003-index.sh diff --git a/t/perf/p0003-index.sh b/t/perf/p0003-index.sh new file mode 100755 index 000..3e02868 --- /dev/null +++ b/t/perf/p0003-index.sh @@ -0,0 +1,59 @@ +#!/bin/sh + +test_description=Tests index versions [23]/4/5 + +. ./perf-lib.sh + +test_perf_large_repo + +test_expect_success convert to v3 + git update-index --index-version=2 + + +test_perf v[23]: update-index + git update-index --index-version=2 /dev/null + + +subdir=$(git ls-files | sed 's#/[^/]*$##' | grep -v '^$' | uniq | tail -n 30 | head -1) + +test_perf v[23]: grep nonexistent -- subdir + test_must_fail git grep nonexistent -- $subdir /dev/null + + +test_perf v[23]: ls-files -- subdir + git ls-files $subdir /dev/null + + +test_expect_success convert to v4 + git update-index --index-version=4 + + +test_perf v4: update-index + git update-index --index-version=4 /dev/null + + +test_perf v4: grep nonexistent -- subdir + test_must_fail git grep nonexistent -- $subdir /dev/null + + +test_perf v4: ls-files -- subdir + git ls-files $subdir /dev/null + + +test_expect_success convert to v5 + git update-index --index-version=5 + + +test_perf v5: update-index + git update-index --index-version=5 /dev/null + + +test_perf v5: grep nonexistent -- subdir + test_must_fail git grep nonexistent -- $subdir /dev/null + + +test_perf v5: ls-files -- subdir + git ls-files $subdir /dev/null + + +test_done -- 1.8.3.453.g1dfc63d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in .mailmap handling?
Stefan Beller stefanbel...@googlemail.com writes: Hello, you may have noticed I am currently trying to bring the mailmap file of git itself up to date. I noticed some behavior, which I did not expect. Have a look yourself: --- # prepare test environment: mkdir testmailmap cd testmailmap/ git init # do a commit: echo asdf test1 git add test1 git commit -a --author=A a...@example.org -m add test1 # commit with same name, but different email # (different capitalization does the trick already, # but here I am going to use a different mail) echo asdf test2 git add test2 git commit -a --author=A changed_em...@example.org -m add test2 # how do we know it's the same person? git shortlog A (2): add test1 add test2 You don't, and it is a long known behaviour. # reports as expected: git shortlog -sne 1 A a...@example.org 1 A changed_em...@example.org Yes. # Adding the line to the mailmap should make life easy, so we know # it's the same person echo A a...@example.org changed_em...@example.org .mailmap # Come on, I just wanted to have it reported as one person! git shortlog -sne 1 A a...@example.org 1 A a...@example.org Err, where does the lowercase a@ come from in the above? Are we missing some steps before we get here? # So let's try another line in the mailmap file, (small 'a') echo A a...@example.org changed_em...@example.org .mailmap This is , not , I presume? Otherwise changed_email is mapped to two destination, no? # We're not there yet? git shortlog -sne 1 A a...@example.org 1 A a...@example.org Expected, as long as some hidden set-up you did not describe that caused me to say Err, where does the lowercase a@ come from is there, i.e. one of the two commits is done by a...@example.org. # Now let's write it rather explicit: # (essentially just write 2 lines into the mailmap file) cat EOF .mailmap A a...@example.org changed_em...@example.org A a...@example.org a...@example.org EOF # works as expected now git shortlog -sne 2 A a...@example.org Makes sense. # works as expected now as well git shortlog A (2): add test1 add test2 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] push: document --lockref
Johannes Sixt j...@kdbg.org writes: Am 12.07.2013 00:14, schrieb Junio C Hamano: Johannes Sixt j...@kdbg.org writes: Again: Why not just define +refspec as the way to achieve this check? What justification do we have to break existing people's configuration that says something like: [remote ko] url = kernel.org:/pub/scm/git/git.git push = master push = next push = +pu push = maint by adding a _new_ requirement they may not be able to satisify? Notice that the above is a typical push only publishing point, where you do not need any remote tracking branches. Why would it break? When you do not specify --lockref, there is no change whatsoever. I thought your suggestion Why not just define +pu as the way to achieve _THIS_ check? was to make +pu to mean git push ko --lockref pu which would mean check refs/remotes/ko/pu and make sure the remote side still is at that commit. If that is not what you meant, please clarify what _THIS_ is. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in .mailmap handling?
The commands are exactly as given in the first mail. (I run it multiple times, and this exact behavior comes up) I think there is some problem with the capitalisation of the first (if not all) characters. (Hence the small 'a') So either check with these example commands yourself, or take my latest patch for the mailmap to reproduce on real names, please. On 07/12/2013 07:35 PM, Junio C Hamano wrote: Stefan Beller stefanbel...@googlemail.com writes: Hello, you may have noticed I am currently trying to bring the mailmap file of git itself up to date. I noticed some behavior, which I did not expect. Have a look yourself: --- # prepare test environment: mkdir testmailmap cd testmailmap/ git init # do a commit: echo asdf test1 git add test1 git commit -a --author=A a...@example.org -m add test1 # commit with same name, but different email # (different capitalization does the trick already, # but here I am going to use a different mail) echo asdf test2 git add test2 git commit -a --author=A changed_em...@example.org -m add test2 # how do we know it's the same person? git shortlog A (2): add test1 add test2 You don't, and it is a long known behaviour. # reports as expected: git shortlog -sne 1 A a...@example.org 1 A changed_em...@example.org Yes. # Adding the line to the mailmap should make life easy, so we know # it's the same person echo A a...@example.org changed_em...@example.org .mailmap # Come on, I just wanted to have it reported as one person! git shortlog -sne 1 A a...@example.org 1 A a...@example.org Err, where does the lowercase a@ come from in the above? Are we missing some steps before we get here? # So let's try another line in the mailmap file, (small 'a') echo A a...@example.org changed_em...@example.org .mailmap This is , not , I presume? Otherwise changed_email is mapped to two destination, no? # We're not there yet? git shortlog -sne 1 A a...@example.org 1 A a...@example.org Expected, as long as some hidden set-up you did not describe that caused me to say Err, where does the lowercase a@ come from is there, i.e. one of the two commits is done by a...@example.org. # Now let's write it rather explicit: # (essentially just write 2 lines into the mailmap file) cat EOF .mailmap A a...@example.org changed_em...@example.org A a...@example.org a...@example.org EOF # works as expected now git shortlog -sne 2 A a...@example.org Makes sense. # works as expected now as well git shortlog A (2): add test1 add test2 signature.asc Description: OpenPGP digital signature
Re: [PATCH v4] config: add support for http.url.* settings
Kyle J. McKay mack...@gmail.com writes: The url value is considered a match to a url if the url value is either an exact match or a prefix of the url which ends on a path component boundary ('/'). So https://example.com/test; will match https://example.com/test; and https://example.com/test/too; but not https://example.com/testextra;. Longer matches take precedence over shorter matches with environment variable settings taking precedence over all. With this configuration: [http] useragent = other-agent noEPSV = true [http https://example.com;] useragent = example-agent sslVerify = false [http https://example.com/path;] useragent = path-agent The https://other.example.com/; url will have useragent other-agent and sslVerify will be on. The https://example.com/; url will have useragent example-agent and sslVerify will be off. The https://example.com/path/sub; url will have useragent path-agent and sslVerify will be off. All three of the examples will have noEPSV enabled. Signed-off-by: Kyle J. McKay mack...@gmail.com Reviewed-by: Junio C Hamano gits...@pobox.com I haven't reviewed this version (yet). --- The credentials configuration values already support url-specific configuration items in the form credential.url.*. This patch adds similar support for http configuration values. Let's move the above three lines into the proposed log message and make it its first paragraph. A log message should say what it is about (i.e. what does add support really mean? what kind of functionality the new support brings to us?) upfront and then explain what it does in detail (i.e. the explanation of matching semantics you have as the first paragraph). diff --git a/Documentation/config.txt b/Documentation/config.txt index b4d4887..3731a3a 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1531,6 +1531,17 @@ http.useragent:: of common USER_AGENT strings (but not including those like git/1.7.1). Can be overridden by the 'GIT_HTTP_USER_AGENT' environment variable. +http.url.*:: + Any of the http.* options above can be applied selectively to some urls. + For example http.https://example.com.useragent; would set the user + agent only for https connections to example.com. The url value + matches a url if it is an exact match or a prefix of the url matching + at a '/' boundary. Given Peff's review, I am no longer sure if the strictly textual match is the semantics we want. At least, it is easy to explain and understand, but it might be too limiting to be useful. Let's assume it is what we want, for the rest of the review. diff --git a/http.c b/http.c index 2d086ae..feca70a 100644 --- a/http.c +++ b/http.c @@ -30,6 +30,34 @@ static CURL *curl_default; char curl_errorstr[CURL_ERROR_SIZE]; +enum http_option_type { + opt_post_buffer = 0, Do we need to have = 0 here? Is this order meant to match some externally defined order (e.g. alphabetical, or the value of corresponding constants defined in the cURL library), other than opt_max is not a real option but just has to be there at the end to count all of them? I am wondering if it would make it easier to read to move all the #ifdef at the end immediately before opt-max, or something. + opt_min_sessions, +#ifdef USE_CURL_MULTI + opt_max_requests, +#endif +... + opt_user_agent, + opt_passwd_req, + opt_max +}; +static int check_matched_len(enum http_option_type opt, size_t matchlen) +{ + /* + * Check the last matched length of option opt against matchlen and + * return true if the last matched length is larger (meaning the config + * setting should be ignored). Upon seeing the _same_ key (i.e. new key + * has the same match length which is therefore not larger) the new + * setting will override the previous setting. Otherwise return false + * and record matchlen as the current last matched length of option opt. + */ + if (http_option_max_matched_len[opt] matchlen) + return 1; + http_option_max_matched_len[opt] = matchlen; + return 0; +} A check_foo that returns either 0 or 1 usually mean 1 is good and 0 is not. A check_foo that returns either negative or 0 usually mean negative is an error and 0 is good. In general, check_foo is a bad name for a boolean function. This checks seen_longer_match(), and it is up to the caller if it considers good or bad to have a matching element that is longer or shorter what it has already seen. See below. static int http_options(const char *var, const char *value, void *cb) { - if (!strcmp(http.sslverify, var)) { + const char *url = (const char *)cb; Cast? + if (!strcmp(sslverify, key)) { + if (check_matched_len(opt_ssl_verify, matchlen)) + return 0; curl_ssl_verify = git_config_bool(var,
Re: [PATCH v4] config: add support for http.url.* settings
Kyle J. McKay mack...@gmail.com writes: + if (!strcmp(sslcertpasswordprotected, key)) { + if (check_matched_len(opt_passwd_req, matchlen)) + return 0; if (git_config_bool(var, value)) ssl_cert_password_required = 1; return 0; } This is not a new problem, but I think the existing code is wrong. There is no way to countermand an earlier [http] sslcertpasswordprotected in a more generic configuration file with [http] sslcertpasswordprotected = no in a repository specific configuration file. Perhaps we should fix it as a preparatory patch (1/2) before the main feature addition patch. - if (!strcmp(http.ssltry, var)) { + if (!strcmp(ssltry, key)) { + if (check_matched_len(opt_ssl_try, matchlen)) + return 0; curl_ssl_try = git_config_bool(var, value); return 0; } -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] http.c: fix parsing of http.sslCertPasswordProtected variable
The existing code triggers only when the configuration variable is set to true. Once the variable is set to true in a more generic configuration file (e.g. ~/.gitconfig), it cannot be overriden to false in the repository specific one (e.g. .git/config). Signed-off-by: Junio C Hamano gits...@pobox.com --- http.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/http.c b/http.c index 92aba59..37986f8 100644 --- a/http.c +++ b/http.c @@ -160,8 +160,7 @@ static int http_options(const char *var, const char *value, void *cb) if (!strcmp(http.sslcainfo, var)) return git_config_string(ssl_cainfo, var, value); if (!strcmp(http.sslcertpasswordprotected, var)) { - if (git_config_bool(var, value)) - ssl_cert_password_required = 1; + ssl_cert_password_required = git_config_bool(var, value); return 0; } if (!strcmp(http.ssltry, var)) { -- 1.8.3.2-942-gc84dfcb -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] .mailmap: Map email addresses to names
Stefan Beller wrote: --- a/.mailmap +++ b/.mailmap @@ -5,99 +5,146 @@ [...] Chris Shoemaker c.shoema...@cox.net -Dan Johnson computerdr...@gmail.com Dana L. How dana...@gmail.com Dana L. How h...@deathvalley.cswitch.com Daniel Barkalow barka...@iabervon.org +Dan Johnson computerdr...@gmail.com Small nit: the sorting looks broken here and in similar places below (the usual ordering is Dan Dana Daniel). Thanks, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] .mailmap: Map email addresses to names
Which tool would you recommend to sort stuff? Or rather the exact parameters for /usr/bin/sort ? Thanks, Stefan On 07/12/2013 08:57 PM, Jonathan Nieder wrote: Stefan Beller wrote: --- a/.mailmap +++ b/.mailmap @@ -5,99 +5,146 @@ [...] Chris Shoemaker c.shoema...@cox.net -Dan Johnson computerdr...@gmail.com Dana L. How dana...@gmail.com Dana L. How h...@deathvalley.cswitch.com Daniel Barkalow barka...@iabervon.org +Dan Johnson computerdr...@gmail.com Small nit: the sorting looks broken here and in similar places below (the usual ordering is Dan Dana Daniel). Thanks, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] http.c: fix parsing of http.sslCertPasswordProtected variable
Junio C Hamano wrote: The existing code triggers only when the configuration variable is set to true. Once the variable is set to true in a more generic configuration file (e.g. ~/.gitconfig), it cannot be overriden to false in the repository specific one (e.g. .git/config). [...] --- a/http.c +++ b/http.c @@ -160,8 +160,7 @@ static int http_options(const char *var, const char *value, void *cb) if (!strcmp(http.sslcainfo, var)) return git_config_string(ssl_cainfo, var, value); if (!strcmp(http.sslcertpasswordprotected, var)) { - if (git_config_bool(var, value)) - ssl_cert_password_required = 1; + ssl_cert_password_required = git_config_bool(var, value); Thanks for catching it. The documentation doesn't say anything about this can only enable and cannot disable behavior and the usual pattern is to allow later settings to override earlier ones, so this change looks good. Reviewed-by: Jonathan Nieder jrnie...@gmail.com FWIW the GIT_SSL_CERT_PASSWORD_PROTECTED envvar has a similar can only enable behavior, but since it's documented, that's not as big of a problem. Do you remember why it was written that way? When that setting was first added[1], there was some mention of autodetection if libcurl could learn to do that. Did it happen? (Please forgive my ignorance.) Thanks, Jonathan [1] http://thread.gmane.org/gmane.comp.version-control.git/122793/focus=122816 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] .mailmap: Map email addresses to names
From the man page *** WARNING *** The locale specified by the environment affects sort order. Set LC_ALL=C to get the traditional sort order that uses native byte values. And indeed I can avoid being nitpicked again for wrong sorting. ;) On 07/12/2013 09:02 PM, Stefan Beller wrote: Which tool would you recommend to sort stuff? Or rather the exact parameters for /usr/bin/sort ? Thanks, Stefan On 07/12/2013 08:57 PM, Jonathan Nieder wrote: Stefan Beller wrote: --- a/.mailmap +++ b/.mailmap @@ -5,99 +5,146 @@ [...] Chris Shoemaker c.shoema...@cox.net -Dan Johnson computerdr...@gmail.com Dana L. How dana...@gmail.com Dana L. How h...@deathvalley.cswitch.com Daniel Barkalow barka...@iabervon.org +Dan Johnson computerdr...@gmail.com Small nit: the sorting looks broken here and in similar places below (the usual ordering is Dan Dana Daniel). Thanks, Jonathan -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] .mailmap: Map email addresses to names
Ok I am sending all confirmed changes now again in one big patch, as the sorting was wrong. Stefan Beller (1): .mailmap: Map email addresses to names .mailmap | 135 +++ 1 file changed, 110 insertions(+), 25 deletions(-) -- 1.8.3.2.790.g9192b0b -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] .mailmap: Map email addresses to names
People change email addresses quite often and sometimes forget to add their entry to the mailmap file. I have contacted lots of people, whose name occurs multiple times in the short log having different email addresses. The entries in the mailmap of this patch are either confirmed by them or are trivial. Trivial means different capitalisation of the domain (@MIT.EDU and @mit.edu) or the domain was localhost, (none) or @local. Additionally to adding (name, email) mappings to the .mailmap file, it has also been sorted alphabetically. (which explains the removals, which are added 3 lines later on again). The sorting was done using export LC_ALL=C; /usr/bin/sort without arguments. While the most changes happen at the email addresses, we also have a name change in here. Karl Hasselström is now known as Karl Wiberg due to marriage. Congratulations! To find out whom to contact I used the following small script: - #!/bin/bash git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d mailmapdoubles while read line ; do # remove leading whitespace trimmed=$(echo $line | sed -e 's/^ *//g' -e 's/ *$//g') echo git shortlog -sne | grep \$trimmed\ done mailmapdoubles mailmapdoubles2 sh mailmapdoubles2 rm mailmapdoubles rm mailmapdoubles2 - Also interesting for similar tasks are these snippets: # Finding out duplicates by comparing email addresses: git shortlog -sne |awk '{ print $NF }' |sort |uniq -d # Finding out duplicates by comparing names: git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d - Signed-off-by: Stefan Beller stefanbel...@googlemail.com --- .mailmap | 135 +++ 1 file changed, 110 insertions(+), 25 deletions(-) diff --git a/.mailmap b/.mailmap index 345cce6..22d3d70 100644 --- a/.mailmap +++ b/.mailmap @@ -5,99 +5,184 @@ # same person appearing not to be so. # +n...@fluxnic.net n...@cam.org +Alejandro R. Sedeño ased...@mit.edu ased...@mit.edu Alex Bennée kernel-hac...@bennee.com +Alex Riesen raa.l...@gmail.com fo...@t-online.de +Alex Riesen raa.l...@gmail.com raa@limbo.localdomain +Alex Riesen raa.l...@gmail.com r...@steel.home +Alex Vandiver a...@chmrr.net ale...@mit.edu Alexander Gavrilov angavri...@gmail.com +Alexey Shumkin alex.crez...@gmail.com zap...@mail.ru +Anders Kaseorg ande...@mit.edu ande...@ksplice.com +Anders Kaseorg ande...@mit.edu ande...@mit.edu Aneesh Kumar K.V aneesh.ku...@gmail.com +Bernt Hansen be...@norang.ca be...@alumni.uwaterloo.ca +Brandon Casey draf...@gmail.com ca...@nrlssc.navy.mil Brian M. Carlson sand...@crustytoothpaste.ath.cx +Bryan Larsen br...@larsen.st bryan.lar...@gmail.com +Bryan Larsen br...@larsen.st bryanlar...@yahoo.com Cheng Renquan crq...@gmail.com Chris Shoemaker c.shoema...@cox.net Dan Johnson computerdr...@gmail.com Dana L. How dana...@gmail.com Dana L. How h...@deathvalley.cswitch.com Daniel Barkalow barka...@iabervon.org +David Brown g...@davidb.org dav...@quicinc.com David D. Kilzer ddkil...@kilzer.net David Kågedal dav...@lysator.liu.se +David Reiss dre...@facebook.com dreiss@dreiss-vmware.(none) David S. Miller da...@davemloft.net Deskin Miller desk...@umich.edu Dirk Süsserott newslet...@dirk.my1.cc +Eric Blake ebl...@redhat.com e...@byu.net +Eric Hanchrow eric.hanch...@gmail.com off...@blarg.net Eric S. Raymond e...@thyrsus.com Erik Faye-Lund kusmab...@gmail.com kusmab...@googlemail.com +Eyvind Bernhardsen eyvind.bernhard...@gmail.com eyvind-...@orakel.ntnu.no +Florian Achleitner florian.achleitner.2.6...@gmail.com florian.achleitner2.6...@gmail.com +Franck Bui-Huu vagabon@gmail.com fbui...@gmail.com +Frank Lichtenheld fr...@lichtenheld.de dj...@debian.org +Frank Lichtenheld fr...@lichtenheld.de flichtenh...@astaro.com Fredrik Kuivinen freku...@student.liu.se Frédéric Heitzmann frederic.heitzm...@gmail.com H. Merijn Brand h.m.br...@xs4all.nl H.Merijn Brand h.m.br...@xs4all.nl -H. Peter Anvin h...@bonde.sc.orionmulti.com -H. Peter Anvin h...@tazenda.sc.orionmulti.com -H. Peter Anvin h...@trantor.hos.anvin.org +H. Peter Anvin h...@zytor.com h...@bonde.sc.orionmulti.com +H. Peter Anvin h...@zytor.com h...@smyrno.hos.anvin.org +H. Peter Anvin h...@zytor.com h...@tazenda.sc.orionmulti.com +H. Peter Anvin h...@zytor.com h...@trantor.hos.anvin.org +Han-Wen Nienhuys han...@google.com Han-Wen Nienhuys han...@xs4all.nl Horst H. von Brand vonbr...@inf.utfsm.cl -İsmail Dönmez ism...@pardus.org.tr +J. Bruce Fields bfie...@citi.umich.edu bfie...@fieldses.org +J. Bruce Fields bfie...@citi.umich.edu bfie...@pig.linuxdev.us.dell.com +J. Bruce Fields bfie...@citi.umich.edu bfie...@puzzle.fieldses.org Jakub Narębski jna...@gmail.com -Jay Soffian jaysoffian+...@gmail.com +Jason Riedy e...@eecs.berkeley.edu e...@eecs.berkeley.edu +Jason Riedy e...@eecs.berkeley.edu e...@cs.berkeley.edu +Jay Soffian jaysoff...@gmail.com jaysoffian+...@gmail.com Jeff King p...@peff.net p...@github.com +Jeff
Re: [PATCH] .mailmap: Map email addresses to names
Stefan Beller stefanbel...@googlemail.com writes: Ok I am sending all confirmed changes now again in one big patch, as the sorting was wrong. Stefan Beller (1): .mailmap: Map email addresses to names So this corresponds to both of your patches, or just the first batch? -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] http.c: fix parsing of http.sslCertPasswordProtected variable
Jonathan Nieder jrnie...@gmail.com writes: FWIW the GIT_SSL_CERT_PASSWORD_PROTECTED envvar has a similar can only enable behavior, but since it's documented, that's not as big of a problem. Do you remember why it was written that way? Not me ;-). If I have to guess, it is probably that these two are thought to be equally trivial way to defeat existing environment settings: (unset GIT_SSL_CERT_PASSWORD_PROTECTED ; git cmd) GIT_SSL_CERT_PASSWORD_PROTECTED=no git cmd When that setting was first added[1], there was some mention of autodetection if libcurl could learn to do that. Did it happen? I do not think so, but let's see if our resident cURL guru has something to say about it. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] push: document --lockref
Am 12.07.2013 19:40, schrieb Junio C Hamano: Johannes Sixt j...@kdbg.org writes: Am 12.07.2013 00:14, schrieb Junio C Hamano: Johannes Sixt j...@kdbg.org writes: Again: Why not just define +refspec as the way to achieve this check? What justification do we have to break existing people's configuration that says something like: [remote ko] url = kernel.org:/pub/scm/git/git.git push = master push = next push = +pu push = maint by adding a _new_ requirement they may not be able to satisify? Notice that the above is a typical push only publishing point, where you do not need any remote tracking branches. Why would it break? When you do not specify --lockref, there is no change whatsoever. I thought your suggestion Why not just define +pu as the way to achieve _THIS_ check? was to make +pu to mean git push ko --lockref pu which would mean check refs/remotes/ko/pu and make sure the remote side still is at that commit. If that is not what you meant, please clarify what _THIS_ is. We have three independent options that the user can choose in any combination: o --force given or not; o --lockref semantics enabled or not; o refspec with or without +; and these two orthogonal preconditions of the push o push is fast-forward or it is not (ff, noff); o the branch at the remote is at the expected rev or it is not (match, mismatch). Here is a table with the expected outcome. ok means that the push is allowed(*), fail means that the push is denied. (Four more lines with --force are omitted because they have ok in all spots.) ff noff ff noff match match mismatch mismatch --lockref +refspec okokdenied denied --lockref refspec ok denied denied denied +refspec okok ok ok refspec ok deniedok denied Notice that without --lockref semantics enabled, +refspec and refspec keep the current behavior. (*) As we are talking only about the client-side of the push here, I'm saying allowed instead of succeeds because the server can have additional restrictions that can make the push fail. -- Hannes -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] .mailmap: Map email addresses to names
Stefan Beller stefanbel...@googlemail.com writes: Additionally to adding (name, email) mappings to the .mailmap file, it has also been sorted alphabetically. (which explains the removals, which are added 3 lines later on again). What is this 3 lines later on again about? Is it a remnant from the previous iteration? If so, I can remove this (which ...) locally. While the most changes happen at the email addresses, we also have a name change in here. Karl Hasselström is now known as Karl Wiberg due to marriage. Congratulations! Indeed. I like this part of the log message ;-) -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 0/7] cat-file --batch-check performance improvements
On Fri, Jul 12, 2013 at 10:23:34AM -0700, Junio C Hamano wrote: Jeff King p...@peff.net writes: The results for running (in linux.git): $ git rev-list --objects --all objects $ git cat-file --batch-check='%(objectsize:disk)' objects /dev/null I can see how these patches are very logical avenue to grab only on-disk footprint for large number of objects, but among the type, payload size and on-disk footprint, I find it highly narrow niche that a real user or script is interested _only_ in on-disk footprint without even worrying about the type of object. Yeah, I agree it is a bit of a niche. However, there are other code paths that might want only the size and not the type (e.g., we already know the object is a blob, but want to know size before deciding how to handle diff). But in general, I doubt the performance impact is a big deal there. It's only measurable when you're doing millions of objects. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in .mailmap handling?
Stefan Beller stefanbel...@googlemail.com writes: # Adding the line to the mailmap should make life easy, so we know # it's the same person echo A a...@example.org changed_em...@example.org .mailmap While I was looking at this, I noticed this piece of code: diff --git a/mailmap.c b/mailmap.c index 2a7b366..418081e 100644 --- a/mailmap.c +++ b/mailmap.c @@ -122,7 +122,7 @@ static char *parse_name_and_email(char *buffer, char **name, while (nend nstart isspace(*nend)) --nend; - *name = (nstart nend ? nstart : NULL); + *name = (nstart = nend ? nstart : NULL); *email = left+1; *(nend+1) = '\0'; *right++ = '\0'; The function is given a buffer A a...@example.org..., nstart scans from the beginning of the buffer, skipping whitespaces (there isn't any, so nstart points at the buffer), while nend starts from one byte before the first '' and skips whitespaces backwards and ends at the first non-whitespace (i.e. it hits A at the beginning of the buffer). nstart == nend in this case for a single-letter name, and an off-by-one error makes it fail to pick up the name, which makes the entry equivalent to a...@example.org changed_em...@example.org without the name. I do not think this bug affected anything you observed, though. git shortlog -sne 1 A a...@example.org 1 A a...@example.org This is coming from mailmap.c::add_mapping() that downcases the e-mail address. changed_em...@example.org is mapped to a...@example.org because of this downcasing, while A a...@example.org does not have any entry for it in the .mailmap file, so it is given back as-is. Hence we see two distinct entries. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in .mailmap handling?
On 07/12/2013 10:28 PM, Junio C Hamano wrote: Stefan Beller stefanbel...@googlemail.com writes: # Adding the line to the mailmap should make life easy, so we know # it's the same person echo A a...@example.org changed_em...@example.org .mailmap While I was looking at this, I noticed this piece of code: diff --git a/mailmap.c b/mailmap.c index 2a7b366..418081e 100644 --- a/mailmap.c +++ b/mailmap.c @@ -122,7 +122,7 @@ static char *parse_name_and_email(char *buffer, char **name, while (nend nstart isspace(*nend)) --nend; - *name = (nstart nend ? nstart : NULL); + *name = (nstart = nend ? nstart : NULL); *email = left+1; *(nend+1) = '\0'; *right++ = '\0'; The function is given a buffer A a...@example.org..., nstart scans from the beginning of the buffer, skipping whitespaces (there isn't any, so nstart points at the buffer), while nend starts from one byte before the first '' and skips whitespaces backwards and ends at the first non-whitespace (i.e. it hits A at the beginning of the buffer). nstart == nend in this case for a single-letter name, and an off-by-one error makes it fail to pick up the name, which makes the entry equivalent to a...@example.org changed_em...@example.org without the name. I do not think this bug affected anything you observed, though. git shortlog -sne 1 A a...@example.org 1 A a...@example.org This is coming from mailmap.c::add_mapping() that downcases the e-mail address. changed_em...@example.org is mapped to a...@example.org because of this downcasing, while A a...@example.org does not have any entry for it in the .mailmap file, so it is given back as-is. Hence we see two distinct entries. So do I understand it right, we're having 2 bugs in here? One being triggered by the short name, only one character. So if you want to debug the other bug with a longer name, you can either use a made up name, or run git shortlog -sne |grep Knut in the git repository having the mailmap file already updated. The way the mailmap file is written, I'd assume only one line to be found, as of now 2 lines come up 2 Knut Franke knut.fra...@gmx.de 1 Knut Franke knut.fra...@gmx.de which seems to downcase the whole first email. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in .mailmap handling?
Junio C Hamano gits...@pobox.com writes: Stefan Beller stefanbel...@googlemail.com writes: git shortlog -sne 1 A a...@example.org 1 A a...@example.org This is coming from mailmap.c::add_mapping() that downcases the e-mail address. changed_em...@example.org is mapped to a...@example.org because of this downcasing, while A a...@example.org does not have any entry for it in the .mailmap file, so it is given back as-is. Hence we see two distinct entries. I think it is wrong for the parser to lose information by downcasing. It is perfectly fine to do the comparison case insensitively, to allow changed_em...@example.org in the input is mangled using the entry for changed_em...@example.org in the .mailmap file; it is not fine to downcase the parsed result, especially the side that is used as the rewritten result (i.e. the tokens earlier on the line), as that would mean mangling the output. Let me see if I can quickly whip up a fix. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo
On Fri, Jul 12, 2013 at 09:23:54AM -0700, Junio C Hamano wrote: In that case, check-ignore gets a SIGPIPE and dies. The outer shell then tries to open the output fifo but blocks indefinitely, because there is no writer. We can fix it by keeping a descriptor open through the whole procedure. Ahh, figures. I wish I were smart enough to figure that out immediately after seeing the test that does funny things to in with 9. I wish I were smart enough to have figured it out when I was writing the funny stuff with in and 9 in the first place. :) -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 3/3] wt-status: use format function attribute for status_printf
On Fri, Jul 12, 2013 at 09:10:30AM -0700, Junio C Hamano wrote: You can fix it with -Wno-zero-format-length, so the hassle is not huge. But I am also inclined to just drop this one. We have lived without the extra safety for a long time, and list review does tend to catch such problems in practice. I am tempted to actually merge the original one as-is without any of the workaround, and just tell people to use -Wno-format-zero-length. Yeah, I think the only downside is the cognitive burden on individual developers who try -Wall and have to figure out that we need -Wno-zero-format-length (and that the warnings are not interesting). It would be nice to add it automatically to CFLAGS, but I do not know if we can reliably detect from the Makefile that we are compiling under gcc. -Peff -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug in .mailmap handling?
Junio C Hamano gits...@pobox.com writes: Junio C Hamano gits...@pobox.com writes: Stefan Beller stefanbel...@googlemail.com writes: git shortlog -sne 1 A a...@example.org 1 A a...@example.org This is coming from mailmap.c::add_mapping() that downcases the e-mail address. changed_em...@example.org is mapped to a...@example.org because of this downcasing, while A a...@example.org does not have any entry for it in the .mailmap file, so it is given back as-is. Hence we see two distinct entries. I think it is wrong for the parser to lose information by downcasing. It is perfectly fine to do the comparison case insensitively, to allow changed_em...@example.org in the input is mangled using the entry for changed_em...@example.org in the .mailmap file; it is not fine to downcase the parsed result, especially the side that is used as the rewritten result (i.e. the tokens earlier on the line), as that would mean mangling the output. Let me see if I can quickly whip up a fix. It might be just the matter of doing this. I suspect that we could drop the downcasing of old-email, too, but the new-email is supposed to be the rewritten result that appears on the output, and downcasing it is very wrong. I also suspect that this was an old workaround for the original string-list that did not know how to match items case insensitively, which we should have removed when we added case sensitivity support to the string-list at around 577f63e7 (Merge branch 'ap/log-mailmap', 2013-01-20). mailmap.c | 3 --- 1 file changed, 3 deletions(-) diff --git a/mailmap.c b/mailmap.c index 418081e..c64a53d 100644 --- a/mailmap.c +++ b/mailmap.c @@ -56,9 +56,6 @@ static void add_mapping(struct string_list *map, if (old_email) for (p = old_email; *p; p++) *p = tolower(*p); - if (new_email) - for (p = new_email; *p; p++) - *p = tolower(*p); if (old_email == NULL) { old_email = new_email; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH v3] config: add support for http.url.* settings
At 06:07 -0700 12 Jul 2013, Kyle J. McKay mack...@gmail.com wrote: I don't think it's necessary to split the URL apart though. Whatever URL the user gave to git on the command line (at some point even if it's now stored as a remote setting in config) complete with URL- encoding, user names, port names, etc. is the same url, possibly shortened, that needs to be used for the http.url.option setting. This seems to be assuming that the configuration is done after the URL is entered and that URLs are always entered manually. I don't think either of those assumptions is valid. A user may want to specify http settings for all repositories on a specified host and so add settings for that host to ~/.gitconfig expecting those settings to be used later. A URL in a slightly different format may later be copy+pasted without the user realizing that it won't use that config due to one of the versions being in a non-canonical form. I think that's simple and very easy to explain and avoids user confusion and surprise while still allowing a default to be set for a site but easily overridden for a portion of that site without needing to worry about the order config files are processed or the order of the [http url] sections within them. I agree that the method is easy to explain, but I think a user may very well be surprised and confused in a scenario like I described above. And having the order not matter (in some cases) for these configuration items deviates from how other configuration values are handled. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH] Add a testcase for checking case insensitivity of mail map
Your patch seems to do it. I added a test case which doesn't fail, if your patch is added. Signed-off-by: Stefan Beller stefanbel...@googlemail.com --- t/t4203-mailmap.sh | 33 + 1 file changed, 33 insertions(+) diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index 842b754..07152e9 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh @@ -409,4 +409,37 @@ test_expect_success 'Blame output (complex mapping)' ' test_cmp expect actual.fuzz ' +test_expect_success 'cleanup after mailmap.blob tests' ' + rm -f .mailmap +' + +cat expect \EOF + 2 A a...@example.org + 2 Other Author ot...@author.xx + 2 Santa Claus santa.cl...@northpole.xx + 1 A U Thor aut...@example.com + 1 CTO c...@company.xx + 1 Some Dude s...@dude.xx +EOF + +test_expect_success 'Test case sensitivity of Names' ' + # do a commit: + echo asdf test1 + git add test1 + git commit -a --author=A a...@example.org -m add test1 + + # commit with same name, but different email + # (different capitalization does the trick already, + # but here I am going to use a different mail) + echo asdf test2 + git add test2 + git commit -a --author=A changed_em...@example.org -m add test2 + + # Adding the line to the mailmap should make life easy, so we know + # it is the same person + echo A a...@example.org changed_em...@example.org .mailmap + + git shortlog -sne HEAD actual test_cmp expect actual +' + test_done -- 1.8.3.2.776.gfcf213d -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 7/7] push: document --lockref
Johannes Sixt j...@kdbg.org writes: We have three independent options that the user can choose in any combination: o --force given or not; o --lockref semantics enabled or not; o refspec with or without +; and these two orthogonal preconditions of the push o push is fast-forward or it is not (ff, noff); o the branch at the remote is at the expected rev or it is not (match, mismatch). Here is a table with the expected outcome. ok means that the push is allowed(*), fail means that the push is denied. (Four more lines with --force are omitted because they have ok in all spots.) ff noff ff noff match match mismatch mismatch --lockref +refspec okokdenied denied --lockref refspec ok denied denied denied I am confused with these. The latter is the most typical: git fetch git checkout topic git rebase topic git push --lockref topic where we know it is noff already, and we just want to make sure that nobody mucked with our remote while we are rebasing. If nobody updated the remote, why should this push be denied? And in order to make it succeed, you need to force with +refspec or --force, but that would bypass match/mismatch safety, which makes the whole make sure the other end is unchanged safety meaningless, no? +refspec okok ok ok This is traditional --force. refspec ok deniedok denied We are not asking for --lockref, so match/mismatch does not affect the outcome. Notice that without --lockref semantics enabled, +refspec and refspec keep the current behavior. But I do not think the above table with --lockref makes much sense. Let's look at noff/match case. That is the only interesting one. This should fail: git push topic due to no-ff. Your table above makes this fail: git push --lockref topic and the user has to force it, like this? git push --lockref --force topic ;# or alternatively git push --lockref +topic Why is it even necessary? If you make git push --lockref topic succeed in noff/match case, everything makes more sense to me. The --lockref option is merely a weaker form of --force but still a way to override the noff check. If the user wants to keep noff check, the user can simply choose not to use the option. Of course, that form should fail if mismatch. And then you can force it, git push --force [--lockref] topic As --force is anything goes, it does not matter if you give the other option on the command line. (*) As we are talking only about the client-side of the push here, I'm saying allowed instead of succeeds because the server can have additional restrictions that can make the push fail. Yes, you and I have known from the beginning that we are in agreement on that, but it is a good idea to explicitly say so for the sake of bystanders. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 1/4] mailmap: do not lose single-letter names
In parse_name_and_email() function, there is this line: *name = (nstart nend ? nstart : NULL); When the function is given a buffer A a...@example.org old@x.z, nstart scans from the beginning of the buffer, skipping whitespaces (there isn't any, so nstart points at the buffer), while nend starts from one byte before the first '' and skips whitespaces backwards and stops at the first non-whitespace (i.e. it hits A at the beginning of the buffer). nstart == nend in this case for a single-letter name, and an off-by-one error makes it fail to pick up the name, which makes the entry equivalent to a...@example.org old@x.z without the name. Signed-off-by: Junio C Hamano gits...@pobox.com --- mailmap.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/mailmap.c b/mailmap.c index 2a7b366..418081e 100644 --- a/mailmap.c +++ b/mailmap.c @@ -122,7 +122,7 @@ static char *parse_name_and_email(char *buffer, char **name, while (nend nstart isspace(*nend)) --nend; - *name = (nstart nend ? nstart : NULL); + *name = (nstart = nend ? nstart : NULL); *email = left+1; *(nend+1) = '\0'; *right++ = '\0'; -- 1.8.3.2-941-gda9c3c8 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 4/4] add a testcase for checking case insensitivity of mailmap
From: Stefan Beller stefanbel...@googlemail.com Signed-off-by: Stefan Beller stefanbel...@googlemail.com Signed-off-by: Junio C Hamano gits...@pobox.com --- t/t4203-mailmap.sh | 33 + 1 file changed, 33 insertions(+) diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index 842b754..07152e9 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh @@ -409,4 +409,37 @@ test_expect_success 'Blame output (complex mapping)' ' test_cmp expect actual.fuzz ' +test_expect_success 'cleanup after mailmap.blob tests' ' + rm -f .mailmap +' + +cat expect \EOF + 2 A a...@example.org + 2 Other Author ot...@author.xx + 2 Santa Claus santa.cl...@northpole.xx + 1 A U Thor aut...@example.com + 1 CTO c...@company.xx + 1 Some Dude s...@dude.xx +EOF + +test_expect_success 'Test case sensitivity of Names' ' + # do a commit: + echo asdf test1 + git add test1 + git commit -a --author=A a...@example.org -m add test1 + + # commit with same name, but different email + # (different capitalization does the trick already, + # but here I am going to use a different mail) + echo asdf test2 + git add test2 + git commit -a --author=A changed_em...@example.org -m add test2 + + # Adding the line to the mailmap should make life easy, so we know + # it is the same person + echo A a...@example.org changed_em...@example.org .mailmap + + git shortlog -sne HEAD actual test_cmp expect actual +' + test_done -- 1.8.3.2-941-gda9c3c8 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 0/4] Microfixes to mailmap
There were a few bugs in mailmap parsing and matching code. Junio C Hamano (3): mailmap: do not lose single-letter names mailmap: do not downcase mailmap entries mailmap: style fixes Stefan Beller (1): Add a testcase for checking case insensitivity of mailmap mailmap.c | 37 +++-- t/t4203-mailmap.sh | 33 + 2 files changed, 52 insertions(+), 18 deletions(-) -- 1.8.3.2-941-gda9c3c8 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 3/4] mailmap: style fixes
Wrap overlong lines and format the multi-line comments to match our coding style. Signed-off-by: Junio C Hamano gits...@pobox.com --- mailmap.c | 15 ++- 1 file changed, 10 insertions(+), 5 deletions(-) diff --git a/mailmap.c b/mailmap.c index a7e92db..5a3347d 100644 --- a/mailmap.c +++ b/mailmap.c @@ -37,7 +37,8 @@ static void free_mailmap_info(void *p, const char *s) static void free_mailmap_entry(void *p, const char *s) { struct mailmap_entry *me = (struct mailmap_entry *)p; - debug_mm(mailmap: removing entries for %s, with %d sub-entries\n, s, me-namemap.nr); + debug_mm(mailmap: removing entries for %s, with %d sub-entries\n, +s, me-namemap.nr); debug_mm(mailmap: - simple: '%s' %s\n, me-name, me-email); free(me-name); free(me-email); @@ -73,7 +74,8 @@ static void add_mapping(struct string_list *map, } if (old_name == NULL) { - debug_mm(mailmap: adding (simple) entry for %s at index %d\n, old_email, index); + debug_mm(mailmap: adding (simple) entry for %s at index %d\n, +old_email, index); /* Replace current name and new email for simple entry */ if (new_name) { free(me-name); @@ -85,7 +87,8 @@ static void add_mapping(struct string_list *map, } } else { struct mailmap_info *mi = xcalloc(1, sizeof(struct mailmap_info)); - debug_mm(mailmap: adding (complex) entry for %s at index %d\n, old_email, index); + debug_mm(mailmap: adding (complex) entry for %s at index %d\n, +old_email, index); if (new_name) mi-name = xstrdup(new_name); if (new_email) @@ -315,8 +318,10 @@ int map_user(struct string_list *map, if (item != NULL) { me = (struct mailmap_entry *)item-util; if (me-namemap.nr) { - /* The item has multiple items, so we'll look up on name too */ - /* If the name is not found, we choose the simple entry */ + /* +* The item has multiple items, so we'll look up on name too. +* If the name is not found, we choose the simple entry. +*/ struct string_list_item *subitem; subitem = lookup_prefix(me-namemap, *name, *namelen); if (subitem) -- 1.8.3.2-941-gda9c3c8 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
[PATCH 2/4] mailmap: do not downcase mailmap entries
The email addresses in the records read from the .mailmap file are downcased very early, and then used to match against e-mail addresses in the input. Because we do use case insensitive version of string list to manage these entries, there is no need to do this, and worse yet, downcasing the rewritten/canonical e-mail read from the .mailmap file loses information. Stop doing that, and also make the string list used to keep multiple names for an mailmap entry case insensitive (the code that uses the list, lookup_prefix(), expects a case insensitive match). Signed-off-by: Junio C Hamano gits...@pobox.com --- mailmap.c | 20 1 file changed, 8 insertions(+), 12 deletions(-) diff --git a/mailmap.c b/mailmap.c index 418081e..a7e92db 100644 --- a/mailmap.c +++ b/mailmap.c @@ -51,14 +51,6 @@ static void add_mapping(struct string_list *map, { struct mailmap_entry *me; int index; - char *p; - - if (old_email) - for (p = old_email; *p; p++) - *p = tolower(*p); - if (new_email) - for (p = new_email; *p; p++) - *p = tolower(*p); if (old_email == NULL) { old_email = new_email; @@ -68,13 +60,17 @@ static void add_mapping(struct string_list *map, if ((index = string_list_find_insert_index(map, old_email, 1)) 0) { /* mailmap entry exists, invert index value */ index = -1 - index; + me = (struct mailmap_entry *)map-items[index].util; } else { /* create mailmap entry */ - struct string_list_item *item = string_list_insert_at_index(map, index, old_email); - item-util = xcalloc(1, sizeof(struct mailmap_entry)); - ((struct mailmap_entry *)item-util)-namemap.strdup_strings = 1; + struct string_list_item *item; + + item = string_list_insert_at_index(map, index, old_email); + me = xcalloc(1, sizeof(struct mailmap_entry)); + me-namemap.strdup_strings = 1; + me-namemap.cmp = strcasecmp; + item-util = me; } - me = (struct mailmap_entry *)map-items[index].util; if (old_name == NULL) { debug_mm(mailmap: adding (simple) entry for %s at index %d\n, old_email, index); -- 1.8.3.2-941-gda9c3c8 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [PATCH 4/4] add a testcase for checking case insensitivity of mailmap
On Fri, Jul 12, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote: From: Stefan Beller stefanbel...@googlemail.com Signed-off-by: Stefan Beller stefanbel...@googlemail.com Signed-off-by: Junio C Hamano gits...@pobox.com --- t/t4203-mailmap.sh | 33 + 1 file changed, 33 insertions(+) diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index 842b754..07152e9 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh @@ -409,4 +409,37 @@ test_expect_success 'Blame output (complex mapping)' ' test_cmp expect actual.fuzz ' +test_expect_success 'cleanup after mailmap.blob tests' ' + rm -f .mailmap +' This test was copied from earlier in the file, but the description was not updated; it has nothing to do with mailmap.blob tests for which cleanup has already taken place. It's also misleading since no .mailmap file exists at this point. Instead, configuration key mailmap.file is set to internal_mailmap/.mailmap, so if you need to clean up anything, it would be that. (You're also recreating .mailmap from scratch directly in your test via echo foo .mailmap, so this test doesn't really add anything.) +cat expect \EOF + 2 A a...@example.org + 2 Other Author ot...@author.xx + 2 Santa Claus santa.cl...@northpole.xx + 1 A U Thor aut...@example.com + 1 CTO c...@company.xx + 1 Some Dude s...@dude.xx +EOF + +test_expect_success 'Test case sensitivity of Names' ' s/Names/names/ Also, most of the test descriptions in this script start with lowercase, so s/Test/test/ would be appropriate. + # do a commit: + echo asdf test1 Git practice normally avoids whitespace after the redirection operator. This particular test script, has a mix of foo and foo, but we may want to avoid adding more of the latter form. + git add test1 + git commit -a --author=A a...@example.org -m add test1 + + # commit with same name, but different email + # (different capitalization does the trick already, + # but here I am going to use a different mail) Without context of the mailing list discussion, the reader does not know what trick is or precisely how this may have failed in the past. It's also not clear why the test is being performed with a different email address entirely rather than one which differs only by case (which is what the test description talks about). + echo asdf test2 Whitespace after redirection. + git add test2 + git commit -a --author=A changed_em...@example.org -m add test2 + + # Adding the line to the mailmap should make life easy, so we know + # it is the same person The comment can probably be dropped, as the new .mailmap entry is self-explanatory. + echo A a...@example.org changed_em...@example.org .mailmap Whitespace after redirection. + + git shortlog -sne HEAD actual test_cmp expect actual For consistency with more other tests, perhaps break the line apart: git shortlog -sne HEAD actual test_cmp expect actual +' + test_done -- 1.8.3.2-941-gda9c3c8 -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: [RFC/PATCH] fetch: make --prune configurable
Here is my previous review comments in a squashable patch form. The result seems to pass all 27 combinations (fetch.prune, remote.*.prune and command line all are tristate yes/no/unspecified). Without the fix-up in *.c files, three combinations seem to fail. Documentation/config.txt | 3 +- builtin/fetch.c | 41 +--- remote.c | 1 + t/t5510-fetch.sh | 118 --- 4 files changed, 108 insertions(+), 55 deletions(-) diff --git a/Documentation/config.txt b/Documentation/config.txt index fc39f3a..e4ce7c4 100644 --- a/Documentation/config.txt +++ b/Documentation/config.txt @@ -1051,7 +1051,7 @@ fetch.unpackLimit:: fetch.prune:: If true, fetch will automatically behave as if the `--prune` - option was given on the command line. + option was given on the command line. See also `remote.name.prune`. format.attach:: Enable multipart/mixed attachments as the default for @@ -1992,6 +1992,7 @@ remote.name.prune:: When set to true, fetching from this remote by default will also remove any remote-tracking branches which no longer exist on the remote (as if the `--prune` option was give on the command line). + Overrides `fetch.prune` settings, if any. remotes.group:: The list of remotes which are fetched by git remote update diff --git a/builtin/fetch.c b/builtin/fetch.c index 082450b..08ab948 100644 --- a/builtin/fetch.c +++ b/builtin/fetch.c @@ -30,13 +30,10 @@ enum { TAGS_SET = 2 }; -enum { - PRUNE_UNSET = 0, - PRUNE_DEFAULT = 1, - PRUNE_FORCE = 2 -}; +static int fetch_prune_config = -1; /* unspecified */ +static int prune = -1; /* unspecified */ +#define PRUNE_BY_DEFAULT 0 /* do we prune by default? */ -static int prune = PRUNE_DEFAULT; static int all, append, dry_run, force, keep, multiple, update_head_ok, verbosity; static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT; static int tags = TAGS_DEFAULT, unshallow; @@ -64,12 +61,10 @@ static int option_parse_recurse_submodules(const struct option *opt, static int git_fetch_config(const char *k, const char *v, void *cb) { if (!strcmp(k, fetch.prune)) { - int boolval = git_config_bool(k, v); - if (boolval) - prune = PRUNE_FORCE; + fetch_prune_config = git_config_bool(k, v); return 0; } - return git_default_config(k, v, cb); + return 0; } static struct option builtin_fetch_options[] = { @@ -87,8 +82,8 @@ static struct option builtin_fetch_options[] = { N_(fetch all tags and associated objects), TAGS_SET), OPT_SET_INT('n', NULL, tags, N_(do not fetch all tags (--no-tags)), TAGS_UNSET), - OPT_BOOLEAN('p', prune, prune, - N_(prune remote-tracking branches no longer on remote)), + OPT_BOOL('p', prune, prune, +N_(prune remote-tracking branches no longer on remote)), { OPTION_CALLBACK, 0, recurse-submodules, NULL, N_(on-demand), N_(control recursive fetching of submodules), PARSE_OPT_OPTARG, option_parse_recurse_submodules }, @@ -756,8 +751,11 @@ static int do_fetch(struct transport *transport, free_refs(ref_map); return 1; } - if (prune == PRUNE_FORCE || (transport-remote-prune prune)) { - /* If --tags was specified, pretend the user gave us the canonical tags refspec */ + if (prune) { + /* +* If --tags was specified, pretend that the user gave us +* the canonical tags refspec +*/ if (tags == TAGS_SET) { const char *tags_str = refs/tags/*:refs/tags/*; struct refspec *tags_refspec, *refspec; @@ -866,10 +864,8 @@ static void add_options_to_argv(struct argv_array *argv) { if (dry_run) argv_array_push(argv, --dry-run); - if (prune == PRUNE_FORCE) + if (prune 0) argv_array_push(argv, --prune); - else if (prune == PRUNE_UNSET) - argv_array_push(argv, --no-prune); if (update_head_ok) argv_array_push(argv, --update-head-ok); if (force) @@ -936,6 +932,17 @@ static int fetch_one(struct remote *remote, int argc, const char **argv) remote name from which new revisions should be fetched.)); transport = transport_get(remote, NULL); + + if (prune 0) { + /* no command line request */ + if (0 = transport-remote-prune) + prune = transport-remote-prune; + else if (0 = fetch_prune_config) + prune = fetch_prune_config; + else + prune = PRUNE_BY_DEFAULT; + } +
Re: [PATCH 4/4] add a testcase for checking case insensitivity of mailmap
On Fri, Jul 12, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote: From: Stefan Beller stefanbel...@googlemail.com Signed-off-by: Stefan Beller stefanbel...@googlemail.com Signed-off-by: Junio C Hamano gits...@pobox.com --- t/t4203-mailmap.sh | 33 + 1 file changed, 33 insertions(+) diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh index 842b754..07152e9 100755 --- a/t/t4203-mailmap.sh +++ b/t/t4203-mailmap.sh +test_expect_success 'Test case sensitivity of Names' ' + # do a commit: + echo asdf test1 + git add test1 + git commit -a --author=A a...@example.org -m add test1 + + # commit with same name, but different email + # (different capitalization does the trick already, + # but here I am going to use a different mail) + echo asdf test2 + git add test2 + git commit -a --author=A changed_em...@example.org -m add test2 + + # Adding the line to the mailmap should make life easy, so we know + # it is the same person + echo A a...@example.org changed_em...@example.org .mailmap + + git shortlog -sne HEAD actual test_cmp expect actual +' I forgot to mention that the -chain is broken (missing) in the entire test (except for the last line). -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
What's cooking in git.git (Jul 2013, #05; Fri, 12)
Here are the topics that have been cooking. Commits prefixed with '-' are only in 'pu' (proposed updates) while commits prefixed with '+' are in 'next'. As many of you may know, I hate to do back-to-back What's cooking, but we ended up acquiring many new topics, and many existing ones have moved from 'pu' to 'next' and 'next' to 'master'. Those that graduated to 'master' are mostly code and documentation clean-up patches. You can find the changes described here in the integration branches of the repositories listed at http://git-blame.blogspot.com/p/git-public-repositories.html -- [Graduated to master] * as/log-output-encoding-in-user-format (2013-07-05) 11 commits (merged to 'next' on 2013-07-08 at 2e1bdd9) + t4205 (log-pretty-formats): avoid using `sed` + t6006 (rev-list-format): add tests for %b and %s for the case i18n.commitEncoding is not set + t4205, t6006, t7102: make functions better readable + t4205 (log-pretty-formats): revert back single quotes (merged to 'next' on 2013-07-05 at d2c99e5) + t4041, t4205, t6006, t7102: use iso8859-1 rather than iso-8859-1 (merged to 'next' on 2013-07-01 at 3318aa8) + t4205: replace .\+ with ..* in sed commands (merged to 'next' on 2013-06-28 at 4063330) + pretty: --format output should honor logOutputEncoding + pretty: Add failing tests: --format output should honor logOutputEncoding + t4205 (log-pretty-formats): don't hardcode SHA-1 in expected outputs + t7102 (reset): don't hardcode SHA-1 in expected outputs + t6006 (rev-list-format): don't hardcode SHA-1 in expected outputs log --format= did not honor i18n.logoutputencoding configuration and this attempts to fix it. * ft/diff-rename-default-score-is-half (2013-07-05) 1 commit (merged to 'next' on 2013-07-09 at 6a6b57e) + diff-options: document default similarity index * jc/remote-http-argv-array (2013-07-09) 1 commit (merged to 'next' on 2013-07-11 at 7fbe8bd) + remote-http: use argv-array * jk/maint-config-multi-order (2013-07-07) 1 commit (merged to 'next' on 2013-07-09 at 0db1db9) + git-config(1): clarify precedence of multiple values * jk/pull-to-integrate (2013-07-08) 2 commits (merged to 'next' on 2013-07-09 at 2ecac24) + pull: change the description to integrate changes + push: avoid suggesting merging remote changes * ml/cygwin-does-not-have-fifo (2013-07-05) 1 commit (merged to 'next' on 2013-07-09 at 7d6849d) + test-lib.sh - cygwin does not have usable FIFOs * ms/remote-tracking-branches-in-doc (2013-07-03) 1 commit (merged to 'next' on 2013-07-09 at 411a8bd) + Change remote tracking to remote-tracking * rr/name-rev-stdin-doc (2013-07-07) 1 commit (merged to 'next' on 2013-07-09 at 7cfbff6) + name-rev doc: rewrite --stdin paragraph * rs/pickaxe-simplify (2013-07-07) 1 commit (merged to 'next' on 2013-07-11 at c5972f7) + diffcore-pickaxe: simplify has_changes and contains * tf/gitweb-extra-breadcrumbs (2013-07-04) 1 commit (merged to 'next' on 2013-07-09 at 525331b) + gitweb: allow extra breadcrumbs to prefix the trail An Gitweb installation that is a part of larger site can optionally show extra links that point at the levels higher than the Gitweb pages itself in the link hierarchy of pages. * tr/test-lint-no-export-assignment-in-shell (2013-07-08) 2 commits (merged to 'next' on 2013-07-09 at 6f10ea2) + test-lint: detect 'export FOO=bar' + t9902: fix 'test A == B' to use = operator -- [New Topics] * es/check-mailmap (2013-07-11) 2 commits - t4203: test check-mailmap command invocation - builtin: add git-check-mailmap command A new command to allow scripts to query the mailmap information. Expecting a reroll to lose the -z option. * jc/check-x-z (2013-07-11) 4 commits - check-attr -z: a single -z should apply to both input and output - check-ignore -z: a single -z should apply to both input and output - check-attr: the name of the character is NUL, not NULL - check-ignore: the name of the character is NUL, not NULL git check-ignore -z applied the NUL termination to both its input (with --stdin) and its output, but git check-attr -z ignored the option on the output side. This is potentially a backward incompatible fix. I am tempted to merge this to and keep it in 'next' for a while to see if anybody screams before deciding if we want to do anything to help existing users (there may be none). * jk/cat-file-batch-optim (2013-07-12) 8 commits - sha1_object_info_extended: pass object_info to helpers - sha1_object_info_extended: make type calculation optional - packed_object_info: make type lookup optional - packed_object_info: hoist delta type resolution to helper - sha1_loose_object_info: make type lookup optional - sha1_object_info_extended: rename status to type - cat-file: disable object/refname ambiguity check for batch mode - Merge branch 'nd/warn-ambiguous-object-name' into