Re: [PATCH v2 3/4] t4203: test mailmap functionality directly rather than indirectly

2013-07-12 Thread Eric Sunshine
On Fri, Jul 12, 2013 at 1:48 AM, Junio C Hamano gits...@pobox.com wrote:
 Jonathan Nieder jrnie...@gmail.com writes:

 My current thinking is no --- the patch has as a justification Now
 we can test these aspects of .mailmap handling directly with a
 low-level tool instead of using the tool most people will use, so do
 so, which sounds an awful lot like Reduce test coverage of commonly
 used tools, because we can.

 Yes, that was exactly my reaction that prompted my response.

Does any of my follow-up commentary result in a different reaction? If
not, I'll drop patches 3 4 in the re-roll.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/7] cat-file --batch-check performance improvements

2013-07-12 Thread Jeff King
In my earlier series introducing git cat-file --batch-check=format,
found here:

  http://thread.gmane.org/gmane.comp.version-control.git/229761/focus=230041

I spent a little time optimizing revindex generation, and measured by
requesting information on a single object from a large repository. This
series takes the next logical step: requesting a large number of objects
from a large repository.

There are two major optimizations here:

  1. Avoiding extra ref lookups due to the warning in 798c35f (get_sha1:
 warn about full or short object names that look like refs,
 2013-05-29).

  2. Avoiding extra work for delta type resolution when the user has not
 asked for %(objecttype).

I prepared the series on top of jk/in-pack-size-measurement, and
certainly optimization 2 is pointless without it (before that topic,
--batch-check always printed the type).

However, the first optimization affects regular --batch-check, and
represents a much more serious performance regression. It looks like
798c35f is in master, but hasn't been released yet, so assuming these
topics graduate before the next release, it should be OK. But if not, we
should consider pulling the first patch out and applying it (or
something like it) separately.

The results for running (in linux.git):

  $ git rev-list --objects --all objects
  $ git cat-file --batch-check='%(objectsize:disk)' objects /dev/null

are:

before after
  real  1m17.143s  0m7.205s
  user  0m27.684s  0m6.580s
  sys   0m49.320s  0m0.608s

Now, _most_ of that speedup is coming from the first patch, and it's
quite trivial. The rest of the patches involve a lot of refactoring, and
only manage to eke out one more second of performance, so it may not be
worth it (though I think the result actually cleans up the
sha1_object_info_extended interface a bit, and is worth it). Individual
timings are in the commit messages.

The patches are:

  [1/7]: cat-file: disable object/refname ambiguity check for batch mode

Optimization 1.

  [2/7]: sha1_object_info_extended: rename status to type
  [3/7]: sha1_loose_object_info: make type lookup optional
  [4/7]: packed_object_info: hoist delta type resolution to helper
  [5/7]: packed_object_info: make type lookup optional
  [6/7]: sha1_object_info_extended: make type calculation optional

Optimization 2.

  [7/7]: sha1_object_info_extended: pass object_info to helpers

Optional cleanup.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] check-attr -z: a single -z should apply to both input and output

2013-07-12 Thread Junio C Hamano
Unless a command has separate --nul-terminated-{input,output}
options, the --nul-terminated-records (-z) option should apply
to both input and output for consistency.  The caller knows that its
input paths may need to be protected for LF, and the program shows
these problematic paths to its output.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 Documentation/git-check-attr.txt |  9 +++--
 builtin/check-attr.c | 14 +++---
 2 files changed, 18 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-check-attr.txt b/Documentation/git-check-attr.txt
index 5abdbaa..760aca9 100644
--- a/Documentation/git-check-attr.txt
+++ b/Documentation/git-check-attr.txt
@@ -31,8 +31,9 @@ OPTIONS
Read file names from stdin instead of from the command-line.
 
 -z::
-   Only meaningful with `--stdin`; paths are separated with a
-   NUL character instead of a linefeed character.
+   The output format is modified to be machine-parseable.
+   If `--stdin` is also given, input paths are separated
+   with a NUL character instead of a linefeed character.
 
 \--::
Interpret all preceding arguments as attributes and all following
@@ -48,6 +49,10 @@ OUTPUT
 The output is of the form:
 path COLON SP attribute COLON SP info LF
 
+unless `-z` is in effect, in which case NUL is used as delimiter:
+path NUL attribute NUL info NUL
+
+
 path is the path of a file being queried, attribute is an attribute
 being queried and info can be either:
 
diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index 7cc9b5d..cd46690 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -20,7 +20,7 @@ static const struct option check_attr_options[] = {
OPT_BOOLEAN(0,  cached, cached_attrs, N_(use .gitattributes only 
from the index)),
OPT_BOOLEAN(0 , stdin, stdin_paths, N_(read file names from 
stdin)),
OPT_BOOLEAN('z', NULL, nul_term_line,
-   N_(input paths are terminated by a NUL character)),
+   N_(terminate input and output records by a NUL 
character)),
OPT_END()
 };
 
@@ -38,8 +38,16 @@ static void output_attr(int cnt, struct git_attr_check 
*check,
else if (ATTR_UNSET(value))
value = unspecified;
 
-   quote_c_style(file, NULL, stdout, 0);
-   printf(: %s: %s\n, git_attr_name(check[j].attr), value);
+   if (nul_term_line) {
+   printf(%s%c /* path */
+  %s%c /* attrname */
+  %s%c /* attrvalue */,
+  file, 0, git_attr_name(check[j].attr), 0, value, 
0);
+   } else {
+   quote_c_style(file, NULL, stdout, 0);
+   printf(: %s: %s\n, git_attr_name(check[j].attr), 
value);
+   }
+
}
 }
 
-- 
1.8.3.2-911-g2c4daa5

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] check-attr: the name of the character is NUL, not NULL

2013-07-12 Thread Junio C Hamano
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/check-attr.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/builtin/check-attr.c b/builtin/check-attr.c
index 075d01d..7cc9b5d 100644
--- a/builtin/check-attr.c
+++ b/builtin/check-attr.c
@@ -13,14 +13,14 @@ N_(git check-attr --stdin [-z] [-a | --all | attr...]  
list-of-paths),
 NULL
 };
 
-static int null_term_line;
+static int nul_term_line;
 
 static const struct option check_attr_options[] = {
OPT_BOOLEAN('a', all, all_attrs, N_(report all attributes set on 
file)),
OPT_BOOLEAN(0,  cached, cached_attrs, N_(use .gitattributes only 
from the index)),
OPT_BOOLEAN(0 , stdin, stdin_paths, N_(read file names from 
stdin)),
-   OPT_BOOLEAN('z', NULL, null_term_line,
-   N_(input paths are terminated by a null character)),
+   OPT_BOOLEAN('z', NULL, nul_term_line,
+   N_(input paths are terminated by a NUL character)),
OPT_END()
 };
 
@@ -65,7 +65,7 @@ static void check_attr_stdin_paths(const char *prefix, int 
cnt,
struct git_attr_check *check)
 {
struct strbuf buf, nbuf;
-   int line_termination = null_term_line ? 0 : '\n';
+   int line_termination = nul_term_line ? 0 : '\n';
 
strbuf_init(buf, 0);
strbuf_init(nbuf, 0);
-- 
1.8.3.2-911-g2c4daa5

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Make check-{attr,ignore} -z consistent

2013-07-12 Thread Junio C Hamano
A command that has to deal with input/output that may contain LF
needs to offer the -z (--nul-terminated-records) option, and if it
does not support separate --nul-terminated-{input,output} options,
the -z option should govern both input and output.  A caller that
uses -z knows that the paths it feeds to these commands as input
may have LF that cannot be expressed in LF delimited input format,
and the output from these commands do contain the same paths, so
there is no way for their output to be expressed unambiguously for
an input that requires -z.

Unfortunately, git check-attr -z was broken and ignored the option
on the output side.  This is a backward-incompatible fix, so we may
need to add a checkAttr.brokenZ configuration to allow people to
keep the existing breakage on top of these fixes, and then flip the
default at Git 2.0 boundary (sometime early next year).

Credit goes to Eric Sunshine for finding this discrepancy
($gmane/230158).

Junio C Hamano (4):
  check-ignore: the name of the character is NUL, not NULL
  check-attr: the name of the character is NUL, not NULL
  check-ignore -z: a single -z should apply to both input and output
  check-attr -z: a single -z should apply to both input and output

 Documentation/git-check-attr.txt |  9 +++--
 builtin/check-attr.c | 20 ++--
 builtin/check-ignore.c   | 12 ++--
 3 files changed, 27 insertions(+), 14 deletions(-)

-- 
1.8.3.2-911-g2c4daa5

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] check-ignore: the name of the character is NUL, not NULL

2013-07-12 Thread Junio C Hamano
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/check-ignore.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index 0240f99..be22bce 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -12,7 +12,7 @@ static const char * const check_ignore_usage[] = {
 NULL
 };
 
-static int null_term_line;
+static int nul_term_line;
 
 static const struct option check_ignore_options[] = {
OPT__QUIET(quiet, N_(suppress progress reporting)),
@@ -20,8 +20,8 @@ static const struct option check_ignore_options[] = {
OPT_GROUP(),
OPT_BOOLEAN(0, stdin, stdin_paths,
N_(read file names from stdin)),
-   OPT_BOOLEAN('z', NULL, null_term_line,
-   N_(input paths are terminated by a null character)),
+   OPT_BOOLEAN('z', NULL, nul_term_line,
+   N_(input paths are terminated by a NUL character)),
OPT_END()
 };
 
@@ -29,7 +29,7 @@ static void output_exclude(const char *path, struct exclude 
*exclude)
 {
char *bang  = exclude-flags  EXC_FLAG_NEGATIVE  ? ! : ;
char *slash = exclude-flags  EXC_FLAG_MUSTBEDIR ? / : ;
-   if (!null_term_line) {
+   if (!nul_term_line) {
if (!verbose) {
write_name_quoted(path, stdout, '\n');
} else {
@@ -111,7 +111,7 @@ static int check_ignore_stdin_paths(const char *prefix)
struct strbuf buf, nbuf;
char **pathspec = NULL;
size_t nr = 0, alloc = 0;
-   int line_termination = null_term_line ? 0 : '\n';
+   int line_termination = nul_term_line ? 0 : '\n';
int num_ignored;
 
strbuf_init(buf, 0);
@@ -150,7 +150,7 @@ int cmd_check_ignore(int argc, const char **argv, const 
char *prefix)
if (argc  0)
die(_(cannot specify pathnames with --stdin));
} else {
-   if (null_term_line)
+   if (nul_term_line)
die(_(-z only makes sense with --stdin));
if (argc == 0)
die(_(no path specified));
-- 
1.8.3.2-911-g2c4daa5

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] check-ignore -z: a single -z should apply to both input and output

2013-07-12 Thread Junio C Hamano
Unless a command has separate --nul-terminated-{input,output}
options, the --nul-terminated-records (-z) option should apply
to both input and output for consistency.  The caller knows that its
input paths may need to be protected for LF, and the program shows
these problematic paths to its output.

The code already did the right thing.  Only the help text needs
fixing.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/check-ignore.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/check-ignore.c b/builtin/check-ignore.c
index be22bce..03e509e 100644
--- a/builtin/check-ignore.c
+++ b/builtin/check-ignore.c
@@ -21,7 +21,7 @@ static const struct option check_ignore_options[] = {
OPT_BOOLEAN(0, stdin, stdin_paths,
N_(read file names from stdin)),
OPT_BOOLEAN('z', NULL, nul_term_line,
-   N_(input paths are terminated by a NUL character)),
+   N_(terminate input and output records by a NUL 
character)),
OPT_END()
 };
 
-- 
1.8.3.2-911-g2c4daa5

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode

2013-07-12 Thread Jeff King
A common use of cat-file --batch-check is to feed a list
of objects from rev-list --objects or a similar command.
In this instance, all of our input objects are 40-byte sha1
ids. However, cat-file has always allowed arbitrary revision
specifiers, and feeds the result to get_sha1().

Fortunately, get_sha1() recognizes a 40-byte sha1 before
doing any hard work trying to look up refs, meaning this
scenario should end up spending very little time converting
the input into an object sha1. However, since 798c35f
(get_sha1: warn about full or short object names that look
like refs, 2013-05-29), when we encounter this case, we
spend the extra effort to do a refname lookup anyway, just
to print a warning. This is further exacerbated by ca91993
(get_packed_ref_cache: reload packed-refs file when it
changes, 2013-06-20), which makes individual ref lookup more
expensive by requiring a stat() of the packed-refs file for
each missing ref.

With no patches, this is the time it takes to run:

  $ git rev-list --objects --all objects
  $ time git cat-file --batch-check='%(objectname)' objects

on the linux.git repository:

  real1m13.494s
  user0m25.924s
  sys 0m47.532s

If we revert ca91993, the packed-refs up-to-date check, it
gets a little better:

  real0m54.697s
  user0m21.692s
  sys 0m32.916s

but we are still spending quite a bit of time on ref lookup
(and we would not want to revert that patch, anyway, which
has correctness issues).  If we revert 798c35f, disabling
the warning entirely, we get a much more reasonable time:

  real0m7.452s
  user0m6.836s
  sys 0m0.608s

This patch does the moral equivalent of this final case (and
gets similar speedups). We introduce a global flag that
callers of get_sha1() can use to avoid paying the price for
the warning.

Signed-off-by: Jeff King p...@peff.net
---
The solution feels a little hacky, but I'm not sure there is a better
one short of reverting the warning entirely.

We could also tie it to warn_ambiguous_refs (or add another config
variable), but I don't think that makes sense. It is not about do I
care about ambiguities in this repository, but rather I am going to
do a really large number of sha1 resolutions, and I do not want to pay
the price in this particular code path for the warning).

There may be other sites that resolve a large number of refs and run
into this, but I couldn't think of any. Doing for_each_ref would not
have the same problem, as we already know they are refs there.

 builtin/cat-file.c |  9 +
 cache.h|  1 +
 environment.c  |  1 +
 sha1_name.c| 14 --
 4 files changed, 19 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 0e64b41..fe5c77f 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -266,6 +266,15 @@ static int batch_objects(struct batch_options *opt)
strbuf_expand(buf, opt-format, expand_format, data);
data.mark_query = 0;
 
+   /*
+* We are going to call get_sha1 on a potentially very large number of
+* objects. In most large cases, these will be actual object sha1s. The
+* cost to double-check that each one is not also a ref (just so we can
+* warn) ends up dwarfing the actual cost of the object lookups
+* themselves. We can work around it by just turning off the warning.
+*/
+   warn_on_object_refname_ambiguity = 0;
+
while (strbuf_getline(buf, stdin, '\n') != EOF) {
char *p;
int error;
diff --git a/cache.h b/cache.h
index 2d06169..c1fd82c 100644
--- a/cache.h
+++ b/cache.h
@@ -575,6 +575,7 @@ extern int warn_ambiguous_refs;
 extern int prefer_symlink_refs;
 extern int log_all_ref_updates;
 extern int warn_ambiguous_refs;
+extern int warn_on_object_refname_ambiguity;
 extern int shared_repository;
 extern const char *apply_default_whitespace;
 extern const char *apply_default_ignorewhitespace;
diff --git a/environment.c b/environment.c
index 0cb67b2..5398c36 100644
--- a/environment.c
+++ b/environment.c
@@ -22,6 +22,7 @@ int warn_ambiguous_refs = 1;
 int is_bare_repository_cfg = -1; /* unspecified */
 int log_all_ref_updates = -1; /* unspecified */
 int warn_ambiguous_refs = 1;
+int warn_on_object_refname_ambiguity = 1;
 int repository_format_version;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
diff --git a/sha1_name.c b/sha1_name.c
index 90419ef..6f8812a 100644
--- a/sha1_name.c
+++ b/sha1_name.c
@@ -452,13 +452,15 @@ static int get_sha1_basic(const char *str, int len, 
unsigned char *sha1)
int at, reflog_len, nth_prior = 0;
 
if (len == 40  !get_sha1_hex(str, sha1)) {
-   refs_found = dwim_ref(str, len, tmp_sha1, real_ref);
-   if (refs_found  0  warn_ambiguous_refs) {
-   warning(warn_msg, len, str);
-   if (advice_object_name_warning)
-   fprintf(stderr, %s\n, 

[PATCH 2/7] sha1_object_info_extended: rename status to type

2013-07-12 Thread Jeff King
The value we get from each low-level object_info function
(e.g., loose, packed) is actually the object type (or -1 for
error). Let's explicitly call it type, which will make
further refactorings easier to read.

Signed-off-by: Jeff King p...@peff.net
---
 sha1_file.c | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 4c2365f..e826066 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2394,7 +2394,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
 {
struct cached_object *co;
struct pack_entry e;
-   int status, rtype;
+   int type, rtype;
 
co = find_cached_object(sha1);
if (co) {
@@ -2408,23 +2408,23 @@ int sha1_object_info_extended(const unsigned char 
*sha1, struct object_info *oi)
 
if (!find_pack_entry(sha1, e)) {
/* Most likely it's a loose object. */
-   status = sha1_loose_object_info(sha1, oi-sizep, 
oi-disk_sizep);
-   if (status = 0) {
+   type = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep);
+   if (type = 0) {
oi-whence = OI_LOOSE;
-   return status;
+   return type;
}
 
/* Not a loose object; someone else may have just packed it. */
reprepare_packed_git();
if (!find_pack_entry(sha1, e))
-   return status;
+   return type;
}
 
-   status = packed_object_info(e.p, e.offset, oi-sizep, rtype,
-   oi-disk_sizep);
-   if (status  0) {
+   type = packed_object_info(e.p, e.offset, oi-sizep, rtype,
+ oi-disk_sizep);
+   if (type  0) {
mark_bad_packed_object(e.p, sha1);
-   status = sha1_object_info_extended(sha1, oi);
+   type = sha1_object_info_extended(sha1, oi);
} else if (in_delta_base_cache(e.p, e.offset)) {
oi-whence = OI_DBCACHED;
} else {
@@ -2435,7 +2435,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
 rtype == OBJ_OFS_DELTA);
}
 
-   return status;
+   return type;
 }
 
 int sha1_object_info(const unsigned char *sha1, unsigned long *sizep)
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 1/4] builtin: add git-check-mailmap command

2013-07-12 Thread Eric Sunshine
On Fri, Jul 12, 2013 at 1:47 AM, Junio C Hamano gits...@pobox.com wrote:
 Eric Sunshine sunsh...@sunshineco.com writes:

 For each contact information (either in the form of ``Name
 user@host'' or ...)

 in order to clarify that the two forms of input is what you call
 contact information.

 Is this easier to read?

 For each ``Name $$user@host$$'' or ``$$user@host$$'' from the
 command-line or standard input (when using `--stdin`), print a line
 showing either the canonical name and email address (see Mapping
 Authors below), or the input ``Name $$user@host$$'' or
 ``$$user@host$$'' if there is no mapping for that person.

 I find it easier than your original, but I do not know if you would
 want to repeat the Name... or user@host at the end.  It does not
 seem to add much useful information and is distracting.

Next attempt:

For each ``Name $$user@host$$'' or ``$$user@host$$'' from the
command-line or standard input (when using `--stdin`) look up the
person's canonical name and email address (see Mapping Authors
below). If found, print them; otherwise print the input as-is.

 In check-attr, null_term_line indicates that _input_ lines are
 null-terminated. In check-ignore, null_term_lines is overloaded (and
 perhaps abused) to mean that both _input_ and _output_ lines are
 null-terminated.

 That is unfortunate but it is good that you found the breakage.  As
 we do not have --nul-terminated-input and --nul-terminated-output
 options separtely, -z should apply to both input and output.  What
 b4666852 (check-attr: Add --stdin option, 2008-10-07) did is broken.

I can make git-check-mailmap behave this way, however, other than
git-check-ignore (which is quite new), there doesn't seem to be any
precedence (that I can find) anywhere else in git which ties input and
output null-termination to a single switch. Is it desirable to do so
or should the user have more fine-grained control? (xargs -0 comes
to mind when thinking of a null-termination input switch.)

 Also git check-ignore -h advertises -z as only affecting --stdin,
 which is also wrong.  It does affect both input and output as it should,
 so it should be described as such, I think.

I also noticed this. (It was copied from check-attr.c).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 3/4] t4203: test mailmap functionality directly rather than indirectly

2013-07-12 Thread Junio C Hamano
Eric Sunshine sunsh...@sunshineco.com writes:

 On Fri, Jul 12, 2013 at 1:48 AM, Junio C Hamano gits...@pobox.com wrote:
 Jonathan Nieder jrnie...@gmail.com writes:

 My current thinking is no --- the patch has as a justification Now
 we can test these aspects of .mailmap handling directly with a
 low-level tool instead of using the tool most people will use, so do
 so, which sounds an awful lot like Reduce test coverage of commonly
 used tools, because we can.

 Yes, that was exactly my reaction that prompted my response.

 Does any of my follow-up commentary result in a different
 reaction?

Not really.  While I _do_ think direct testing has merits, I think
that should be done by adding direct tests, not by removing the
tests that are meant to protect higher level _users_ of the
underlying mechanism from breakages.  After all, their breakages may
not come from new breakages of the lower level mechanism (i.e. the
mailmap.c code) but the way these higher level code makes calls into
the mechanism, and direct test of the lower level mechanism will not
protect them from the latter kind of breakages.




--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/7] sha1_loose_object_info: make type lookup optional

2013-07-12 Thread Jeff King
Until recently, the only items to request from
sha1_object_info_extended were type and size. This meant
that we always had to open a loose object file to determine
one or the other.  But with the addition of the disk_size
query, it's possible that we can fulfill the query without
even opening the object file at all. However, since the
function interface always returns the type, we have no way
of knowing whether the caller cares about it or not.

This patch only modified sha1_loose_object_info to make type
lookup optional using an out-parameter, similar to the way
the size is handled (and the return value is 0 or -1 for
success or error, respectively).

There should be no functional change yet, though, as
sha1_object_info_extended, the only caller, will always ask
for a type.

Signed-off-by: Jeff King p...@peff.net
---
Obviously the end goal is to have sha1_object_info_extended do this
optionally, too (which happens in patch 6).

I'm not too happy about the stat_sha1_file function, which is almost
identical to open_sha1_file (and which in turn is almost the same thing
as has_loose_object). They all do:

  try operation X on sha1_file_name(sha1)
  prepare_alt_odb();
  foreach alt_odb
try operation X on alt_odb/sha1_file_name(sha1)

Unfortunately it's hard to do this kind of factoring out in C, because
the argument and return types for operation X are different in these
cases; you are stuck with providing callback function that takes a void
pointer to some operation-specific data. The boilerplate ends up worse
than the repeated code.

Another solution would be to have a find the file for loose object Y
function, and then just do operation X on that. But since X is a
filesystem operation in each case, you do not want to lose the atomicity
of performing the operation directly (not to mention incurring the cost
of an extra stat() on each file).

So I am open to clever refactoring suggestions.

 sha1_file.c | 48 +++-
 1 file changed, 43 insertions(+), 5 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index e826066..39e7313 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1306,6 +1306,26 @@ static int git_open_noatime(const char *name)
}
 }
 
+static int stat_sha1_file(const unsigned char *sha1, struct stat *st)
+{
+   char *name = sha1_file_name(sha1);
+   struct alternate_object_database *alt;
+
+   if (!lstat(name, st))
+   return 0;
+
+   prepare_alt_odb();
+   errno = ENOENT;
+   for (alt = alt_odb_list; alt; alt = alt-next) {
+   name = alt-name;
+   fill_sha1_path(name, sha1);
+   if (!lstat(alt-base, st))
+   return 0;
+   }
+
+   return -1;
+}
+
 static int open_sha1_file(const unsigned char *sha1)
 {
int fd;
@@ -2363,7 +2383,9 @@ struct packed_git *find_sha1_pack(const unsigned char 
*sha1,
 
 }
 
-static int sha1_loose_object_info(const unsigned char *sha1, unsigned long 
*sizep,
+static int sha1_loose_object_info(const unsigned char *sha1,
+ enum object_type *typep,
+ unsigned long *sizep,
  unsigned long *disk_sizep)
 {
int status;
@@ -2372,6 +2394,20 @@ static int sha1_loose_object_info(const unsigned char 
*sha1, unsigned long *size
git_zstream stream;
char hdr[32];
 
+   /*
+* If we don't care about type or size, then we don't
+* need to look inside the object at all.
+*/
+   if (!typep  !sizep) {
+   if (disk_sizep) {
+   struct stat st;
+   if (stat_sha1_file(sha1, st)  0)
+   return -1;
+   *disk_sizep = st.st_size;
+   }
+   return 0;
+   }
+
map = map_sha1_file(sha1, mapsize);
if (!map)
return -1;
@@ -2386,7 +2422,9 @@ static int sha1_loose_object_info(const unsigned char 
*sha1, unsigned long *size
*sizep = size;
git_inflate_end(stream);
munmap(map, mapsize);
-   return status;
+   if (typep)
+   *typep = status;
+   return 0;
 }
 
 /* returns enum object_type or negative */
@@ -2408,8 +2446,8 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
 
if (!find_pack_entry(sha1, e)) {
/* Most likely it's a loose object. */
-   type = sha1_loose_object_info(sha1, oi-sizep, oi-disk_sizep);
-   if (type = 0) {
+   if (!sha1_loose_object_info(sha1, type,
+   oi-sizep, oi-disk_sizep)) {
oi-whence = OI_LOOSE;
return type;
}
@@ -2417,7 +2455,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
/* Not a loose object; someone else may have 

[PATCH 5/7] packed_object_info: make type lookup optional

2013-07-12 Thread Jeff King
Currently, packed_object_info can save some work by not
calculating the size or disk_size of the object if the
caller is not interested. However, it always calculates the
true object type, whether the caller cares or not, and only
optionally returns the easy-to-get representation type.

Let's swap these types. The function will now return the
representation type (or OBJ_BAD on failure), and will only
optionally fill in the true type.

There should be no behavior change yet, as the only caller,
sha1_object_info_extended, will always feed it a type
pointer.

Signed-off-by: Jeff King p...@peff.net
---
 sha1_file.c | 25 -
 1 file changed, 16 insertions(+), 9 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 6f4aff3..2a1e230 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1783,7 +1783,7 @@ static int packed_object_info(struct packed_git *p, off_t 
obj_offset,
 }
 
 static int packed_object_info(struct packed_git *p, off_t obj_offset,
- unsigned long *sizep, int *rtype,
+ enum object_type *typep, unsigned long *sizep,
  unsigned long *disk_sizep)
 {
struct pack_window *w_curs = NULL;
@@ -1791,11 +1791,12 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
off_t curpos = obj_offset;
enum object_type type;
 
+   /*
+* We always get the representation type, but only convert it to
+* a real type later if the caller is interested.
+*/
type = unpack_object_header(p, w_curs, curpos, size);
 
-   if (rtype)
-   *rtype = type; /* representation type */
-
if (sizep) {
if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
off_t tmp_pos = curpos;
@@ -1820,7 +1821,13 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
*disk_sizep = revidx[1].offset - obj_offset;
}
 
-   type = packed_to_object_type(p, obj_offset, type, w_curs, curpos);
+   if (typep) {
+   *typep = packed_to_object_type(p, obj_offset, type, w_curs, 
curpos);
+   if (*typep  0) {
+   type = OBJ_BAD;
+   goto out;
+   }
+   }
 
 out:
unuse_pack(w_curs);
@@ -2471,11 +2478,11 @@ int sha1_object_info_extended(const unsigned char 
*sha1, struct object_info *oi)
return -1;
}
 
-   type = packed_object_info(e.p, e.offset, oi-sizep, rtype,
- oi-disk_sizep);
-   if (type  0) {
+   rtype = packed_object_info(e.p, e.offset, type, oi-sizep,
+  oi-disk_sizep);
+   if (rtype  0) {
mark_bad_packed_object(e.p, sha1);
-   type = sha1_object_info_extended(sha1, oi);
+   return sha1_object_info_extended(sha1, oi);
} else if (in_delta_base_cache(e.p, e.offset)) {
oi-whence = OI_DBCACHED;
} else {
-- 
1.8.3.rc3.24.gec82cb9

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/7] packed_object_info: hoist delta type resolution to helper

2013-07-12 Thread Jeff King
To calculate the type of a packed object, we must walk down
its delta chain until we hit a true base object with a real
type. Most of the code in packed_object_info is for handling
this case.

Let's hoist it out into a separate helper function, which
will make it easier to make the type-lookup optional in the
future (and keep our indentation level sane).

Signed-off-by: Jeff King p...@peff.net
---
Annoyingly, since the part we are moving comprises the bulk
of the function, the diff tends to show the opposite of what
actually happened: it looks like we renamed the function to its helper
name, and moved the other parts to a new function, which happens to have
the same name as the old one. So read the diff with care. :)

 sha1_file.c | 93 +++--
 1 file changed, 53 insertions(+), 40 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 39e7313..6f4aff3 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1713,52 +1713,21 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
return type;
 }
 
-
 #define POI_STACK_PREALLOC 64
 
-static int packed_object_info(struct packed_git *p, off_t obj_offset,
- unsigned long *sizep, int *rtype,
- unsigned long *disk_sizep)
+static enum object_type packed_to_object_type(struct packed_git *p,
+ off_t obj_offset,
+ enum object_type type,
+ struct pack_window **w_curs,
+ off_t curpos)
 {
-   struct pack_window *w_curs = NULL;
-   unsigned long size;
-   off_t curpos = obj_offset;
-   enum object_type type;
off_t small_poi_stack[POI_STACK_PREALLOC];
off_t *poi_stack = small_poi_stack;
int poi_stack_nr = 0, poi_stack_alloc = POI_STACK_PREALLOC;
 
-   type = unpack_object_header(p, w_curs, curpos, size);
-
-   if (rtype)
-   *rtype = type; /* representation type */
-
-   if (sizep) {
-   if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
-   off_t tmp_pos = curpos;
-   off_t base_offset = get_delta_base(p, w_curs, tmp_pos,
-  type, obj_offset);
-   if (!base_offset) {
-   type = OBJ_BAD;
-   goto out;
-   }
-   *sizep = get_size_from_delta(p, w_curs, tmp_pos);
-   if (*sizep == 0) {
-   type = OBJ_BAD;
-   goto out;
-   }
-   } else {
-   *sizep = size;
-   }
-   }
-
-   if (disk_sizep) {
-   struct revindex_entry *revidx = find_pack_revindex(p, 
obj_offset);
-   *disk_sizep = revidx[1].offset - obj_offset;
-   }
-
while (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
off_t base_offset;
+   unsigned long size;
/* Push the object we're going to leave behind */
if (poi_stack_nr = poi_stack_alloc  poi_stack == 
small_poi_stack) {
poi_stack_alloc = alloc_nr(poi_stack_nr);
@@ -1769,11 +1738,11 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
}
poi_stack[poi_stack_nr++] = obj_offset;
/* If parsing the base offset fails, just unwind */
-   base_offset = get_delta_base(p, w_curs, curpos, type, 
obj_offset);
+   base_offset = get_delta_base(p, w_curs, curpos, type, 
obj_offset);
if (!base_offset)
goto unwind;
curpos = obj_offset = base_offset;
-   type = unpack_object_header(p, w_curs, curpos, size);
+   type = unpack_object_header(p, w_curs, curpos, size);
if (type = OBJ_NONE) {
/* If getting the base itself fails, we first
 * retry the base, otherwise unwind */
@@ -1800,7 +1769,6 @@ out:
 out:
if (poi_stack != small_poi_stack)
free(poi_stack);
-   unuse_pack(w_curs);
return type;
 
 unwind:
@@ -1814,6 +1782,51 @@ unwind:
goto out;
 }
 
+static int packed_object_info(struct packed_git *p, off_t obj_offset,
+ unsigned long *sizep, int *rtype,
+ unsigned long *disk_sizep)
+{
+   struct pack_window *w_curs = NULL;
+   unsigned long size;
+   off_t curpos = obj_offset;
+   enum object_type type;
+
+   type = unpack_object_header(p, w_curs, curpos, size);
+
+   if (rtype)
+   *rtype = type; /* representation type */
+
+   if (sizep) {
+   if (type 

Re: [PATCH v2 1/4] builtin: add git-check-mailmap command

2013-07-12 Thread Junio C Hamano
Eric Sunshine sunsh...@sunshineco.com writes:

 I find it easier than your original, but I do not know if you would
 want to repeat the Name... or user@host at the end.  It does not
 seem to add much useful information and is distracting.

 Next attempt:

 For each ``Name $$user@host$$'' or ``$$user@host$$'' from the
 command-line or standard input (when using `--stdin`) look up the
 person's canonical name and email address (see Mapping Authors
 below). If found, print them; otherwise print the input as-is.

Nice.

 ... Is it desirable to do so
 or should the user have more fine-grained control? (xargs -0 comes
 to mind when thinking of a null-termination input switch.)

For the purposes of check-attr and check-ignore, a single -z
governing both is sufficient.  I think you already got that from my
4-patch series, but the core reason for that is :

 - when -z is used, the user knows the input paths may need
   protection against LF.

 - our output contains these same paths.

 - which means our output cannot be expressed unambiguously using LF
   as record separator.

For the purpose of check-mailmap, I actually do not see much point
in supporting -z format.  We do not even handle names or addresses
with LF in it.  The mailmap format would not let you express such
records in the first place, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 6/7] sha1_object_info_extended: make type calculation optional

2013-07-12 Thread Jeff King
Each caller of sha1_object_info_extended sets up an
object_info struct to tell the function which elements of
the object it wants to get. Until now, getting the type of
the object has always been required (and it is returned via
the return type rather than a pointer in object_info).

This can involve actually opening a loose object file to
determine its type, or following delta chains to determine a
packed file's base type. These effects produce a measurable
slow-down when doing a cat-file --batch-check that does
not include %(objecttype).

This patch adds a typep query to struct object_info, so
that it can be optionally queried just like size and
disk_size. As a result, the return type of the function is
no longer the object type, but rather 0/-1 for success/error.

As there are only three callers total, we just fix up each
caller rather than keep a compatibility wrapper:

  1. The simpler sha1_object_info wrapper continues to
 always ask for and return the type field.

  2. The istream_source function wants to know the type, and
 so always asks for it.

  3. The cat-file batch code asks for the type only when
 %(objecttype) is part of the format string.

On linux.git, the best-of-five for running:

  $ git rev-list --objects --all objects
  $ time git cat-file --batch-check='%(objectsize:disk)'

on a fully packed repository goes from:

  real0m8.680s
  user0m8.160s
  sys 0m0.512s

to:

  real0m7.205s
  user0m6.580s
  sys 0m0.608s

Signed-off-by: Jeff King p...@peff.net
---
This ends up changing the behavior of sha1_object_info_extended without
changing its function signature. Given that it is a fairly inactive area
of the code and that there are no topics in flight, I think this is OK.
But an alternative could be to add (yet another) wrapper to leave the
first two call-sites untouched.

 builtin/cat-file.c |  7 ---
 cache.h|  1 +
 sha1_file.c| 20 +---
 streaming.c|  2 +-
 4 files changed, 19 insertions(+), 11 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index fe5c77f..163ce6c 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -150,7 +150,9 @@ static void expand_atom(struct strbuf *sb, const char 
*atom, int len,
if (!data-mark_query)
strbuf_addstr(sb, sha1_to_hex(data-sha1));
} else if (is_atom(objecttype, atom, len)) {
-   if (!data-mark_query)
+   if (data-mark_query)
+   data-info.typep = data-type;
+   else
strbuf_addstr(sb, typename(data-type));
} else if (is_atom(objectsize, atom, len)) {
if (data-mark_query)
@@ -229,8 +231,7 @@ static int batch_one_object(const char *obj_name, struct 
batch_options *opt,
return 0;
}
 
-   data-type = sha1_object_info_extended(data-sha1, data-info);
-   if (data-type = 0) {
+   if (sha1_object_info_extended(data-sha1, data-info)  0) {
printf(%s missing\n, obj_name);
fflush(stdout);
return 0;
diff --git a/cache.h b/cache.h
index c1fd82c..d3b770c 100644
--- a/cache.h
+++ b/cache.h
@@ -1130,6 +1130,7 @@ struct object_info {
 
 struct object_info {
/* Request */
+   enum object_type *typep;
unsigned long *sizep;
unsigned long *disk_sizep;
 
diff --git a/sha1_file.c b/sha1_file.c
index 2a1e230..52f7a1e 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -2452,24 +2452,26 @@ int sha1_object_info_extended(const unsigned char 
*sha1, struct object_info *oi)
 {
struct cached_object *co;
struct pack_entry e;
-   int type, rtype;
+   int rtype;
 
co = find_cached_object(sha1);
if (co) {
+   if (oi-typep)
+   *(oi-typep) = co-type;
if (oi-sizep)
*(oi-sizep) = co-size;
if (oi-disk_sizep)
*(oi-disk_sizep) = 0;
oi-whence = OI_CACHED;
-   return co-type;
+   return 0;
}
 
if (!find_pack_entry(sha1, e)) {
/* Most likely it's a loose object. */
-   if (!sha1_loose_object_info(sha1, type,
+   if (!sha1_loose_object_info(sha1, oi-typep,
oi-sizep, oi-disk_sizep)) {
oi-whence = OI_LOOSE;
-   return type;
+   return 0;
}
 
/* Not a loose object; someone else may have just packed it. */
@@ -2478,7 +2480,7 @@ int sha1_object_info_extended(const unsigned char *sha1, 
struct object_info *oi)
return -1;
}
 
-   rtype = packed_object_info(e.p, e.offset, type, oi-sizep,
+   rtype = packed_object_info(e.p, e.offset, oi-typep, oi-sizep,
   oi-disk_sizep);
if (rtype  0) {

[PATCH 7/7] sha1_object_info_extended: pass object_info to helpers

2013-07-12 Thread Jeff King
We take in a struct object_info which contains pointers to
storage for items the caller cares about. But then rather
than pass the whole object to the low-level loose/packed
helper functions, we pass the individual pointers.

Let's pass the whole struct instead, which will make adding
more items later easier.

Signed-off-by: Jeff King p...@peff.net
---
This one is an optional cleanup. The diff is quite noisy due to all of
the s/foo/oi-foo/, so it is arguable whether the result is nicer or
not. It would make later additions to object_info nicer, but I do not
plan to add any more.

It _would_ have been a nice cleanup to do at the beginning of the series
(and further diffs would not have to add extra parameters to the
function calls), but that would make the incremental learn to do type
optionally patches quite awkward.

So I am on the fence over this one, and do not mind too much if it gets
dropped.

 sha1_file.c | 49 ++---
 1 file changed, 22 insertions(+), 27 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index 52f7a1e..563f521 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -1783,8 +1783,7 @@ static int packed_object_info(struct packed_git *p, off_t 
obj_offset,
 }
 
 static int packed_object_info(struct packed_git *p, off_t obj_offset,
- enum object_type *typep, unsigned long *sizep,
- unsigned long *disk_sizep)
+ struct object_info *oi)
 {
struct pack_window *w_curs = NULL;
unsigned long size;
@@ -1797,7 +1796,7 @@ static int packed_object_info(struct packed_git *p, off_t 
obj_offset,
 */
type = unpack_object_header(p, w_curs, curpos, size);
 
-   if (sizep) {
+   if (oi-sizep) {
if (type == OBJ_OFS_DELTA || type == OBJ_REF_DELTA) {
off_t tmp_pos = curpos;
off_t base_offset = get_delta_base(p, w_curs, tmp_pos,
@@ -1806,24 +1805,24 @@ static int packed_object_info(struct packed_git *p, 
off_t obj_offset,
type = OBJ_BAD;
goto out;
}
-   *sizep = get_size_from_delta(p, w_curs, tmp_pos);
-   if (*sizep == 0) {
+   *oi-sizep = get_size_from_delta(p, w_curs, tmp_pos);
+   if (*oi-sizep == 0) {
type = OBJ_BAD;
goto out;
}
} else {
-   *sizep = size;
+   *oi-sizep = size;
}
}
 
-   if (disk_sizep) {
+   if (oi-disk_sizep) {
struct revindex_entry *revidx = find_pack_revindex(p, 
obj_offset);
-   *disk_sizep = revidx[1].offset - obj_offset;
+   *oi-disk_sizep = revidx[1].offset - obj_offset;
}
 
-   if (typep) {
-   *typep = packed_to_object_type(p, obj_offset, type, w_curs, 
curpos);
-   if (*typep  0) {
+   if (oi-typep) {
+   *oi-typep = packed_to_object_type(p, obj_offset, type, 
w_curs, curpos);
+   if (*oi-typep  0) {
type = OBJ_BAD;
goto out;
}
@@ -2404,9 +2403,7 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
 }
 
 static int sha1_loose_object_info(const unsigned char *sha1,
- enum object_type *typep,
- unsigned long *sizep,
- unsigned long *disk_sizep)
+ struct object_info *oi)
 {
int status;
unsigned long mapsize, size;
@@ -2418,12 +2415,12 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
 * If we don't care about type or size, then we don't
 * need to look inside the object at all.
 */
-   if (!typep  !sizep) {
-   if (disk_sizep) {
+   if (!oi-typep  !oi-sizep) {
+   if (oi-disk_sizep) {
struct stat st;
if (stat_sha1_file(sha1, st)  0)
return -1;
-   *disk_sizep = st.st_size;
+   *oi-disk_sizep = st.st_size;
}
return 0;
}
@@ -2431,19 +2428,19 @@ static int sha1_loose_object_info(const unsigned char 
*sha1,
map = map_sha1_file(sha1, mapsize);
if (!map)
return -1;
-   if (disk_sizep)
-   *disk_sizep = mapsize;
+   if (oi-disk_sizep)
+   *oi-disk_sizep = mapsize;
if (unpack_sha1_header(stream, map, mapsize, hdr, sizeof(hdr))  0)
status = error(unable to unpack %s header,
   sha1_to_hex(sha1));
else if ((status = parse_sha1_header(hdr, size))  0)

Re: [PATCH v2 1/4] builtin: add git-check-mailmap command

2013-07-12 Thread Eric Sunshine
On Fri, Jul 12, 2013 at 2:34 AM, Junio C Hamano gits...@pobox.com wrote:
 Eric Sunshine sunsh...@sunshineco.com writes:

 ... Is it desirable to do so
 or should the user have more fine-grained control? (xargs -0 comes
 to mind when thinking of a null-termination input switch.)

 For the purposes of check-attr and check-ignore, a single -z
 governing both is sufficient.  I think you already got that from my
 4-patch series, but the core reason for that is :

I'm reading it right now.

  - when -z is used, the user knows the input paths may need
protection against LF.

  - our output contains these same paths.

  - which means our output cannot be expressed unambiguously using LF
as record separator.

 For the purpose of check-mailmap, I actually do not see much point
 in supporting -z format.  We do not even handle names or addresses
 with LF in it.  The mailmap format would not let you express such
 records in the first place, no?

You're right. I had this exact argument in mind for why
null-termination was not needed on the input side of check-mailmap,
but for some reason I had a blind spot concerning the output side.
I'll drop this option in the next re-roll.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/4] Make check-{attr,ignore} -z consistent

2013-07-12 Thread Eric Sunshine
On Fri, Jul 12, 2013 at 2:18 AM, Junio C Hamano gits...@pobox.com wrote:
 A command that has to deal with input/output that may contain LF
 needs to offer the -z (--nul-terminated-records) option, and if it
 does not support separate --nul-terminated-{input,output} options,
 the -z option should govern both input and output.  A caller that
 uses -z knows that the paths it feeds to these commands as input
 may have LF that cannot be expressed in LF delimited input format,
 and the output from these commands do contain the same paths, so
 there is no way for their output to be expressed unambiguously for
 an input that requires -z.

FWIW, applying to the entire series:

Reviewed-by: Eric Sunshine sunsh...@sunshineco.com

 Unfortunately, git check-attr -z was broken and ignored the option
 on the output side.  This is a backward-incompatible fix, so we may
 need to add a checkAttr.brokenZ configuration to allow people to
 keep the existing breakage on top of these fixes, and then flip the
 default at Git 2.0 boundary (sometime early next year).

 Credit goes to Eric Sunshine for finding this discrepancy
 ($gmane/230158).

 Junio C Hamano (4):
   check-ignore: the name of the character is NUL, not NULL
   check-attr: the name of the character is NUL, not NULL
   check-ignore -z: a single -z should apply to both input and output
   check-attr -z: a single -z should apply to both input and output

  Documentation/git-check-attr.txt |  9 +++--
  builtin/check-attr.c | 20 ++--
  builtin/check-ignore.c   | 12 ++--
  3 files changed, 27 insertions(+), 14 deletions(-)

 --
 1.8.3.2-911-g2c4daa5
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC PATCH] During a shallow fetch, prevent sending over unneeded objects

2013-07-12 Thread Matthijs Kooijman
Hi Junio,

 [administrivia: you seem to have mail-followup-to that points at you
 and the list; is that really needed???]
I'm not subscribed to the list, so yes :-)

  This happens when a client issues a fetch with a depth bigger or equal
  to the number of commits the server is ahead of the client.
 
 Do you mean smaller (not bigger)?
Yes, I meant smaller (reworded this first sentence a few times and then messed
up :-)

  diff --git a/upload-pack.c b/upload-pack.c
  index 59f43d1..5885f33 100644
  --- a/upload-pack.c
  +++ b/upload-pack.c
  @@ -122,6 +122,14 @@ static int do_rev_list(int in, int out, void 
  *user_data)
  if (prepare_revision_walk(revs))
  die(revision walk setup failed);
  mark_edges_uninteresting(revs.commits, revs, show_edge);
  +   /* In case we create a new shallow root, make sure that all
  +* we don't send over objects that the client already has just
  +* because their have revisions are no longer reachable from
  +* the shallow root. */
  +   for (i = 0; i  have_obj.nr; i++) {
  +   struct commit *commit = (struct commit 
  *)have_obj.objects[i].item;
  +   mark_tree_uninteresting(commit-tree);
  +   }
 
 Hmph.
 
 In your discussion (including the comment), you talk about shallow
 root (I think that is the same as what we call shallow boundary),
I think so, yes. I mean to refer to the commits referenced in
.git/shallow, that have their parents hidden.

 but in this added block, there is nothing that checks CLIENT_SHALLOW
 or SHALLOW flags to special case that.

 Is it a good idea to unconditionally do this for all have
 revisions?
That's what I meant in my mail with applying the fix unconditionally -
there is probably some check needed (I discussed a few options in the
mail as well).

Note that this entire do_rev_list function is only called when there are
shallow revisions involved, so there is also a basic only when shallow
check in place.

 Also there is another loop that iterates over have revisions just
 above the precontext.  I wonder if this added code belongs in that
 loop.
I think we could add it there, yes. On the other hand, if we only want
to execute this code when there are shallow boundaries in the list of
revisions to send (as I suggested in my previous mail), then we can't
move this code up.

Gr.

Matthijs
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] git-clone.txt: remove the restriction on pushing from a shallow clone

2013-07-12 Thread Duy Nguyen
On Fri, Jul 12, 2013 at 12:54 PM, Junio C Hamano gits...@pobox.com wrote:
 Nguyễn Thái Ngọc Duy  pclo...@gmail.com writes:

 + number of limitations (you cannot clone or fetch from it, nor
 + push into it), but is adequate if you are only interested in
 + the recent history of a large project with a long history.

 Ahh, sorry for the noise.  You still say you cannot push _into_ it.

Yes, I actually removed that check in the first iteration of this
patch, reasoning that it can't cause any harm and it mostly works. But
mostly is not good enough. I will try remove it later if I can make
it always work by extending git protocol a bit (I think full clone
pushing to shallow one can always work. Shallow pushing to shallow,
probably no such guarantee)
--
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode

2013-07-12 Thread Michael Haggerty
On 07/12/2013 08:20 AM, Jeff King wrote:
 A common use of cat-file --batch-check is to feed a list
 of objects from rev-list --objects or a similar command.
 In this instance, all of our input objects are 40-byte sha1
 ids. However, cat-file has always allowed arbitrary revision
 specifiers, and feeds the result to get_sha1().
 
 Fortunately, get_sha1() recognizes a 40-byte sha1 before
 doing any hard work trying to look up refs, meaning this
 scenario should end up spending very little time converting
 the input into an object sha1. However, since 798c35f
 (get_sha1: warn about full or short object names that look
 like refs, 2013-05-29), when we encounter this case, we
 spend the extra effort to do a refname lookup anyway, just
 to print a warning. This is further exacerbated by ca91993
 (get_packed_ref_cache: reload packed-refs file when it
 changes, 2013-06-20), which makes individual ref lookup more
 expensive by requiring a stat() of the packed-refs file for
 each missing ref.
 
 With no patches, this is the time it takes to run:
 
   $ git rev-list --objects --all objects
   $ time git cat-file --batch-check='%(objectname)' objects
 
 on the linux.git repository:
 
   real1m13.494s
   user0m25.924s
   sys 0m47.532s
 
 If we revert ca91993, the packed-refs up-to-date check, it
 gets a little better:
 
   real0m54.697s
   user0m21.692s
   sys 0m32.916s
 
 but we are still spending quite a bit of time on ref lookup
 (and we would not want to revert that patch, anyway, which
 has correctness issues).  If we revert 798c35f, disabling
 the warning entirely, we get a much more reasonable time:
 
   real0m7.452s
   user0m6.836s
   sys 0m0.608s
 
 This patch does the moral equivalent of this final case (and
 gets similar speedups). We introduce a global flag that
 callers of get_sha1() can use to avoid paying the price for
 the warning.
 
 Signed-off-by: Jeff King p...@peff.net
 ---
 The solution feels a little hacky, but I'm not sure there is a better
 one short of reverting the warning entirely.
 
 We could also tie it to warn_ambiguous_refs (or add another config
 variable), but I don't think that makes sense. It is not about do I
 care about ambiguities in this repository, but rather I am going to
 do a really large number of sha1 resolutions, and I do not want to pay
 the price in this particular code path for the warning).
 
 There may be other sites that resolve a large number of refs and run
 into this, but I couldn't think of any. Doing for_each_ref would not
 have the same problem, as we already know they are refs there.

ISTM that callers of --batch-check would usually know whether they are
feeding it SHA-1s or arbitrary object specifiers.  (And in most cases
when there are a large number of them, they are probably all SHA-1s).

So instead of framing this change as skip object refname ambiguity
check (i.e., sacrifice some correctness for performance), perhaps it
would be less hacky to offer the following new feature:

To cat-file we could add an option like --sha1-only or --literal or
--no-dwim (... better names are failing me) which would skip *all*
dwimming of 40-character strings.  It would also assume that any shorter
strings are abbreviated SHA-1s and fail if they are not.  This would be
a nice feature by itself (these are object names, dammit, and don't try
to tell me differently!) and would have the additional small advantage
of speeding up lookups of abbreviated SHA-1s, which (regardless of your
patch) otherwise go through the whole DWIM process.

I understand that such an option alone would leave some old scripts
suffering the performance regression caused by the check, but in most
cases it would be easy to fix them.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/2] open() error checking

2013-07-12 Thread Thomas Rast
#1 is Dale's suggested change.  Dale, to include it we'd need your
Signed-off-by as per Documentation/SubmittingPatches.

#2 is a similar error-checking fix; I reviewed 'git grep \bopen\b'
and found one case where the return value was obviously not tested.
The corresponding Windows code path has the same problem, but I dare
not touch it; perhaps someone from the Windows side can look into it?

I originally had a four-patch series to open 0/1/2 from /dev/null, but
then I noticed that this was shot down in 2008:

  http://thread.gmane.org/gmane.comp.version-control.git/93605/focus=93896

Do you want to resurrect this?

The worst part about it is that because we don't have a stderr to rely
on, we can't simply die(stop playing mind games).


Dale R. Worley (1):
  git_mkstemps: correctly test return value of open()

Thomas Rast (1):
  run-command: dup_devnull(): guard against syscalls failing

 run-command.c | 5 -
 wrapper.c | 2 +-
 2 files changed, 5 insertions(+), 2 deletions(-)

-- 
1.8.3.2.998.g1d087bc

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] git_mkstemps: correctly test return value of open()

2013-07-12 Thread Thomas Rast
From: Dale R. Worley wor...@alum.mit.edu

open() returns -1 on failure, and indeed 0 is a possible success value
if the user closed stdin in our process.  Fix the test.

Signed-off-by: Thomas Rast tr...@inf.ethz.ch
---
 wrapper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/wrapper.c b/wrapper.c
index dd7ecbb..6a015de 100644
--- a/wrapper.c
+++ b/wrapper.c
@@ -322,7 +322,7 @@ int git_mkstemps_mode(char *pattern, int suffix_len, int 
mode)
template[5] = letters[v % num_letters]; v /= num_letters;
 
fd = open(pattern, O_CREAT | O_EXCL | O_RDWR, mode);
-   if (fd  0)
+   if (fd = 0)
return fd;
/*
 * Fatal error (EPERM, ENOSPC etc).
-- 
1.8.3.2.998.g1d087bc

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/2] run-command: dup_devnull(): guard against syscalls failing

2013-07-12 Thread Thomas Rast
dup_devnull() did not check the return values of open() and dup2().
Fix this omission.

Signed-off-by: Thomas Rast tr...@inf.ethz.ch
---
 run-command.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/run-command.c b/run-command.c
index aece872..1b7f88e 100644
--- a/run-command.c
+++ b/run-command.c
@@ -76,7 +76,10 @@ static inline void close_pair(int fd[2])
 static inline void dup_devnull(int to)
 {
int fd = open(/dev/null, O_RDWR);
-   dup2(fd, to);
+   if (fd  0)
+   die_errno(_(open /dev/null failed));
+   if (dup2(fd, to)  0)
+   die_errno(_(dup2(%d,%d) failed), fd, to);
close(fd);
 }
 #endif
-- 
1.8.3.2.998.g1d087bc

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode

2013-07-12 Thread Jeff King
On Fri, Jul 12, 2013 at 10:47:46AM +0200, Michael Haggerty wrote:

  The solution feels a little hacky, but I'm not sure there is a better
  one short of reverting the warning entirely.
  
  We could also tie it to warn_ambiguous_refs (or add another config
  variable), but I don't think that makes sense. It is not about do I
  care about ambiguities in this repository, but rather I am going to
  do a really large number of sha1 resolutions, and I do not want to pay
  the price in this particular code path for the warning).
  
  There may be other sites that resolve a large number of refs and run
  into this, but I couldn't think of any. Doing for_each_ref would not
  have the same problem, as we already know they are refs there.
 
 ISTM that callers of --batch-check would usually know whether they are
 feeding it SHA-1s or arbitrary object specifiers.  (And in most cases
 when there are a large number of them, they are probably all SHA-1s).

Thanks for bringing this up; I had meant to outline this option in the
cover letter, but forgot when posting.

If were designing cat-file from scratch, I'd consider having --batch
take just 40-hex sha1s in the first place. Since we can't do that now
for backwards compatibility, having an option is the best we can do
there.

But having to use such an option to avoid a 10x slowdown just strikes me
as ugly for two reasons:

  1. It's part of the user-visible interface, and it seems like an extra
 wtf? moment when somebody wonders why their script is painfully
 slow, and why they need a magic option to fix it. We're cluttering
 the interface forever.

  2. It's a sign that the refname check was put in for a specific
 performance profile (resolving one or a few refs), but didn't
 consider resolving a large number. I'm wondering what other
 performance regressions it's possible to trigger.

 So instead of framing this change as skip object refname ambiguity
 check (i.e., sacrifice some correctness for performance), perhaps it
 would be less hacky to offer the following new feature:

I wouldn't frame it as correctness for performance at all. The check
does not impact behavior at all, and is purely informational (to catch
quite a rare situation, too). I'd argue that the check simply isn't that
interesting in this case in the first place.

 To cat-file we could add an option like --sha1-only or --literal or
 --no-dwim (... better names are failing me) which would skip *all*
 dwimming of 40-character strings.  It would also assume that any shorter
 strings are abbreviated SHA-1s and fail if they are not.  This would be
 a nice feature by itself (these are object names, dammit, and don't try
 to tell me differently!) and would have the additional small advantage
 of speeding up lookups of abbreviated SHA-1s, which (regardless of your
 patch) otherwise go through the whole DWIM process.

I can see in theory that somebody might want that, but I am having a
hard time thinking of a practical use.

 I understand that such an option alone would leave some old scripts
 suffering the performance regression caused by the check, but in most
 cases it would be easy to fix them.

I'm less worried about old scripts, and more worried about carrying
around a confusing option forever, especially when I expect most
invocations to use it (perhaps my experience is biased, but I use
cat-file --batch quite a lot in scripting, and it has almost always been
on the output of rev-list).

IOW, it seems like a poor default, and we are choosing it only because
of backwards compatibility. I guess another option is to switch the
default with the usual deprecation dance.

Yet another option is to consider what the check is doing, and
accomplish the same thing in a different way. The real pain is that we
are individually trying to resolve each object by hitting the filesystem
(and doing lots of silly checks on the refname format, when we know it
must be valid).

We don't actually care in this case if the ref list is up to date (we
are not trying to update or read a ref, but only know if it exists, and
raciness is OK). IOW, could we replace the dwim_ref call for the warning
with something that directly queries the ref cache?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] config: add support for http.url.* settings

2013-07-12 Thread Jeff King
On Thu, Jul 11, 2013 at 02:50:03PM -0700, Kyle J. McKay wrote:

 The url value is considered a match to a url if the url value
 is either an exact match or a prefix of the url which ends on a
 path component boundary ('/').  So https://example.com/test; will
 match https://example.com/test; and https://example.com/test/too;
 but not https://example.com/testextra;.
 
 Longer matches take precedence over shorter matches with
 environment variable settings taking precedence over all.
 
 With this configuration:
 
 [http]
 useragent = other-agent
 noEPSV = true
 [http https://example.com;]
 useragent = example-agent
 sslVerify = false
 [http https://example.com/path;]
 useragent = path-agent

I like the general direction you are going, and especially the path
prefix matching is something I had always meant to implement for the
credential code.

A few comments on the implementation:

 +enum http_option_type {
 + opt_post_buffer = 0,
 + opt_min_sessions,
 +#ifdef USE_CURL_MULTI
 + opt_max_requests,
 +#endif
 + opt_ssl_verify,
 + opt_ssl_try,
 + opt_ssl_cert,
 +#if LIBCURL_VERSION_NUM = 0x070903
 + opt_ssl_key,
 +#endif
 +#if LIBCURL_VERSION_NUM = 0x070908
 + opt_ssl_capath,
 +#endif
 + opt_ssl_cainfo,
 + opt_low_speed,
 + opt_low_time,
 + opt_no_epsv,
 + opt_http_proxy,
 + opt_cookie_file,
 + opt_user_agent,
 + opt_passwd_req,
 + opt_max
 +};

We usually put enum fields in ALL_CAPS to mark them as constants (though
you can find few exceptions in the code).

 +static size_t http_options_url_match_prefix(const char *url,
 + const char *url_prefix,
 + size_t url_prefix_len)
 +{

It looks like you're matching the URLs as raw strings, and I don't see
any canonicalization going on. What happens if I have
https://example.com/foo+bar; in my config, but then I visit
https://example.comfoo%20bar;?

Or what about optional components? If I have https://example.com; in my
config, but I am visiting https://p...@example.com/;, shouldn't it
match? The config spec is more general than my specific URL; you would
not want it to match in the opposite direction, though.

I think you would want to break down the URL into fields, canonicalize
each field by undoing any encoding, and then compare the broken-down
URLs, as credential_match does (adding in your prefix/boundary matching to
the path instead of a straight string comparison).

I think you could mostly reuse the code there by doing:

  1. Create a struct url that contains the broken-down form; struct
 credential would contain a url.

  2. Pull out credential_from_url into int url_parse(struct url *,
 const char *).

  3. Pull out credential_match into int url_match(struct url *, struct
 url *).

  4. Parse and compare URLs from http.$URL.* during the config read
 (similar to credential_config_callback).

That does make your by-length ordering impossible, but I don't think you
can do it in the general case. If I have:

  [http http://p...@example.com;] foo = 1
  [http http://example.com:8080;] foo = 2

and I visit http://p...@example.com:8080;, which one is the winner? I
don't think there is an unambiguous definition. I'd suggest instead just
using the usual last one wins strategy that our config uses. It has
the unfortunate case that:

  [http http://example.com;]
 foo = 1
  [http]
 foo = 2

will always choose http.foo=2, but I don't think that is a big problem
in practice. You often only set one or the other, and in the cases where
you want to have a non-standard default and then override it with
another host-specific case, the last one wins rule makes it simple to
explain that you need to specify keys in most-general to most-specific
order.

  static int http_options(const char *var, const char *value, void *cb)
  {
 - if (!strcmp(http.sslverify, var)) {
 + const char *url = (const char *)cb;

No need to cast from void, this isn't C++. :)

The rest of the http_options() changes look like what I'd expect.

 @@ -344,7 +479,7 @@ void http_init(struct remote *remote, const char *url, 
 int proactive_auth)
  
   http_is_verbose = 0;
  
 - git_config(http_options, NULL);
 + git_config(http_options, (void *)url);

No need to cast again. :)

So this is the URL that we got on the command line. Which means that if
we get a redirect, we will not re-examine the options. I think that's
OK; we do not even _see_ the redirect, as it happens internally within
curl. The credential code has the same problem, but it is not a security
issue because curl clears the credentials on redirect.

However, it may be worth mentioning in the documentation that the config
options operate on the URL you give git, _not_ necessarily on the actual
URL you are hitting at any given moment (the gitcredentials(7) page
should probably make the same distinction).

-Peff
--
To unsubscribe 

Re: [PATCH v5 0/5] allow more sources for config values

2013-07-12 Thread Jeff King
On Thu, Jul 11, 2013 at 04:26:02PM -0700, Junio C Hamano wrote:

 Thanks.
 
 The differences since the last round I see are these.  And I think
 the series overall makes sense (I haven't look hard enough to pick
 any nits yet, though).

Both v4 and the interdiff look fine to me. I gave it one more cursory
read-through, and I don't see any problems. So:

Acked-by: Jeff King p...@peff.net

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 1/7] cat-file: disable object/refname ambiguity check for batch mode

2013-07-12 Thread Michael Haggerty
On 07/12/2013 11:22 AM, Jeff King wrote:
 Yet another option is to consider what the check is doing, and
 accomplish the same thing in a different way. The real pain is that we
 are individually trying to resolve each object by hitting the filesystem
 (and doing lots of silly checks on the refname format, when we know it
 must be valid).
 
 We don't actually care in this case if the ref list is up to date (we
 are not trying to update or read a ref, but only know if it exists, and
 raciness is OK). IOW, could we replace the dwim_ref call for the warning
 with something that directly queries the ref cache?

I think it would be quite practical to add an API something like

struct ref_snapshot *get_ref_snapshot(const char *prefix)
void release_ref_snapshot(struct ref_snapshot *)
int lookup_ref(struct ref_snapshot *, const char *refname,
   unsigned char *sha1, int *flags)

where prefix is the part of the refs tree that you want included in the
snapshot (e.g., refs/heads) and ref_snapshot is probably opaque
outside of the refs module.

Symbolic refs, which are currently not stored in the ref_cache, would
have to be added because otherwise we would have to do all of the
lookups anyway.

I think this would be a good step to take for many reasons, including
because it would be another useful step in the direction of ref
transactions.

But with particular respect to git cat-file, I see problems:

1. get_ref_snapshot() would have to read all loose and packed refs
within the specified subtree, because loose refs have to be read before
packed refs.  So the call would be expensive if there are a lot of loose
refs.  And DWIM wouldn't know in advance where the references might be,
so it would have to set prefix=.  If many refs are looked up, then it
would presumably be worth it.  But if only a couple of lookups are done
and there are a lot of loose refs, then using a cache would probably
slow things down.

The slowdown could be ameliorated by adding some more intelligence, for
example only populating the loose refs cache after a certain number of
lookups have already been done.

2. A git cat-file --batch process can be long-lived.  What guarantees
would users expect regarding its lookup results?  Currently, its ref
lookups reflect the state of the repo at the moment the commit
identifier is written into the pipe.  Using a cache like this would mean
that ref lookups would always reflect the snapshot taken at the start of
the git cat-file run, regardless of whether the script using it might
have added or modified some references since then.  I think this would
have to be considered a regression.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] t0008: avoid SIGPIPE race condition on fifo

2013-07-12 Thread Jeff King
On Thu, Jul 11, 2013 at 09:09:55AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  On Wed, Jul 10, 2013 at 12:36:40PM -0400, Brian Gernhardt wrote:
 
  The newest test in t0008 streaming support for --stdin, seems to
  hang sporadically on my MacBook Pro (running 10.8.4).  The hang seems
  to be related to running it in parallel with other tests, as I can
  only reliably cause it by running with prove  and -j 3.  However, once
  that has hung I am able to semi-reliably have it occur by running the
  test separately (with the test hung in the background, using separate
  trash directories via the --root option).
 
  I can't replicate the hang here (on Linux) doing:
 
for i in `seq 1 30`; do
./t0008-ignores.sh --root=/tmp/foo/$i 
done
 
 It seems to hang on me with bash, but not dash, here.

Thanks, I was able to replicate it with bash, and like Brian, I saw it
hanging in the second read. strace revealed that we were blocked on
open(out).

The patch below should fix it. I'm still not sure why the choice of
shell matters; it may simply be a timing fluke that bash is more likely
to hit for some reason.

-- 8 --
Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo

To test check-ignore's --stdin feature, we use two fifos to
send and receive data. We carefully keep a descriptor to its
input open so that it does not receive EOF between input
lines. However, we do not do the same for its output. That
means there is a potential race condition in which
check-ignore has opened the output pipe once (when we read
the first line), and then writes the second line before we
have re-opened the pipe.

In that case, check-ignore gets a SIGPIPE and dies. The
outer shell then tries to open the output fifo but blocks
indefinitely, because there is no writer.  We can fix it by
keeping a descriptor open through the whole procedure.

This should also help if check-ignore dies for any other
reason (we would already have opened the fifo and would
therefore not block, but just get EOF on read).

However, we are technically still susceptible to
check-ignore dying early, before we have opened the fifo.
This is an unlikely race and shouldn't generally happen in
practice, though, so we can hopefully ignore it.

Signed-off-by: Jeff King p...@peff.net
---
 t/t0008-ignores.sh | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
index a56db80..c29342d 100755
--- a/t/t0008-ignores.sh
+++ b/t/t0008-ignores.sh
@@ -697,13 +697,21 @@ test_expect_success PIPE 'streaming support for --stdin' '
# shell, and then echo to the fd. We make sure to close it at
# the end, so that the subprocess does get EOF and dies
# properly.
+   #
+   # Similarly, we must keep out open so that check-ignore does
+   # not ever get SIGPIPE trying to write to us. Not only would that
+   # produce incorrect results, but then there would be no writer on the
+   # other end of the pipe, and we would potentially block forever trying
+   # to open it.
exec 9in 
+   exec 8out 
test_when_finished exec 9- 
+   test_when_finished exec 8- 
echo 9 one 
-   read response out 
+   read response 8 
echo $response | grep ^\.gitignore:1:one one 
echo 9 two 
-   read response out 
+   read response 8 
echo $response | grep ^::two
 '
 
-- 
1.8.3.rc1.30.gff0fb75

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v4] config: add support for http.url.* settings

2013-07-12 Thread Kyle J. McKay
The url value is considered a match to a url if the url value
is either an exact match or a prefix of the url which ends on a
path component boundary ('/').  So https://example.com/test; will
match https://example.com/test; and https://example.com/test/too;
but not https://example.com/testextra;.

Longer matches take precedence over shorter matches with
environment variable settings taking precedence over all.

With this configuration:

[http]
useragent = other-agent
noEPSV = true
[http https://example.com;]
useragent = example-agent
sslVerify = false
[http https://example.com/path;]
useragent = path-agent

The https://other.example.com/; url will have useragent
other-agent and sslVerify will be on.

The https://example.com/; url will have useragent
example-agent and sslVerify will be off.

The https://example.com/path/sub; url will have useragent
path-agent and sslVerify will be off.

All three of the examples will have noEPSV enabled.

Signed-off-by: Kyle J. McKay mack...@gmail.com
Reviewed-by: Junio C Hamano gits...@pobox.com
---

The credentials configuration values already support url-specific
configuration items in the form credential.url.*.  This patch
adds similar support for http configuration values.

This version of the patch addresses the following feedback:

  * The url[url_prefix_len-1] test can now never succeed.  It is removed.

  * More text added to check_matched_len to clarify behavior when two
options have the same key.

 Documentation/config.txt |  11 
 http.c   | 168 ++-
 2 files changed, 162 insertions(+), 17 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index b4d4887..3731a3a 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1531,6 +1531,17 @@ http.useragent::
of common USER_AGENT strings (but not including those like git/1.7.1).
Can be overridden by the 'GIT_HTTP_USER_AGENT' environment variable.
 
+http.url.*::
+   Any of the http.* options above can be applied selectively to some urls.
+   For example http.https://example.com.useragent; would set the user
+   agent only for https connections to example.com.  The url value
+   matches a url if it is an exact match or a prefix of the url matching
+   at a '/' boundary.  Longer url matches take precedence over shorter
+   ones with the environment variable settings taking precedence over all.
+   Note that url must match the url exactly (other than possibly being a
+   prefix).  This means any user, password and/or port setting that appears
+   in a url must also appear in url to have a successful match.
+
 i18n.commitEncoding::
Character encoding the commit messages are stored in; Git itself
does not care per se, but this information is necessary e.g. when
diff --git a/http.c b/http.c
index 2d086ae..feca70a 100644
--- a/http.c
+++ b/http.c
@@ -30,6 +30,34 @@ static CURL *curl_default;
 
 char curl_errorstr[CURL_ERROR_SIZE];
 
+enum http_option_type {
+   opt_post_buffer = 0,
+   opt_min_sessions,
+#ifdef USE_CURL_MULTI
+   opt_max_requests,
+#endif
+   opt_ssl_verify,
+   opt_ssl_try,
+   opt_ssl_cert,
+#if LIBCURL_VERSION_NUM = 0x070903
+   opt_ssl_key,
+#endif
+#if LIBCURL_VERSION_NUM = 0x070908
+   opt_ssl_capath,
+#endif
+   opt_ssl_cainfo,
+   opt_low_speed,
+   opt_low_time,
+   opt_no_epsv,
+   opt_http_proxy,
+   opt_cookie_file,
+   opt_user_agent,
+   opt_passwd_req,
+   opt_max
+};
+
+static size_t http_option_max_matched_len[opt_max];
+
 static int curl_ssl_verify = -1;
 static int curl_ssl_try;
 static const char *ssl_cert;
@@ -141,34 +169,121 @@ static void process_curl_messages(void)
 }
 #endif
 
+static size_t http_options_url_match_prefix(const char *url,
+   const char *url_prefix,
+   size_t url_prefix_len)
+{
+   /*
+* url_prefix matches url if url_prefix is an exact match for url or it
+* is a prefix of url and the match ends on a path component boundary.
+* Both url and url_prefix are considered to have an implicit '/' on the
+* end for matching purposes if they do not already.
+*
+* url must be NUL terminated.  url_prefix_len is the length of
+* url_prefix which need not be NUL terminated.
+*
+* The return value is the length of the match in characters (excluding
+* any final '/') or 0 for no match.  Passing / as url_prefix will
+* always cause 0 to be returned.
+*/
+   size_t url_len;
+   if (url_prefix_len  url_prefix[url_prefix_len - 1] == '/')
+   url_prefix_len--;
+   if (!url_prefix_len || strncmp(url, url_prefix, url_prefix_len))
+   return 0;
+   url_len = strlen(url);
+   if ((url_len 

[PATCH] .mailmap: Map email addresses to names

2013-07-12 Thread Stefan Beller
People change email addresses quite often and sometimes
forget to add their entry to the mailmap file.
I have contacted lots of people, whose name occurs
multiple times in the short log having different
email addresses. The entries in the mailmap of
this patch are either confirmed by them or are trivial.
Trivial means different capitalisation of the domain
(@MIT.EDU and @mit.edu) or the domain was localhost,
(none) or @local.

Additionally to adding (name, email) mappings to the
.mailmap file, it has also been sorted alphabetically.
(which explains the removals, which are added
3 lines later on again)

While the most changes happen at the email addresses,
we also have a name change in here. Karl Hasselström
is now known as Karl Wiberg due to marriage. Congratulations!

To find out whom to contact I used the following small
script:
---
#!/bin/bash
git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d  
mailmapdoubles
while read line ; do
# remove leading whitespace
trimmed=$(echo $line | sed -e 's/^ *//g' -e 's/ *$//g')
echo git shortlog -sne | grep \$trimmed\
done  mailmapdoubles  mailmapdoubles2
sh mailmapdoubles2
rm mailmapdoubles
rm mailmapdoubles2
---
Also interesting for similar tasks are these snippets:

# Finding out duplicates by comparing email addresses:
git shortlog -sne |awk '{ print $NF }' |sort |uniq -d

# Finding out duplicates by comparing names:
git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d
---

Signed-off-by: Stefan Beller stefanbel...@googlemail.com
---
 .mailmap | 95 
 1 file changed, 71 insertions(+), 24 deletions(-)

diff --git a/.mailmap b/.mailmap
index 345cce6..1179767 100644
--- a/.mailmap
+++ b/.mailmap
@@ -5,99 +5,146 @@
 # same person appearing not to be so.
 #
 
-Alex Bennée kernel-hac...@bennee.com
+Alejandro R. Sedeño ased...@mit.edu ased...@mit.edu
 Alexander Gavrilov angavri...@gmail.com
+Alex Bennée kernel-hac...@bennee.com
+Alex Riesen raa.l...@gmail.com fo...@t-online.de
+Alex Riesen raa.l...@gmail.com raa@limbo.localdomain
+Alex Riesen raa.l...@gmail.com r...@steel.home
+Anders Kaseorg ande...@mit.edu ande...@ksplice.com
+Anders Kaseorg ande...@mit.edu ande...@mit.edu
 Aneesh Kumar K.V aneesh.ku...@gmail.com
+anonymous li...@horizon.com
+anonymous li...@horizon.net
+Brandon Casey draf...@gmail.com ca...@nrlssc.navy.mil
 Brian M. Carlson sand...@crustytoothpaste.ath.cx
 Cheng Renquan crq...@gmail.com
 Chris Shoemaker c.shoema...@cox.net
-Dan Johnson computerdr...@gmail.com
 Dana L. How dana...@gmail.com
 Dana L. How h...@deathvalley.cswitch.com
 Daniel Barkalow barka...@iabervon.org
+Dan Johnson computerdr...@gmail.com
 David D. Kilzer ddkil...@kilzer.net
 David Kågedal dav...@lysator.liu.se
+David Reiss dre...@facebook.com dreiss@dreiss-vmware.(none)
 David S. Miller da...@davemloft.net
 Deskin Miller desk...@umich.edu
 Dirk Süsserott newslet...@dirk.my1.cc
 Eric S. Raymond e...@thyrsus.com
 Erik Faye-Lund kusmab...@gmail.com kusmab...@googlemail.com
-Fredrik Kuivinen freku...@student.liu.se
+Florian Achleitner florian.achleitner.2.6...@gmail.com 
florian.achleitner2.6...@gmail.com
+Franck Bui-Huu vagabon@gmail.com fbui...@gmail.com
+Frank Lichtenheld fr...@lichtenheld.de dj...@debian.org
+Frank Lichtenheld fr...@lichtenheld.de flichtenh...@astaro.com
 Frédéric Heitzmann frederic.heitzm...@gmail.com
+Fredrik Kuivinen freku...@student.liu.se
+Han-Wen Nienhuys han...@google.com Han-Wen Nienhuys han...@xs4all.nl
 H. Merijn Brand h.m.br...@xs4all.nl H.Merijn Brand h.m.br...@xs4all.nl
-H. Peter Anvin h...@bonde.sc.orionmulti.com
-H. Peter Anvin h...@tazenda.sc.orionmulti.com
-H. Peter Anvin h...@trantor.hos.anvin.org
 Horst H. von Brand vonbr...@inf.utfsm.cl
+H. Peter Anvin h...@zytor.com h...@bonde.sc.orionmulti.com
+H. Peter Anvin h...@zytor.com h...@smyrno.hos.anvin.org
+H. Peter Anvin h...@zytor.com h...@tazenda.sc.orionmulti.com
+H. Peter Anvin h...@zytor.com h...@trantor.hos.anvin.org
 İsmail Dönmez ism...@pardus.org.tr
 Jakub Narębski jna...@gmail.com
-Jay Soffian jaysoffian+...@gmail.com
+Jay Soffian jaysoff...@gmail.com jaysoffian+...@gmail.com
+J. Bruce Fields bfie...@citi.umich.edu bfie...@fieldses.org
+J. Bruce Fields bfie...@citi.umich.edu bfie...@pig.linuxdev.us.dell.com
+J. Bruce Fields bfie...@citi.umich.edu bfie...@puzzle.fieldses.org
 Jeff King p...@peff.net p...@github.com
 Joachim Berdal Haga cjh...@fys.uio.no
+Johannes Schindelin johannes.schinde...@gmx.de johannes.schinde...@gmx.de
 Johannes Sixt j...@kdbg.org johannes.s...@telecom.at
-Johannes Sixt j...@kdbg.org j.s...@viscovery.net
 Johannes Sixt j...@kdbg.org j.s...@eudaptics.com
+Johannes Sixt j...@kdbg.org j.s...@viscovery.net
+Jonathan Nieder jrnie...@gmail.com jrnie...@uchicago.edu
 Jon Loeliger j...@freescale.com
 Jon Seymour j...@blackcubes.dyndns.org
-Jonathan Nieder jrnie...@uchicago.edu
 Junio C Hamano gits...@pobox.com 

[PATCH v2] Corrections to the mailmap file

2013-07-12 Thread Stefan Beller
By now I contacted more than half the people, who might
get some .mailmap entries. Not all of them have responded,
but maybe 2/3 of those, whom I contacted.

I used 2 branches to get this work done. One branch having
all the proposed patches, where each patch just changes one
name, so I can send it to that specific person for review.
The second branch would be slightly behind the first branch
and only have the patches of the confirmed .mailmap changes.
The following patch is a squashed version of the confirmed 
branch.

Whenever somebody confirmed their patch, I'd include it
into the confirmed branch and rebase the first branch on top
of it. That works fine, if there are no many commits
in between, so no merge conflicts occur.

Junio, therefore I'd ask to include the following patch as 
the 1/3 milestone in the mailmap completion, so the number of
my local patches floating around is reduced.

The 6 patches sent at 4th July are not required anymore,
but the following patch directly applies to your master branch.

Stefan Beller (1):
  .mailmap: Map email addresses to names

 .mailmap | 95 
 1 file changed, 71 insertions(+), 24 deletions(-)

-- 
1.8.3.2.636.g7943f03

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Cygwin, git and x: directory

2013-07-12 Thread Mikko Rapeli
(please Cc: me in replies, not subscribed to the lists)

Hi Cygwin and git developers,

Does following scenario show signs of bugs in Cygwin and/or git?

# setup git repo
$ cd /tmp
$ mkdir foo  cd foo
$ git init

# create x: directory
$ mkdir x:
$ ls
x:

# create Windows X: drive, cygwin utils can work with both unix and dos style
# path names
$ mkdir c:/temp/bar
$ subst x: c:/temp/bar
$ touch x:/file.txt
$ ls x:/
file.txt

# clean git tree from non-tracked files
$ git clean -d -x -f
Removing x:/

# observe results, git did rm -rf on the X drive instead of the local
# directory named x:
$ ls
x:
$ file x\:
x:: directory
$ ls x:/
ls: cannot access x:/: No such file or directory
$ ls c:/temp/bar
ls: cannot access c:/temp/bar: No such file or directory
$ subst
X:\: = C:\temp\bar

In real life CMake created C: file in a build tree -- which is also a bug
but a separate one -- which resulted in obviously catastrophic results.

-Mikko
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] config: add support for http.url.* settings

2013-07-12 Thread Kyle J. McKay

On Jul 12, 2013, at 02:59, Jeff King wrote:

On Thu, Jul 11, 2013 at 02:50:03PM -0700, Kyle J. McKay wrote:

A few comments on the implementation:


+enum http_option_type {
+   opt_post_buffer = 0,
+   opt_min_sessions,


We usually put enum fields in ALL_CAPS to mark them as constants  
(though

you can find few exceptions in the code).


OK.


+static size_t http_options_url_match_prefix(const char *url,
+   const char *url_prefix,
+   size_t url_prefix_len)
+{


It looks like you're matching the URLs as raw strings, and I don't see
any canonicalization going on. What happens if I have
https://example.com/foo+bar; in my config, but then I visit
https://example.comfoo%20bar;?


The documentation for http.url.* says:


+http.url.*::

[snip]
+   Note that url must match the url exactly (other than  
possibly being a
+   prefix).  This means any user, password and/or port setting  
that appears
+   in a url must also appear in url to have a successful  
match.

+


Or what about optional components? If I have https://example.com;  
in my

config, but I am visiting https://p...@example.com/;, shouldn't it
match? The config spec is more general than my specific URL; you would
not want it to match in the opposite direction, though.


They have to be included to match.  Any URL-encoding present in the  
URL given to git needs to be duplicated in the URL in the config file  
exactly or there will not be a match.


That does make your by-length ordering impossible, but I don't think  
you

can do it in the general case. If I have:

 [http http://p...@example.com;] foo = 1
 [http http://example.com:8080;] foo = 2

and I visit http://p...@example.com:8080;, which one is the winner?


If I were to spilt everything out, then I would only have the second  
one match.  The first config is on a different port, quite possibly an  
entirely different service.  It shouldn't match.  Consider if you were  
port forwarding with ssh, services from many different machines would  
all be on localhost on different ports.  The second one is on the same  
port and matches because it's slightly more general (missing the user  
name), but it's clearly talking to the same service.


As the patch stands now, neither of them would match since the  
documentation requires an exact match (except for possibly being a  
prefix).


I don't think it's necessary to split the URL apart though.  Whatever  
URL the user gave to git on the command line (at some point even if  
it's now stored as a remote setting in config) complete with URL- 
encoding, user names, port names, etc. is the same url, possibly  
shortened, that needs to be used for the http.url.option setting.


I think that's simple and very easy to explain and avoids user  
confusion and surprise while still allowing a default to be set for a  
site but easily overridden for a portion of that site without needing  
to worry about the order config files are processed or the order of  
the [http url] sections within them.


I don't think there is an unambiguous definition. I'd suggest  
instead just

using the usual last one wins strategy that our config uses. It has
the unfortunate case that:

 [http http://example.com;]
foo = 1
 [http]
foo = 2

will always choose http.foo=2, but I don't think that is a big problem
in practice.


I think that violates the principle of least surprise.  In this case:

[http https://cacert.org;]
  sslVerify = false
[http]
  sslVerify = true

the intention here is very clear -- for cacert.org only, sslVerify  
should be false.  To have the [http] setting step on the [http https://cacert.org 
] setting seems completely surprising and unexpected.


The last one wins is still in effect though for the same paths.  If  
you have:


# System config
[http https://cacert.org;]
  sslVerify = false

# Global config
[http https://cacert.org;]
  sslVerify = true

The setting from the Global config still wins.


You often only set one or the other, and in the cases where
you want to have a non-standard default and then override it with
another host-specific case, the last one wins rule makes it simple  
to

explain that you need to specify keys in most-general to most-specific
order.


Unless, of course, different config files are involved and that's not  
possible without duplicating entries into the later-processed config  
file.



static int http_options(const char *var, const char *value, void *cb)
{
-   if (!strcmp(http.sslverify, var)) {
+   const char *url = (const char *)cb;


No need to cast from void, this isn't C++. :)


Indeed.


The rest of the http_options() changes look like what I'd expect.

@@ -344,7 +479,7 @@ void http_init(struct remote *remote, const  
char *url, int proactive_auth)


http_is_verbose = 0;

-   git_config(http_options, NULL);
+   git_config(http_options, (void *)url);


No need to cast again. :)



Webmail actualización

2013-07-12 Thread Maruti Suzuki

Webmail actualización
20GB 23GB
 
Su buzón ha superado el límite de almacenamiento lo establecido por el  
administrador y usted no será capaz de recibir nuevos correos hasta  
que vuelva a validar. Para revalidar:  
https://docs.google.com/a/uc.cl/spreadsheet/viewform?formkey=dC1aMjNqcThwR3BLaVJWUy1fTlFDNmc6MQ

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Cygwin, git and x: directory

2013-07-12 Thread Mark Levedahl

On 07/12/2013 08:42 AM, Mikko Rapeli wrote:

(please Cc: me in replies, not subscribed to the lists)

Hi Cygwin and git developers,

Does following scenario show signs of bugs in Cygwin and/or git?

# setup git repo
$ cd /tmp
$ mkdir foo  cd foo
$ git init

# create x: directory
$ mkdir x:
$ ls
x:



I would report this on the Cygwin list, not here. Any use of dos file 
paths using a Cygwin tool is not recommended, officially only POSIX 
paths are supported. If cygwin's Cmake is generating dos style paths, 
that is a bug that needs reporting to the Cygwin list. Also, I'm not 
sure how the developers will react to mishandling a directory named x:, 
but the behaviour you see is a limitation of the Cygwin platform, not of 
git.


Mark
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug in .mailmap handling?

2013-07-12 Thread Stefan Beller
Hello,

you may have noticed I am currently trying to bring the 
mailmap file of git itself up to date. I noticed
some behavior, which I did not expect. Have a look yourself:

---
# prepare test environment:
mkdir testmailmap
cd testmailmap/
git init

# do a commit:
echo asdf  test1 
git add test1
git commit -a --author=A a...@example.org -m add test1

# commit with same name, but different email 
# (different capitalization does the trick already, 
# but here I am going to use a different mail)
echo asdf  test2
git add test2
git commit -a --author=A changed_em...@example.org -m add test2

# how do we know it's the same person?
git shortlog
A (2):
  add test1
  add test2

# reports as expected:
git shortlog -sne
  1  A a...@example.org
  1  A changed_em...@example.org
  
# Adding the line to the mailmap should make life easy, so we know
# it's the same person
echo A a...@example.org changed_em...@example.org  .mailmap

# Come on, I just wanted to have it reported as one person!
git shortlog -sne
 1  A a...@example.org
 1  A a...@example.org
 
# So let's try another line in the mailmap file, (small 'a')
echo A a...@example.org changed_em...@example.org  .mailmap

# We're not there yet?
git shortlog -sne
 1  A a...@example.org
 1  A a...@example.org

# Now let's write it rather explicit: 
# (essentially just write 2 lines into the mailmap file)
cat  EOF  .mailmap
A a...@example.org changed_em...@example.org
A a...@example.org a...@example.org
EOF
 
# works as expected now
git shortlog -sne
 2  A a...@example.org

# works as expected now as well
git shortlog  
A (2):
  add test1
  add test2

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] wt-status: use format function attribute for status_printf

2013-07-12 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Tue, Jul 09, 2013 at 10:26:04PM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 ...
  I'm torn on this one. It really does provide us with more compile-time
  safety checks, but it's annoying that -Wall -Werror will no longer
  work out of the box.
 
 Yeah, that is a show-stopper for me X-.

 You can fix it with -Wno-zero-format-length, so the hassle is not
 huge. But I am also inclined to just drop this one. We have lived
 without the extra safety for a long time, and list review does tend to
 catch such problems in practice.

I am tempted to actually merge the original one as-is without any of
the workaround, and just tell people to use -Wno-format-zero-length.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo

2013-07-12 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo

 To test check-ignore's --stdin feature, we use two fifos to
 send and receive data. We carefully keep a descriptor to its
 input open so that it does not receive EOF between input
 lines. However, we do not do the same for its output. That
 means there is a potential race condition in which
 check-ignore has opened the output pipe once (when we read
 the first line), and then writes the second line before we
 have re-opened the pipe.

 In that case, check-ignore gets a SIGPIPE and dies. The
 outer shell then tries to open the output fifo but blocks
 indefinitely, because there is no writer.  We can fix it by
 keeping a descriptor open through the whole procedure.

Ahh, figures.

I wish I were smart enough to figure that out immediately after
seeing the test that does funny things to in with 9.

Thanks.

 This should also help if check-ignore dies for any other
 reason (we would already have opened the fifo and would
 therefore not block, but just get EOF on read).

 However, we are technically still susceptible to
 check-ignore dying early, before we have opened the fifo.
 This is an unlikely race and shouldn't generally happen in
 practice, though, so we can hopefully ignore it.

 Signed-off-by: Jeff King p...@peff.net
 ---
  t/t0008-ignores.sh | 12 ++--
  1 file changed, 10 insertions(+), 2 deletions(-)

 diff --git a/t/t0008-ignores.sh b/t/t0008-ignores.sh
 index a56db80..c29342d 100755
 --- a/t/t0008-ignores.sh
 +++ b/t/t0008-ignores.sh
 @@ -697,13 +697,21 @@ test_expect_success PIPE 'streaming support for 
 --stdin' '
   # shell, and then echo to the fd. We make sure to close it at
   # the end, so that the subprocess does get EOF and dies
   # properly.
 + #
 + # Similarly, we must keep out open so that check-ignore does
 + # not ever get SIGPIPE trying to write to us. Not only would that
 + # produce incorrect results, but then there would be no writer on the
 + # other end of the pipe, and we would potentially block forever trying
 + # to open it.
   exec 9in 
 + exec 8out 
   test_when_finished exec 9- 
 + test_when_finished exec 8- 
   echo 9 one 
 - read response out 
 + read response 8 
   echo $response | grep ^\.gitignore:1:one one 
   echo 9 two 
 - read response out 
 + read response 8 
   echo $response | grep ^::two
  '
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] Corrections to the mailmap file

2013-07-12 Thread Junio C Hamano
Stefan Beller stefanbel...@googlemail.com writes:

 By now I contacted more than half the people, who might
 get some .mailmap entries. Not all of them have responded,
 but maybe 2/3 of those, whom I contacted.

 I used 2 branches to get this work done. One branch having
 all the proposed patches, where each patch just changes one
 name, so I can send it to that specific person for review.
 The second branch would be slightly behind the first branch
 and only have the patches of the confirmed .mailmap changes.
 The following patch is a squashed version of the confirmed 
 branch.

 Whenever somebody confirmed their patch, I'd include it
 into the confirmed branch and rebase the first branch on top
 of it. That works fine, if there are no many commits
 in between, so no merge conflicts occur.

 Junio, therefore I'd ask to include the following patch as 
 the 1/3 milestone in the mailmap completion, so the number of
 my local patches floating around is reduced.

 The 6 patches sent at 4th July are not required anymore,
 but the following patch directly applies to your master branch.

 Stefan Beller (1):
   .mailmap: Map email addresses to names

  .mailmap | 95 
 
  1 file changed, 71 insertions(+), 24 deletions(-)

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] .mailmap: Map email addresses to names

2013-07-12 Thread Junio C Hamano
Stefan Beller stefanbel...@googlemail.com writes:

 People change email addresses quite often and sometimes
 forget to add their entry to the mailmap file.
 I have contacted lots of people, whose name occurs
 multiple times in the short log having different
 email addresses. The entries in the mailmap of
 this patch are either confirmed by them or are trivial.
 Trivial means different capitalisation of the domain
 (@MIT.EDU and @mit.edu) or the domain was localhost,
 (none) or @local.

 Additionally to adding (name, email) mappings to the
 .mailmap file, it has also been sorted alphabetically.
 (which explains the removals, which are added
 3 lines later on again)

 While the most changes happen at the email addresses,
 we also have a name change in here. Karl Hasselström
 is now known as Karl Wiberg due to marriage. Congratulations!

 To find out whom to contact I used the following small
 script:
 ---
 #!/bin/bash
 git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d  
 mailmapdoubles
 while read line ; do
 # remove leading whitespace
 trimmed=$(echo $line | sed -e 's/^ *//g' -e 's/ *$//g')
 echo git shortlog -sne | grep \$trimmed\
 done  mailmapdoubles  mailmapdoubles2
 sh mailmapdoubles2
 rm mailmapdoubles
 rm mailmapdoubles2
 ---
 Also interesting for similar tasks are these snippets:

 # Finding out duplicates by comparing email addresses:
 git shortlog -sne |awk '{ print $NF }' |sort |uniq -d

 # Finding out duplicates by comparing names:
 git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d
 ---

 Signed-off-by: Stefan Beller stefanbel...@googlemail.com
 ---

Thanks, but please be careful about these three-dashes when sending
the next batch.  As you may have already guessed, Git cannot guess
reliably which one of the abouve four three-dash lines is the end of
the proposed log message, and cuts at the first one.

  .mailmap | 95 
 
  1 file changed, 71 insertions(+), 24 deletions(-)

 diff --git a/.mailmap b/.mailmap
 index 345cce6..1179767 100644
 --- a/.mailmap
 +++ b/.mailmap
 @@ -5,99 +5,146 @@
  # same person appearing not to be so.
  #
  
 -Alex Bennée kernel-hac...@bennee.com
 +Alejandro R. Sedeño ased...@mit.edu ased...@mit.edu
  Alexander Gavrilov angavri...@gmail.com
 +Alex Bennée kernel-hac...@bennee.com
 +Alex Riesen raa.l...@gmail.com fo...@t-online.de
 +Alex Riesen raa.l...@gmail.com raa@limbo.localdomain
 +Alex Riesen raa.l...@gmail.com r...@steel.home
 +Anders Kaseorg ande...@mit.edu ande...@ksplice.com
 +Anders Kaseorg ande...@mit.edu ande...@mit.edu
  Aneesh Kumar K.V aneesh.ku...@gmail.com
 +anonymous li...@horizon.com
 +anonymous li...@horizon.net
 +Brandon Casey draf...@gmail.com ca...@nrlssc.navy.mil
  Brian M. Carlson sand...@crustytoothpaste.ath.cx
  Cheng Renquan crq...@gmail.com
  Chris Shoemaker c.shoema...@cox.net
 -Dan Johnson computerdr...@gmail.com
  Dana L. How dana...@gmail.com
  Dana L. How h...@deathvalley.cswitch.com
  Daniel Barkalow barka...@iabervon.org
 +Dan Johnson computerdr...@gmail.com
  David D. Kilzer ddkil...@kilzer.net
  David Kågedal dav...@lysator.liu.se
 +David Reiss dre...@facebook.com dreiss@dreiss-vmware.(none)
  David S. Miller da...@davemloft.net
  Deskin Miller desk...@umich.edu
  Dirk Süsserott newslet...@dirk.my1.cc
  Eric S. Raymond e...@thyrsus.com
  Erik Faye-Lund kusmab...@gmail.com kusmab...@googlemail.com
 -Fredrik Kuivinen freku...@student.liu.se
 +Florian Achleitner florian.achleitner.2.6...@gmail.com 
 florian.achleitner2.6...@gmail.com
 +Franck Bui-Huu vagabon@gmail.com fbui...@gmail.com
 +Frank Lichtenheld fr...@lichtenheld.de dj...@debian.org
 +Frank Lichtenheld fr...@lichtenheld.de flichtenh...@astaro.com
  Frédéric Heitzmann frederic.heitzm...@gmail.com
 +Fredrik Kuivinen freku...@student.liu.se
 +Han-Wen Nienhuys han...@google.com Han-Wen Nienhuys han...@xs4all.nl
  H. Merijn Brand h.m.br...@xs4all.nl H.Merijn Brand h.m.br...@xs4all.nl
 -H. Peter Anvin h...@bonde.sc.orionmulti.com
 -H. Peter Anvin h...@tazenda.sc.orionmulti.com
 -H. Peter Anvin h...@trantor.hos.anvin.org
  Horst H. von Brand vonbr...@inf.utfsm.cl
 +H. Peter Anvin h...@zytor.com h...@bonde.sc.orionmulti.com
 +H. Peter Anvin h...@zytor.com h...@smyrno.hos.anvin.org
 +H. Peter Anvin h...@zytor.com h...@tazenda.sc.orionmulti.com
 +H. Peter Anvin h...@zytor.com h...@trantor.hos.anvin.org
  İsmail Dönmez ism...@pardus.org.tr
  Jakub Narębski jna...@gmail.com
 -Jay Soffian jaysoffian+...@gmail.com
 +Jay Soffian jaysoff...@gmail.com jaysoffian+...@gmail.com
 +J. Bruce Fields bfie...@citi.umich.edu bfie...@fieldses.org
 +J. Bruce Fields bfie...@citi.umich.edu bfie...@pig.linuxdev.us.dell.com
 +J. Bruce Fields bfie...@citi.umich.edu bfie...@puzzle.fieldses.org
  Jeff King p...@peff.net p...@github.com
  Joachim Berdal Haga cjh...@fys.uio.no
 +Johannes Schindelin johannes.schinde...@gmx.de johannes.schinde...@gmx.de
  

Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo

2013-07-12 Thread Brian Gernhardt

On Jul 12, 2013, at 6:35 AM, Jeff King p...@peff.net wrote:

 Subject: [PATCH] t0008: avoid SIGPIPE race condition on fifo

Was able to complete a prove run with this patch.  Many thanks.

~~ Brian
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] push: document --lockref

2013-07-12 Thread Johannes Sixt
Am 12.07.2013 00:14, schrieb Junio C Hamano:
 Johannes Sixt j...@kdbg.org writes:
 
 Again: Why not just define +refspec as the way to achieve this check?
 
 What justification do we have to break existing people's
 configuration that says something like:
 
   [remote ko]
   url = kernel.org:/pub/scm/git/git.git
 push = master
 push = next
 push = +pu
 push = maint
 
 by adding a _new_ requirement they may not be able to satisify?
 Notice that the above is a typical push only publishing point,
 where you do not need any remote tracking branches.

Why would it break? When you do not specify --lockref, there is no
change whatsoever.

To achieve any safety at all for these push-only refs, you have to be
very explicit by saying --lockref=pu:$myoldpu
--lockref=master:$myoldmaster etc, and what is wrong if in this case
--lockref semantics are applied, but only pu is allowed to be no-ff?

 I am not opposed if your proposal were to introduce a new syntax
 element that calls for this new feature, e.g.
 
   [remote ko]
   url = kernel.org:/pub/scm/git/git.git
 push = *pu
 fetch = +refs/heads/*:refs/remotes/ko/*
 
 but changing what + means to something new will simply not fly.

I still do not see why we need two different kinds of ways to spell the
same strong kind of override (--force and +refspec) under the presence
of --lockref, and why we need a third one (--allow-no-ff) to give a
weaker kind of override (that makes sense only when --lockref was given
in the first place).

-- Hannes

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] cat-file --batch-check performance improvements

2013-07-12 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 The results for running (in linux.git):

   $ git rev-list --objects --all objects
   $ git cat-file --batch-check='%(objectsize:disk)' objects /dev/null

I can see how these patches are very logical avenue to grab only
on-disk footprint for large number of objects, but among the type,
payload size and on-disk footprint, I find it highly narrow niche
that a real user or script is interested _only_ in on-disk footprint
without even worrying about the type of object.

 ... (though I think the result actually cleans up the
 sha1_object_info_extended interface a bit, and is worth it).

I tend to agree, especially eyeballing the result of 7/7.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 01/19] t2104: Don't fail for index versions other than [23]

2013-07-12 Thread Thomas Gummerer
t2104 currently checks for the exact index version 2 or 3,
depending if there is a skip-worktree flag or not. Other
index versions do not use extended flags and thus cannot
be tested for version changes.

Make this test update the index to version 2 at the beginning
of the test. Testing the skip-worktree flags for the default
index format is still covered by t7011 and t7012.

Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 t/t2104-update-index-skip-worktree.sh | 1 +
 1 file changed, 1 insertion(+)

diff --git a/t/t2104-update-index-skip-worktree.sh 
b/t/t2104-update-index-skip-worktree.sh
index 1d0879b..bd9644f 100755
--- a/t/t2104-update-index-skip-worktree.sh
+++ b/t/t2104-update-index-skip-worktree.sh
@@ -22,6 +22,7 @@ H sub/2
 EOF
 
 test_expect_success 'setup' '
+   git update-index --index-version=2 
mkdir sub 
touch ./1 ./2 sub/1 sub/2 
git add 1 2 sub/1 sub/2 
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 04/19] read-cache: Re-read index if index file changed

2013-07-12 Thread Thomas Gummerer
Add the possibility of re-reading the index file, if it changed
while reading.

The index file might change during the read, causing outdated
information to be displayed. We check if the index file changed
by using its stat data as heuristic.

Helped-by: Ramsay Jones ram...@ramsay1.demon.co.uk
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 read-cache.c | 91 +---
 1 file changed, 57 insertions(+), 34 deletions(-)

diff --git a/read-cache.c b/read-cache.c
index 1e7ffc2..3e3a0e2 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1275,11 +1275,31 @@ int read_index(struct index_state *istate)
return read_index_from(istate, get_index_file());
 }
 
+static int index_changed(struct stat *st_old, struct stat *st_new)
+{
+   if (st_old-st_mtime != st_new-st_mtime ||
+#if !defined (__CYGWIN__)
+   st_old-st_uid   != st_new-st_uid ||
+   st_old-st_gid   != st_new-st_gid ||
+   st_old-st_ino   != st_new-st_ino ||
+#endif
+#if USE_NSEC
+   ST_MTIME_NSEC(*st_old) != ST_MTIME_NSEC(*st_new) ||
+#endif
+#if USE_STDEV
+   st_old-st_dev != st_new-st_dev ||
+#endif
+   st_old-st_size != st_new-st_size)
+   return 1;
+
+   return 0;
+}
+
 /* remember to discard_cache() before reading a different cache! */
 int read_index_from(struct index_state *istate, const char *path)
 {
-   int fd;
-   struct stat st;
+   int fd, err, i;
+   struct stat st_old, st_new;
struct cache_version_header *hdr;
void *mmap;
size_t mmap_size;
@@ -1291,41 +1311,44 @@ int read_index_from(struct index_state *istate, const 
char *path)
errno = ENOENT;
istate-timestamp.sec = 0;
istate-timestamp.nsec = 0;
+   for (i = 0; i  50; i++) {
+   err = 0;
+   fd = open(path, O_RDONLY);
+   if (fd  0) {
+   if (errno == ENOENT)
+   return 0;
+   die_errno(index file open failed);
+   }
 
-   fd = open(path, O_RDONLY);
-   if (fd  0) {
-   if (errno == ENOENT)
-   return 0;
-   die_errno(index file open failed);
+   if (fstat(fd, st_old))
+   die_errno(cannot stat the open index);
+
+   errno = EINVAL;
+   mmap_size = xsize_t(st_old.st_size);
+   mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, 
MAP_PRIVATE, fd, 0);
+   close(fd);
+   if (mmap == MAP_FAILED)
+   die_errno(unable to map index file);
+
+   hdr = mmap;
+   if (verify_hdr_version(istate, hdr, mmap_size)  0)
+   err = 1;
+
+   if (istate-ops-verify_hdr(mmap, mmap_size)  0)
+   err = 1;
+
+   if (istate-ops-read_index(istate, mmap, mmap_size)  0)
+   err = 1;
+   istate-timestamp.sec = st_old.st_mtime;
+   istate-timestamp.nsec = ST_MTIME_NSEC(st_old);
+   if (lstat(path, st_new))
+   die_errno(cannot stat the open index);
+
+   munmap(mmap, mmap_size);
+   if (!index_changed(st_old, st_new)  !err)
+   return istate-cache_nr;
}
 
-   if (fstat(fd, st))
-   die_errno(cannot stat the open index);
-
-   errno = EINVAL;
-   mmap_size = xsize_t(st.st_size);
-   mmap = xmmap(NULL, mmap_size, PROT_READ | PROT_WRITE, MAP_PRIVATE, fd, 
0);
-   close(fd);
-   if (mmap == MAP_FAILED)
-   die_errno(unable to map index file);
-
-   hdr = mmap;
-   if (verify_hdr_version(istate, hdr, mmap_size)  0)
-   goto unmap;
-
-   if (istate-ops-verify_hdr(mmap, mmap_size)  0)
-   goto unmap;
-
-   if (istate-ops-read_index(istate, mmap, mmap_size)  0)
-   goto unmap;
-   istate-timestamp.sec = st.st_mtime;
-   istate-timestamp.nsec = ST_MTIME_NSEC(st);
-
-   munmap(mmap, mmap_size);
-   return istate-cache_nr;
-
-unmap:
-   munmap(mmap, mmap_size);
die(index file corrupt);
 }
 
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 03/19] read-cache: move index v2 specific functions to their own file

2013-07-12 Thread Thomas Gummerer
Move index version 2 specific functions to their own file. The non-index
specific functions will be in read-cache.c, while the index version 2
specific functions will be in read-cache-v2.c.

Helped-by: Nguyen Thai Ngoc Duy pclo...@gmail.com
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 Makefile |   2 +
 cache.h  |  16 +-
 read-cache-v2.c  | 556 +
 read-cache.c | 575 ---
 read-cache.h |  57 +
 test-index-version.c |   5 +
 6 files changed, 661 insertions(+), 550 deletions(-)
 create mode 100644 read-cache-v2.c
 create mode 100644 read-cache.h

diff --git a/Makefile b/Makefile
index 5a68fe5..73369ae 100644
--- a/Makefile
+++ b/Makefile
@@ -711,6 +711,7 @@ LIB_H += progress.h
 LIB_H += prompt.h
 LIB_H += quote.h
 LIB_H += reachable.h
+LIB_H += read-cache.h
 LIB_H += reflog-walk.h
 LIB_H += refs.h
 LIB_H += remote.h
@@ -854,6 +855,7 @@ LIB_OBJS += prompt.o
 LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
+LIB_OBJS += read-cache-v2.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += remote.o
diff --git a/cache.h b/cache.h
index 7af853b..5082b34 100644
--- a/cache.h
+++ b/cache.h
@@ -95,19 +95,8 @@ unsigned long git_deflate_bound(git_zstream *, unsigned 
long);
  */
 #define DEFAULT_GIT_PORT 9418
 
-/*
- * Basic data structures for the directory cache
- */
 
 #define CACHE_SIGNATURE 0x44495243 /* DIRC */
-struct cache_version_header {
-   unsigned int hdr_signature;
-   unsigned int hdr_version;
-};
-
-struct cache_header {
-   unsigned int hdr_entries;
-};
 
 #define INDEX_FORMAT_LB 2
 #define INDEX_FORMAT_UB 4
@@ -280,6 +269,7 @@ struct index_state {
 initialized : 1;
struct hash_table name_hash;
struct hash_table dir_hash;
+   struct index_ops *ops;
 };
 
 extern struct index_state the_index;
@@ -489,8 +479,8 @@ extern void *read_blob_data_from_index(struct index_state 
*, const char *, unsig
 #define CE_MATCH_RACY_IS_DIRTY 02
 /* do stat comparison even if CE_SKIP_WORKTREE is true */
 #define CE_MATCH_IGNORE_SKIP_WORKTREE  04
-extern int ie_match_stat(const struct index_state *, const struct cache_entry 
*, struct stat *, unsigned int);
-extern int ie_modified(const struct index_state *, const struct cache_entry *, 
struct stat *, unsigned int);
+extern int ie_match_stat(struct index_state *, const struct cache_entry *, 
struct stat *, unsigned int);
+extern int ie_modified(struct index_state *, const struct cache_entry *, 
struct stat *, unsigned int);
 
 #define PATHSPEC_ONESTAR 1 /* the pathspec pattern sastisfies GFNM_ONESTAR 
*/
 
diff --git a/read-cache-v2.c b/read-cache-v2.c
new file mode 100644
index 000..a6883c3
--- /dev/null
+++ b/read-cache-v2.c
@@ -0,0 +1,556 @@
+#include cache.h
+#include read-cache.h
+#include resolve-undo.h
+#include cache-tree.h
+#include varint.h
+
+/* Mask for the name length in ce_flags in the on-disk index */
+#define CE_NAMEMASK  (0x0fff)
+
+struct cache_header {
+   unsigned int hdr_entries;
+};
+
+/*
+ * Index File I/O
+ */
+
+/*
+ * dev/ino/uid/gid/size are also just tracked to the low 32 bits
+ * Again - this is just a (very strong in practice) heuristic that
+ * the inode hasn't changed.
+ *
+ * We save the fields in big-endian order to allow using the
+ * index file over NFS transparently.
+ */
+struct ondisk_cache_entry {
+   struct cache_time ctime;
+   struct cache_time mtime;
+   unsigned int dev;
+   unsigned int ino;
+   unsigned int mode;
+   unsigned int uid;
+   unsigned int gid;
+   unsigned int size;
+   unsigned char sha1[20];
+   unsigned short flags;
+   char name[FLEX_ARRAY]; /* more */
+};
+
+/*
+ * This struct is used when CE_EXTENDED bit is 1
+ * The struct must match ondisk_cache_entry exactly from
+ * ctime till flags
+ */
+struct ondisk_cache_entry_extended {
+   struct cache_time ctime;
+   struct cache_time mtime;
+   unsigned int dev;
+   unsigned int ino;
+   unsigned int mode;
+   unsigned int uid;
+   unsigned int gid;
+   unsigned int size;
+   unsigned char sha1[20];
+   unsigned short flags;
+   unsigned short flags2;
+   char name[FLEX_ARRAY]; /* more */
+};
+
+/* These are only used for v3 or lower */
+#define align_flex_name(STRUCT,len) ((offsetof(struct STRUCT,name) + (len) + 
8)  ~7)
+#define ondisk_cache_entry_size(len) align_flex_name(ondisk_cache_entry,len)
+#define ondisk_cache_entry_extended_size(len) 
align_flex_name(ondisk_cache_entry_extended,len)
+#define ondisk_ce_size(ce) (((ce)-ce_flags  CE_EXTENDED) ? \
+   ondisk_cache_entry_extended_size(ce_namelen(ce)) : \
+   

[PATCH v2 05/19] Add documentation for the index api

2013-07-12 Thread Thomas Gummerer
Add documentation for the index reading api.  This also includes
documentation for the new api functions introduced in the next patch.

Helped-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 Documentation/technical/api-in-core-index.txt | 54 +--
 1 file changed, 50 insertions(+), 4 deletions(-)

diff --git a/Documentation/technical/api-in-core-index.txt 
b/Documentation/technical/api-in-core-index.txt
index adbdbf5..9b8c37c 100644
--- a/Documentation/technical/api-in-core-index.txt
+++ b/Documentation/technical/api-in-core-index.txt
@@ -1,14 +1,60 @@
 in-core index API
 =
 
+Reading API
+---
+
+`cache`::
+
+   An array of cache entries.  This is used to access the cache
+   entries directly.  Use `index_name_pos` to search for the
+   index of a specific cache entry.
+
+`read_index_filtered`::
+
+   Read a part of the index, filtered by the pathspec given in
+   the opts.  The function may load more than necessary, so the
+   caller still responsible to apply filters appropriately.  The
+   filtering is only done for performance reasons, as it's
+   possible to only read part of the index when the on-disk
+   format is index-v5.
+
+   To iterate only over the entries that match the pathspec, use
+   the for_each_index_entry function.
+
+`read_index`::
+
+   Read the whole index file from disk.
+
+`index_name_pos`::
+
+   Find a cache_entry with name in the index.  Returns pos if an
+   entry is matched exactly and -1-pos if an entry is matched
+   partially.
+   e.g.
+   index:
+   file1
+   file2
+   path/file1
+   zzz
+
+   index_name_pos(path/file1, 10) returns 2, while
+   index_name_pos(path, 4) returns -3
+
+`for_each_index_entry`::
+
+   Iterates over all cache_entries in the index filtered by
+   filter_opts in the index_state.  For each cache entry fn is
+   executed with cb_data as callback data.  From within the loop
+   do `return 0` to continue, or `return 1` to break the loop.
+
+TODO
+
 Talk about read-cache.c and cache-tree.c, things like:
 
-* cache - the_index macros
-* read_index()
 * write_index()
 * ie_match_stat() and ie_modified(); how they are different and when to
   use which.
-* index_name_pos()
 * remove_index_entry_at()
 * remove_file_from_index()
 * add_file_to_index()
@@ -18,4 +64,4 @@ Talk about read-cache.c and cache-tree.c, things like:
 * cache_tree_invalidate_path()
 * cache_tree_update()
 
-(JC, Linus)
+(JC, Linus, Thomas Gummerer)
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 06/19] read-cache: add index reading api

2013-07-12 Thread Thomas Gummerer
Add an api for access to the index file.  Currently there is only a very
basic api for accessing the index file, which only allows a full read of
the index, and lets the users of the data filter it.  The new index api
gives the users the possibility to use only part of the index and
provides functions for iterating over and accessing cache entries.

This simplifies future improvements to the in-memory format, as changes
will be concentrated on one file, instead of the whole git source code.

Helped-by: Nguyễn Thái Ngọc Duy pclo...@gmail.com
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 cache.h | 42 +-
 read-cache-v2.c | 35 +--
 read-cache.c| 44 
 read-cache.h|  8 +++-
 4 files changed, 121 insertions(+), 8 deletions(-)

diff --git a/cache.h b/cache.h
index 5082b34..d305d21 100644
--- a/cache.h
+++ b/cache.h
@@ -127,7 +127,7 @@ struct cache_entry {
unsigned int ce_flags;
unsigned int ce_namelen;
unsigned char sha1[20];
-   struct cache_entry *next;
+   struct cache_entry *next; /* used by name_hash */
char name[FLEX_ARRAY]; /* more */
 };
 
@@ -258,6 +258,29 @@ static inline unsigned int canon_mode(unsigned int mode)
 
 #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
 
+/*
+ * Options by which the index should be filtered when read partially.
+ *
+ * pathspec: The pathspec which the index entries have to match
+ * seen: Used to return the seen parameter from match_pathspec()
+ * max_prefix_len: The common prefix length of the pathspecs
+ *
+ * read_staged: used to indicate if the conflicted entries (entries
+ * with a stage) should be included
+ * read_cache_tree: used to indicate if the cache-tree should be read
+ * read_resolve_undo: used to indicate if the resolve undo data should
+ * be read
+ */
+struct filter_opts {
+   const struct pathspec *pathspec;
+   char *seen;
+   int max_prefix_len;
+
+   int read_staged;
+   int read_cache_tree;
+   int read_resolve_undo;
+};
+
 struct index_state {
struct cache_entry **cache;
unsigned int version;
@@ -270,6 +293,8 @@ struct index_state {
struct hash_table name_hash;
struct hash_table dir_hash;
struct index_ops *ops;
+   struct internal_ops *internal_ops;
+   struct filter_opts *filter_opts;
 };
 
 extern struct index_state the_index;
@@ -311,6 +336,12 @@ extern void free_name_hash(struct index_state *istate);
 #define unmerge_cache_entry_at(at) unmerge_index_entry_at(the_index, at)
 #define unmerge_cache(pathspec) unmerge_index(the_index, pathspec)
 #define read_blob_data_from_cache(path, sz) 
read_blob_data_from_index(the_index, (path), (sz))
+
+/* index api */
+#define read_cache_filtered(opts) read_index_filtered(the_index, (opts))
+#define read_cache_filtered_from(path, opts) 
read_index_filtered_from(the_index, (path), (opts))
+#define for_each_cache_entry(fn, cb_data) \
+   for_each_index_entry(the_index, (fn), (cb_data))
 #endif
 
 enum object_type {
@@ -438,6 +469,15 @@ extern int init_db(const char *template_dir, unsigned int 
flags);
} \
} while (0)
 
+/* index api */
+extern int read_index_filtered(struct index_state *, struct filter_opts *opts);
+extern int read_index_filtered_from(struct index_state *, const char *path, 
struct filter_opts *opts);
+
+typedef int each_cache_entry_fn(struct cache_entry *ce, void *);
+extern int for_each_index_entry(struct index_state *istate,
+   each_cache_entry_fn, void *);
+
+
 /* Initialize and use the cache information */
 extern int read_index(struct index_state *);
 extern int read_index_preload(struct index_state *, const char **pathspec);
diff --git a/read-cache-v2.c b/read-cache-v2.c
index a6883c3..51b618f 100644
--- a/read-cache-v2.c
+++ b/read-cache-v2.c
@@ -3,6 +3,7 @@
 #include resolve-undo.h
 #include cache-tree.h
 #include varint.h
+#include dir.h
 
 /* Mask for the name length in ce_flags in the on-disk index */
 #define CE_NAMEMASK  (0x0fff)
@@ -207,8 +208,14 @@ static int read_index_extension(struct index_state *istate,
return 0;
 }
 
+/*
+ * The performance is the same if we read the whole index or only
+ * part of it, therefore we always read the whole index to avoid
+ * having to re-read it later.  The filter_opts will determine
+ * what part of the index is used when retrieving the cache-entries.
+ */
 static int read_index_v2(struct index_state *istate, void *mmap,
-unsigned long mmap_size)
+unsigned long mmap_size, struct filter_opts *opts)
 {
int i;
unsigned long src_offset;
@@ -238,7 +245,6 @@ static int read_index_v2(struct index_state *istate, void 
*mmap,
disk_ce = (struct ondisk_cache_entry *)((char *)mmap + 
src_offset);
ce = 

[PATCH v2 07/19] make sure partially read index is not changed

2013-07-12 Thread Thomas Gummerer
A partially read index file currently cannot be written to disk.  Make
sure that never happens, by erroring out when a caller tries to change a
partially read index.  The caller is responsible for reading the whole
index when it's trying to change it later.

Forcing the caller to load the right part of the index file instead of
re-reading it when changing it, gives a bit of a performance advantage,
by avoiding to read parts of the index twice.

Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 builtin/update-index.c |  4 
 cache.h|  1 +
 read-cache-v2.c|  2 ++
 read-cache.c   | 10 ++
 4 files changed, 17 insertions(+)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 5c7762e..4c6e3a6 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -49,6 +49,8 @@ static int mark_ce_flags(const char *path, int flag, int mark)
int namelen = strlen(path);
int pos = cache_name_pos(path, namelen);
if (0 = pos) {
+   if (active_cache_partially_read)
+   die(BUG: Can't change a partially read index);
if (mark)
active_cache[pos]-ce_flags |= flag;
else
@@ -253,6 +255,8 @@ static void chmod_path(int flip, const char *path)
pos = cache_name_pos(path, strlen(path));
if (pos  0)
goto fail;
+   if (active_cache_partially_read)
+   die(BUG: Can't change a partially read index);
ce = active_cache[pos];
mode = ce-ce_mode;
if (!S_ISREG(mode))
diff --git a/cache.h b/cache.h
index d305d21..455b772 100644
--- a/cache.h
+++ b/cache.h
@@ -311,6 +311,7 @@ extern void free_name_hash(struct index_state *istate);
 #define active_alloc (the_index.cache_alloc)
 #define active_cache_changed (the_index.cache_changed)
 #define active_cache_tree (the_index.cache_tree)
+#define active_cache_partially_read (the_index.filter_opts)
 
 #define read_cache() read_index(the_index)
 #define read_cache_from(path) read_index_from(the_index, (path))
diff --git a/read-cache-v2.c b/read-cache-v2.c
index 51b618f..f3c0685 100644
--- a/read-cache-v2.c
+++ b/read-cache-v2.c
@@ -479,6 +479,8 @@ static int write_index_v2(struct index_state *istate, int 
newfd)
struct stat st;
struct strbuf previous_name_buf = STRBUF_INIT, *previous_name;
 
+   if (istate-filter_opts)
+   die(BUG: index: cannot write a partially read index);
for (i = removed = extended = 0; i  entries; i++) {
if (cache[i]-ce_flags  CE_REMOVE)
removed++;
diff --git a/read-cache.c b/read-cache.c
index 9053d43..ab716ed 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -30,6 +30,8 @@ static void replace_index_entry(struct index_state *istate, 
int nr, struct cache
 {
struct cache_entry *old = istate-cache[nr];
 
+   if (istate-filter_opts)
+   die(BUG: Can't change a partially read index);
remove_name_hash(istate, old);
set_index_entry(istate, nr, ce);
istate-cache_changed = 1;
@@ -467,6 +469,8 @@ int remove_index_entry_at(struct index_state *istate, int 
pos)
 {
struct cache_entry *ce = istate-cache[pos];
 
+   if (istate-filter_opts)
+   die(BUG: Can't change a partially read index);
record_resolve_undo(istate, ce);
remove_name_hash(istate, ce);
istate-cache_changed = 1;
@@ -973,6 +977,8 @@ int add_index_entry(struct index_state *istate, struct 
cache_entry *ce, int opti
 {
int pos;
 
+   if (istate-filter_opts)
+   die(BUG: Can't change a partially read index);
if (option  ADD_CACHE_JUST_APPEND)
pos = istate-cache_nr;
else {
@@ -1173,6 +1179,8 @@ int refresh_index(struct index_state *istate, unsigned 
int flags, const char **p
/* If we are doing --really-refresh that
 * means the index is not valid anymore.
 */
+   if (istate-filter_opts)
+   die(BUG: Can't change a partially read 
index);
ce-ce_flags = ~CE_VALID;
istate-cache_changed = 1;
}
@@ -1331,6 +1339,8 @@ int read_index_filtered_from(struct index_state *istate, 
const char *path,
void *mmap;
size_t mmap_size;
 
+   if (istate-filter_opts)
+   die(BUG: Can't re-read partially read index);
errno = EBUSY;
if (istate-initialized)
return istate-cache_nr;
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 08/19] grep.c: Use index api

2013-07-12 Thread Thomas Gummerer
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 builtin/grep.c | 71 ++
 1 file changed, 37 insertions(+), 34 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index a419cda..8b02644 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -368,41 +368,33 @@ static void run_pager(struct grep_opt *opt, const char 
*prefix)
free(argv);
 }
 
-static int grep_cache(struct grep_opt *opt, const struct pathspec *pathspec, 
int cached)
+struct grep_opts {
+   struct grep_opt *opt;
+   const struct pathspec *pathspec;
+   int cached;
+   int hit;
+};
+
+static int grep_cache(struct cache_entry *ce, void *cb_data)
 {
-   int hit = 0;
-   int nr;
-   read_cache();
+   struct grep_opts *opts = cb_data;
 
-   for (nr = 0; nr  active_nr; nr++) {
-   struct cache_entry *ce = active_cache[nr];
-   if (!S_ISREG(ce-ce_mode))
-   continue;
-   if (!match_pathspec_depth(pathspec, ce-name, ce_namelen(ce), 
0, NULL))
-   continue;
-   /*
-* If CE_VALID is on, we assume worktree file and its cache 
entry
-* are identical, even if worktree file has been modified, so 
use
-* cache version instead
-*/
-   if (cached || (ce-ce_flags  CE_VALID) || 
ce_skip_worktree(ce)) {
-   if (ce_stage(ce))
-   continue;
-   hit |= grep_sha1(opt, ce-sha1, ce-name, 0, ce-name);
-   }
-   else
-   hit |= grep_file(opt, ce-name);
-   if (ce_stage(ce)) {
-   do {
-   nr++;
-   } while (nr  active_nr 
-!strcmp(ce-name, active_cache[nr]-name));
-   nr--; /* compensate for loop control */
-   }
-   if (hit  opt-status_only)
-   break;
-   }
-   return hit;
+   if (!S_ISREG(ce-ce_mode))
+   return 0;
+   if (!match_pathspec_depth(opts-pathspec, ce-name, ce_namelen(ce), 0, 
NULL))
+   return 0;
+   /*
+* If CE_VALID is on, we assume worktree file and its cache entry
+* are identical, even if worktree file has been modified, so use
+* cache version instead
+*/
+   if (opts-cached || (ce-ce_flags  CE_VALID) || ce_skip_worktree(ce))
+   opts-hit |= grep_sha1(opts-opt, ce-sha1, ce-name, 0, 
ce-name);
+   else
+   opts-hit |= grep_file(opts-opt, ce-name);
+   if (opts-hit  opts-opt-status_only)
+   return 1;
+   return 0;
 }
 
 static int grep_tree(struct grep_opt *opt, const struct pathspec *pathspec,
@@ -895,10 +887,21 @@ int cmd_grep(int argc, const char **argv, const char 
*prefix)
} else if (0 = opt_exclude) {
die(_(--[no-]exclude-standard cannot be used for tracked 
contents.));
} else if (!list.nr) {
+   struct grep_opts opts;
+   struct filter_opts *filter_opts = xmalloc(sizeof(*filter_opts));
+
if (!cached)
setup_work_tree();
 
-   hit = grep_cache(opt, pathspec, cached);
+   memset(filter_opts, 0, sizeof(*filter_opts));
+   filter_opts-pathspec = pathspec;
+   opts.opt = opt;
+   opts.pathspec = pathspec;
+   opts.cached = cached;
+   opts.hit = 0;
+   read_cache_filtered(filter_opts);
+   for_each_cache_entry(grep_cache, opts);
+   hit = opts.hit;
} else {
if (cached)
die(_(both --cached and trees are given.));
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 09/19] ls-files.c: use index api

2013-07-12 Thread Thomas Gummerer
Use the index api to read only part of the index, if the on-disk version
of the index is index-v5.

Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 builtin/ls-files.c | 31 ---
 1 file changed, 20 insertions(+), 11 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index 08d9786..80cc398 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -31,6 +31,7 @@ static const char *prefix;
 static int max_prefix_len;
 static int prefix_len;
 static const char **pathspec;
+static struct pathspec pathspec_struct;
 static int error_unmatch;
 static char *ps_matched;
 static const char *with_tree;
@@ -457,6 +458,7 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
struct dir_struct dir;
struct exclude_list *el;
struct string_list exclude_list = STRING_LIST_INIT_NODUP;
+   struct filter_opts *opts = xmalloc(sizeof(*opts));
struct option builtin_ls_files_options[] = {
{ OPTION_CALLBACK, 'z', NULL, NULL, NULL,
N_(paths are separated with NUL character),
@@ -522,9 +524,6 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
prefix_len = strlen(prefix);
git_config(git_default_config, NULL);
 
-   if (read_cache()  0)
-   die(index file corrupt);
-
argc = parse_options(argc, argv, prefix, builtin_ls_files_options,
ls_files_usage, 0);
el = add_exclude_list(dir, EXC_CMDL, --exclude option);
@@ -556,14 +555,7 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
setup_work_tree();
 
pathspec = get_pathspec(prefix, argv);
-
-   /* be nice with submodule paths ending in a slash */
-   if (pathspec)
-   strip_trailing_slash_from_submodules();
-
-   /* Find common prefix for all pathspec's */
-   max_prefix = common_prefix(pathspec);
-   max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
+   init_pathspec(pathspec_struct, pathspec);
 
/* Treat unmatching pathspec elements as errors */
if (pathspec  error_unmatch) {
@@ -573,6 +565,23 @@ int cmd_ls_files(int argc, const char **argv, const char 
*cmd_prefix)
ps_matched = xcalloc(1, num);
}
 
+   if (!with_tree) {
+   memset(opts, 0, sizeof(*opts));
+   opts-pathspec = pathspec_struct;
+   opts-read_staged = 1;
+   if (show_resolve_undo)
+   opts-read_resolve_undo = 1;
+   read_cache_filtered(opts);
+   } else {
+   read_cache();
+   }
+   /* be nice with submodule paths ending in a slash */
+   if (pathspec)
+   strip_trailing_slash_from_submodules();
+
+   max_prefix = common_prefix(pathspec);
+   max_prefix_len = max_prefix ? strlen(max_prefix) : 0;
+
if ((dir.flags  DIR_SHOW_IGNORED)  !exc_given)
die(ls-files --ignored needs some exclude pattern);
 
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/2] open() error checking

2013-07-12 Thread Junio C Hamano
Thomas Rast tr...@inf.ethz.ch writes:

 #1 is Dale's suggested change.  Dale, to include it we'd need your
 Signed-off-by as per Documentation/SubmittingPatches.

 #2 is a similar error-checking fix; I reviewed 'git grep \bopen\b'
 and found one case where the return value was obviously not tested.
 The corresponding Windows code path has the same problem, but I dare
 not touch it; perhaps someone from the Windows side can look into it?

 I originally had a four-patch series to open 0/1/2 from /dev/null, but
 then I noticed that this was shot down in 2008:

   http://thread.gmane.org/gmane.comp.version-control.git/93605/focus=93896

The way I recall the thread was not shot down but more like
fizzled out without seeing a clear consensus.  As a normal POSIX
program, we do rely on fd#2 connected to an error stream, and I do
agree with the general sentiment of that old thread that it is very
wrong for warning() or die() to write to a pipe or file descriptor
we opened for some other purpose, corrupting the destination.

I briefly wondered if we can do the sanity check lazily (e.g. upon
first warning() see of fd#2 is open and otherwise die silently), but
we may open a fd (e.g. to create a new loose object) that may happen
to grab fd#2 and then it is too late for us to do anything about it,
so...

 Do you want to resurrect this?

 The worst part about it is that because we don't have a stderr to rely
 on, we can't simply die(stop playing mind games).

Right.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] .mailmap: Map email addresses to names

2013-07-12 Thread Stefan Beller
The same kind of cleanup as sent earlier today
(2e2ae79df4fabea0157c5eb527b5396eb89185a1 locally here)

I asked all the people before, whether
they like their lines added. Many had
requests to change the order of the mail address.

When having this patch applied, you'll notice the
bug as described here
http://marc.info/?l=gitm=137364524514927w=2
http://www.mail-archive.com/git@vger.kernel.org/msg31964.html
(Bug in .mailmap handling?, for example look for Knut Franke)

Signed-off-by: Stefan Beller stefanbel...@googlemail.com
---
 .mailmap | 28 ++--
 1 file changed, 26 insertions(+), 2 deletions(-)

diff --git a/.mailmap b/.mailmap
index 1179767..1d6ba17 100644
--- a/.mailmap
+++ b/.mailmap
@@ -7,6 +7,7 @@
 
 Alejandro R. Sedeño ased...@mit.edu ased...@mit.edu
 Alexander Gavrilov angavri...@gmail.com
+Alexey Shumkin alex.crez...@gmail.com zap...@mail.ru
 Alex Bennée kernel-hac...@bennee.com
 Alex Riesen raa.l...@gmail.com fo...@t-online.de
 Alex Riesen raa.l...@gmail.com raa@limbo.localdomain
@@ -18,12 +19,15 @@ anonymous li...@horizon.com
 anonymous li...@horizon.net
 Brandon Casey draf...@gmail.com ca...@nrlssc.navy.mil
 Brian M. Carlson sand...@crustytoothpaste.ath.cx
+Bryan Larsen br...@larsen.st bryan.lar...@gmail.com
+Bryan Larsen br...@larsen.st bryanlar...@yahoo.com
 Cheng Renquan crq...@gmail.com
 Chris Shoemaker c.shoema...@cox.net
 Dana L. How dana...@gmail.com
 Dana L. How h...@deathvalley.cswitch.com
 Daniel Barkalow barka...@iabervon.org
 Dan Johnson computerdr...@gmail.com
+David Brown g...@davidb.org dav...@quicinc.com
 David D. Kilzer ddkil...@kilzer.net
 David Kågedal dav...@lysator.liu.se
 David Reiss dre...@facebook.com dreiss@dreiss-vmware.(none)
@@ -31,7 +35,10 @@ David S. Miller da...@davemloft.net
 Deskin Miller desk...@umich.edu
 Dirk Süsserott newslet...@dirk.my1.cc
 Eric S. Raymond e...@thyrsus.com
+Eric Blake ebl...@redhat.com e...@byu.net
+Eric Hanchrow eric.hanch...@gmail.com off...@blarg.net
 Erik Faye-Lund kusmab...@gmail.com kusmab...@googlemail.com
+Eyvind Bernhardsen eyvind.bernhard...@gmail.com eyvind-...@orakel.ntnu.no
 Florian Achleitner florian.achleitner.2.6...@gmail.com 
florian.achleitner2.6...@gmail.com
 Franck Bui-Huu vagabon@gmail.com fbui...@gmail.com
 Frank Lichtenheld fr...@lichtenheld.de dj...@debian.org
@@ -47,19 +54,25 @@ H. Peter Anvin h...@zytor.com 
h...@tazenda.sc.orionmulti.com
 H. Peter Anvin h...@zytor.com h...@trantor.hos.anvin.org
 İsmail Dönmez ism...@pardus.org.tr
 Jakub Narębski jna...@gmail.com
+Jason Riedy e...@eecs.berkeley.edu e...@cs.berkeley.edu
+Jason Riedy e...@eecs.berkeley.edu e...@eecs.berkeley.edu
 Jay Soffian jaysoff...@gmail.com jaysoffian+...@gmail.com
 J. Bruce Fields bfie...@citi.umich.edu bfie...@fieldses.org
 J. Bruce Fields bfie...@citi.umich.edu bfie...@pig.linuxdev.us.dell.com
 J. Bruce Fields bfie...@citi.umich.edu bfie...@puzzle.fieldses.org
 Jeff King p...@peff.net p...@github.com
+Jeff Muizelaar jmuizel...@mozilla.com j...@infidigm.net
 Joachim Berdal Haga cjh...@fys.uio.no
 Johannes Schindelin johannes.schinde...@gmx.de johannes.schinde...@gmx.de
 Johannes Sixt j...@kdbg.org johannes.s...@telecom.at
 Johannes Sixt j...@kdbg.org j.s...@eudaptics.com
 Johannes Sixt j...@kdbg.org j.s...@viscovery.net
+Jonathan del Strother jon.delstrot...@bestbefore.tv maill...@steelskies.com
 Jonathan Nieder jrnie...@gmail.com jrnie...@uchicago.edu
 Jon Loeliger j...@freescale.com
-Jon Seymour j...@blackcubes.dyndns.org
+Jon Seymour jon.seym...@gmail.com j...@blackcubes.dyndns.org
+Josh Triplett j...@joshtriplett.org j...@freedesktop.org
+Josh Triplett j...@joshtriplett.org jo...@us.ibm.com
 Junio C Hamano gits...@pobox.com gits...@pobox.com
 Junio C Hamano gits...@pobox.com ju...@hera.kernel.org
 Junio C Hamano gits...@pobox.com ju...@kernel.org
@@ -71,11 +84,14 @@ Karl Wiberg k...@treskal.com Karl Hasselström 
k...@treskal.com
 Karl Wiberg k...@treskal.com Karl Hasselström 
k...@yoghurt.hemma.treskal.com
 Kay Sievers kay.siev...@vrfy.org kay@mam.(none)
 Kay Sievers kay.siev...@vrfy.org kay.siev...@suse.de
+Karsten Blees bl...@dcon.de karsten.bl...@dcon.de
+Karsten Blees bl...@dcon.de karsten.bl...@gmail.com
 Keith Cascio ke...@cs.ucla.edu ke...@cs.ucla.edu
 Kent Engstrom k...@lysator.liu.se
 Kevin Leung kevin...@gmail.com
 Kirill Smelkov k...@navytux.spb.ru k...@landau.phys.spbu.ru
 Kirill Smelkov k...@navytux.spb.ru k...@mns.spb.ru
+Knut Franke knut.fra...@gmx.de k.fra...@science-computing.de
 Lars Doelle lars.doelle@on-line ! de
 Lars Doelle lars.doe...@on-line.de
 Li Hong leeh...@pku.edu.cn
@@ -85,11 +101,14 @@ Linus Torvalds torva...@linux-foundation.org 
torva...@osdl.org
 Linus Torvalds torva...@linux-foundation.org torva...@ppc970.osdl.org
 Linus Torvalds torva...@linux-foundation.org 
torva...@ppc970.osdl.org.(none)
 Linus Torvalds torva...@linux-foundation.org 
torva...@woody.linux-foundation.org
-Lukas Sandström luk...@etek.chalmers.se
+Lukas Sandström luk...@gmail.com luk...@etek.chalmers.se
+Matt 

[PATCH v2 11/19] read-cache: make in-memory format aware of stat_crc

2013-07-12 Thread Thomas Gummerer
Make the in-memory format aware of the stat_crc used by index-v5.
It is simply ignored by index version prior to v5.

Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 cache.h  |  1 +
 read-cache.c | 25 +
 2 files changed, 26 insertions(+)

diff --git a/cache.h b/cache.h
index 455b772..2097105 100644
--- a/cache.h
+++ b/cache.h
@@ -127,6 +127,7 @@ struct cache_entry {
unsigned int ce_flags;
unsigned int ce_namelen;
unsigned char sha1[20];
+   uint32_t ce_stat_crc;
struct cache_entry *next; /* used by name_hash */
char name[FLEX_ARRAY]; /* more */
 };
diff --git a/read-cache.c b/read-cache.c
index ab716ed..9bfbb4f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -108,6 +108,29 @@ int match_stat_data(const struct stat_data *sd, struct 
stat *st)
return changed;
 }
 
+static uint32_t calculate_stat_crc(struct cache_entry *ce)
+{
+   unsigned int ctimens = 0;
+   uint32_t stat, stat_crc;
+
+   stat = htonl(ce-ce_stat_data.sd_ctime.sec);
+   stat_crc = crc32(0, (Bytef*)stat, 4);
+#ifdef USE_NSEC
+   ctimens = ce-ce_stat_data.sd_ctime.nsec;
+#endif
+   stat = htonl(ctimens);
+   stat_crc = crc32(stat_crc, (Bytef*)stat, 4);
+   stat = htonl(ce-ce_stat_data.sd_ino);
+   stat_crc = crc32(stat_crc, (Bytef*)stat, 4);
+   stat = htonl(ce-ce_stat_data.sd_dev);
+   stat_crc = crc32(stat_crc, (Bytef*)stat, 4);
+   stat = htonl(ce-ce_stat_data.sd_uid);
+   stat_crc = crc32(stat_crc, (Bytef*)stat, 4);
+   stat = htonl(ce-ce_stat_data.sd_gid);
+   stat_crc = crc32(stat_crc, (Bytef*)stat, 4);
+   return stat_crc;
+}
+
 /*
  * This only updates the non-critical parts of the directory
  * cache, ie the parts that aren't tracked by GIT, and only used
@@ -122,6 +145,8 @@ void fill_stat_cache_info(struct cache_entry *ce, struct 
stat *st)
 
if (S_ISREG(st-st_mode))
ce_mark_uptodate(ce);
+
+   ce-ce_stat_crc = calculate_stat_crc(ce);
 }
 
 static int ce_compare_data(const struct cache_entry *ce, struct stat *st)
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 10/19] documentation: add documentation of the index-v5 file format

2013-07-12 Thread Thomas Gummerer
Add a documentation of the index file format version 5 to
Documentation/technical.

Helped-by: Michael Haggerty mhag...@alum.mit.edu
Helped-by: Junio C Hamano gits...@pobox.com
Helped-by: Thomas Rast tr...@student.ethz.ch
Helped-by: Nguyen Thai Ngoc Duy pclo...@gmail.com
Helped-by: Robin Rosenberg robin.rosenb...@dewire.com
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 Documentation/technical/index-file-format-v5.txt | 296 +++
 1 file changed, 296 insertions(+)
 create mode 100644 Documentation/technical/index-file-format-v5.txt

diff --git a/Documentation/technical/index-file-format-v5.txt 
b/Documentation/technical/index-file-format-v5.txt
new file mode 100644
index 000..4213087
--- /dev/null
+++ b/Documentation/technical/index-file-format-v5.txt
@@ -0,0 +1,296 @@
+GIT index format
+
+
+== The git index
+
+   The git index file (.git/index) documents the status of the files
+ in the git staging area.
+
+   The staging area is used for preparing commits, merging, etc.
+
+== The git index file format
+
+   All binary numbers are in network byte order. Version 5 is described
+ here. The index file consists of various sections. They appear in
+ the following order in the file.
+
+   - header: the description of the index format, including it's signature,
+ version and various other fields that are used internally.
+
+   - diroffsets (ndir entries of direcotry offset): A 4-byte offset
+   relative to the beginning of the direntries block (see below)
+   for each of the ndir directories in the index, sorted by pathname
+   (of the directory it's pointing to). [1]
+
+   - direntries (ndir entries of directory offset): A directory entry
+   for each of the ndir directories in the index, sorted by pathname
+   (see below). [2]
+
+   - fileoffsets (nfile entries of file offset): A 4-byte offset
+   relative to the beginning of the fileentries block (see below)
+   for each of the nfile files in the index. [1]
+
+   - fileentries (nfile entries of file entry): A file entry for
+   each of the nfile files in the index (see below).
+
+   - crdata: A number of entries for conflicted data/resolved conflicts
+   (see below).
+
+   - Extensions (Currently none, see below in the future)
+
+ Extensions are identified by signature. Optional extensions can
+ be ignored if GIT does not understand them.
+
+ GIT supports an arbitrary number of extension, but currently none
+ is implemented. [3]
+
+ extsig (32-bits): extension signature. If the first byte is 'A'..'Z'
+ the extension is optional and can be ignored.
+
+ extsize (32-bits): size of the extension, excluding the header
+   (extsig, extsize, extchecksum).
+
+ extchecksum (32-bits): crc32 checksum of the extension signature
+   and size.
+
+- Extension data.
+
+== Header
+   sig (32-bits): Signature:
+ The signature is { 'D', 'I', 'R', 'C' } (stands for dircache)
+
+   vnr (32-bits): Version number:
+ The current supported versions are 2, 3, 4 and 5.
+
+   ndir (32-bits): number of directories in the index.
+
+   nfile (32-bits): number of file entries in the index.
+
+   fblockoffset (32-bits): offset to the file block, relative to the
+ beginning of the file.
+
+   - Offset to the extensions.
+
+ nextensions (32-bits): number of extensions.
+
+ extoffset (32-bits): offset to the extension. (Possibly none, as
+   many as indicated in the 4-byte number of extensions)
+
+   headercrc (32-bits): crc checksum including the header and the
+ offsets to the extensions.
+
+
+== Directory offsets (diroffsets)
+
+  diroffset (32-bits): offset to the directory relative to the beginning
+of the index file. There are ndir + 1 offsets in the diroffset table,
+the last is pointing to the end of the last direntry. With this last
+entry, we are able to replace the strlen of when reading the directory
+name, by calculating it from diroffset[n+1]-diroffset[n]-61.  61 is the
+size of the directory data, which follows each each directory + the
+crc sum + the NUL byte.
+
+  This part is needed for making the directory entries bisectable and
+thus allowing a binary search.
+
+== Directory entry (direntries)
+
+  Directory entries are sorted in lexicographic order by the name
+of their path starting with the root.
+
+  pathname (variable length, nul terminated): relative to top level
+directory (without the leading slash). '/' is used as path
+separator. A string of length 0 ('') indicates the root directory.
+The special path components ., and .. (without quotes) are
+disallowed. The path also includes a trailing slash. [9]
+
+  foffset (32-bits): offset to the lexicographically first file in
+the file offsets (fileoffsets), relative to the beginning of
+the fileoffset block.
+
+  cr (32-bits): offset to conflicted/resolved data at the end of the
+index. 0 if 

[PATCH v2 14/19] read-cache: read cache-tree in index-v5

2013-07-12 Thread Thomas Gummerer
Since the cache-tree data is saved as part of the directory data,
we already read it at the beginning of the index. The cache-tree
is only converted from this directory data.

The cache-tree data is arranged in a tree, with the children sorted by
pathlen at each node, while the ondisk format is sorted lexically.
So we have to rebuild this format from the on-disk directory list.

Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 cache-tree.c|   2 +-
 cache-tree.h|   6 
 read-cache-v5.c | 100 
 3 files changed, 107 insertions(+), 1 deletion(-)

diff --git a/cache-tree.c b/cache-tree.c
index 37e4d00..f4b0917 100644
--- a/cache-tree.c
+++ b/cache-tree.c
@@ -31,7 +31,7 @@ void cache_tree_free(struct cache_tree **it_p)
*it_p = NULL;
 }
 
-static int subtree_name_cmp(const char *one, int onelen,
+int subtree_name_cmp(const char *one, int onelen,
const char *two, int twolen)
 {
if (onelen  twolen)
diff --git a/cache-tree.h b/cache-tree.h
index 55d0f59..9aac493 100644
--- a/cache-tree.h
+++ b/cache-tree.h
@@ -21,10 +21,16 @@ struct cache_tree {
struct cache_tree_sub **down;
 };
 
+struct directory_queue {
+   struct directory_queue *down;
+   struct directory_entry *de;
+};
+
 struct cache_tree *cache_tree(void);
 void cache_tree_free(struct cache_tree **);
 void cache_tree_invalidate_path(struct cache_tree *, const char *);
 struct cache_tree_sub *cache_tree_sub(struct cache_tree *, const char *);
+int subtree_name_cmp(const char *, int, const char *, int);
 
 void cache_tree_write(struct strbuf *, struct cache_tree *root);
 struct cache_tree *cache_tree_read(const char *buffer, unsigned long size);
diff --git a/read-cache-v5.c b/read-cache-v5.c
index 853b97d..0b9c320 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -448,6 +448,103 @@ static int read_conflicts(struct conflict_entry **head,
return 0;
 }
 
+static struct cache_tree *convert_one(struct directory_queue *queue, int dirnr)
+{
+   int i, subtree_nr;
+   struct cache_tree *it;
+   struct directory_queue *down;
+
+   it = cache_tree();
+   it-entry_count = queue[dirnr].de-de_nentries;
+   subtree_nr = queue[dirnr].de-de_nsubtrees;
+   if (0 = it-entry_count)
+   hashcpy(it-sha1, queue[dirnr].de-sha1);
+
+   /*
+* Just a heuristic -- we do not add directories that often but
+* we do not want to have to extend it immediately when we do,
+* hence +2.
+*/
+   it-subtree_alloc = subtree_nr + 2;
+   it-down = xcalloc(it-subtree_alloc, sizeof(struct cache_tree_sub *));
+   down = queue[dirnr].down;
+   for (i = 0; i  subtree_nr; i++) {
+   struct cache_tree *sub;
+   struct cache_tree_sub *subtree;
+   char *buf, *name;
+
+   name = ;
+   buf = strtok(down[i].de-pathname, /);
+   while (buf) {
+   name = buf;
+   buf = strtok(NULL, /);
+   }
+   sub = convert_one(down, i);
+   if(!sub)
+   goto free_return;
+   subtree = cache_tree_sub(it, name);
+   subtree-cache_tree = sub;
+   }
+   if (subtree_nr != it-subtree_nr)
+   die(cache-tree: internal error);
+   return it;
+ free_return:
+   cache_tree_free(it);
+   return NULL;
+}
+
+static int compare_cache_tree_elements(const void *a, const void *b)
+{
+   const struct directory_entry *de1, *de2;
+
+   de1 = ((const struct directory_queue *)a)-de;
+   de2 = ((const struct directory_queue *)b)-de;
+   return subtree_name_cmp(de1-pathname, de1-de_pathlen,
+   de2-pathname, de2-de_pathlen);
+}
+
+static struct directory_entry *sort_directories(struct directory_entry *de,
+   struct directory_queue *queue)
+{
+   int i, nsubtrees;
+
+   nsubtrees = de-de_nsubtrees;
+   for (i = 0; i  nsubtrees; i++) {
+   struct directory_entry *new_de;
+   de = de-next;
+   new_de = xmalloc(directory_entry_size(de-de_pathlen));
+   memcpy(new_de, de, directory_entry_size(de-de_pathlen));
+   queue[i].de = new_de;
+   if (de-de_nsubtrees) {
+   queue[i].down = xcalloc(de-de_nsubtrees,
+   sizeof(struct directory_queue));
+   de = sort_directories(de,
+   queue[i].down);
+   }
+   }
+   qsort(queue, nsubtrees, sizeof(struct directory_queue),
+   compare_cache_tree_elements);
+   return de;
+}
+
+/*
+ * This function modifies the directory argument that is given to it.
+ * Don't use it if the directory entries are still needed after.
+ */
+static struct 

[PATCH v2 13/19] read-cache: read resolve-undo data

2013-07-12 Thread Thomas Gummerer
Make git read the resolve-undo data from the index.

Since the resolve-undo data is joined with the conflicts in
the ondisk format of the index file version 5, conflicts and
resolved data is read at the same time, and the resolve-undo
data is then converted to the in-memory format.

Helped-by: Thomas Rast tr...@student.ethz.ch
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 read-cache-v5.c | 39 +++
 1 file changed, 39 insertions(+)

diff --git a/read-cache-v5.c b/read-cache-v5.c
index 00112ea..853b97d 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -1,5 +1,6 @@
 #include cache.h
 #include read-cache.h
+#include string-list.h
 #include resolve-undo.h
 #include cache-tree.h
 #include dir.h
@@ -447,6 +448,43 @@ static int read_conflicts(struct conflict_entry **head,
return 0;
 }
 
+static void resolve_undo_convert_v5(struct index_state *istate,
+   struct conflict_entry *conflict)
+{
+   int i;
+
+   while (conflict) {
+   struct string_list_item *lost;
+   struct resolve_undo_info *ui;
+   struct conflict_part *cp;
+
+   if (conflict-entries 
+   (conflict-entries-flags  CONFLICT_CONFLICTED) != 0) {
+   conflict = conflict-next;
+   continue;
+   }
+   if (!istate-resolve_undo) {
+   istate-resolve_undo = xcalloc(1, sizeof(struct 
string_list));
+   istate-resolve_undo-strdup_strings = 1;
+   }
+
+   lost = string_list_insert(istate-resolve_undo, conflict-name);
+   if (!lost-util)
+   lost-util = xcalloc(1, sizeof(*ui));
+   ui = lost-util;
+
+   cp = conflict-entries;
+   for (i = 0; i  3; i++)
+   ui-mode[i] = 0;
+   while (cp) {
+   ui-mode[conflict_stage(cp) - 1] = cp-entry_mode;
+   hashcpy(ui-sha1[conflict_stage(cp) - 1], cp-sha1);
+   cp = cp-next;
+   }
+   conflict = conflict-next;
+   }
+}
+
 static int read_entries(struct index_state *istate, struct directory_entry 
**de,
unsigned int *entry_offset, void **mmap,
unsigned long mmap_size, unsigned int *nr,
@@ -460,6 +498,7 @@ static int read_entries(struct index_state *istate, struct 
directory_entry **de,
conflict_queue = NULL;
if (read_conflicts(conflict_queue, *de, mmap, mmap_size)  0)
return -1;
+   resolve_undo_convert_v5(istate, conflict_queue);
for (i = 0; i  (*de)-de_nfiles; i++) {
if (read_entry(ce,
   *de,
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 12/19] read-cache: read index-v5

2013-07-12 Thread Thomas Gummerer
Make git read the index file version 5 without complaining.

This version of the reader doesn't read neither the cache-tree
nor the resolve undo data, but doesn't choke on an index that
includes such data.

Helped-by: Junio C Hamano gits...@pobox.com
Helped-by: Nguyen Thai Ngoc Duy pclo...@gmail.com
Helped-by: Thomas Rast tr...@student.ethz.ch
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 Makefile|   1 +
 cache.h |  75 ++-
 read-cache-v5.c | 638 
 read-cache.h|   1 +
 4 files changed, 714 insertions(+), 1 deletion(-)
 create mode 100644 read-cache-v5.c

diff --git a/Makefile b/Makefile
index 73369ae..80e35f5 100644
--- a/Makefile
+++ b/Makefile
@@ -856,6 +856,7 @@ LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
 LIB_OBJS += read-cache-v2.o
+LIB_OBJS += read-cache-v5.o
 LIB_OBJS += reflog-walk.o
 LIB_OBJS += refs.o
 LIB_OBJS += remote.o
diff --git a/cache.h b/cache.h
index 2097105..1e5cc77 100644
--- a/cache.h
+++ b/cache.h
@@ -99,7 +99,7 @@ unsigned long git_deflate_bound(git_zstream *, unsigned long);
 #define CACHE_SIGNATURE 0x44495243 /* DIRC */
 
 #define INDEX_FORMAT_LB 2
-#define INDEX_FORMAT_UB 4
+#define INDEX_FORMAT_UB 5
 
 /*
  * The cache_time is just the low 32 bits of the
@@ -121,6 +121,15 @@ struct stat_data {
unsigned int sd_size;
 };
 
+/*
+ * The *next pointer is used in read_entries_v5 for holding
+ * all the elements of a directory, and points to the next
+ * cache_entry in a directory.
+ *
+ * It is reset by the add_name_hash call in set_index_entry
+ * to set it to point to the next cache_entry in the
+ * correct in-memory format ordering.
+ */
 struct cache_entry {
struct stat_data ce_stat_data;
unsigned int ce_mode;
@@ -132,11 +141,59 @@ struct cache_entry {
char name[FLEX_ARRAY]; /* more */
 };
 
+struct directory_entry {
+   struct directory_entry *next;
+   struct directory_entry *next_hash;
+   struct cache_entry *ce;
+   struct cache_entry *ce_last;
+   struct conflict_entry *conflict;
+   struct conflict_entry *conflict_last;
+   unsigned int conflict_size;
+   unsigned int de_foffset;
+   unsigned int de_cr;
+   unsigned int de_ncr;
+   unsigned int de_nsubtrees;
+   unsigned int de_nfiles;
+   unsigned int de_nentries;
+   unsigned char sha1[20];
+   unsigned short de_flags;
+   unsigned int de_pathlen;
+   char pathname[FLEX_ARRAY];
+};
+
+struct conflict_part {
+   struct conflict_part *next;
+   unsigned short flags;
+   unsigned short entry_mode;
+   unsigned char sha1[20];
+};
+
+struct conflict_entry {
+   struct conflict_entry *next;
+   unsigned int nfileconflicts;
+   struct conflict_part *entries;
+   unsigned int namelen;
+   unsigned int pathlen;
+   char name[FLEX_ARRAY];
+};
+
+struct ondisk_conflict_part {
+   unsigned short flags;
+   unsigned short entry_mode;
+   unsigned char sha1[20];
+};
+
+#define CE_NAMEMASK  (0x0fff)
 #define CE_STAGEMASK (0x3000)
 #define CE_EXTENDED  (0x4000)
 #define CE_VALID (0x8000)
+#define CE_SMUDGED   (0x0400) /* index v5 only flag */
 #define CE_STAGESHIFT 12
 
+#define CONFLICT_CONFLICTED (0x8000)
+#define CONFLICT_STAGESHIFT 13
+#define CONFLICT_STAGEMASK (0x6000)
+
 /*
  * Range 0x in ce_flags is divided into
  * two parts: in-memory flags and on-disk ones.
@@ -173,6 +230,18 @@ struct cache_entry {
 #define CE_EXTENDED_FLAGS (CE_INTENT_TO_ADD | CE_SKIP_WORKTREE)
 
 /*
+ * Representation of the extended on-disk flags in the v5 format.
+ * They must not collide with the ordinary on-disk flags, and need to
+ * fit in 16 bits.  Note however that v5 does not save the name
+ * length.
+ */
+#define CE_INTENT_TO_ADD_V5  (0x4000)
+#define CE_SKIP_WORKTREE_V5  (0x0800)
+#if (CE_VALID|CE_STAGEMASK)  (CE_INTENTTOADD_V5|CE_SKIPWORKTREE_V5)
+#error v5 on-disk flags collide with ordinary on-disk flags
+#endif
+
+/*
  * Safeguard to avoid saving wrong flags:
  *  - CE_EXTENDED2 won't get saved until its semantic is known
  *  - Bits in 0x have been saved in ce_flags already
@@ -211,6 +280,8 @@ static inline unsigned create_ce_flags(unsigned stage)
 #define ce_skip_worktree(ce) ((ce)-ce_flags  CE_SKIP_WORKTREE)
 #define ce_mark_uptodate(ce) ((ce)-ce_flags |= CE_UPTODATE)
 
+#define conflict_stage(c) ((CONFLICT_STAGEMASK  (c)-flags)  
CONFLICT_STAGESHIFT)
+
 #define ce_permissions(mode) (((mode)  0100) ? 0755 : 0644)
 static inline unsigned int create_ce_mode(unsigned int mode)
 {
@@ -258,6 +329,8 @@ static inline unsigned int canon_mode(unsigned int mode)
 }
 
 #define cache_entry_size(len) (offsetof(struct cache_entry,name) + (len) + 1)
+#define directory_entry_size(len) (offsetof(struct directory_entry,pathname) + 
(len) + 1)
+#define conflict_entry_size(len) (offsetof(struct conflict_entry,name) + (len) 
+ 1)
 
 /*
  * Options by which the index 

[PATCH v2 16/19] read-cache: write index-v5 cache-tree data

2013-07-12 Thread Thomas Gummerer
Write the cache-tree data for the index version 5 file format. The
in-memory cache-tree data is converted to the ondisk format, by adding
it to the directory entries, that were compiled from the cache-entries
in the step before.

Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 read-cache-v5.c | 53 +
 1 file changed, 53 insertions(+)

diff --git a/read-cache-v5.c b/read-cache-v5.c
index 33667d7..cd819b4 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -941,6 +941,57 @@ static struct conflict_entry 
*create_conflict_entry_from_ce(struct cache_entry *
return create_new_conflict(ce-name, ce_namelen(ce), pathlen);
 }
 
+static void convert_one_to_ondisk_v5(struct hash_table *table, struct 
cache_tree *it,
+   const char *path, int pathlen, uint32_t crc)
+{
+   int i;
+   struct directory_entry *found, *search;
+
+   crc = crc32(crc, (Bytef*)path, pathlen);
+   found = lookup_hash(crc, table);
+   search = found;
+   while (search  strcmp(path, search-pathname + search-de_pathlen - 
strlen(path)) != 0)
+   search = search-next_hash;
+   if (!search)
+   return;
+   /*
+* The number of subtrees is already calculated by
+* compile_directory_data, therefore we only need to
+* add the entry_count
+*/
+   search-de_nentries = it-entry_count;
+   if (0 = it-entry_count)
+   hashcpy(search-sha1, it-sha1);
+   if (strcmp(path, ) != 0)
+   crc = crc32(crc, (Bytef*)/, 1);
+
+#if DEBUG
+   if (0 = it-entry_count)
+   fprintf(stderr, cache-tree %.*s (%d ent, %d subtree) %s\n,
+   pathlen, path, it-entry_count, it-subtree_nr,
+   sha1_to_hex(it-sha1));
+   else
+   fprintf(stderr, cache-tree %.*s (%d subtree) invalid\n,
+   pathlen, path, it-subtree_nr);
+#endif
+
+   for (i = 0; i  it-subtree_nr; i++) {
+   struct cache_tree_sub *down = it-down[i];
+   if (i) {
+   struct cache_tree_sub *prev = it-down[i-1];
+   if (subtree_name_cmp(down-name, down-namelen,
+prev-name, prev-namelen) = 0)
+   die(fatal - unsorted cache subtree);
+   }
+   convert_one_to_ondisk_v5(table, down-cache_tree, down-name, 
down-namelen, crc);
+   }
+}
+
+static void cache_tree_to_ondisk_v5(struct hash_table *table, struct 
cache_tree *root)
+{
+   convert_one_to_ondisk_v5(table, root, , 0, 0);
+}
+
 static struct directory_entry *compile_directory_data(struct index_state 
*istate,
int nfile,
unsigned int *ndir,
@@ -1046,6 +1097,8 @@ static struct directory_entry 
*compile_directory_data(struct index_state *istate
previous_entry-next = no_subtrees;
}
}
+   if (istate-cache_tree)
+   cache_tree_to_ondisk_v5(table, istate-cache_tree);
return de;
 }
 
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 18/19] update-index.c: rewrite index when index-version is given

2013-07-12 Thread Thomas Gummerer
Make update-index always rewrite the index when a index-version
is given, even if the index already has the right version.
This option is used for performance testing the writer and
reader.

Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 builtin/update-index.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/builtin/update-index.c b/builtin/update-index.c
index 4c6e3a6..7e723c0 100644
--- a/builtin/update-index.c
+++ b/builtin/update-index.c
@@ -6,6 +6,7 @@
 #include cache.h
 #include quote.h
 #include cache-tree.h
+#include read-cache.h
 #include tree-walk.h
 #include builtin.h
 #include refs.h
@@ -863,8 +864,7 @@ int cmd_update_index(int argc, const char **argv, const 
char *prefix)
preferred_index_format,
INDEX_FORMAT_LB, INDEX_FORMAT_UB);
 
-   if (the_index.version != preferred_index_format)
-   active_cache_changed = 1;
+   active_cache_changed = 1;
the_index.version = preferred_index_format;
}
 
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 17/19] read-cache: write resolve-undo data for index-v5

2013-07-12 Thread Thomas Gummerer
Make git read the resolve-undo data from the index.

Since the resolve-undo data is joined with the conflicts in
the ondisk format of the index file version 5, conflicts and
resolved data is read at the same time, and the resolve-undo
data is then converted to the in-memory format.

Helped-by: Thomas Rast tr...@student.ethz.ch
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 read-cache-v5.c | 94 +
 1 file changed, 94 insertions(+)

diff --git a/read-cache-v5.c b/read-cache-v5.c
index cd819b4..093ee1a 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -992,6 +992,99 @@ static void cache_tree_to_ondisk_v5(struct hash_table 
*table, struct cache_tree
convert_one_to_ondisk_v5(table, root, , 0, 0);
 }
 
+static void resolve_undo_to_ondisk_v5(struct hash_table *table,
+ struct string_list *resolve_undo,
+ unsigned int *ndir, int *total_dir_len,
+ struct directory_entry *de)
+{
+   struct string_list_item *item;
+   struct directory_entry *search;
+
+   if (!resolve_undo)
+   return;
+   for_each_string_list_item(item, resolve_undo) {
+   struct conflict_entry *conflict_entry;
+   struct resolve_undo_info *ui = item-util;
+   char *super;
+   int i, dir_len, len;
+   uint32_t crc;
+   struct directory_entry *found, *current, *new_tree;
+
+   if (!ui)
+   continue;
+
+   super = super_directory(item-string);
+   if (!super)
+   dir_len = 0;
+   else
+   dir_len = strlen(super);
+   crc = crc32(0, (Bytef*)super, dir_len);
+   found = lookup_hash(crc, table);
+   current = NULL;
+   new_tree = NULL;
+
+   while (!found) {
+   struct directory_entry *new;
+
+   new = init_directory_entry(super, dir_len);
+   if (!current)
+   current = new;
+   insert_directory_entry(new, table, total_dir_len, ndir, 
crc);
+   if (new_tree != NULL)
+   new-de_nsubtrees = 1;
+   new-next = new_tree;
+   new_tree = new;
+   super = super_directory(super);
+   if (!super)
+   dir_len = 0;
+   else
+   dir_len = strlen(super);
+   crc = crc32(0, (Bytef*)super, dir_len);
+   found = lookup_hash(crc, table);
+   }
+   search = found;
+   while (search-next_hash  strcmp(super, search-pathname) != 
0)
+   search = search-next_hash;
+   if (search  !current)
+   current = search;
+   if (!search  !current)
+   current = new_tree;
+   if (!super  new_tree) {
+   new_tree-next = de-next;
+   de-next = new_tree;
+   de-de_nsubtrees++;
+   } else if (new_tree) {
+   struct directory_entry *temp;
+
+   search = de-next;
+   while (strcmp(super, search-pathname))
+   search = search-next;
+   temp = new_tree;
+   while (temp-next)
+   temp = temp-next;
+   search-de_nsubtrees++;
+   temp-next = search-next;
+   search-next = new_tree;
+   }
+
+   len = strlen(item-string);
+   conflict_entry = create_new_conflict(item-string, len, 
current-de_pathlen);
+   add_conflict_to_directory_entry(current, conflict_entry);
+   for (i = 0; i  3; i++) {
+   if (ui-mode[i]) {
+   struct conflict_part *cp;
+
+   cp = xmalloc(sizeof(struct conflict_part));
+   cp-flags = (i + 1)  CONFLICT_STAGESHIFT;
+   cp-entry_mode = ui-mode[i];
+   cp-next = NULL;
+   hashcpy(cp-sha1, ui-sha1[i]);
+   add_part_to_conflict_entry(current, 
conflict_entry, cp);
+   }
+   }
+   }
+}
+
 static struct directory_entry *compile_directory_data(struct index_state 
*istate,
int nfile,
unsigned int *ndir,
@@ -1099,6 +1192,7 @@ static struct directory_entry 
*compile_directory_data(struct 

[PATCH v2 15/19] read-cache: write index-v5

2013-07-12 Thread Thomas Gummerer
Write the index version 5 file format to disk. This version doesn't
write the cache-tree data and resolve-undo data to the file.

The main work is done when filtering out the directories from the
current in-memory format, where in the same turn also the conflicts
and the file data is calculated.

Helped-by: Nguyen Thai Ngoc Duy pclo...@gmail.com
Helped-by: Thomas Rast tr...@student.ethz.ch
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 cache.h |   8 +
 read-cache-v5.c | 594 +++-
 read-cache.c|  11 +-
 read-cache.h|   1 +
 4 files changed, 611 insertions(+), 3 deletions(-)

diff --git a/cache.h b/cache.h
index 1e5cc77..f3685f8 100644
--- a/cache.h
+++ b/cache.h
@@ -565,6 +565,7 @@ extern int unmerged_index(const struct index_state *);
 extern int verify_path(const char *path);
 extern struct cache_entry *index_name_exists(struct index_state *istate, const 
char *name, int namelen, int igncase);
 extern int index_name_pos(const struct index_state *, const char *name, int 
namelen);
+extern struct directory_entry *init_directory_entry(char *pathname, int len);
 #define ADD_CACHE_OK_TO_ADD 1  /* Ok to add */
 #define ADD_CACHE_OK_TO_REPLACE 2  /* Ok to replace file/directory */
 #define ADD_CACHE_SKIP_DFCHECK 4   /* Ok to skip DF conflict checks */
@@ -1363,6 +1364,13 @@ static inline ssize_t write_str_in_full(int fd, const 
char *str)
return write_in_full(fd, str, strlen(str));
 }
 
+/* index-v5 helper functions */
+extern char *super_directory(const char *filename);
+extern void insert_directory_entry(struct directory_entry *, struct hash_table 
*, int *, unsigned int *, uint32_t);
+extern void add_conflict_to_directory_entry(struct directory_entry *, struct 
conflict_entry *);
+extern void add_part_to_conflict_entry(struct directory_entry *, struct 
conflict_entry *, struct conflict_part *);
+extern struct conflict_entry *create_new_conflict(char *, int, int);
+
 /* pager.c */
 extern void setup_pager(void);
 extern const char *pager_program;
diff --git a/read-cache-v5.c b/read-cache-v5.c
index 0b9c320..33667d7 100644
--- a/read-cache-v5.c
+++ b/read-cache-v5.c
@@ -769,9 +769,601 @@ static int read_index_v5(struct index_state *istate, void 
*mmap,
return 0;
 }
 
+#define WRITE_BUFFER_SIZE 8192
+static unsigned char write_buffer[WRITE_BUFFER_SIZE];
+static unsigned long write_buffer_len;
+
+static int ce_write_flush(int fd)
+{
+   unsigned int buffered = write_buffer_len;
+   if (buffered) {
+   if (write_in_full(fd, write_buffer, buffered) != buffered)
+   return -1;
+   write_buffer_len = 0;
+   }
+   return 0;
+}
+
+static int ce_write(uint32_t *crc, int fd, void *data, unsigned int len)
+{
+   if (crc)
+   *crc = crc32(*crc, (Bytef*)data, len);
+   while (len) {
+   unsigned int buffered = write_buffer_len;
+   unsigned int partial = WRITE_BUFFER_SIZE - buffered;
+   if (partial  len)
+   partial = len;
+   memcpy(write_buffer + buffered, data, partial);
+   buffered += partial;
+   if (buffered == WRITE_BUFFER_SIZE) {
+   write_buffer_len = buffered;
+   if (ce_write_flush(fd))
+   return -1;
+   buffered = 0;
+   }
+   write_buffer_len = buffered;
+   len -= partial;
+   data = (char *) data + partial;
+   }
+   return 0;
+}
+
+static int ce_flush(int fd)
+{
+   unsigned int left = write_buffer_len;
+
+   if (left)
+   write_buffer_len = 0;
+
+   if (write_in_full(fd, write_buffer, left) != left)
+   return -1;
+
+   return 0;
+}
+
+static void ce_smudge_racily_clean_entry(struct cache_entry *ce)
+{
+   /*
+* This method shall only be called if the timestamp of ce
+* is racy (check with is_racy_timestamp). If the timestamp
+* is racy, the writer will set the CE_SMUDGED flag.
+*
+* The reader (match_stat_basic) will then take care
+* of checking if the entry is really changed or not, by
+* taking into account the size and the stat_crc and if
+* that hasn't changed checking the sha1.
+*/
+   ce-ce_flags |= CE_SMUDGED;
+}
+
+char *super_directory(const char *filename)
+{
+   char *slash;
+
+   slash = strrchr(filename, '/');
+   if (slash)
+   return xmemdupz(filename, slash-filename);
+   return NULL;
+}
+
+struct directory_entry *init_directory_entry(char *pathname, int len)
+{
+   struct directory_entry *de = xmalloc(directory_entry_size(len));
+
+   memcpy(de-pathname, pathname, len);
+   de-pathname[len] = '\0';
+   de-de_flags  = 0;
+   de-de_foffset= 0;
+   de-de_cr = 0;
+   de-de_ncr

[PATCH v2 19/19] p0003-index.sh: add perf test for the index formats

2013-07-12 Thread Thomas Gummerer
From: Thomas Rast tr...@inf.ethz.ch

Add a performance test for index version [23]/4/5 by using
git update-index --index-version=x, thus testing both the reader
and the writer speed of all index formats.

Signed-off-by: Thomas Rast tr...@inf.ethz.ch
Signed-off-by: Thomas Gummerer t.gumme...@gmail.com
---
 t/perf/p0003-index.sh | 59 +++
 1 file changed, 59 insertions(+)
 create mode 100755 t/perf/p0003-index.sh

diff --git a/t/perf/p0003-index.sh b/t/perf/p0003-index.sh
new file mode 100755
index 000..3e02868
--- /dev/null
+++ b/t/perf/p0003-index.sh
@@ -0,0 +1,59 @@
+#!/bin/sh
+
+test_description=Tests index versions [23]/4/5
+
+. ./perf-lib.sh
+
+test_perf_large_repo
+
+test_expect_success convert to v3 
+   git update-index --index-version=2
+
+
+test_perf v[23]: update-index 
+   git update-index --index-version=2 /dev/null
+
+
+subdir=$(git ls-files | sed 's#/[^/]*$##' | grep -v '^$' | uniq | tail -n 30 | 
head -1)
+
+test_perf v[23]: grep nonexistent -- subdir 
+   test_must_fail git grep nonexistent -- $subdir /dev/null
+
+
+test_perf v[23]: ls-files -- subdir 
+   git ls-files $subdir /dev/null
+
+
+test_expect_success convert to v4 
+   git update-index --index-version=4
+
+
+test_perf v4: update-index 
+   git update-index --index-version=4 /dev/null
+
+
+test_perf v4: grep nonexistent -- subdir 
+   test_must_fail git grep nonexistent -- $subdir /dev/null
+
+
+test_perf v4: ls-files -- subdir 
+   git ls-files $subdir /dev/null
+
+
+test_expect_success convert to v5 
+   git update-index --index-version=5
+
+
+test_perf v5: update-index 
+   git update-index --index-version=5 /dev/null
+
+
+test_perf v5: grep nonexistent -- subdir 
+   test_must_fail git grep nonexistent -- $subdir /dev/null
+
+
+test_perf v5: ls-files -- subdir 
+   git ls-files $subdir /dev/null
+
+
+test_done
-- 
1.8.3.453.g1dfc63d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in .mailmap handling?

2013-07-12 Thread Junio C Hamano
Stefan Beller stefanbel...@googlemail.com writes:

 Hello,

 you may have noticed I am currently trying to bring the 
 mailmap file of git itself up to date. I noticed
 some behavior, which I did not expect. Have a look yourself:

 ---
   # prepare test environment:
   mkdir testmailmap
   cd testmailmap/
   git init

   # do a commit:
   echo asdf  test1 
   git add test1
   git commit -a --author=A a...@example.org -m add test1

   # commit with same name, but different email 
   # (different capitalization does the trick already, 
   # but here I am going to use a different mail)
   echo asdf  test2
   git add test2
   git commit -a --author=A changed_em...@example.org -m add test2

   # how do we know it's the same person?
   git shortlog
   A (2):
 add test1
 add test2

You don't, and it is a long known behaviour.

   # reports as expected:
   git shortlog -sne
 1  A a...@example.org
 1  A changed_em...@example.org

Yes.

   # Adding the line to the mailmap should make life easy, so we know
   # it's the same person
   echo A a...@example.org changed_em...@example.org  .mailmap

   # Come on, I just wanted to have it reported as one person!
   git shortlog -sne
1  A a...@example.org
1  A a...@example.org

Err, where does the lowercase a@ come from in the above?  Are we
missing some steps before we get here?

   # So let's try another line in the mailmap file, (small 'a')
   echo A a...@example.org changed_em...@example.org  .mailmap

This is , not , I presume?  Otherwise changed_email is mapped
to two destination, no?

   # We're not there yet?
   git shortlog -sne
1  A a...@example.org
1  A a...@example.org

Expected, as long as some hidden set-up you did not describe that
caused me to say Err, where does the lowercase a@ come from is
there, i.e. one of the two commits is done by a...@example.org.

   # Now let's write it rather explicit: 
   # (essentially just write 2 lines into the mailmap file)
   cat  EOF  .mailmap
   A a...@example.org changed_em...@example.org
   A a...@example.org a...@example.org
   EOF

   # works as expected now
   git shortlog -sne
2  A a...@example.org

Makes sense.

   # works as expected now as well
   git shortlog  
   A (2):
 add test1
 add test2
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] push: document --lockref

2013-07-12 Thread Junio C Hamano
Johannes Sixt j...@kdbg.org writes:

 Am 12.07.2013 00:14, schrieb Junio C Hamano:
 Johannes Sixt j...@kdbg.org writes:
 
 Again: Why not just define +refspec as the way to achieve this check?
 
 What justification do we have to break existing people's
 configuration that says something like:
 
  [remote ko]
  url = kernel.org:/pub/scm/git/git.git
 push = master
 push = next
 push = +pu
 push = maint
 
 by adding a _new_ requirement they may not be able to satisify?
 Notice that the above is a typical push only publishing point,
 where you do not need any remote tracking branches.

 Why would it break? When you do not specify --lockref, there is no
 change whatsoever.

I thought your suggestion Why not just define +pu as the way to
achieve _THIS_ check? was to make +pu to mean

git push ko --lockref pu

which would mean check refs/remotes/ko/pu and make sure the remote
side still is at that commit.

If that is not what you meant, please clarify what _THIS_ is.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in .mailmap handling?

2013-07-12 Thread Stefan Beller
The commands are exactly as given in the first mail.
(I run it multiple times, and this exact behavior comes up)

I think there is some problem with the capitalisation
of the first (if not all) characters. (Hence the small 'a')

So either check with these example commands yourself, or take my latest
patch for the mailmap to reproduce on real names, please.

On 07/12/2013 07:35 PM, Junio C Hamano wrote:
 Stefan Beller stefanbel...@googlemail.com writes:
 
 Hello,

 you may have noticed I am currently trying to bring the 
 mailmap file of git itself up to date. I noticed
 some behavior, which I did not expect. Have a look yourself:

 ---
  # prepare test environment:
  mkdir testmailmap
  cd testmailmap/
  git init

  # do a commit:
  echo asdf  test1 
  git add test1
  git commit -a --author=A a...@example.org -m add test1

  # commit with same name, but different email 
  # (different capitalization does the trick already, 
  # but here I am going to use a different mail)
  echo asdf  test2
  git add test2
  git commit -a --author=A changed_em...@example.org -m add test2

  # how do we know it's the same person?
  git shortlog
  A (2):
add test1
add test2
 
 You don't, and it is a long known behaviour.
 
  # reports as expected:
  git shortlog -sne
1  A a...@example.org
1  A changed_em...@example.org
 
 Yes.
 
  # Adding the line to the mailmap should make life easy, so we know
  # it's the same person
  echo A a...@example.org changed_em...@example.org  .mailmap

  # Come on, I just wanted to have it reported as one person!
  git shortlog -sne
   1  A a...@example.org
   1  A a...@example.org
 
 Err, where does the lowercase a@ come from in the above?  Are we
 missing some steps before we get here?
 
  # So let's try another line in the mailmap file, (small 'a')
  echo A a...@example.org changed_em...@example.org  .mailmap
 
 This is , not , I presume?  Otherwise changed_email is mapped
 to two destination, no?
 
  # We're not there yet?
  git shortlog -sne
   1  A a...@example.org
   1  A a...@example.org
 
 Expected, as long as some hidden set-up you did not describe that
 caused me to say Err, where does the lowercase a@ come from is
 there, i.e. one of the two commits is done by a...@example.org.
 
  # Now let's write it rather explicit: 
  # (essentially just write 2 lines into the mailmap file)
  cat  EOF  .mailmap
  A a...@example.org changed_em...@example.org
  A a...@example.org a...@example.org
  EOF
   
  # works as expected now
  git shortlog -sne
   2  A a...@example.org
 
 Makes sense.
 
  # works as expected now as well
  git shortlog  
  A (2):
add test1
add test2




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v4] config: add support for http.url.* settings

2013-07-12 Thread Junio C Hamano
Kyle J. McKay mack...@gmail.com writes:

 The url value is considered a match to a url if the url value
 is either an exact match or a prefix of the url which ends on a
 path component boundary ('/').  So https://example.com/test; will
 match https://example.com/test; and https://example.com/test/too;
 but not https://example.com/testextra;.

 Longer matches take precedence over shorter matches with
 environment variable settings taking precedence over all.

 With this configuration:

 [http]
 useragent = other-agent
 noEPSV = true
 [http https://example.com;]
 useragent = example-agent
 sslVerify = false
 [http https://example.com/path;]
 useragent = path-agent

 The https://other.example.com/; url will have useragent
 other-agent and sslVerify will be on.

 The https://example.com/; url will have useragent
 example-agent and sslVerify will be off.

 The https://example.com/path/sub; url will have useragent
 path-agent and sslVerify will be off.

 All three of the examples will have noEPSV enabled.

 Signed-off-by: Kyle J. McKay mack...@gmail.com
 Reviewed-by: Junio C Hamano gits...@pobox.com

I haven't reviewed this version (yet).

 ---

 The credentials configuration values already support url-specific
 configuration items in the form credential.url.*.  This patch
 adds similar support for http configuration values.

Let's move the above three lines into the proposed log message and
make it its first paragraph.  A log message should say what it is
about (i.e. what does add support really mean?  what kind of
functionality the new support brings to us?) upfront and then
explain what it does in detail (i.e. the explanation of matching
semantics you have as the first paragraph).

 diff --git a/Documentation/config.txt b/Documentation/config.txt
 index b4d4887..3731a3a 100644
 --- a/Documentation/config.txt
 +++ b/Documentation/config.txt
 @@ -1531,6 +1531,17 @@ http.useragent::
   of common USER_AGENT strings (but not including those like git/1.7.1).
   Can be overridden by the 'GIT_HTTP_USER_AGENT' environment variable.
  
 +http.url.*::
 + Any of the http.* options above can be applied selectively to some urls.
 + For example http.https://example.com.useragent; would set the user
 + agent only for https connections to example.com.  The url value
 + matches a url if it is an exact match or a prefix of the url matching
 + at a '/' boundary.

Given Peff's review, I am no longer sure if the strictly textual
match is the semantics we want.  At least, it is easy to explain
and understand, but it might be too limiting to be useful.

Let's assume it is what we want, for the rest of the review.

 diff --git a/http.c b/http.c
 index 2d086ae..feca70a 100644
 --- a/http.c
 +++ b/http.c
 @@ -30,6 +30,34 @@ static CURL *curl_default;
  
  char curl_errorstr[CURL_ERROR_SIZE];
  
 +enum http_option_type {
 + opt_post_buffer = 0,

Do we need to have = 0 here?

Is this order meant to match some externally defined order
(e.g. alphabetical, or the value of corresponding constants defined
in the cURL library), other than opt_max is not a real option but
just has to be there at the end to count all of them?

I am wondering if it would make it easier to read to move all the #ifdef
at the end immediately before opt-max, or something.

 + opt_min_sessions,
 +#ifdef USE_CURL_MULTI
 + opt_max_requests,
 +#endif
 +...
 + opt_user_agent,
 + opt_passwd_req,
 + opt_max
 +};

 +static int check_matched_len(enum http_option_type opt, size_t matchlen)
 +{
 + /*
 +  * Check the last matched length of option opt against matchlen and
 +  * return true if the last matched length is larger (meaning the config
 +  * setting should be ignored).  Upon seeing the _same_ key (i.e. new key
 +  * has the same match length which is therefore not larger) the new
 +  * setting will override the previous setting.  Otherwise return false
 +  * and record matchlen as the current last matched length of option opt.
 +  */
 + if (http_option_max_matched_len[opt]  matchlen)
 + return 1;
 + http_option_max_matched_len[opt] = matchlen;
 + return 0;
 +}

A check_foo that returns either 0 or 1 usually mean 1 is good and
0 is not.  A check_foo that returns either negative or 0 usually
mean negative is an error and 0 is good.

In general, check_foo is a bad name for a boolean function.  This
checks seen_longer_match(), and it is up to the caller if it
considers good or bad to have a matching element that is longer or
shorter what it has already seen.  See below.

  static int http_options(const char *var, const char *value, void *cb)
  {
 - if (!strcmp(http.sslverify, var)) {
 + const char *url = (const char *)cb;

Cast?

 + if (!strcmp(sslverify, key)) {
 + if (check_matched_len(opt_ssl_verify, matchlen))
 + return 0;
   curl_ssl_verify = git_config_bool(var, 

Re: [PATCH v4] config: add support for http.url.* settings

2013-07-12 Thread Junio C Hamano
Kyle J. McKay mack...@gmail.com writes:

 + if (!strcmp(sslcertpasswordprotected, key)) {
 + if (check_matched_len(opt_passwd_req, matchlen))
 + return 0;
   if (git_config_bool(var, value))
   ssl_cert_password_required = 1;
   return 0;
   }

This is not a new problem, but I think the existing code is wrong.
There is no way to countermand an earlier

[http]
sslcertpasswordprotected

in a more generic configuration file with


[http]
sslcertpasswordprotected = no

in a repository specific configuration file.

Perhaps we should fix it as a preparatory patch (1/2) before the
main feature addition patch.

 - if (!strcmp(http.ssltry, var)) {
 + if (!strcmp(ssltry, key)) {
 + if (check_matched_len(opt_ssl_try, matchlen))
 + return 0;
   curl_ssl_try = git_config_bool(var, value);
   return 0;
   }
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] http.c: fix parsing of http.sslCertPasswordProtected variable

2013-07-12 Thread Junio C Hamano
The existing code triggers only when the configuration variable is
set to true.  Once the variable is set to true in a more generic
configuration file (e.g. ~/.gitconfig), it cannot be overriden to
false in the repository specific one (e.g. .git/config).

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 http.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/http.c b/http.c
index 92aba59..37986f8 100644
--- a/http.c
+++ b/http.c
@@ -160,8 +160,7 @@ static int http_options(const char *var, const char *value, 
void *cb)
if (!strcmp(http.sslcainfo, var))
return git_config_string(ssl_cainfo, var, value);
if (!strcmp(http.sslcertpasswordprotected, var)) {
-   if (git_config_bool(var, value))
-   ssl_cert_password_required = 1;
+   ssl_cert_password_required = git_config_bool(var, value);
return 0;
}
if (!strcmp(http.ssltry, var)) {
-- 
1.8.3.2-942-gc84dfcb

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] .mailmap: Map email addresses to names

2013-07-12 Thread Jonathan Nieder
Stefan Beller wrote:

 --- a/.mailmap
 +++ b/.mailmap
 @@ -5,99 +5,146 @@
[...]
  Chris Shoemaker c.shoema...@cox.net
 -Dan Johnson computerdr...@gmail.com
  Dana L. How dana...@gmail.com
  Dana L. How h...@deathvalley.cswitch.com
  Daniel Barkalow barka...@iabervon.org
 +Dan Johnson computerdr...@gmail.com

Small nit: the sorting looks broken here and in similar places below
(the usual ordering is Dan  Dana  Daniel).

Thanks,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] .mailmap: Map email addresses to names

2013-07-12 Thread Stefan Beller

Which tool would you recommend to sort stuff?
Or rather the exact parameters for /usr/bin/sort ?

Thanks,
Stefan

On 07/12/2013 08:57 PM, Jonathan Nieder wrote:
 Stefan Beller wrote:
 
 --- a/.mailmap
 +++ b/.mailmap
 @@ -5,99 +5,146 @@
 [...]
  Chris Shoemaker c.shoema...@cox.net
 -Dan Johnson computerdr...@gmail.com
  Dana L. How dana...@gmail.com
  Dana L. How h...@deathvalley.cswitch.com
  Daniel Barkalow barka...@iabervon.org
 +Dan Johnson computerdr...@gmail.com
 
 Small nit: the sorting looks broken here and in similar places below
 (the usual ordering is Dan  Dana  Daniel).
 
 Thanks,
 Jonathan
 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http.c: fix parsing of http.sslCertPasswordProtected variable

2013-07-12 Thread Jonathan Nieder
Junio C Hamano wrote:

 The existing code triggers only when the configuration variable is
 set to true.  Once the variable is set to true in a more generic
 configuration file (e.g. ~/.gitconfig), it cannot be overriden to
 false in the repository specific one (e.g. .git/config).
[...]
 --- a/http.c
 +++ b/http.c
 @@ -160,8 +160,7 @@ static int http_options(const char *var, const char 
 *value, void *cb)
   if (!strcmp(http.sslcainfo, var))
   return git_config_string(ssl_cainfo, var, value);
   if (!strcmp(http.sslcertpasswordprotected, var)) {
 - if (git_config_bool(var, value))
 - ssl_cert_password_required = 1;
 + ssl_cert_password_required = git_config_bool(var, value);

Thanks for catching it.  The documentation doesn't say anything about
this can only enable and cannot disable behavior and the usual
pattern is to allow later settings to override earlier ones, so this
change looks good.

Reviewed-by: Jonathan Nieder jrnie...@gmail.com

FWIW the GIT_SSL_CERT_PASSWORD_PROTECTED envvar has a similar can
only enable behavior, but since it's documented, that's not as big
of a problem.  Do you remember why it was written that way?

When that setting was first added[1], there was some mention of
autodetection if libcurl could learn to do that.  Did it happen?
(Please forgive my ignorance.)

Thanks,
Jonathan

[1] http://thread.gmane.org/gmane.comp.version-control.git/122793/focus=122816
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] .mailmap: Map email addresses to names

2013-07-12 Thread Stefan Beller
From the man page

*** WARNING *** The locale specified by the environment affects sort
order.  Set LC_ALL=C to get the traditional sort  order  that  uses
   native byte values.

And indeed I can avoid being nitpicked again for wrong sorting. ;)

On 07/12/2013 09:02 PM, Stefan Beller wrote:
 
 Which tool would you recommend to sort stuff?
 Or rather the exact parameters for /usr/bin/sort ?
 
 Thanks,
 Stefan
 
 On 07/12/2013 08:57 PM, Jonathan Nieder wrote:
 Stefan Beller wrote:

 --- a/.mailmap
 +++ b/.mailmap
 @@ -5,99 +5,146 @@
 [...]
  Chris Shoemaker c.shoema...@cox.net
 -Dan Johnson computerdr...@gmail.com
  Dana L. How dana...@gmail.com
  Dana L. How h...@deathvalley.cswitch.com
  Daniel Barkalow barka...@iabervon.org
 +Dan Johnson computerdr...@gmail.com

 Small nit: the sorting looks broken here and in similar places below
 (the usual ordering is Dan  Dana  Daniel).

 Thanks,
 Jonathan

 

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] .mailmap: Map email addresses to names

2013-07-12 Thread Stefan Beller
Ok I am sending all confirmed changes now again
in one big patch, as the sorting was wrong.

Stefan Beller (1):
  .mailmap: Map email addresses to names

 .mailmap | 135 +++
 1 file changed, 110 insertions(+), 25 deletions(-)

-- 
1.8.3.2.790.g9192b0b

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] .mailmap: Map email addresses to names

2013-07-12 Thread Stefan Beller
People change email addresses quite often and sometimes
forget to add their entry to the mailmap file.
I have contacted lots of people, whose name occurs
multiple times in the short log having different
email addresses. The entries in the mailmap of
this patch are either confirmed by them or are trivial.
Trivial means different capitalisation of the domain
(@MIT.EDU and @mit.edu) or the domain was localhost,
(none) or @local.

Additionally to adding (name, email) mappings to the
.mailmap file, it has also been sorted alphabetically.
(which explains the removals, which are added
3 lines later on again). The sorting was done using
export LC_ALL=C; /usr/bin/sort without arguments.

While the most changes happen at the email addresses,
we also have a name change in here. Karl Hasselström
is now known as Karl Wiberg due to marriage. Congratulations!

To find out whom to contact I used the following small
script:
-
#!/bin/bash
git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d  
mailmapdoubles
while read line ; do
# remove leading whitespace
trimmed=$(echo $line | sed -e 's/^ *//g' -e 's/ *$//g')
echo git shortlog -sne | grep \$trimmed\
done  mailmapdoubles  mailmapdoubles2
sh mailmapdoubles2
rm mailmapdoubles
rm mailmapdoubles2
-
Also interesting for similar tasks are these snippets:

# Finding out duplicates by comparing email addresses:
git shortlog -sne |awk '{ print $NF }' |sort |uniq -d

# Finding out duplicates by comparing names:
git shortlog -sne |awk '{ NF--; $1=; print }' |sort |uniq -d
-

Signed-off-by: Stefan Beller stefanbel...@googlemail.com
---
 .mailmap | 135 +++
 1 file changed, 110 insertions(+), 25 deletions(-)

diff --git a/.mailmap b/.mailmap
index 345cce6..22d3d70 100644
--- a/.mailmap
+++ b/.mailmap
@@ -5,99 +5,184 @@
 # same person appearing not to be so.
 #
 
+n...@fluxnic.net n...@cam.org
+Alejandro R. Sedeño ased...@mit.edu ased...@mit.edu
 Alex Bennée kernel-hac...@bennee.com
+Alex Riesen raa.l...@gmail.com fo...@t-online.de
+Alex Riesen raa.l...@gmail.com raa@limbo.localdomain
+Alex Riesen raa.l...@gmail.com r...@steel.home
+Alex Vandiver a...@chmrr.net ale...@mit.edu
 Alexander Gavrilov angavri...@gmail.com
+Alexey Shumkin alex.crez...@gmail.com zap...@mail.ru
+Anders Kaseorg ande...@mit.edu ande...@ksplice.com
+Anders Kaseorg ande...@mit.edu ande...@mit.edu
 Aneesh Kumar K.V aneesh.ku...@gmail.com
+Bernt Hansen be...@norang.ca be...@alumni.uwaterloo.ca
+Brandon Casey draf...@gmail.com ca...@nrlssc.navy.mil
 Brian M. Carlson sand...@crustytoothpaste.ath.cx
+Bryan Larsen br...@larsen.st bryan.lar...@gmail.com
+Bryan Larsen br...@larsen.st bryanlar...@yahoo.com
 Cheng Renquan crq...@gmail.com
 Chris Shoemaker c.shoema...@cox.net
 Dan Johnson computerdr...@gmail.com
 Dana L. How dana...@gmail.com
 Dana L. How h...@deathvalley.cswitch.com
 Daniel Barkalow barka...@iabervon.org
+David Brown g...@davidb.org dav...@quicinc.com
 David D. Kilzer ddkil...@kilzer.net
 David Kågedal dav...@lysator.liu.se
+David Reiss dre...@facebook.com dreiss@dreiss-vmware.(none)
 David S. Miller da...@davemloft.net
 Deskin Miller desk...@umich.edu
 Dirk Süsserott newslet...@dirk.my1.cc
+Eric Blake ebl...@redhat.com e...@byu.net
+Eric Hanchrow eric.hanch...@gmail.com off...@blarg.net
 Eric S. Raymond e...@thyrsus.com
 Erik Faye-Lund kusmab...@gmail.com kusmab...@googlemail.com
+Eyvind Bernhardsen eyvind.bernhard...@gmail.com eyvind-...@orakel.ntnu.no
+Florian Achleitner florian.achleitner.2.6...@gmail.com 
florian.achleitner2.6...@gmail.com
+Franck Bui-Huu vagabon@gmail.com fbui...@gmail.com
+Frank Lichtenheld fr...@lichtenheld.de dj...@debian.org
+Frank Lichtenheld fr...@lichtenheld.de flichtenh...@astaro.com
 Fredrik Kuivinen freku...@student.liu.se
 Frédéric Heitzmann frederic.heitzm...@gmail.com
 H. Merijn Brand h.m.br...@xs4all.nl H.Merijn Brand h.m.br...@xs4all.nl
-H. Peter Anvin h...@bonde.sc.orionmulti.com
-H. Peter Anvin h...@tazenda.sc.orionmulti.com
-H. Peter Anvin h...@trantor.hos.anvin.org
+H. Peter Anvin h...@zytor.com h...@bonde.sc.orionmulti.com
+H. Peter Anvin h...@zytor.com h...@smyrno.hos.anvin.org
+H. Peter Anvin h...@zytor.com h...@tazenda.sc.orionmulti.com
+H. Peter Anvin h...@zytor.com h...@trantor.hos.anvin.org
+Han-Wen Nienhuys han...@google.com Han-Wen Nienhuys han...@xs4all.nl
 Horst H. von Brand vonbr...@inf.utfsm.cl
-İsmail Dönmez ism...@pardus.org.tr
+J. Bruce Fields bfie...@citi.umich.edu bfie...@fieldses.org
+J. Bruce Fields bfie...@citi.umich.edu bfie...@pig.linuxdev.us.dell.com
+J. Bruce Fields bfie...@citi.umich.edu bfie...@puzzle.fieldses.org
 Jakub Narębski jna...@gmail.com
-Jay Soffian jaysoffian+...@gmail.com
+Jason Riedy e...@eecs.berkeley.edu e...@eecs.berkeley.edu
+Jason Riedy e...@eecs.berkeley.edu e...@cs.berkeley.edu
+Jay Soffian jaysoff...@gmail.com jaysoffian+...@gmail.com
 Jeff King p...@peff.net p...@github.com
+Jeff 

Re: [PATCH] .mailmap: Map email addresses to names

2013-07-12 Thread Junio C Hamano
Stefan Beller stefanbel...@googlemail.com writes:

 Ok I am sending all confirmed changes now again
 in one big patch, as the sorting was wrong.

 Stefan Beller (1):
   .mailmap: Map email addresses to names

So this corresponds to both of your patches, or just the first
batch?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] http.c: fix parsing of http.sslCertPasswordProtected variable

2013-07-12 Thread Junio C Hamano
Jonathan Nieder jrnie...@gmail.com writes:

 FWIW the GIT_SSL_CERT_PASSWORD_PROTECTED envvar has a similar can
 only enable behavior, but since it's documented, that's not as big
 of a problem.  Do you remember why it was written that way?

Not me ;-).

If I have to guess, it is probably that these two are thought to be
equally trivial way to defeat existing environment settings:

(unset GIT_SSL_CERT_PASSWORD_PROTECTED ; git cmd)
GIT_SSL_CERT_PASSWORD_PROTECTED=no git cmd

 When that setting was first added[1], there was some mention of
 autodetection if libcurl could learn to do that.  Did it happen?

I do not think so, but let's see if our resident cURL guru has
something to say about it.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] push: document --lockref

2013-07-12 Thread Johannes Sixt
Am 12.07.2013 19:40, schrieb Junio C Hamano:
 Johannes Sixt j...@kdbg.org writes:
 
 Am 12.07.2013 00:14, schrieb Junio C Hamano:
 Johannes Sixt j...@kdbg.org writes:

 Again: Why not just define +refspec as the way to achieve this check?

 What justification do we have to break existing people's
 configuration that says something like:

 [remote ko]
 url = kernel.org:/pub/scm/git/git.git
 push = master
 push = next
 push = +pu
 push = maint

 by adding a _new_ requirement they may not be able to satisify?
 Notice that the above is a typical push only publishing point,
 where you do not need any remote tracking branches.

 Why would it break? When you do not specify --lockref, there is no
 change whatsoever.
 
 I thought your suggestion Why not just define +pu as the way to
 achieve _THIS_ check? was to make +pu to mean
 
   git push ko --lockref pu
 
 which would mean check refs/remotes/ko/pu and make sure the remote
 side still is at that commit.
 
 If that is not what you meant, please clarify what _THIS_ is.

We have three independent options that the user can choose in any
combination:

 o --force given or not;

 o --lockref semantics enabled or not;

 o refspec with or without +;

and these two orthogonal preconditions of the push

 o push is fast-forward or it is not (ff, noff);

 o the branch at the remote is at the expected rev or it is not
   (match, mismatch).

Here is a table with the expected outcome. ok means that the push is
allowed(*), fail means that the push is denied. (Four more lines with
--force are omitted because they have ok in all spots.)

   ff   noff ff  noff
  match match mismatch mismatch

--lockref +refspec okokdenied   denied
--lockref  refspec ok  denied  denied   denied
  +refspec okok  ok   ok
   refspec ok  deniedok denied

Notice that without --lockref semantics enabled, +refspec and refspec
keep the current behavior.

(*) As we are talking only about the client-side of the push here, I'm
saying allowed instead of succeeds because the server can have
additional restrictions that can make the push fail.

-- Hannes

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] .mailmap: Map email addresses to names

2013-07-12 Thread Junio C Hamano
Stefan Beller stefanbel...@googlemail.com writes:

 Additionally to adding (name, email) mappings to the
 .mailmap file, it has also been sorted alphabetically.
 (which explains the removals, which are added
 3 lines later on again).

What is this 3 lines later on again about?  Is it a remnant from
the previous iteration?  If so, I can remove this (which ...)
locally.

 While the most changes happen at the email addresses,
 we also have a name change in here. Karl Hasselström
 is now known as Karl Wiberg due to marriage. Congratulations!

Indeed.  I like this part of the log message ;-)
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/7] cat-file --batch-check performance improvements

2013-07-12 Thread Jeff King
On Fri, Jul 12, 2013 at 10:23:34AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  The results for running (in linux.git):
 
$ git rev-list --objects --all objects
$ git cat-file --batch-check='%(objectsize:disk)' objects /dev/null
 
 I can see how these patches are very logical avenue to grab only
 on-disk footprint for large number of objects, but among the type,
 payload size and on-disk footprint, I find it highly narrow niche
 that a real user or script is interested _only_ in on-disk footprint
 without even worrying about the type of object.

Yeah, I agree it is a bit of a niche. However, there are other code
paths that might want only the size and not the type (e.g., we already
know the object is a blob, but want to know size before deciding how to
handle diff). But in general, I doubt the performance impact is a big
deal there. It's only measurable when you're doing millions of objects.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in .mailmap handling?

2013-07-12 Thread Junio C Hamano
Stefan Beller stefanbel...@googlemail.com writes:

   # Adding the line to the mailmap should make life easy, so we know
   # it's the same person
   echo A a...@example.org changed_em...@example.org  .mailmap

While I was looking at this, I noticed this piece of code:

diff --git a/mailmap.c b/mailmap.c
index 2a7b366..418081e 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -122,7 +122,7 @@ static char *parse_name_and_email(char *buffer, char **name,
while (nend  nstart  isspace(*nend))
--nend;
 
-   *name = (nstart  nend ? nstart : NULL);
+   *name = (nstart = nend ? nstart : NULL);
*email = left+1;
*(nend+1) = '\0';
*right++ = '\0';

The function is given a buffer A a...@example.org..., nstart scans
from the beginning of the buffer, skipping whitespaces (there isn't
any, so nstart points at the buffer), while nend starts from one
byte before the first '' and skips whitespaces backwards and ends
at the first non-whitespace (i.e. it hits A at the beginning of
the buffer).  nstart == nend in this case for a single-letter name,
and an off-by-one error makes it fail to pick up the name, which
makes the entry equivalent to

a...@example.org changed_em...@example.org

without the name.  I do not think this bug affected anything you
observed, though.

   git shortlog -sne
1  A a...@example.org
1  A a...@example.org

This is coming from mailmap.c::add_mapping() that downcases the
e-mail address.

changed_em...@example.org is mapped to a...@example.org because of this
downcasing, while A a...@example.org does not have any entry for it
in the .mailmap file, so it is given back as-is.  Hence we see two
distinct entries.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in .mailmap handling?

2013-07-12 Thread Stefan Beller
On 07/12/2013 10:28 PM, Junio C Hamano wrote:
 Stefan Beller stefanbel...@googlemail.com writes:
 
  # Adding the line to the mailmap should make life easy, so we know
  # it's the same person
  echo A a...@example.org changed_em...@example.org  .mailmap
 
 While I was looking at this, I noticed this piece of code:
 
 diff --git a/mailmap.c b/mailmap.c
 index 2a7b366..418081e 100644
 --- a/mailmap.c
 +++ b/mailmap.c
 @@ -122,7 +122,7 @@ static char *parse_name_and_email(char *buffer, char 
 **name,
   while (nend  nstart  isspace(*nend))
   --nend;
  
 - *name = (nstart  nend ? nstart : NULL);
 + *name = (nstart = nend ? nstart : NULL);
   *email = left+1;
   *(nend+1) = '\0';
   *right++ = '\0';
 
 The function is given a buffer A a...@example.org..., nstart scans
 from the beginning of the buffer, skipping whitespaces (there isn't
 any, so nstart points at the buffer), while nend starts from one
 byte before the first '' and skips whitespaces backwards and ends
 at the first non-whitespace (i.e. it hits A at the beginning of
 the buffer).  nstart == nend in this case for a single-letter name,
 and an off-by-one error makes it fail to pick up the name, which
 makes the entry equivalent to
 
   a...@example.org changed_em...@example.org
 
 without the name.  I do not think this bug affected anything you
 observed, though.
 
  git shortlog -sne
   1  A a...@example.org
   1  A a...@example.org
 
 This is coming from mailmap.c::add_mapping() that downcases the
 e-mail address.
 
 changed_em...@example.org is mapped to a...@example.org because of this
 downcasing, while A a...@example.org does not have any entry for it
 in the .mailmap file, so it is given back as-is.  Hence we see two
 distinct entries.
 

So do I understand it right, we're having 2 bugs in here?

One being triggered by the short name, only one character.
So if you want to debug the other bug with a longer name,
you can either use a made up name, or run
git shortlog -sne |grep Knut
in the git repository having the mailmap file already updated.
The way the mailmap file is written, I'd assume only one line
to be found, as of now 2 lines come up
 2  Knut Franke knut.fra...@gmx.de
 1  Knut Franke knut.fra...@gmx.de

which seems to downcase the whole first email.





--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in .mailmap handling?

2013-07-12 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Stefan Beller stefanbel...@googlemail.com writes:

  git shortlog -sne
   1  A a...@example.org
   1  A a...@example.org

 This is coming from mailmap.c::add_mapping() that downcases the
 e-mail address.

 changed_em...@example.org is mapped to a...@example.org because of this
 downcasing, while A a...@example.org does not have any entry for it
 in the .mailmap file, so it is given back as-is.  Hence we see two
 distinct entries.

I think it is wrong for the parser to lose information by
downcasing.

It is perfectly fine to do the comparison case insensitively, to
allow changed_em...@example.org in the input is mangled using the
entry for changed_em...@example.org in the .mailmap file; it is
not fine to downcase the parsed result, especially the side that is
used as the rewritten result (i.e. the tokens earlier on the line),
as that would mean mangling the output.

Let me see if I can quickly whip up a fix.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH] t0008: avoid SIGPIPE race condition on fifo

2013-07-12 Thread Jeff King
On Fri, Jul 12, 2013 at 09:23:54AM -0700, Junio C Hamano wrote:

  In that case, check-ignore gets a SIGPIPE and dies. The
  outer shell then tries to open the output fifo but blocks
  indefinitely, because there is no writer.  We can fix it by
  keeping a descriptor open through the whole procedure.
 
 Ahh, figures.
 
 I wish I were smart enough to figure that out immediately after
 seeing the test that does funny things to in with 9.

I wish I were smart enough to have figured it out when I was writing the
funny stuff with in and 9 in the first place. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] wt-status: use format function attribute for status_printf

2013-07-12 Thread Jeff King
On Fri, Jul 12, 2013 at 09:10:30AM -0700, Junio C Hamano wrote:

  You can fix it with -Wno-zero-format-length, so the hassle is not
  huge. But I am also inclined to just drop this one. We have lived
  without the extra safety for a long time, and list review does tend to
  catch such problems in practice.
 
 I am tempted to actually merge the original one as-is without any of
 the workaround, and just tell people to use -Wno-format-zero-length.

Yeah, I think the only downside is the cognitive burden on individual
developers who try -Wall and have to figure out that we need
-Wno-zero-format-length (and that the warnings are not interesting).

It would be nice to add it automatically to CFLAGS, but I do not know if
we can reliably detect from the Makefile that we are compiling under
gcc.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug in .mailmap handling?

2013-07-12 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Junio C Hamano gits...@pobox.com writes:

 Stefan Beller stefanbel...@googlemail.com writes:

 git shortlog -sne
  1  A a...@example.org
  1  A a...@example.org

 This is coming from mailmap.c::add_mapping() that downcases the
 e-mail address.

 changed_em...@example.org is mapped to a...@example.org because of this
 downcasing, while A a...@example.org does not have any entry for it
 in the .mailmap file, so it is given back as-is.  Hence we see two
 distinct entries.

 I think it is wrong for the parser to lose information by
 downcasing.

 It is perfectly fine to do the comparison case insensitively, to
 allow changed_em...@example.org in the input is mangled using the
 entry for changed_em...@example.org in the .mailmap file; it is
 not fine to downcase the parsed result, especially the side that is
 used as the rewritten result (i.e. the tokens earlier on the line),
 as that would mean mangling the output.

 Let me see if I can quickly whip up a fix.

It might be just the matter of doing this.

I suspect that we could drop the downcasing of old-email, too, but
the new-email is supposed to be the rewritten result that appears on
the output, and downcasing it is very wrong.

I also suspect that this was an old workaround for the original
string-list that did not know how to match items case insensitively,
which we should have removed when we added case sensitivity support
to the string-list at around 577f63e7 (Merge branch
'ap/log-mailmap', 2013-01-20).

 mailmap.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/mailmap.c b/mailmap.c
index 418081e..c64a53d 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -56,9 +56,6 @@ static void add_mapping(struct string_list *map,
if (old_email)
for (p = old_email; *p; p++)
*p = tolower(*p);
-   if (new_email)
-   for (p = new_email; *p; p++)
-   *p = tolower(*p);
 
if (old_email == NULL) {
old_email = new_email;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v3] config: add support for http.url.* settings

2013-07-12 Thread Aaron Schrab

At 06:07 -0700 12 Jul 2013, Kyle J. McKay mack...@gmail.com wrote:
I don't think it's necessary to split the URL apart though.  Whatever 
URL the user gave to git on the command line (at some point even if 
it's now stored as a remote setting in config) complete with URL-
encoding, user names, port names, etc. is the same url, possibly 
shortened, that needs to be used for the http.url.option setting.


This seems to be assuming that the configuration is done after the URL 
is entered and that URLs are always entered manually.  I don't think 
either of those assumptions is valid.  A user may want to specify http 
settings for all repositories on a specified host and so add settings 
for that host to ~/.gitconfig expecting those settings to be used later.  
A URL in a slightly different format may later be copy+pasted without 
the user realizing that it won't use that config due to one of the 
versions being in a non-canonical form.


I think that's simple and very easy to explain and avoids user 
confusion and surprise while still allowing a default to be set for a 
site but easily overridden for a portion of that site without needing 
to worry about the order config files are processed or the order of the 
[http url] sections within them.


I agree that the method is easy to explain, but I think a user may very 
well be surprised and confused in a scenario like I described above.  
And having the order not matter (in some cases) for these configuration 
items deviates from how other configuration values are handled.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH] Add a testcase for checking case insensitivity of mail map

2013-07-12 Thread Stefan Beller
Your patch seems to do it.
I added a test case which doesn't fail, if your patch is added.

Signed-off-by: Stefan Beller stefanbel...@googlemail.com

---
 t/t4203-mailmap.sh | 33 +
 1 file changed, 33 insertions(+)

diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 842b754..07152e9 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -409,4 +409,37 @@ test_expect_success 'Blame output (complex mapping)' '
test_cmp expect actual.fuzz
 '
 
+test_expect_success 'cleanup after mailmap.blob tests' '
+   rm -f .mailmap
+'
+
+cat expect \EOF
+ 2 A a...@example.org
+ 2 Other Author ot...@author.xx
+ 2 Santa Claus santa.cl...@northpole.xx
+ 1 A U Thor aut...@example.com
+ 1 CTO c...@company.xx
+ 1 Some Dude s...@dude.xx
+EOF
+
+test_expect_success 'Test case sensitivity of Names' '
+   # do a commit:
+   echo asdf  test1
+   git add test1
+   git commit -a --author=A a...@example.org -m add test1
+
+   # commit with same name, but different email
+   # (different capitalization does the trick already,
+   # but here I am going to use a different mail)
+   echo asdf  test2
+   git add test2
+   git commit -a --author=A changed_em...@example.org -m add test2
+
+   # Adding the line to the mailmap should make life easy, so we know
+   # it is the same person
+   echo A a...@example.org changed_em...@example.org  .mailmap
+
+   git shortlog -sne HEAD actual  test_cmp expect actual
+'
+
 test_done
-- 
1.8.3.2.776.gfcf213d

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] push: document --lockref

2013-07-12 Thread Junio C Hamano
Johannes Sixt j...@kdbg.org writes:

 We have three independent options that the user can choose in any
 combination:

  o --force given or not;

  o --lockref semantics enabled or not;

  o refspec with or without +;

 and these two orthogonal preconditions of the push

  o push is fast-forward or it is not (ff, noff);

  o the branch at the remote is at the expected rev or it is not
(match, mismatch).

 Here is a table with the expected outcome. ok means that the push is
 allowed(*), fail means that the push is denied. (Four more lines with
 --force are omitted because they have ok in all spots.)

ff   noff ff  noff
   match match mismatch mismatch

 --lockref +refspec okokdenied   denied
 --lockref  refspec ok  denied  denied   denied

I am confused with these.  The latter is the most typical:

git fetch
git checkout topic
git rebase topic
git push --lockref topic

where we know it is noff already, and we just want to make sure
that nobody mucked with our remote while we are rebasing.

If nobody updated the remote, why should this push be denied?  And in
order to make it succeed, you need to force with +refspec or --force,
but that would bypass match/mismatch safety, which makes the whole
make sure the other end is unchanged safety meaningless, no?

   +refspec okok  ok   ok

This is traditional --force.

refspec ok  deniedok denied

We are not asking for --lockref, so match/mismatch does not affect
the outcome.

 Notice that without --lockref semantics enabled, +refspec and refspec
 keep the current behavior.

But I do not think the above table with --lockref makes much sense.

Let's look at noff/match case.  That is the only interesting one.

This should fail:

git push topic

due to no-ff.

Your table above makes this fail:

git push --lockref topic

and the user has to force it, like this?

git push --lockref --force topic ;# or alternatively
git push --lockref +topic

Why is it even necessary?

If you make

git push --lockref topic

succeed in noff/match case, everything makes more sense to me.

The --lockref option is merely a weaker form of --force but still a
way to override the noff check.  If the user wants to keep noff
check, the user can simply choose not to use the option.

Of course, that form should fail if mismatch.  And then you can
force it,

git push --force [--lockref] topic

As --force is anything goes, it does not matter if you give the
other option on the command line.

 (*) As we are talking only about the client-side of the push here, I'm
 saying allowed instead of succeeds because the server can have
 additional restrictions that can make the push fail.

Yes, you and I have known from the beginning that we are in
agreement on that, but it is a good idea to explicitly say so for
the sake of bystanders.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/4] mailmap: do not lose single-letter names

2013-07-12 Thread Junio C Hamano
In parse_name_and_email() function, there is this line:

*name = (nstart  nend ? nstart : NULL);

When the function is given a buffer A a...@example.org old@x.z,
nstart scans from the beginning of the buffer, skipping whitespaces
(there isn't any, so nstart points at the buffer), while nend starts
from one byte before the first '' and skips whitespaces backwards
and stops at the first non-whitespace (i.e. it hits A at the
beginning of the buffer).  nstart == nend in this case for a
single-letter name, and an off-by-one error makes it fail to pick up
the name, which makes the entry equivalent to

a...@example.org old@x.z

without the name.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 mailmap.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/mailmap.c b/mailmap.c
index 2a7b366..418081e 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -122,7 +122,7 @@ static char *parse_name_and_email(char *buffer, char **name,
while (nend  nstart  isspace(*nend))
--nend;
 
-   *name = (nstart  nend ? nstart : NULL);
+   *name = (nstart = nend ? nstart : NULL);
*email = left+1;
*(nend+1) = '\0';
*right++ = '\0';
-- 
1.8.3.2-941-gda9c3c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 4/4] add a testcase for checking case insensitivity of mailmap

2013-07-12 Thread Junio C Hamano
From: Stefan Beller stefanbel...@googlemail.com

Signed-off-by: Stefan Beller stefanbel...@googlemail.com
Signed-off-by: Junio C Hamano gits...@pobox.com
---
 t/t4203-mailmap.sh | 33 +
 1 file changed, 33 insertions(+)

diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
index 842b754..07152e9 100755
--- a/t/t4203-mailmap.sh
+++ b/t/t4203-mailmap.sh
@@ -409,4 +409,37 @@ test_expect_success 'Blame output (complex mapping)' '
test_cmp expect actual.fuzz
 '
 
+test_expect_success 'cleanup after mailmap.blob tests' '
+   rm -f .mailmap
+'
+
+cat expect \EOF
+ 2 A a...@example.org
+ 2 Other Author ot...@author.xx
+ 2 Santa Claus santa.cl...@northpole.xx
+ 1 A U Thor aut...@example.com
+ 1 CTO c...@company.xx
+ 1 Some Dude s...@dude.xx
+EOF
+
+test_expect_success 'Test case sensitivity of Names' '
+   # do a commit:
+   echo asdf  test1
+   git add test1
+   git commit -a --author=A a...@example.org -m add test1
+
+   # commit with same name, but different email
+   # (different capitalization does the trick already,
+   # but here I am going to use a different mail)
+   echo asdf  test2
+   git add test2
+   git commit -a --author=A changed_em...@example.org -m add test2
+
+   # Adding the line to the mailmap should make life easy, so we know
+   # it is the same person
+   echo A a...@example.org changed_em...@example.org  .mailmap
+
+   git shortlog -sne HEAD actual  test_cmp expect actual
+'
+
 test_done
-- 
1.8.3.2-941-gda9c3c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/4] Microfixes to mailmap

2013-07-12 Thread Junio C Hamano
There were a few bugs in mailmap parsing and matching code.

Junio C Hamano (3):
  mailmap: do not lose single-letter names
  mailmap: do not downcase mailmap entries
  mailmap: style fixes

Stefan Beller (1):
  Add a testcase for checking case insensitivity of mailmap

 mailmap.c  | 37 +++--
 t/t4203-mailmap.sh | 33 +
 2 files changed, 52 insertions(+), 18 deletions(-)

-- 
1.8.3.2-941-gda9c3c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/4] mailmap: style fixes

2013-07-12 Thread Junio C Hamano
Wrap overlong lines and format the multi-line comments to match our
coding style.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 mailmap.c | 15 ++-
 1 file changed, 10 insertions(+), 5 deletions(-)

diff --git a/mailmap.c b/mailmap.c
index a7e92db..5a3347d 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -37,7 +37,8 @@ static void free_mailmap_info(void *p, const char *s)
 static void free_mailmap_entry(void *p, const char *s)
 {
struct mailmap_entry *me = (struct mailmap_entry *)p;
-   debug_mm(mailmap: removing entries for %s, with %d sub-entries\n, 
s, me-namemap.nr);
+   debug_mm(mailmap: removing entries for %s, with %d sub-entries\n,
+s, me-namemap.nr);
debug_mm(mailmap: - simple: '%s' %s\n, me-name, me-email);
free(me-name);
free(me-email);
@@ -73,7 +74,8 @@ static void add_mapping(struct string_list *map,
}
 
if (old_name == NULL) {
-   debug_mm(mailmap: adding (simple) entry for %s at index %d\n, 
old_email, index);
+   debug_mm(mailmap: adding (simple) entry for %s at index %d\n,
+old_email, index);
/* Replace current name and new email for simple entry */
if (new_name) {
free(me-name);
@@ -85,7 +87,8 @@ static void add_mapping(struct string_list *map,
}
} else {
struct mailmap_info *mi = xcalloc(1, sizeof(struct 
mailmap_info));
-   debug_mm(mailmap: adding (complex) entry for %s at index 
%d\n, old_email, index);
+   debug_mm(mailmap: adding (complex) entry for %s at index %d\n,
+old_email, index);
if (new_name)
mi-name = xstrdup(new_name);
if (new_email)
@@ -315,8 +318,10 @@ int map_user(struct string_list *map,
if (item != NULL) {
me = (struct mailmap_entry *)item-util;
if (me-namemap.nr) {
-   /* The item has multiple items, so we'll look up on 
name too */
-   /* If the name is not found, we choose the simple entry 
 */
+   /*
+* The item has multiple items, so we'll look up on 
name too.
+* If the name is not found, we choose the simple entry.
+*/
struct string_list_item *subitem;
subitem = lookup_prefix(me-namemap, *name, *namelen);
if (subitem)
-- 
1.8.3.2-941-gda9c3c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/4] mailmap: do not downcase mailmap entries

2013-07-12 Thread Junio C Hamano
The email addresses in the records read from the .mailmap file are
downcased very early, and then used to match against e-mail
addresses in the input.  Because we do use case insensitive version
of string list to manage these entries, there is no need to do this,
and worse yet, downcasing the rewritten/canonical e-mail read from
the .mailmap file loses information.

Stop doing that, and also make the string list used to keep multiple
names for an mailmap entry case insensitive (the code that uses the
list, lookup_prefix(), expects a case insensitive match).

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 mailmap.c | 20 
 1 file changed, 8 insertions(+), 12 deletions(-)

diff --git a/mailmap.c b/mailmap.c
index 418081e..a7e92db 100644
--- a/mailmap.c
+++ b/mailmap.c
@@ -51,14 +51,6 @@ static void add_mapping(struct string_list *map,
 {
struct mailmap_entry *me;
int index;
-   char *p;
-
-   if (old_email)
-   for (p = old_email; *p; p++)
-   *p = tolower(*p);
-   if (new_email)
-   for (p = new_email; *p; p++)
-   *p = tolower(*p);
 
if (old_email == NULL) {
old_email = new_email;
@@ -68,13 +60,17 @@ static void add_mapping(struct string_list *map,
if ((index = string_list_find_insert_index(map, old_email, 1))  0) {
/* mailmap entry exists, invert index value */
index = -1 - index;
+   me = (struct mailmap_entry *)map-items[index].util;
} else {
/* create mailmap entry */
-   struct string_list_item *item = 
string_list_insert_at_index(map, index, old_email);
-   item-util = xcalloc(1, sizeof(struct mailmap_entry));
-   ((struct mailmap_entry *)item-util)-namemap.strdup_strings = 
1;
+   struct string_list_item *item;
+
+   item = string_list_insert_at_index(map, index, old_email);
+   me = xcalloc(1, sizeof(struct mailmap_entry));
+   me-namemap.strdup_strings = 1;
+   me-namemap.cmp = strcasecmp;
+   item-util = me;
}
-   me = (struct mailmap_entry *)map-items[index].util;
 
if (old_name == NULL) {
debug_mm(mailmap: adding (simple) entry for %s at index %d\n, 
old_email, index);
-- 
1.8.3.2-941-gda9c3c8

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 4/4] add a testcase for checking case insensitivity of mailmap

2013-07-12 Thread Eric Sunshine
On Fri, Jul 12, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote:
 From: Stefan Beller stefanbel...@googlemail.com

 Signed-off-by: Stefan Beller stefanbel...@googlemail.com
 Signed-off-by: Junio C Hamano gits...@pobox.com
 ---
  t/t4203-mailmap.sh | 33 +
  1 file changed, 33 insertions(+)

 diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
 index 842b754..07152e9 100755
 --- a/t/t4203-mailmap.sh
 +++ b/t/t4203-mailmap.sh
 @@ -409,4 +409,37 @@ test_expect_success 'Blame output (complex mapping)' '
 test_cmp expect actual.fuzz
  '

 +test_expect_success 'cleanup after mailmap.blob tests' '
 +   rm -f .mailmap
 +'

This test was copied from earlier in the file, but the description
was not updated; it has nothing to do with mailmap.blob tests for
which cleanup has already taken place.

It's also misleading since no .mailmap file exists at this point.
Instead, configuration key mailmap.file is set to
internal_mailmap/.mailmap, so if you need to clean up anything, it
would be that.

(You're also recreating .mailmap from scratch directly in your test
via echo foo .mailmap, so this test doesn't really add anything.)

 +cat expect \EOF
 + 2 A a...@example.org
 + 2 Other Author ot...@author.xx
 + 2 Santa Claus santa.cl...@northpole.xx
 + 1 A U Thor aut...@example.com
 + 1 CTO c...@company.xx
 + 1 Some Dude s...@dude.xx
 +EOF
 +
 +test_expect_success 'Test case sensitivity of Names' '

s/Names/names/

Also, most of the test descriptions in this script start with
lowercase, so s/Test/test/ would be appropriate.

 +   # do a commit:
 +   echo asdf  test1

Git practice normally avoids whitespace after the redirection
operator. This particular test script, has a mix of foo and 
foo, but we may want to avoid adding more of the latter form.

 +   git add test1
 +   git commit -a --author=A a...@example.org -m add test1
 +
 +   # commit with same name, but different email
 +   # (different capitalization does the trick already,
 +   # but here I am going to use a different mail)

Without context of the mailing list discussion, the reader does not
know what trick is or precisely how this may have failed in the
past. It's also not clear why the test is being performed with a
different email address entirely rather than one which differs only by
case (which is what the test description talks about).

 +   echo asdf  test2

Whitespace after redirection.

 +   git add test2
 +   git commit -a --author=A changed_em...@example.org -m add test2
 +
 +   # Adding the line to the mailmap should make life easy, so we know
 +   # it is the same person

The comment can probably be dropped, as the new .mailmap entry is
self-explanatory.

 +   echo A a...@example.org changed_em...@example.org  .mailmap

Whitespace after redirection.

 +
 +   git shortlog -sne HEAD actual  test_cmp expect actual

For consistency with more other tests, perhaps break the line apart:

  git shortlog -sne HEAD actual 
  test_cmp expect actual

 +'
 +
  test_done
 --
 1.8.3.2-941-gda9c3c8
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/PATCH] fetch: make --prune configurable

2013-07-12 Thread Junio C Hamano
Here is my previous review comments in a squashable patch form.  The
result seems to pass all 27 combinations (fetch.prune, remote.*.prune
and command line all are tristate yes/no/unspecified).

Without the fix-up in *.c files, three combinations seem to fail.

 Documentation/config.txt |   3 +-
 builtin/fetch.c  |  41 +---
 remote.c |   1 +
 t/t5510-fetch.sh | 118 ---
 4 files changed, 108 insertions(+), 55 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index fc39f3a..e4ce7c4 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1051,7 +1051,7 @@ fetch.unpackLimit::
 
 fetch.prune::
If true, fetch will automatically behave as if the `--prune`
-   option was given on the command line.
+   option was given on the command line.  See also `remote.name.prune`.
 
 format.attach::
Enable multipart/mixed attachments as the default for
@@ -1992,6 +1992,7 @@ remote.name.prune::
When set to true, fetching from this remote by default will also
remove any remote-tracking branches which no longer exist on the
remote (as if the `--prune` option was give on the command line).
+   Overrides `fetch.prune` settings, if any.
 
 remotes.group::
The list of remotes which are fetched by git remote update
diff --git a/builtin/fetch.c b/builtin/fetch.c
index 082450b..08ab948 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -30,13 +30,10 @@ enum {
TAGS_SET = 2
 };
 
-enum {
-   PRUNE_UNSET = 0,
-   PRUNE_DEFAULT = 1,
-   PRUNE_FORCE = 2
-};
+static int fetch_prune_config = -1; /* unspecified */
+static int prune = -1; /* unspecified */
+#define PRUNE_BY_DEFAULT 0 /* do we prune by default? */
 
-static int prune = PRUNE_DEFAULT;
 static int all, append, dry_run, force, keep, multiple, update_head_ok, 
verbosity;
 static int progress = -1, recurse_submodules = RECURSE_SUBMODULES_DEFAULT;
 static int tags = TAGS_DEFAULT, unshallow;
@@ -64,12 +61,10 @@ static int option_parse_recurse_submodules(const struct 
option *opt,
 static int git_fetch_config(const char *k, const char *v, void *cb)
 {
if (!strcmp(k, fetch.prune)) {
-   int boolval = git_config_bool(k, v);
-   if (boolval)
-   prune = PRUNE_FORCE;
+   fetch_prune_config = git_config_bool(k, v);
return 0;
}
-   return git_default_config(k, v, cb);
+   return 0;
 }
 
 static struct option builtin_fetch_options[] = {
@@ -87,8 +82,8 @@ static struct option builtin_fetch_options[] = {
N_(fetch all tags and associated objects), TAGS_SET),
OPT_SET_INT('n', NULL, tags,
N_(do not fetch all tags (--no-tags)), TAGS_UNSET),
-   OPT_BOOLEAN('p', prune, prune,
-   N_(prune remote-tracking branches no longer on remote)),
+   OPT_BOOL('p', prune, prune,
+N_(prune remote-tracking branches no longer on remote)),
{ OPTION_CALLBACK, 0, recurse-submodules, NULL, N_(on-demand),
N_(control recursive fetching of submodules),
PARSE_OPT_OPTARG, option_parse_recurse_submodules },
@@ -756,8 +751,11 @@ static int do_fetch(struct transport *transport,
free_refs(ref_map);
return 1;
}
-   if (prune == PRUNE_FORCE || (transport-remote-prune  prune)) {
-   /* If --tags was specified, pretend the user gave us the 
canonical tags refspec */
+   if (prune) {
+   /*
+* If --tags was specified, pretend that the user gave us
+* the canonical tags refspec
+*/
if (tags == TAGS_SET) {
const char *tags_str = refs/tags/*:refs/tags/*;
struct refspec *tags_refspec, *refspec;
@@ -866,10 +864,8 @@ static void add_options_to_argv(struct argv_array *argv)
 {
if (dry_run)
argv_array_push(argv, --dry-run);
-   if (prune == PRUNE_FORCE)
+   if (prune  0)
argv_array_push(argv, --prune);
-   else if (prune == PRUNE_UNSET)
-   argv_array_push(argv, --no-prune);
if (update_head_ok)
argv_array_push(argv, --update-head-ok);
if (force)
@@ -936,6 +932,17 @@ static int fetch_one(struct remote *remote, int argc, 
const char **argv)
remote name from which new revisions should be fetched.));
 
transport = transport_get(remote, NULL);
+
+   if (prune  0) {
+   /* no command line request */
+   if (0 = transport-remote-prune)
+   prune = transport-remote-prune;
+   else if (0 = fetch_prune_config)
+   prune = fetch_prune_config;
+   else
+   prune = PRUNE_BY_DEFAULT;
+   }
+
  

Re: [PATCH 4/4] add a testcase for checking case insensitivity of mailmap

2013-07-12 Thread Eric Sunshine
On Fri, Jul 12, 2013 at 5:38 PM, Junio C Hamano gits...@pobox.com wrote:
 From: Stefan Beller stefanbel...@googlemail.com

 Signed-off-by: Stefan Beller stefanbel...@googlemail.com
 Signed-off-by: Junio C Hamano gits...@pobox.com
 ---
  t/t4203-mailmap.sh | 33 +
  1 file changed, 33 insertions(+)

 diff --git a/t/t4203-mailmap.sh b/t/t4203-mailmap.sh
 index 842b754..07152e9 100755
 --- a/t/t4203-mailmap.sh
 +++ b/t/t4203-mailmap.sh
 +test_expect_success 'Test case sensitivity of Names' '
 +   # do a commit:
 +   echo asdf  test1
 +   git add test1
 +   git commit -a --author=A a...@example.org -m add test1
 +
 +   # commit with same name, but different email
 +   # (different capitalization does the trick already,
 +   # but here I am going to use a different mail)
 +   echo asdf  test2
 +   git add test2
 +   git commit -a --author=A changed_em...@example.org -m add test2
 +
 +   # Adding the line to the mailmap should make life easy, so we know
 +   # it is the same person
 +   echo A a...@example.org changed_em...@example.org  .mailmap
 +
 +   git shortlog -sne HEAD actual  test_cmp expect actual
 +'

I forgot to mention that the -chain is broken (missing) in the
entire test (except for the last line).
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


What's cooking in git.git (Jul 2013, #05; Fri, 12)

2013-07-12 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.

As many of you may know, I hate to do back-to-back What's cooking,
but we ended up acquiring many new topics, and many existing ones
have moved from 'pu' to 'next' and 'next' to 'master'.

Those that graduated to 'master' are mostly code and documentation
clean-up patches.

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[Graduated to master]

* as/log-output-encoding-in-user-format (2013-07-05) 11 commits
  (merged to 'next' on 2013-07-08 at 2e1bdd9)
 + t4205 (log-pretty-formats): avoid using `sed`
 + t6006 (rev-list-format): add tests for %b and %s for the case 
i18n.commitEncoding is not set
 + t4205, t6006, t7102: make functions better readable
 + t4205 (log-pretty-formats): revert back single quotes
  (merged to 'next' on 2013-07-05 at d2c99e5)
 + t4041, t4205, t6006, t7102: use iso8859-1 rather than iso-8859-1
  (merged to 'next' on 2013-07-01 at 3318aa8)
 + t4205: replace .\+ with ..* in sed commands
  (merged to 'next' on 2013-06-28 at 4063330)
 + pretty: --format output should honor logOutputEncoding
 + pretty: Add failing tests: --format output should honor logOutputEncoding
 + t4205 (log-pretty-formats): don't hardcode SHA-1 in expected outputs
 + t7102 (reset): don't hardcode SHA-1 in expected outputs
 + t6006 (rev-list-format): don't hardcode SHA-1 in expected outputs

 log --format= did not honor i18n.logoutputencoding configuration
 and this attempts to fix it.


* ft/diff-rename-default-score-is-half (2013-07-05) 1 commit
  (merged to 'next' on 2013-07-09 at 6a6b57e)
 + diff-options: document default similarity index


* jc/remote-http-argv-array (2013-07-09) 1 commit
  (merged to 'next' on 2013-07-11 at 7fbe8bd)
 + remote-http: use argv-array


* jk/maint-config-multi-order (2013-07-07) 1 commit
  (merged to 'next' on 2013-07-09 at 0db1db9)
 + git-config(1): clarify precedence of multiple values


* jk/pull-to-integrate (2013-07-08) 2 commits
  (merged to 'next' on 2013-07-09 at 2ecac24)
 + pull: change the description to integrate changes
 + push: avoid suggesting merging remote changes


* ml/cygwin-does-not-have-fifo (2013-07-05) 1 commit
  (merged to 'next' on 2013-07-09 at 7d6849d)
 + test-lib.sh - cygwin does not have usable FIFOs


* ms/remote-tracking-branches-in-doc (2013-07-03) 1 commit
  (merged to 'next' on 2013-07-09 at 411a8bd)
 + Change remote tracking to remote-tracking


* rr/name-rev-stdin-doc (2013-07-07) 1 commit
  (merged to 'next' on 2013-07-09 at 7cfbff6)
 + name-rev doc: rewrite --stdin paragraph


* rs/pickaxe-simplify (2013-07-07) 1 commit
  (merged to 'next' on 2013-07-11 at c5972f7)
 + diffcore-pickaxe: simplify has_changes and contains


* tf/gitweb-extra-breadcrumbs (2013-07-04) 1 commit
  (merged to 'next' on 2013-07-09 at 525331b)
 + gitweb: allow extra breadcrumbs to prefix the trail

 An Gitweb installation that is a part of larger site can optionally
 show extra links that point at the levels higher than the Gitweb
 pages itself in the link hierarchy of pages.


* tr/test-lint-no-export-assignment-in-shell (2013-07-08) 2 commits
  (merged to 'next' on 2013-07-09 at 6f10ea2)
 + test-lint: detect 'export FOO=bar'
 + t9902: fix 'test A == B' to use = operator

--
[New Topics]

* es/check-mailmap (2013-07-11) 2 commits
 - t4203: test check-mailmap command invocation
 - builtin: add git-check-mailmap command

 A new command to allow scripts to query the mailmap information.

 Expecting a reroll to lose the -z option.


* jc/check-x-z (2013-07-11) 4 commits
 - check-attr -z: a single -z should apply to both input and output
 - check-ignore -z: a single -z should apply to both input and output
 - check-attr: the name of the character is NUL, not NULL
 - check-ignore: the name of the character is NUL, not NULL

 git check-ignore -z applied the NUL termination to both its input
 (with --stdin) and its output, but git check-attr -z ignored the
 option on the output side.

 This is potentially a backward incompatible fix.  I am tempted to
 merge this to and keep it in 'next' for a while to see if anybody
 screams before deciding if we want to do anything to help existing
 users (there may be none).


* jk/cat-file-batch-optim (2013-07-12) 8 commits
 - sha1_object_info_extended: pass object_info to helpers
 - sha1_object_info_extended: make type calculation optional
 - packed_object_info: make type lookup optional
 - packed_object_info: hoist delta type resolution to helper
 - sha1_loose_object_info: make type lookup optional
 - sha1_object_info_extended: rename status to type
 - cat-file: disable object/refname ambiguity check for batch mode
 - Merge branch 'nd/warn-ambiguous-object-name' into 

  1   2   >