Re: [PATCH v6 04/19] fsck: Offer a function to demote fsck errors to warnings

2015-06-22 Thread Johannes Schindelin
Hi Junio,

On 2015-06-19 21:26, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 +static inline int substrcmp(const char *string, int len, const char *match)
 +{
 +int match_len = strlen(match);
 +if (match_len != len)
 +return -1;
 +return memcmp(string, match, len);
 +}
 
 What this does looks suspiciously like starts_with(), but its name
 substrcmp() does not give any hint that this is about the beginnig
 part of string; if anything, it gives a wrong hint that it may be
 any substring.  prefixcmp() might be a better name but that was the
 old name for !starts_with() so we cannot use it here.  It is a
 mouthful, but starts_with_counted() may be.
 
 But the whole thing may be moot.
 
 If we take the why not upcase the end-user string upfront
 suggestion from the previous review, fsck_set_msg_types() would have
 an upcased copy of the end-user string that it can muck with; it can
 turn badfoo=error,poorbar=... into BADFOO=error,POORBAR=...
 that is stored in its own writable memory (possibly a strbuf), and
 at that point it can afford to NUL-terminate BADFOO=error after
 finding where one specification ends with strcspn() before calling
 fsck_set_msg_type(), which in turn calls parse_msg_type().
 
 So all parse_msg_type() needs to do is just !strcmp().

Turns out that the diffstat says it saves 10 lines. So I changed it according 
to your suggestion. Part of v7.

Ciao,
Dscho

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 08/19] parse-option: rename parse_opt_with_commit()

2015-06-22 Thread Junio C Hamano
On Mon, Jun 22, 2015 at 8:38 AM, Karthik Nayak karthik@gmail.com wrote:
 On Mon, Jun 22, 2015 at 6:34 AM, Junio C Hamano gits...@pobox.com wrote:
 On Sun, Jun 21, 2015 at 1:48 PM, Karthik Nayak karthik@gmail.com wrote:
 Rename parse_opt_with_commit() to parse_opt_commit_object_name()
 to show that it can be used to obtain a list of commits and is
 not constricted to usage of '--contains' option.

 I think that is a brilliant idea, but unlike the other function you
 added earlier
 that can do only one object and adopts last one wins rule, this is 
 cumulative,
 and that fact should be made clear to the developers in some way, no?

 Will add a comment

I didn't mean that. Can't we indicate this with plural somewhere in the name?
parse_opt_commits(), parse_opt_commit_into_list(), etc.?
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 18/19] fsck: git receive-pack: support excluding objects from fsck'ing

2015-06-22 Thread Johannes Schindelin
The optional new config option `receive.fsck.skipList` specifies the path
to a file listing the names, i.e. SHA-1s, one per line, of objects that
are to be ignored by `git receive-pack` when `receive.fsckObjects = true`.

This is extremely handy in case of legacy repositories where it would
cause more pain to change incorrect objects than to live with them
(e.g. a duplicate 'author' line in an early commit object).

The intended use case is for server administrators to inspect objects
that are reported by `git push` as being too problematic to enter the
repository, and to add the objects' SHA-1 to a (preferably sorted) file
when the objects are legitimate, i.e. when it is determined that those
problematic objects should be allowed to enter the server.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 Documentation/config.txt|  8 +++
 builtin/receive-pack.c  | 11 +
 fsck.c  | 50 +
 fsck.h  |  1 +
 t/t5504-fetch-receive-strict.sh | 12 ++
 5 files changed, 82 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index bfccd2b..ed7f37f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2230,6 +2230,14 @@ which would not pass pushing when `receive.fsckObjects = 
true`, allowing
 the host to accept repositories with certain known issues but still catch
 other issues.
 
+receive.fsck.skipList::
+   The path to a sorted list of object names (i.e. one SHA-1 per
+   line) that are known to be broken in a non-fatal way and should
+   be ignored. This feature is useful when an established project
+   should be accepted despite early commits containing errors that
+   can be safely ignored such as invalid committer email addresses.
+   Note: corrupt objects cannot be skipped with this setting.
+
 receive.unpackLimit::
If the number of objects received in a push is below this
limit then the objects will be unpacked into loose object
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 3afe8f8..3fbed23 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -117,6 +117,17 @@ static int receive_pack_config(const char *var, const char 
*value, void *cb)
return 0;
}
 
+   if (strcmp(var, receive.fsck.skiplist) == 0) {
+   const char *path;
+
+   if (git_config_pathname(path, var, value))
+   return 1;
+   strbuf_addf(fsck_msg_types, %cskiplist=%s,
+   fsck_msg_types.len ? ',' : '=', path);
+   free((char *) path);
+   return 0;
+   }
+
if (skip_prefix(var, receive.fsck., var)) {
if (is_valid_msg_type(var, value))
strbuf_addf(fsck_msg_types, %c%s=%s,
diff --git a/fsck.c b/fsck.c
index f6fc384..a677b50 100644
--- a/fsck.c
+++ b/fsck.c
@@ -8,6 +8,7 @@
 #include fsck.h
 #include refs.h
 #include utf8.h
+#include sha1-array.h
 
 #define FSCK_FATAL -1
 #define FSCK_INFO -2
@@ -127,6 +128,43 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
return msg_type;
 }
 
+static void init_skiplist(struct fsck_options *options, const char *path)
+{
+   static struct sha1_array skiplist = SHA1_ARRAY_INIT;
+   int sorted, fd;
+   char buffer[41];
+   unsigned char sha1[20];
+
+   if (options-skiplist)
+   sorted = options-skiplist-sorted;
+   else {
+   sorted = 1;
+   options-skiplist = skiplist;
+   }
+
+   fd = open(path, O_RDONLY);
+   if (fd  0)
+   die(Could not open skip list: %s, path);
+   for (;;) {
+   int result = read_in_full(fd, buffer, sizeof(buffer));
+   if (result  0)
+   die_errno(Could not read '%s', path);
+   if (!result)
+   break;
+   if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n')
+   die(Invalid SHA-1: %s, buffer);
+   sha1_array_append(skiplist, sha1);
+   if (sorted  skiplist.nr  1 
+   hashcmp(skiplist.sha1[skiplist.nr - 2],
+   sha1)  0)
+   sorted = 0;
+   }
+   close(fd);
+
+   if (sorted)
+   skiplist.sorted = 1;
+}
+
 static int parse_msg_type(const char *str)
 {
if (!strcmp(str, error))
@@ -190,6 +228,14 @@ void fsck_set_msg_types(struct fsck_options *options, 
const char *values)
buf[equal] = tolower(buf[equal]);
buf[equal] = '\0';
 
+   if (!strcmp(buf, skiplist)) {
+   if (equal == len)
+   die(skiplist requires a path);
+   init_skiplist(options, buf + equal + 1);
+   buf += len + 1;
+

[PATCH v7 09/19] fsck: Handle multiple authors in commits specially

2015-06-22 Thread Johannes Schindelin
This problem has been detected in the wild, and is the primary reason
to introduce an option to demote certain fsck errors to warnings. Let's
offer to ignore this particular problem specifically.

Technically, we could handle such repositories by setting
receive.fsck.msg-id to missingCommitter=warn, but that could hide
missing tree objects in the same commit because we cannot continue
verifying any commit object after encountering a missing committer line,
while we can continue in the case of multiple author lines.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 17 +
 1 file changed, 13 insertions(+), 4 deletions(-)

diff --git a/fsck.c b/fsck.c
index ef3bf68..daa07ad 100644
--- a/fsck.c
+++ b/fsck.c
@@ -38,6 +38,7 @@
FUNC(MISSING_TREE, ERROR) \
FUNC(MISSING_TYPE, ERROR) \
FUNC(MISSING_TYPE_ENTRY, ERROR) \
+   FUNC(MULTIPLE_AUTHORS, ERROR) \
FUNC(NUL_IN_HEADER, ERROR) \
FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
FUNC(TREE_NOT_SORTED, ERROR) \
@@ -528,7 +529,7 @@ static int fsck_commit_buffer(struct commit *commit, const 
char *buffer,
 {
unsigned char tree_sha1[20], sha1[20];
struct commit_graft *graft;
-   unsigned parent_count, parent_line_count = 0;
+   unsigned parent_count, parent_line_count = 0, author_count;
int err;
 
if (require_end_of_header(buffer, size, commit-object, options))
@@ -568,9 +569,17 @@ static int fsck_commit_buffer(struct commit *commit, const 
char *buffer,
return err;
}
}
-   if (!skip_prefix(buffer, author , buffer))
-   return report(options, commit-object, 
FSCK_MSG_MISSING_AUTHOR, invalid format - expected 'author' line);
-   err = fsck_ident(buffer, commit-object, options);
+   author_count = 0;
+   while (skip_prefix(buffer, author , buffer)) {
+   author_count++;
+   err = fsck_ident(buffer, commit-object, options);
+   if (err)
+   return err;
+   }
+   if (author_count  1)
+   err = report(options, commit-object, FSCK_MSG_MISSING_AUTHOR, 
invalid format - expected 'author' line);
+   else if (author_count  1)
+   err = report(options, commit-object, 
FSCK_MSG_MULTIPLE_AUTHORS, invalid format - multiple 'author' lines);
if (err)
return err;
if (!skip_prefix(buffer, committer , buffer))
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 19/19] fsck: support ignoring objects in `git fsck` via fsck.skiplist

2015-06-22 Thread Johannes Schindelin
Identical to support in `git receive-pack for the config option
`receive.fsck.skiplist`, we now support ignoring given objects in
`git fsck` via `fsck.skiplist` altogether.

This is extremely handy in case of legacy repositories where it would
cause more pain to change incorrect objects than to live with them
(e.g. a duplicate 'author' line in an early commit object).

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 Documentation/config.txt |  8 
 builtin/fsck.c   | 13 +
 2 files changed, 21 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index ed7f37f..69dda93 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1261,6 +1261,14 @@ that setting `fsck.missingEmail = ignore` will hide that 
issue.
 This feature is intended to support working with legacy repositories
 which cannot be repaired without disruptive changes.
 
+fsck.skipList::
+   The path to a sorted list of object names (i.e. one SHA-1 per
+   line) that are known to be broken in a non-fatal way and should
+   be ignored. This feature is useful when an established project
+   should be accepted despite early commits containing errors that
+   can be safely ignored such as invalid committer email addresses.
+   Note: corrupt objects cannot be skipped with this setting.
+
 gc.aggressiveDepth::
The depth parameter used in the delta compression
algorithm used by 'git gc --aggressive'.  This defaults
diff --git a/builtin/fsck.c b/builtin/fsck.c
index 2d14298..7e3df20 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -49,6 +49,19 @@ static int show_dangling = 1;
 
 static int fsck_config(const char *var, const char *value, void *cb)
 {
+   if (strcmp(var, fsck.skiplist) == 0) {
+   const char *path;
+   struct strbuf sb = STRBUF_INIT;
+
+   if (git_config_pathname(path, var, value))
+   return 1;
+   strbuf_addf(sb, skiplist=%s, path);
+   free((char *) path);
+   fsck_set_msg_types(fsck_obj_options, sb.buf);
+   strbuf_release(sb);
+   return 0;
+   }
+
if (skip_prefix(var, fsck., var)) {
fsck_set_msg_type(fsck_obj_options, var, value);
return 0;
-- 
2.3.1.windows.1.9.g8c01ab4


--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 17/19] fsck: Introduce `git fsck --connectivity-only`

2015-06-22 Thread Johannes Schindelin
This option avoids unpacking each and all blob objects, and just
verifies the connectivity. In particular with large repositories, this
speeds up the operation, at the expense of missing corrupt blobs,
ignoring unreachable objects and other fsck issues, if any.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 Documentation/git-fsck.txt |  7 ++-
 builtin/fsck.c |  7 ++-
 t/t1450-fsck.sh| 22 ++
 3 files changed, 34 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-fsck.txt b/Documentation/git-fsck.txt
index 25c431d..84ee92e 100644
--- a/Documentation/git-fsck.txt
+++ b/Documentation/git-fsck.txt
@@ -11,7 +11,7 @@ SYNOPSIS
 [verse]
 'git fsck' [--tags] [--root] [--unreachable] [--cache] [--no-reflogs]
 [--[no-]full] [--strict] [--verbose] [--lost-found]
-[--[no-]dangling] [--[no-]progress] [object*]
+[--[no-]dangling] [--[no-]progress] [--connectivity-only] [object*]
 
 DESCRIPTION
 ---
@@ -60,6 +60,11 @@ index file, all SHA-1 references in `refs` namespace, and 
all reflogs
object pools.  This is now default; you can turn it off
with --no-full.
 
+--connectivity-only::
+   Check only the connectivity of tags, commits and tree objects. By
+   avoiding to unpack blobs, this speeds up the operation, at the
+   expense of missing corrupt objects or other problematic issues.
+
 --strict::
Enable more strict checking, namely to catch a file mode
recorded with g+w bit set, which was created by older
diff --git a/builtin/fsck.c b/builtin/fsck.c
index adaa802..2d14298 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -23,6 +23,7 @@ static int show_tags;
 static int show_unreachable;
 static int include_reflogs = 1;
 static int check_full = 1;
+static int connectivity_only;
 static int check_strict;
 static int keep_cache_objects;
 static struct fsck_options fsck_walk_options = FSCK_OPTIONS_DEFAULT;
@@ -181,6 +182,8 @@ static void check_reachable_object(struct object *obj)
if (!(obj-flags  HAS_OBJ)) {
if (has_sha1_pack(obj-sha1))
return; /* it is in pack - forget about it */
+   if (connectivity_only  has_sha1_file(obj-sha1))
+   return;
printf(missing %s %s\n, typename(obj-type), 
sha1_to_hex(obj-sha1));
errors_found |= ERROR_REACHABLE;
return;
@@ -623,6 +626,7 @@ static struct option fsck_opts[] = {
OPT_BOOL(0, cache, keep_cache_objects, N_(make index objects head 
nodes)),
OPT_BOOL(0, reflogs, include_reflogs, N_(make reflogs head nodes 
(default))),
OPT_BOOL(0, full, check_full, N_(also consider packs and alternate 
objects)),
+   OPT_BOOL(0, connectivity-only, connectivity_only, N_(check only 
connectivity)),
OPT_BOOL(0, strict, check_strict, N_(enable more strict checking)),
OPT_BOOL(0, lost-found, write_lost_and_found,
N_(write dangling objects in 
.git/lost-found)),
@@ -659,7 +663,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
git_config(fsck_config, NULL);
 
fsck_head_link();
-   fsck_object_dir(get_object_directory());
+   if (!connectivity_only)
+   fsck_object_dir(get_object_directory());
 
prepare_alt_odb();
for (alt = alt_odb_list; alt; alt = alt-next) {
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 1727129..956673b 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -431,4 +431,26 @@ test_expect_success 'fsck notices ref pointing to missing 
tag' '
test_must_fail git -C missing fsck
 '
 
+test_expect_success 'fsck --connectivity-only' '
+   rm -rf connectivity-only 
+   git init connectivity-only 
+   (
+   cd connectivity-only 
+   touch empty 
+   git add empty 
+   test_commit empty 
+   empty=.git/objects/e6/9de29bb2d1d6434b8b29ae775ad8c2e48c5391 
+   rm -f $empty 
+   echo invalid $empty 
+   test_must_fail git fsck --strict 
+   git fsck --strict --connectivity-only 
+   tree=$(git rev-parse HEAD:) 
+   suffix=${tree#??} 
+   tree=.git/objects/${tree%$suffix}/$suffix 
+   rm -f $tree 
+   echo invalid $tree 
+   test_must_fail git fsck --strict --connectivity-only
+   )
+'
+
 test_done
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 11/19] fsck: Add a simple test for receive.fsck.msg-id

2015-06-22 Thread Johannes Schindelin
Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 t/t5504-fetch-receive-strict.sh | 21 +
 1 file changed, 21 insertions(+)

diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 69ee13c..36024fc 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -115,4 +115,25 @@ test_expect_success 'push with transfer.fsckobjects' '
test_cmp exp act
 '
 
+cat bogus-commit \EOF
+tree 4b825dc642cb6eb9a060e54bf8d69288fbee4904
+author Bugs Bunny 1234567890 +
+committer Bugs Bunny b...@bun.ni 1234567890 +
+
+This commit object intentionally broken
+EOF
+
+test_expect_success 'push with receive.fsck.missingEmail=warn' '
+   commit=$(git hash-object -t commit -w --stdin bogus-commit) 
+   git push . $commit:refs/heads/bogus 
+   rm -rf dst 
+   git init dst 
+   git --git-dir=dst/.git config receive.fsckobjects true 
+   test_must_fail git push --porcelain dst bogus 
+   git --git-dir=dst/.git config \
+   receive.fsck.missingEmail warn 
+   git push --porcelain dst bogus act 21 
+   grep missingEmail act
+'
+
 test_done
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 08/19] fsck: Make fsck_commit() warn-friendly

2015-06-22 Thread Johannes Schindelin
When fsck_commit() identifies a problem with the commit, it should try
to make it possible to continue checking the commit object, in case the
user wants to demote the detected errors to mere warnings.

Note that some problems are too problematic to simply ignore. For
example, when the header lines are mixed up, we punt after encountering
an incorrect line. Therefore, demoting certain warnings to errors can
hide other problems. Example: demoting the missingauthor error to
a warning would hide a problematic committer line.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index d0a7282..ef3bf68 100644
--- a/fsck.c
+++ b/fsck.c
@@ -536,12 +536,18 @@ static int fsck_commit_buffer(struct commit *commit, 
const char *buffer,
 
if (!skip_prefix(buffer, tree , buffer))
return report(options, commit-object, FSCK_MSG_MISSING_TREE, 
invalid format - expected 'tree' line);
-   if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n')
-   return report(options, commit-object, FSCK_MSG_BAD_TREE_SHA1, 
invalid 'tree' line format - bad sha1);
+   if (get_sha1_hex(buffer, tree_sha1) || buffer[40] != '\n') {
+   err = report(options, commit-object, FSCK_MSG_BAD_TREE_SHA1, 
invalid 'tree' line format - bad sha1);
+   if (err)
+   return err;
+   }
buffer += 41;
while (skip_prefix(buffer, parent , buffer)) {
-   if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n')
-   return report(options, commit-object, 
FSCK_MSG_BAD_PARENT_SHA1, invalid 'parent' line format - bad sha1);
+   if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') {
+   err = report(options, commit-object, 
FSCK_MSG_BAD_PARENT_SHA1, invalid 'parent' line format - bad sha1);
+   if (err)
+   return err;
+   }
buffer += 41;
parent_line_count++;
}
@@ -550,11 +556,17 @@ static int fsck_commit_buffer(struct commit *commit, 
const char *buffer,
if (graft) {
if (graft-nr_parent == -1  !parent_count)
; /* shallow commit */
-   else if (graft-nr_parent != parent_count)
-   return report(options, commit-object, 
FSCK_MSG_MISSING_GRAFT, graft objects missing);
+   else if (graft-nr_parent != parent_count) {
+   err = report(options, commit-object, 
FSCK_MSG_MISSING_GRAFT, graft objects missing);
+   if (err)
+   return err;
+   }
} else {
-   if (parent_count != parent_line_count)
-   return report(options, commit-object, 
FSCK_MSG_MISSING_PARENT, parent objects missing);
+   if (parent_count != parent_line_count) {
+   err = report(options, commit-object, 
FSCK_MSG_MISSING_PARENT, parent objects missing);
+   if (err)
+   return err;
+   }
}
if (!skip_prefix(buffer, author , buffer))
return report(options, commit-object, 
FSCK_MSG_MISSING_AUTHOR, invalid format - expected 'author' line);
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 07/19] fsck: Make fsck_ident() warn-friendly

2015-06-22 Thread Johannes Schindelin
When fsck_ident() identifies a problem with the ident, it should still
advance the pointer to the next line so that fsck can continue in the
case of a mere warning.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 49 +++--
 1 file changed, 27 insertions(+), 22 deletions(-)

diff --git a/fsck.c b/fsck.c
index 0ec02b2..d0a7282 100644
--- a/fsck.c
+++ b/fsck.c
@@ -481,40 +481,45 @@ static int require_end_of_header(const void *data, 
unsigned long size,
 
 static int fsck_ident(const char **ident, struct object *obj, struct 
fsck_options *options)
 {
+   const char *p = *ident;
char *end;
 
-   if (**ident == '')
+   *ident = strchrnul(*ident, '\n');
+   if (**ident == '\n')
+   (*ident)++;
+
+   if (*p == '')
return report(options, obj, FSCK_MSG_MISSING_NAME_BEFORE_EMAIL, 
invalid author/committer line - missing space before email);
-   *ident += strcspn(*ident, \n);
-   if (**ident == '')
+   p += strcspn(p, \n);
+   if (*p == '')
return report(options, obj, FSCK_MSG_BAD_NAME, invalid 
author/committer line - bad name);
-   if (**ident != '')
+   if (*p != '')
return report(options, obj, FSCK_MSG_MISSING_EMAIL, invalid 
author/committer line - missing email);
-   if ((*ident)[-1] != ' ')
+   if (p[-1] != ' ')
return report(options, obj, 
FSCK_MSG_MISSING_SPACE_BEFORE_EMAIL, invalid author/committer line - missing 
space before email);
-   (*ident)++;
-   *ident += strcspn(*ident, \n);
-   if (**ident != '')
+   p++;
+   p += strcspn(p, \n);
+   if (*p != '')
return report(options, obj, FSCK_MSG_BAD_EMAIL, invalid 
author/committer line - bad email);
-   (*ident)++;
-   if (**ident != ' ')
+   p++;
+   if (*p != ' ')
return report(options, obj, FSCK_MSG_MISSING_SPACE_BEFORE_DATE, 
invalid author/committer line - missing space before date);
-   (*ident)++;
-   if (**ident == '0'  (*ident)[1] != ' ')
+   p++;
+   if (*p == '0'  p[1] != ' ')
return report(options, obj, FSCK_MSG_ZERO_PADDED_DATE, invalid 
author/committer line - zero-padded date);
-   if (date_overflows(strtoul(*ident, end, 10)))
+   if (date_overflows(strtoul(p, end, 10)))
return report(options, obj, FSCK_MSG_BAD_DATE_OVERFLOW, 
invalid author/committer line - date causes integer overflow);
-   if (end == *ident || *end != ' ')
+   if ((end == p || *end != ' '))
return report(options, obj, FSCK_MSG_BAD_DATE, invalid 
author/committer line - bad date);
-   *ident = end + 1;
-   if ((**ident != '+'  **ident != '-') ||
-   !isdigit((*ident)[1]) ||
-   !isdigit((*ident)[2]) ||
-   !isdigit((*ident)[3]) ||
-   !isdigit((*ident)[4]) ||
-   ((*ident)[5] != '\n'))
+   p = end + 1;
+   if ((*p != '+'  *p != '-') ||
+   !isdigit(p[1]) ||
+   !isdigit(p[2]) ||
+   !isdigit(p[3]) ||
+   !isdigit(p[4]) ||
+   (p[5] != '\n'))
return report(options, obj, FSCK_MSG_BAD_TIMEZONE, invalid 
author/committer line - bad time zone);
-   (*ident) += 6;
+   p += 6;
return 0;
 }
 
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 10/19] fsck: Make fsck_tag() warn-friendly

2015-06-22 Thread Johannes Schindelin
When fsck_tag() identifies a problem with the commit, it should try
to make it possible to continue checking the commit object, in case the
user wants to demote the detected errors to mere warnings.

Just like fsck_commit(), there are certain problems that could hide other
issues with the same tag object. For example, if the 'type' line is not
encountered in the correct position, the 'tag' line – if there is any –
would not be handled at all.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/fsck.c b/fsck.c
index daa07ad..3f76f99 100644
--- a/fsck.c
+++ b/fsck.c
@@ -642,7 +642,8 @@ static int fsck_tag_buffer(struct tag *tag, const char 
*data,
}
if (get_sha1_hex(buffer, sha1) || buffer[40] != '\n') {
ret = report(options, tag-object, FSCK_MSG_BAD_OBJECT_SHA1, 
invalid 'object' line format - bad sha1);
-   goto done;
+   if (ret)
+   goto done;
}
buffer += 41;
 
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 15/19] fsck: Document the new receive.fsck.msg-id options

2015-06-22 Thread Johannes Schindelin
Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 Documentation/config.txt | 14 ++
 1 file changed, 14 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 3e37b93..4e5fbea 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2205,6 +2205,20 @@ receive.fsckObjects::
Defaults to false. If not set, the value of `transfer.fsckObjects`
is used instead.
 
+receive.fsck.msg-id::
+   When `receive.fsckObjects` is set to true, errors can be switched
+   to warnings and vice versa by configuring the `receive.fsck.msg-id`
+   setting where the `msg-id` is the fsck message ID and the value
+   is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes
+   the error/warning with the message ID, e.g. missingEmail: invalid
+   author/committer line - missing email means that setting
+   `receive.fsck.missingEmail = ignore` will hide that issue.
++
+This feature is intended to support working with legacy repositories
+which would not pass pushing when `receive.fsckObjects = true`, allowing
+the host to accept repositories with certain known issues but still catch
+other issues.
+
 receive.unpackLimit::
If the number of objects received in a push is below this
limit then the objects will be unpacked into loose object
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 12/19] fsck: Disallow demoting grave fsck errors to warnings

2015-06-22 Thread Johannes Schindelin
Some kinds of errors are intrinsically unrecoverable (e.g. errors while
uncompressing objects). It does not make sense to allow demoting them to
mere warnings.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c  | 13 +++--
 t/t5504-fetch-receive-strict.sh | 11 +++
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 3f76f99..85535b1 100644
--- a/fsck.c
+++ b/fsck.c
@@ -9,7 +9,12 @@
 #include refs.h
 #include utf8.h
 
+#define FSCK_FATAL -1
+
 #define FOREACH_MSG_ID(FUNC) \
+   /* fatal errors */ \
+   FUNC(NUL_IN_HEADER, FATAL) \
+   FUNC(UNTERMINATED_HEADER, FATAL) \
/* errors */ \
FUNC(BAD_DATE, ERROR) \
FUNC(BAD_DATE_OVERFLOW, ERROR) \
@@ -39,11 +44,9 @@
FUNC(MISSING_TYPE, ERROR) \
FUNC(MISSING_TYPE_ENTRY, ERROR) \
FUNC(MULTIPLE_AUTHORS, ERROR) \
-   FUNC(NUL_IN_HEADER, ERROR) \
FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
FUNC(TREE_NOT_SORTED, ERROR) \
FUNC(UNKNOWN_TYPE, ERROR) \
-   FUNC(UNTERMINATED_HEADER, ERROR) \
FUNC(ZERO_PADDED_DATE, ERROR) \
/* warnings */ \
FUNC(BAD_FILEMODE, WARN) \
@@ -149,6 +152,9 @@ void fsck_set_msg_type(struct fsck_options *options,
die(Unhandled message id: %s, msg_id);
type = parse_msg_type(msg_type);
 
+   if (type != FSCK_ERROR  msg_id_info[id].msg_type == FSCK_FATAL)
+   die(Cannot demote %s to %s, msg_id, msg_type);
+
if (!options-msg_type) {
int i;
int *msg_type = xmalloc(sizeof(int) * FSCK_MSG_MAX);
@@ -215,6 +221,9 @@ static int report(struct fsck_options *options, struct 
object *object,
struct strbuf sb = STRBUF_INIT;
int msg_type = fsck_msg_type(id, options), result;
 
+   if (msg_type == FSCK_FATAL)
+   msg_type = FSCK_ERROR;
+
append_msg_id(sb, msg_id_info[id].id_string);
 
va_start(ap, fmt);
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index 36024fc..f5d6d0d 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -136,4 +136,15 @@ test_expect_success 'push with 
receive.fsck.missingEmail=warn' '
grep missingEmail act
 '
 
+test_expect_success \
+   'receive.fsck.unterminatedHeader=warn triggers error' '
+   rm -rf dst 
+   git init dst 
+   git --git-dir=dst/.git config receive.fsckobjects true 
+   git --git-dir=dst/.git config \
+   receive.fsck.unterminatedheader warn 
+   test_must_fail git push --porcelain dst HEAD act 21 
+   grep Cannot demote unterminatedheader act
+'
+
 test_done
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 06/19] fsck: Report the ID of the error/warning

2015-06-22 Thread Johannes Schindelin
Some legacy code has objects with non-fatal fsck issues; To enable the
user to ignore those issues, let's print out the ID (e.g. when
encountering missingEmail, the user might want to call `git config
--add receive.fsck.missingEmail=warn`).

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c  | 20 
 t/t1450-fsck.sh |  4 ++--
 2 files changed, 22 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 02af3ed..0ec02b2 100644
--- a/fsck.c
+++ b/fsck.c
@@ -188,6 +188,24 @@ void fsck_set_msg_types(struct fsck_options *options, 
const char *values)
free(to_free);
 }
 
+static void append_msg_id(struct strbuf *sb, const char *msg_id)
+{
+   for (;;) {
+   char c = *(msg_id)++;
+
+   if (!c)
+   break;
+   if (c != '_')
+   strbuf_addch(sb, tolower(c));
+   else {
+   assert(*msg_id);
+   strbuf_addch(sb, *(msg_id)++);
+   }
+   }
+
+   strbuf_addstr(sb, : );
+}
+
 __attribute__((format (printf, 4, 5)))
 static int report(struct fsck_options *options, struct object *object,
enum fsck_msg_id id, const char *fmt, ...)
@@ -196,6 +214,8 @@ static int report(struct fsck_options *options, struct 
object *object,
struct strbuf sb = STRBUF_INIT;
int msg_type = fsck_msg_type(id, options), result;
 
+   append_msg_id(sb, msg_id_info[id].id_string);
+
va_start(ap, fmt);
strbuf_vaddf(sb, fmt, ap);
result = options-error_func(object, msg_type, sb.buf);
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index cfb32b6..7c5b3d5 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -231,8 +231,8 @@ test_expect_success 'tag with incorrect tag name  missing 
tagger' '
git fsck --tags 2out 
 
cat expect -EOF 
-   warning in tag $tag: invalid '\''tag'\'' name: wrong name format
-   warning in tag $tag: invalid format - expected '\''tagger'\'' line
+   warning in tag $tag: badTagName: invalid '\''tag'\'' name: wrong name 
format
+   warning in tag $tag: missingTaggerEntry: invalid format - expected 
'\''tagger'\'' line
EOF
test_cmp expect out
 '
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 16/19] fsck: Support demoting errors to warnings

2015-06-22 Thread Johannes Schindelin
We already have support in `git receive-pack` to deal with some legacy
repositories which have non-fatal issues.

Let's make `git fsck` itself useful with such repositories, too, by
allowing users to ignore known issues, or at least demote those issues
to mere warnings.

Example: `git -c fsck.missingEmail=ignore fsck` would hide
problems with missing emails in author, committer and tagger lines.

In the same spirit that `git receive-pack`'s usage of the fsck machinery
differs from `git fsck`'s – some of the non-fatal warnings in `git fsck`
are fatal with `git receive-pack` when receive.fsckObjects = true, for
example – we strictly separate the fsck.msg-id from the
receive.fsck.msg-id settings.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 Documentation/config.txt | 11 +++
 builtin/fsck.c   | 12 
 t/t1450-fsck.sh  | 11 +++
 3 files changed, 34 insertions(+)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4e5fbea..bfccd2b 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1250,6 +1250,17 @@ filter.driver.smudge::
object to a worktree file upon checkout.  See
linkgit:gitattributes[5] for details.
 
+fsck.msg-id::
+   Allows overriding the message type (error, warn or ignore) of a
+   specific message ID such as `missingEmail`.
++
+For convenience, fsck prefixes the error/warning with the message ID,
+e.g.  missingEmail: invalid author/committer line - missing email means
+that setting `fsck.missingEmail = ignore` will hide that issue.
++
+This feature is intended to support working with legacy repositories
+which cannot be repaired without disruptive changes.
+
 gc.aggressiveDepth::
The depth parameter used in the delta compression
algorithm used by 'git gc --aggressive'.  This defaults
diff --git a/builtin/fsck.c b/builtin/fsck.c
index fff38fe..adaa802 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -46,6 +46,16 @@ static int show_dangling = 1;
 #define DIRENT_SORT_HINT(de) ((de)-d_ino)
 #endif
 
+static int fsck_config(const char *var, const char *value, void *cb)
+{
+   if (skip_prefix(var, fsck., var)) {
+   fsck_set_msg_type(fsck_obj_options, var, value);
+   return 0;
+   }
+
+   return git_default_config(var, value, cb);
+}
+
 static void objreport(struct object *obj, const char *msg_type,
const char *err)
 {
@@ -646,6 +656,8 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
include_reflogs = 0;
}
 
+   git_config(fsck_config, NULL);
+
fsck_head_link();
fsck_object_dir(get_object_directory());
 
diff --git a/t/t1450-fsck.sh b/t/t1450-fsck.sh
index 7c5b3d5..1727129 100755
--- a/t/t1450-fsck.sh
+++ b/t/t1450-fsck.sh
@@ -287,6 +287,17 @@ test_expect_success 'rev-list --verify-objects with bad 
sha1' '
grep -q error: sha1 mismatch 63ff 
out
 '
 
+test_expect_success 'force fsck to ignore double author' '
+   git cat-file commit HEAD basis 
+   sed s/^author .*/,/ basis | tr , \\n multiple-authors 
+   new=$(git hash-object -t commit -w --stdin multiple-authors) 
+   test_when_finished remove_object $new 
+   git update-ref refs/heads/bogus $new 
+   test_when_finished git update-ref -d refs/heads/bogus 
+   test_must_fail git fsck 
+   git -c fsck.multipleAuthors=ignore fsck
+'
+
 _bz='\0'
 _bz5=$_bz$_bz$_bz$_bz$_bz
 _bz20=$_bz5$_bz5$_bz5$_bz5
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v7 04/19] fsck: Offer a function to demote fsck errors to warnings

2015-06-22 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 diff --git a/fsck.c b/fsck.c
 index 1a3f7ce..e81a342 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -64,30 +64,29 @@ enum fsck_msg_id {
  #undef MSG_ID
  
  #define STR(x) #x
 -#define MSG_ID(id, msg_type) { STR(id), FSCK_##msg_type },
 +#define MSG_ID(id, msg_type) { STR(id), NULL, FSCK_##msg_type },
  static struct {
   const char *id_string;
 + const char *lowercased;
   int msg_type;
  } msg_id_info[FSCK_MSG_MAX + 1] = {
   FOREACH_MSG_ID(MSG_ID)
 - { NULL, -1 }
 + { NULL, NULL, -1 }
  };
  #undef MSG_ID
  
  static int parse_msg_id(const char *text)
  {
 - static char **lowercased;
   int i;
  
 - if (!lowercased) {
 + if (!msg_id_info[0].lowercased) {
   /* convert id_string to lower case, without underscores. */
 - lowercased = xmalloc(FSCK_MSG_MAX * sizeof(*lowercased));
   for (i = 0; i  FSCK_MSG_MAX; i++) {
   const char *p = msg_id_info[i].id_string;
   int len = strlen(p);
   char *q = xmalloc(len);
  
 - lowercased[i] = q;
 + msg_id_info[i].lowercased = q;
   while (*p)
   if (*p == '_')
   p++;
 @@ -98,7 +97,7 @@ static int parse_msg_id(const char *text)
   }
  
   for (i = 0; i  FSCK_MSG_MAX; i++)
 - if (!strcmp(text, lowercased[i]))
 + if (!strcmp(text, msg_id_info[i].lowercased))
   return i;
  
   return -1;

Heh, this was the first thing that came to my mind when I saw 03/19
that lazily prepares downcased version (which is good) but do so in
a separately allocated buffer (which is improved by this change) ;-)

IOW, I think all of the above should have been part of 03/19, not
oops I belatedly realized that this way is better fixup here.

The end result looks good, so let's keep reading.

 +void fsck_set_msg_types(struct fsck_options *options, const char *values)
 +{
 + char *buf = xstrdup(values), *to_free = buf;
 + int done = 0;
 +
 + while (!done) {
 + int len = strcspn(buf,  ,|), equal;
 +
 + done = !buf[len];
 + if (!len) {
 + buf++;
 + continue;
 + }
 + buf[len] = '\0';
 +
 + for (equal = 0; equal  len 
 + buf[equal] != '='  buf[equal] != ':'; equal++)

Style.  I'd format this more like so:

for (equal = 0;
 equal  len  buf[equal] != '='  buf[equal] != ':';
 equal++)

 + buf[equal] = tolower(buf[equal]);
 + buf[equal] = '\0';
 +
 + if (equal == len)
 + die(Missing '=': '%s', buf);
 +
 + fsck_set_msg_type(options, buf, buf + equal + 1);
 + buf += len + 1;
 + }
 + free(to_free);
 +}

Overall, the change is good (and it was good in v6, too), and I
think it has become simpler to follow the logic with the upfront
downcasing.

Thanks.


--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 04/19] fsck: Offer a function to demote fsck errors to warnings

2015-06-22 Thread Johannes Schindelin
There are legacy repositories out there whose older commits and tags
have issues that prevent pushing them when 'receive.fsckObjects' is set.
One real-life example is a commit object that has been hand-crafted to
list two authors.

Often, it is not possible to fix those issues without disrupting the
work with said repositories, yet it is still desirable to perform checks
by setting `receive.fsckObjects = true`. This commit is the first step
to allow demoting specific fsck issues to mere warnings.

The `fsck_set_msg_types()` function added by this commit parses a list
of settings in the form:

missingemail=warn,badname=warn,...

Unfortunately, the FSCK_WARN/FSCK_ERROR flag is only really heeded by
git fsck so far, but other call paths (e.g. git index-pack --strict)
error out *always* no matter what type was specified. Therefore, we need
to take extra care to set all message types to FSCK_ERROR by default in
those cases.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 88 ++
 fsck.h |  9 +--
 2 files changed, 85 insertions(+), 12 deletions(-)

diff --git a/fsck.c b/fsck.c
index 1a3f7ce..e81a342 100644
--- a/fsck.c
+++ b/fsck.c
@@ -64,30 +64,29 @@ enum fsck_msg_id {
 #undef MSG_ID
 
 #define STR(x) #x
-#define MSG_ID(id, msg_type) { STR(id), FSCK_##msg_type },
+#define MSG_ID(id, msg_type) { STR(id), NULL, FSCK_##msg_type },
 static struct {
const char *id_string;
+   const char *lowercased;
int msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
FOREACH_MSG_ID(MSG_ID)
-   { NULL, -1 }
+   { NULL, NULL, -1 }
 };
 #undef MSG_ID
 
 static int parse_msg_id(const char *text)
 {
-   static char **lowercased;
int i;
 
-   if (!lowercased) {
+   if (!msg_id_info[0].lowercased) {
/* convert id_string to lower case, without underscores. */
-   lowercased = xmalloc(FSCK_MSG_MAX * sizeof(*lowercased));
for (i = 0; i  FSCK_MSG_MAX; i++) {
const char *p = msg_id_info[i].id_string;
int len = strlen(p);
char *q = xmalloc(len);
 
-   lowercased[i] = q;
+   msg_id_info[i].lowercased = q;
while (*p)
if (*p == '_')
p++;
@@ -98,7 +97,7 @@ static int parse_msg_id(const char *text)
}
 
for (i = 0; i  FSCK_MSG_MAX; i++)
-   if (!strcmp(text, lowercased[i]))
+   if (!strcmp(text, msg_id_info[i].lowercased))
return i;
 
return -1;
@@ -109,13 +108,78 @@ static int fsck_msg_type(enum fsck_msg_id msg_id,
 {
int msg_type;
 
-   msg_type = msg_id_info[msg_id].msg_type;
-   if (options-strict  msg_type == FSCK_WARN)
-   msg_type = FSCK_ERROR;
+   assert(msg_id = 0  msg_id  FSCK_MSG_MAX);
+
+   if (options-msg_type)
+   msg_type = options-msg_type[msg_id];
+   else {
+   msg_type = msg_id_info[msg_id].msg_type;
+   if (options-strict  msg_type == FSCK_WARN)
+   msg_type = FSCK_ERROR;
+   }
 
return msg_type;
 }
 
+static int parse_msg_type(const char *str)
+{
+   if (!strcmp(str, error))
+   return FSCK_ERROR;
+   else if (!strcmp(str, warn))
+   return FSCK_WARN;
+   else
+   die(Unknown fsck message type: '%s', str);
+}
+
+void fsck_set_msg_type(struct fsck_options *options,
+   const char *msg_id, const char *msg_type)
+{
+   int id = parse_msg_id(msg_id), type;
+
+   if (id  0)
+   die(Unhandled message id: %s, msg_id);
+   type = parse_msg_type(msg_type);
+
+   if (!options-msg_type) {
+   int i;
+   int *msg_type = xmalloc(sizeof(int) * FSCK_MSG_MAX);
+   for (i = 0; i  FSCK_MSG_MAX; i++)
+   msg_type[i] = fsck_msg_type(i, options);
+   options-msg_type = msg_type;
+   }
+
+   options-msg_type[id] = type;
+}
+
+void fsck_set_msg_types(struct fsck_options *options, const char *values)
+{
+   char *buf = xstrdup(values), *to_free = buf;
+   int done = 0;
+
+   while (!done) {
+   int len = strcspn(buf,  ,|), equal;
+
+   done = !buf[len];
+   if (!len) {
+   buf++;
+   continue;
+   }
+   buf[len] = '\0';
+
+   for (equal = 0; equal  len 
+   buf[equal] != '='  buf[equal] != ':'; equal++)
+   buf[equal] = tolower(buf[equal]);
+   buf[equal] = '\0';
+
+   if (equal == len)
+   die(Missing '=': '%s', buf);
+
+   fsck_set_msg_type(options, buf, buf + equal + 1);
+   

[PATCH v7 00/19] Introduce an internal API to interact with the fsck machinery

2015-06-22 Thread Johannes Schindelin
At the moment, the git-fsck's integrity checks are targeted toward the
end user, i.e. the error messages are really just messages, intended for
human consumption.

Under certain circumstances, some of those errors should be allowed to
be turned into mere warnings, though, because the cost of fixing the
issues might well be larger than the cost of carrying those flawed
objects. For example, when an already-public repository contains a
commit object with two authors for years, it does not make sense to
force the maintainer to rewrite the history, affecting all contributors
negatively by forcing them to update.

This branch introduces an internal fsck API to be able to turn some of
the errors into warnings, and to make it easier to call the fsck
machinery from elsewhere in general.

I am proud to report that this work has been sponsored by GitHub.

Changes since v6:

- camelCased message IDs

- multiple author checking now as suggested by Junio

- renamed `--quick` to `--connectivity-only`, better commit message

- `fsck.skipList` is now handled correctly (and not mistaken for a message
  type setting)

- `fsck.skipList` can handle user paths now

- index-pack configures the walk function in a more logical place now

- simplified code by avoiding working on partial strings (i.e. removed
  `substrcmp()`). This saves 10 lines. To accomodate parsing config
  variables directly, we now work on lowercased message IDs; unfortunately
  this means that we cannot use them in append_msg_id() because that
  function wants to append camelCased message IDs.

Interdiff below diffstat.

Johannes Schindelin (19):
  fsck: Introduce fsck options
  fsck: Introduce identifiers for fsck messages
  fsck: Provide a function to parse fsck message IDs
  fsck: Offer a function to demote fsck errors to warnings
  fsck (receive-pack): Allow demoting errors to warnings
  fsck: Report the ID of the error/warning
  fsck: Make fsck_ident() warn-friendly
  fsck: Make fsck_commit() warn-friendly
  fsck: Handle multiple authors in commits specially
  fsck: Make fsck_tag() warn-friendly
  fsck: Add a simple test for receive.fsck.msg-id
  fsck: Disallow demoting grave fsck errors to warnings
  fsck: Optionally ignore specific fsck issues completely
  fsck: Allow upgrading fsck warnings to errors
  fsck: Document the new receive.fsck.msg-id options
  fsck: Support demoting errors to warnings
  fsck: Introduce `git fsck --connectivity-only`
  fsck: git receive-pack: support excluding objects from fsck'ing
  fsck: support ignoring objects in `git fsck` via fsck.skiplist

 Documentation/config.txt|  41 +++
 Documentation/git-fsck.txt  |   7 +-
 builtin/fsck.c  |  78 --
 builtin/index-pack.c|  13 +-
 builtin/receive-pack.c  |  28 +-
 builtin/unpack-objects.c|  16 +-
 fsck.c  | 554 +++-
 fsck.h  |  30 ++-
 t/t1450-fsck.sh |  37 ++-
 t/t5302-pack-index.sh   |   2 +-
 t/t5504-fetch-receive-strict.sh |  51 
 11 files changed, 692 insertions(+), 165 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5aba63a..69dda93 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1252,11 +1252,11 @@ filter.driver.smudge::
 
 fsck.msg-id::
Allows overriding the message type (error, warn or ignore) of a
-   specific message ID such as `missingemail`.
+   specific message ID such as `missingEmail`.
 +
 For convenience, fsck prefixes the error/warning with the message ID,
-e.g.  missingemail: invalid author/committer line - missing email means
-that setting `fsck.missingemail = ignore` will hide that issue.
+e.g.  missingEmail: invalid author/committer line - missing email means
+that setting `fsck.missingEmail = ignore` will hide that issue.
 +
 This feature is intended to support working with legacy repositories
 which cannot be repaired without disruptive changes.
@@ -1267,6 +1267,7 @@ fsck.skipList::
be ignored. This feature is useful when an established project
should be accepted despite early commits containing errors that
can be safely ignored such as invalid committer email addresses.
+   Note: corrupt objects cannot be skipped with this setting.
 
 gc.aggressiveDepth::
The depth parameter used in the delta compression
@@ -2228,9 +2229,9 @@ receive.fsck.msg-id::
to warnings and vice versa by configuring the `receive.fsck.msg-id`
setting where the `msg-id` is the fsck message ID and the value
is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes
-   the error/warning with the message ID, e.g. missingemail: invalid
+   the error/warning with the message ID, e.g. missingEmail: invalid
author/committer line - missing email means that setting
-   `receive.fsck.missingemail = ignore` will hide that issue.
+   

[PATCH v7 03/19] fsck: Provide a function to parse fsck message IDs

2015-06-22 Thread Johannes Schindelin
These functions will be used in the next commits to allow the user to
ask fsck to handle specific problems differently, e.g. demoting certain
errors to warnings. The upcoming `fsck_set_msg_types()` function has to
handle partial strings because we would like to be able to parse, say,
'missingemail=warn,missingtaggerentry=warn' command line parameters
(which will be passed by receive-pack to index-pack and unpack-objects).

To make the parsing robust, we generate strings from the enum keys, and
using these keys, we match up strings without dashes case-insensitively
to the corresponding enum values.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index ab24618..1a3f7ce 100644
--- a/fsck.c
+++ b/fsck.c
@@ -63,15 +63,47 @@ enum fsck_msg_id {
 };
 #undef MSG_ID
 
-#define MSG_ID(id, msg_type) { FSCK_##msg_type },
+#define STR(x) #x
+#define MSG_ID(id, msg_type) { STR(id), FSCK_##msg_type },
 static struct {
+   const char *id_string;
int msg_type;
 } msg_id_info[FSCK_MSG_MAX + 1] = {
FOREACH_MSG_ID(MSG_ID)
-   { -1 }
+   { NULL, -1 }
 };
 #undef MSG_ID
 
+static int parse_msg_id(const char *text)
+{
+   static char **lowercased;
+   int i;
+
+   if (!lowercased) {
+   /* convert id_string to lower case, without underscores. */
+   lowercased = xmalloc(FSCK_MSG_MAX * sizeof(*lowercased));
+   for (i = 0; i  FSCK_MSG_MAX; i++) {
+   const char *p = msg_id_info[i].id_string;
+   int len = strlen(p);
+   char *q = xmalloc(len);
+
+   lowercased[i] = q;
+   while (*p)
+   if (*p == '_')
+   p++;
+   else
+   *(q)++ = tolower(*(p)++);
+   *q = '\0';
+   }
+   }
+
+   for (i = 0; i  FSCK_MSG_MAX; i++)
+   if (!strcmp(text, lowercased[i]))
+   return i;
+
+   return -1;
+}
+
 static int fsck_msg_type(enum fsck_msg_id msg_id,
struct fsck_options *options)
 {
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 05/19] fsck (receive-pack): Allow demoting errors to warnings

2015-06-22 Thread Johannes Schindelin
For example, missing emails in commit and tag objects can be demoted to
mere warnings with

git config receive.fsck.missingemail=warn

The value is actually a comma-separated list.

In case that the same key is listed in multiple receive.fsck.msg-id
lines in the config, the latter configuration wins (this can happen for
example when both $HOME/.gitconfig and .git/config contain message type
settings).

As git receive-pack does not actually perform the checks, it hands off
the setting to index-pack or unpack-objects in the form of an optional
argument to the --strict option.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 builtin/index-pack.c |  4 
 builtin/receive-pack.c   | 17 +++--
 builtin/unpack-objects.c |  5 +
 fsck.c   |  8 
 fsck.h   |  1 +
 5 files changed, 33 insertions(+), 2 deletions(-)

diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index f8b0c64..f0d283b 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -1633,6 +1633,10 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
} else if (!strcmp(arg, --strict)) {
strict = 1;
do_fsck_object = 1;
+   } else if (skip_prefix(arg, --strict=, arg)) {
+   strict = 1;
+   do_fsck_object = 1;
+   fsck_set_msg_types(fsck_options, arg);
} else if (!strcmp(arg, 
--check-self-contained-and-connected)) {
strict = 1;
check_self_contained_and_connected = 1;
diff --git a/builtin/receive-pack.c b/builtin/receive-pack.c
index 94d0571..3afe8f8 100644
--- a/builtin/receive-pack.c
+++ b/builtin/receive-pack.c
@@ -19,6 +19,7 @@
 #include tag.h
 #include gpg-interface.h
 #include sigchain.h
+#include fsck.h
 
 static const char receive_pack_usage[] = git receive-pack git-dir;
 
@@ -36,6 +37,7 @@ static enum deny_action deny_current_branch = 
DENY_UNCONFIGURED;
 static enum deny_action deny_delete_current = DENY_UNCONFIGURED;
 static int receive_fsck_objects = -1;
 static int transfer_fsck_objects = -1;
+static struct strbuf fsck_msg_types = STRBUF_INIT;
 static int receive_unpack_limit = -1;
 static int transfer_unpack_limit = -1;
 static int advertise_atomic_push = 1;
@@ -115,6 +117,15 @@ static int receive_pack_config(const char *var, const char 
*value, void *cb)
return 0;
}
 
+   if (skip_prefix(var, receive.fsck., var)) {
+   if (is_valid_msg_type(var, value))
+   strbuf_addf(fsck_msg_types, %c%s=%s,
+   fsck_msg_types.len ? ',' : '=', var, value);
+   else
+   warning(Skipping unknown msg id '%s', var);
+   return 0;
+   }
+
if (strcmp(var, receive.fsckobjects) == 0) {
receive_fsck_objects = git_config_bool(var, value);
return 0;
@@ -1490,7 +1501,8 @@ static const char *unpack(int err_fd, struct shallow_info 
*si)
if (quiet)
argv_array_push(child.args, -q);
if (fsck_objects)
-   argv_array_push(child.args, --strict);
+   argv_array_pushf(child.args, --strict%s,
+   fsck_msg_types.buf);
child.no_stdout = 1;
child.err = err_fd;
child.git_cmd = 1;
@@ -1508,7 +1520,8 @@ static const char *unpack(int err_fd, struct shallow_info 
*si)
argv_array_pushl(child.args, index-pack,
 --stdin, hdr_arg, keep_arg, NULL);
if (fsck_objects)
-   argv_array_push(child.args, --strict);
+   argv_array_pushf(child.args, --strict%s,
+   fsck_msg_types.buf);
if (fix_thin)
argv_array_push(child.args, --fix-thin);
child.out = -1;
diff --git a/builtin/unpack-objects.c b/builtin/unpack-objects.c
index 6d17040..7cc086f 100644
--- a/builtin/unpack-objects.c
+++ b/builtin/unpack-objects.c
@@ -530,6 +530,11 @@ int cmd_unpack_objects(int argc, const char **argv, const 
char *prefix)
strict = 1;
continue;
}
+   if (skip_prefix(arg, --strict=, arg)) {
+   strict = 1;
+   fsck_set_msg_types(fsck_options, arg);
+   continue;
+   }
if (starts_with(arg, --pack_header=)) {
struct pack_header *hdr;
char *c;
diff --git a/fsck.c b/fsck.c
index e81a342..02af3ed 100644
--- a/fsck.c

[PATCH v7 01/19] fsck: Introduce fsck options

2015-06-22 Thread Johannes Schindelin
Just like the diff machinery, we are about to introduce more settings,
therefore it makes sense to carry them around as a (pointer to a) struct
containing all of them.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 builtin/fsck.c   |  20 +--
 builtin/index-pack.c |   9 +--
 builtin/unpack-objects.c |  11 ++--
 fsck.c   | 150 +++
 fsck.h   |  17 +-
 5 files changed, 114 insertions(+), 93 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 2679793..981dca5 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -25,6 +25,8 @@ static int include_reflogs = 1;
 static int check_full = 1;
 static int check_strict;
 static int keep_cache_objects;
+static struct fsck_options fsck_walk_options = FSCK_OPTIONS_DEFAULT;
+static struct fsck_options fsck_obj_options = FSCK_OPTIONS_DEFAULT;
 static struct object_id head_oid;
 static const char *head_points_at;
 static int errors_found;
@@ -76,7 +78,7 @@ static int fsck_error_func(struct object *obj, int type, 
const char *err, ...)
 
 static struct object_array pending;
 
-static int mark_object(struct object *obj, int type, void *data)
+static int mark_object(struct object *obj, int type, void *data, struct 
fsck_options *options)
 {
struct object *parent = data;
 
@@ -119,7 +121,7 @@ static int mark_object(struct object *obj, int type, void 
*data)
 
 static void mark_object_reachable(struct object *obj)
 {
-   mark_object(obj, OBJ_ANY, NULL);
+   mark_object(obj, OBJ_ANY, NULL, NULL);
 }
 
 static int traverse_one_object(struct object *obj)
@@ -132,7 +134,7 @@ static int traverse_one_object(struct object *obj)
if (parse_tree(tree)  0)
return 1; /* error already displayed */
}
-   result = fsck_walk(obj, mark_object, obj);
+   result = fsck_walk(obj, obj, fsck_walk_options);
if (tree)
free_tree_buffer(tree);
return result;
@@ -158,7 +160,7 @@ static int traverse_reachable(void)
return !!result;
 }
 
-static int mark_used(struct object *obj, int type, void *data)
+static int mark_used(struct object *obj, int type, void *data, struct 
fsck_options *options)
 {
if (!obj)
return 1;
@@ -296,9 +298,9 @@ static int fsck_obj(struct object *obj)
fprintf(stderr, Checking %s %s\n,
typename(obj-type), sha1_to_hex(obj-sha1));
 
-   if (fsck_walk(obj, mark_used, NULL))
+   if (fsck_walk(obj, NULL, fsck_obj_options))
objerror(obj, broken links);
-   if (fsck_object(obj, NULL, 0, check_strict, fsck_error_func))
+   if (fsck_object(obj, NULL, 0, fsck_obj_options))
return -1;
 
if (obj-type == OBJ_TREE) {
@@ -638,6 +640,12 @@ int cmd_fsck(int argc, const char **argv, const char 
*prefix)
 
argc = parse_options(argc, argv, prefix, fsck_opts, fsck_usage, 0);
 
+   fsck_walk_options.walk = mark_object;
+   fsck_obj_options.walk = mark_used;
+   fsck_obj_options.error_func = fsck_error_func;
+   if (check_strict)
+   fsck_obj_options.strict = 1;
+
if (show_progress == -1)
show_progress = isatty(2);
if (verbose)
diff --git a/builtin/index-pack.c b/builtin/index-pack.c
index 48fa472..f8b0c64 100644
--- a/builtin/index-pack.c
+++ b/builtin/index-pack.c
@@ -75,6 +75,7 @@ static int nr_threads;
 static int from_stdin;
 static int strict;
 static int do_fsck_object;
+static struct fsck_options fsck_options = FSCK_OPTIONS_STRICT;
 static int verbose;
 static int show_stat;
 static int check_self_contained_and_connected;
@@ -192,7 +193,7 @@ static void cleanup_thread(void)
 #endif
 
 
-static int mark_link(struct object *obj, int type, void *data)
+static int mark_link(struct object *obj, int type, void *data, struct 
fsck_options *options)
 {
if (!obj)
return -1;
@@ -838,10 +839,9 @@ static void sha1_object(const void *data, struct 
object_entry *obj_entry,
if (!obj)
die(_(invalid %s), typename(type));
if (do_fsck_object 
-   fsck_object(obj, buf, size, 1,
-   fsck_error_function))
+   fsck_object(obj, buf, size, fsck_options))
die(_(Error in object));
-   if (fsck_walk(obj, mark_link, NULL))
+   if (fsck_walk(obj, NULL, fsck_options))
die(_(Not all child objects of %s are 
reachable), sha1_to_hex(obj-sha1));
 
if (obj-type == OBJ_TREE) {
@@ -1615,6 +1615,7 @@ int cmd_index_pack(int argc, const char **argv, const 
char *prefix)
usage(index_pack_usage);
 
check_replace_refs = 0;
+   fsck_options.walk = mark_link;
 

[PATCH v7 02/19] fsck: Introduce identifiers for fsck messages

2015-06-22 Thread Johannes Schindelin
Instead of specifying whether a message by the fsck machinery constitutes
an error or a warning, let's specify an identifier relating to the
concrete problem that was encountered. This is necessary for upcoming
support to be able to demote certain errors to warnings.

In the process, simplify the requirements on the calling code: instead of
having to handle full-blown varargs in every callback, we now send a
string buffer ready to be used by the callback.

We could use a simple enum for the message IDs here, but we want to
guarantee that the enum values are associated with the appropriate
message types (i.e. error or warning?). Besides, we want to introduce a
parser in the next commit that maps the string representation to the
enum value, hence we use the slightly ugly preprocessor construct that
is extensible for use with said parser.

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 builtin/fsck.c |  26 +++-
 fsck.c | 201 +
 fsck.h |   5 +-
 3 files changed, 154 insertions(+), 78 deletions(-)

diff --git a/builtin/fsck.c b/builtin/fsck.c
index 981dca5..fff38fe 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -46,33 +46,23 @@ static int show_dangling = 1;
 #define DIRENT_SORT_HINT(de) ((de)-d_ino)
 #endif
 
-static void objreport(struct object *obj, const char *severity,
-  const char *err, va_list params)
+static void objreport(struct object *obj, const char *msg_type,
+   const char *err)
 {
-   fprintf(stderr, %s in %s %s: ,
-   severity, typename(obj-type), sha1_to_hex(obj-sha1));
-   vfprintf(stderr, err, params);
-   fputs(\n, stderr);
+   fprintf(stderr, %s in %s %s: %s\n,
+   msg_type, typename(obj-type), sha1_to_hex(obj-sha1), err);
 }
 
-__attribute__((format (printf, 2, 3)))
-static int objerror(struct object *obj, const char *err, ...)
+static int objerror(struct object *obj, const char *err)
 {
-   va_list params;
-   va_start(params, err);
errors_found |= ERROR_OBJECT;
-   objreport(obj, error, err, params);
-   va_end(params);
+   objreport(obj, error, err);
return -1;
 }
 
-__attribute__((format (printf, 3, 4)))
-static int fsck_error_func(struct object *obj, int type, const char *err, ...)
+static int fsck_error_func(struct object *obj, int type, const char *message)
 {
-   va_list params;
-   va_start(params, err);
-   objreport(obj, (type == FSCK_WARN) ? warning : error, err, params);
-   va_end(params);
+   objreport(obj, (type == FSCK_WARN) ? warning : error, message);
return (type == FSCK_WARN) ? 0 : 1;
 }
 
diff --git a/fsck.c b/fsck.c
index d83b811..ab24618 100644
--- a/fsck.c
+++ b/fsck.c
@@ -9,6 +9,98 @@
 #include refs.h
 #include utf8.h
 
+#define FOREACH_MSG_ID(FUNC) \
+   /* errors */ \
+   FUNC(BAD_DATE, ERROR) \
+   FUNC(BAD_DATE_OVERFLOW, ERROR) \
+   FUNC(BAD_EMAIL, ERROR) \
+   FUNC(BAD_NAME, ERROR) \
+   FUNC(BAD_OBJECT_SHA1, ERROR) \
+   FUNC(BAD_PARENT_SHA1, ERROR) \
+   FUNC(BAD_TAG_OBJECT, ERROR) \
+   FUNC(BAD_TIMEZONE, ERROR) \
+   FUNC(BAD_TREE, ERROR) \
+   FUNC(BAD_TREE_SHA1, ERROR) \
+   FUNC(BAD_TYPE, ERROR) \
+   FUNC(DUPLICATE_ENTRIES, ERROR) \
+   FUNC(MISSING_AUTHOR, ERROR) \
+   FUNC(MISSING_COMMITTER, ERROR) \
+   FUNC(MISSING_EMAIL, ERROR) \
+   FUNC(MISSING_GRAFT, ERROR) \
+   FUNC(MISSING_NAME_BEFORE_EMAIL, ERROR) \
+   FUNC(MISSING_OBJECT, ERROR) \
+   FUNC(MISSING_PARENT, ERROR) \
+   FUNC(MISSING_SPACE_BEFORE_DATE, ERROR) \
+   FUNC(MISSING_SPACE_BEFORE_EMAIL, ERROR) \
+   FUNC(MISSING_TAG, ERROR) \
+   FUNC(MISSING_TAG_ENTRY, ERROR) \
+   FUNC(MISSING_TAG_OBJECT, ERROR) \
+   FUNC(MISSING_TREE, ERROR) \
+   FUNC(MISSING_TYPE, ERROR) \
+   FUNC(MISSING_TYPE_ENTRY, ERROR) \
+   FUNC(NUL_IN_HEADER, ERROR) \
+   FUNC(TAG_OBJECT_NOT_TAG, ERROR) \
+   FUNC(TREE_NOT_SORTED, ERROR) \
+   FUNC(UNKNOWN_TYPE, ERROR) \
+   FUNC(UNTERMINATED_HEADER, ERROR) \
+   FUNC(ZERO_PADDED_DATE, ERROR) \
+   /* warnings */ \
+   FUNC(BAD_FILEMODE, WARN) \
+   FUNC(BAD_TAG_NAME, WARN) \
+   FUNC(EMPTY_NAME, WARN) \
+   FUNC(FULL_PATHNAME, WARN) \
+   FUNC(HAS_DOT, WARN) \
+   FUNC(HAS_DOTDOT, WARN) \
+   FUNC(HAS_DOTGIT, WARN) \
+   FUNC(MISSING_TAGGER_ENTRY, WARN) \
+   FUNC(NULL_SHA1, WARN) \
+   FUNC(ZERO_PADDED_FILEMODE, WARN)
+
+#define MSG_ID(id, msg_type) FSCK_MSG_##id,
+enum fsck_msg_id {
+   FOREACH_MSG_ID(MSG_ID)
+   FSCK_MSG_MAX
+};
+#undef MSG_ID
+
+#define MSG_ID(id, msg_type) { FSCK_##msg_type },
+static struct {
+   int msg_type;
+} msg_id_info[FSCK_MSG_MAX + 1] = {
+   FOREACH_MSG_ID(MSG_ID)
+   { -1 }
+};
+#undef MSG_ID
+
+static int fsck_msg_type(enum fsck_msg_id msg_id,
+   struct fsck_options *options)
+{
+   

Re: [PATCH v4 05/19] ref-filter: add parse_opt_merge_filter()

2015-06-22 Thread Junio C Hamano
Matthieu Moy matthieu@grenoble-inp.fr writes:

 Junio C Hamano gits...@pobox.com writes:

 This is copied from 'builtin/branch.c' which will eventually be removed
 when we port 'branch.c' to use ref-filter APIs.

 Hmph. I somehow thought Matthieu's instruction was to finish tag.c
 side first

 I would call in advice rather than instruction. 

Hmph, the choice of the word came from my understanding that a
mentor is like a teacher, but advice is fine by me ;-)

 I still think we
 should prioritize the tag.c side, but if this patch is ready, it makes
 sense to keep it in the series.

OK.
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 13/19] fsck: Optionally ignore specific fsck issues completely

2015-06-22 Thread Johannes Schindelin
An fsck issue in a legacy repository might be so common that one would
like not to bother the user with mentioning it at all. With this change,
that is possible by setting the respective message type to ignore.

This change abuses the missingEmail=warn test to verify that ignore
is also accepted and works correctly. And while at it, it makes sure
that multiple options work, too (they are passed to unpack-objects or
index-pack as a comma-separated list via the --strict=... command-line
option).

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c  | 5 +
 fsck.h  | 1 +
 t/t5504-fetch-receive-strict.sh | 9 -
 3 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/fsck.c b/fsck.c
index 85535b1..cbfff1f 100644
--- a/fsck.c
+++ b/fsck.c
@@ -131,6 +131,8 @@ static int parse_msg_type(const char *str)
return FSCK_ERROR;
else if (!strcmp(str, warn))
return FSCK_WARN;
+   else if (!strcmp(str, ignore))
+   return FSCK_IGNORE;
else
die(Unknown fsck message type: '%s', str);
 }
@@ -221,6 +223,9 @@ static int report(struct fsck_options *options, struct 
object *object,
struct strbuf sb = STRBUF_INIT;
int msg_type = fsck_msg_type(id, options), result;
 
+   if (msg_type == FSCK_IGNORE)
+   return 0;
+
if (msg_type == FSCK_FATAL)
msg_type = FSCK_ERROR;
 
diff --git a/fsck.h b/fsck.h
index 3ef92a3..1dab276 100644
--- a/fsck.h
+++ b/fsck.h
@@ -3,6 +3,7 @@
 
 #define FSCK_ERROR 1
 #define FSCK_WARN 2
+#define FSCK_IGNORE 3
 
 struct fsck_options;
 
diff --git a/t/t5504-fetch-receive-strict.sh b/t/t5504-fetch-receive-strict.sh
index f5d6d0d..af373ba 100755
--- a/t/t5504-fetch-receive-strict.sh
+++ b/t/t5504-fetch-receive-strict.sh
@@ -133,7 +133,14 @@ test_expect_success 'push with 
receive.fsck.missingEmail=warn' '
git --git-dir=dst/.git config \
receive.fsck.missingEmail warn 
git push --porcelain dst bogus act 21 
-   grep missingEmail act
+   grep missingEmail act 
+   git --git-dir=dst/.git branch -D bogus 
+   git  --git-dir=dst/.git config --add \
+   receive.fsck.missingEmail ignore 
+   git  --git-dir=dst/.git config --add \
+   receive.fsck.badDate warn 
+   git push --porcelain dst bogus act 21 
+   test_must_fail grep missingEmail act
 '
 
 test_expect_success \
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v7 14/19] fsck: Allow upgrading fsck warnings to errors

2015-06-22 Thread Johannes Schindelin
The 'invalid tag name' and 'missing tagger entry' warnings can now be
upgraded to errors by specifying `invalidTagName` and
`missingTaggerEntry` in the receive.fsck.msg-id config setting.

Incidentally, the missing tagger warning is now really shown as a warning
(as opposed to being reported with the error: prefix, as it used to be
the case before this commit).

Signed-off-by: Johannes Schindelin johannes.schinde...@gmx.de
---
 fsck.c| 24 +---
 t/t5302-pack-index.sh |  2 +-
 2 files changed, 18 insertions(+), 8 deletions(-)

diff --git a/fsck.c b/fsck.c
index cbfff1f..f6fc384 100644
--- a/fsck.c
+++ b/fsck.c
@@ -10,6 +10,7 @@
 #include utf8.h
 
 #define FSCK_FATAL -1
+#define FSCK_INFO -2
 
 #define FOREACH_MSG_ID(FUNC) \
/* fatal errors */ \
@@ -50,15 +51,16 @@
FUNC(ZERO_PADDED_DATE, ERROR) \
/* warnings */ \
FUNC(BAD_FILEMODE, WARN) \
-   FUNC(BAD_TAG_NAME, WARN) \
FUNC(EMPTY_NAME, WARN) \
FUNC(FULL_PATHNAME, WARN) \
FUNC(HAS_DOT, WARN) \
FUNC(HAS_DOTDOT, WARN) \
FUNC(HAS_DOTGIT, WARN) \
-   FUNC(MISSING_TAGGER_ENTRY, WARN) \
FUNC(NULL_SHA1, WARN) \
-   FUNC(ZERO_PADDED_FILEMODE, WARN)
+   FUNC(ZERO_PADDED_FILEMODE, WARN) \
+   /* infos (reported as warnings, but ignored by default) */ \
+   FUNC(BAD_TAG_NAME, INFO) \
+   FUNC(MISSING_TAGGER_ENTRY, INFO)
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
 enum fsck_msg_id {
@@ -228,6 +230,8 @@ static int report(struct fsck_options *options, struct 
object *object,
 
if (msg_type == FSCK_FATAL)
msg_type = FSCK_ERROR;
+   else if (msg_type == FSCK_INFO)
+   msg_type = FSCK_WARN;
 
append_msg_id(sb, msg_id_info[id].id_string);
 
@@ -686,15 +690,21 @@ static int fsck_tag_buffer(struct tag *tag, const char 
*data,
goto done;
}
strbuf_addf(sb, refs/tags/%.*s, (int)(eol - buffer), buffer);
-   if (check_refname_format(sb.buf, 0))
-   report(options, tag-object, FSCK_MSG_BAD_TAG_NAME,
+   if (check_refname_format(sb.buf, 0)) {
+   ret = report(options, tag-object, FSCK_MSG_BAD_TAG_NAME,
   invalid 'tag' name: %.*s,
   (int)(eol - buffer), buffer);
+   if (ret)
+   goto done;
+   }
buffer = eol + 1;
 
-   if (!skip_prefix(buffer, tagger , buffer))
+   if (!skip_prefix(buffer, tagger , buffer)) {
/* early tags do not contain 'tagger' lines; warn only */
-   report(options, tag-object, FSCK_MSG_MISSING_TAGGER_ENTRY, 
invalid format - expected 'tagger' line);
+   ret = report(options, tag-object, 
FSCK_MSG_MISSING_TAGGER_ENTRY, invalid format - expected 'tagger' line);
+   if (ret)
+   goto done;
+   }
else
ret = fsck_ident(buffer, tag-object, options);
 
diff --git a/t/t5302-pack-index.sh b/t/t5302-pack-index.sh
index 61bc8da..3dc5ec4 100755
--- a/t/t5302-pack-index.sh
+++ b/t/t5302-pack-index.sh
@@ -259,7 +259,7 @@ EOF
 thirtyeight=${tag#??} 
 rm -f .git/objects/${tag%$thirtyeight}/$thirtyeight 
 git index-pack --strict tag-test-${pack1}.pack 2err 
-grep ^error:.* expected .tagger. line err
+grep ^warning:.* expected .tagger. line err
 '
 
 test_done
-- 
2.3.1.windows.1.9.g8c01ab4



--
To unsubscribe from this list: send the line unsubscribe git in


Re: 'eol' documentation confusion

2015-06-22 Thread Junio C Hamano
Torsten Bögershausen tbo...@web.de writes:

 eol=lf or eol=crlf are the only useful settings.
 Everything else is ignored because it does not make sense.

 See convert.c:
 static enum eol git_path_check_eol()

That makes me wonder...

The original reasoning behind the current behaviour that we ignore
unknown values given to configuration variables and attributes is so
that people can use the same file that has values that are
understood by newer versions of Git with older versions of Git.

You may be trying the eol=cleverLF setting introduced in Git version
47-prerelease by adding it to .git/info/attributes, and may have
found it useful.  But you may also have to use the same repository
on another machine that you didn't install that future version of
Git over the network filesystem.  Barfing and not proceeding when we
see unknown eol=cleverLF does not sound like a nice thing to do,
which is why we just ignore and behave as if the setting was not
there.

Ideally, however, I think we should ignore an unknown setting as
long as it does not matter (i.e. we do not come to the codepath that
wants to know eol settings for the path, e.g. running git log to
show only the commit log messages and the topology of the history),
but we should error out when the unknown setting possibly matters
(i.e. we do need the eol setting for the path in order to correctly
convert the contents to end-user's liking).

Thoughts (and patches ;-)?


--
To unsubscribe from this list: send the line unsubscribe git in


Re: [Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux?

2015-06-22 Thread Bastien Traverse
Le 22/06/2015 17:04, Charles Bailey a écrit :
 Note that these aren't decomposed (in the unicode decomposition
 sense) but are merely octal escaped representations of the utf-8
 encoded file names.

Thanks, I had read that term in similar context (German umlaut) and
thought it was correctly describing the phenomenon. Key words octal
escape return more precise results :)

 My understanding that this is normal and probably dates back (at least
 for status as far as:
 
   commit a734d0b10bd0f5554abb3acdf11426040cfc4df0
   Author: Dmitry Potapov dpota...@gmail.com
   Date:   Fri Mar 7 05:30:58 2008 +0300
 
   Make private quote_path() in wt-status.c available as
 quote_path_relative()
 
   [...]
 
 The behaviour can be changed by setting the git config variable
 core.quotePath to false.

This is awesome, thank you. Indeed I just tried my test case with this
config option set to false and accented characters appear normally.

Thank you!
--
To unsubscribe from this list: send the line unsubscribe git in


Re: Submodule and proxy server.

2015-06-22 Thread Johannes Löthberg

On 22/06, Jamie Archibald wrote:

fatal: unable to access
'http://http://path/to/submodule/MySubmodule.git/': The requested URL
returned error: 502



Did you copy this error verbatim?

--
Sincerely,
 Johannes Löthberg
 PGP Key ID: 0x50FB9B273A9D0BB5
 https://theos.kyriasis.com/~kyrias/


signature.asc
Description: PGP signature


Re: apply --cached --whitespace=fix now failing on items added with add -N

2015-06-22 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 On Mon, Jun 22, 2015 at 9:29 PM, Patrick Higgins phigg...@google.com wrote:
 I like to use git to remove trailing whitespace from my files. I use
 the following ~/.gitconfig to make this convenient:

 [alias]
 wsadd = !sh -c 'git diff -- \$@\ | git apply --cached
 --whitespace=fix;\
 git checkout -- ${1-.} \$@\' -

 The wsadd alias doesn't work with new files, so I have to use git add
 -N on them first. As of a week or two ago, the apply --cached step
 now fails with the following, assuming a new file named bar containing
 foo  has been added with add -N:

 $ git diff -- $@ | git apply --cached --whitespace=fix
 stdin:7: trailing whitespace.
 foo
 error: bar: already exists in index

 The final git checkout at the end of wsadd truncates my file. I've
 changed the ; to a  to fix the truncation.

 Were there any recent changes to git apply that might have caused this?

 Probably fallout from this one, merged to 'master' about 5 weeks ago.
 I'll have a look at it tomorrow unless somebody does it first

 d95d728 (diff-lib.c: adjust position of i-t-a entries in diff - 2015-03-16)

Yeah, the world order has changed by that commit, and I would expect
to see some more fallouts.

After add -N, git diff used to pretend that an empty blob was in
the index, and showed a modification between an existing empty and
the full file contents, and git apply --cached was happy to take
that modification.  In the new world order, git diff is instructed
to pretend as if the path that was added by add -N does not exist
yet in the index at all, but the index still knows about the path,
which is a strange half-born state.  It shows an addition of the
full file contents as a new file.  git apply --cached sees an
addition of a new path, which requires that the path does not exist
in the index.  In the new world order, it should be taught to
pretend that these add -N paths do not exist in the index, but
that was not done.

Something like the attached (totally untested) may be a good
starting point.

For another potential fallout, try this:

$ cp COPYING RENAMING
$ cp COPYING UNTRACKED
$ EMPTY
$ git add -N RENAMING EMPTY
$ git rm UNTRACKED
fatal: pathspec 'UNTRACKED' did not match any files
$ git rm EMPTY RENAMING
error: the following file has staged content different from both the
file and the HEAD:
EMPTY
RENAMING
(use -f to force removal)

One could argue that these three should behave the same way, if the
new world order is path added by 'add -N' does not exist in the
index.

I however think the new world order should be slightly different
from does not exist in the index, but should be more like the
index is aware of its presence but has not been told about its
contents yet.  The consequences of this are:

 - git rm RENAMING shouldn't say 'did not match any files'.

 - git rm RENAMING does not know about 'staged content', so
   complaining about staged contents different from file and HEAD
   feels wrong.

Having said that, I do think erroring out and requiring -f is the
right thing when remiving RENAMING and EMPTY.  Unlike contents added
by git add without -N, we do not know what is in the working
tree file at all.  Given that we check and refuse when the contents
are different between the working tree file, the index and the HEAD
even when we know what was in the working tree when git add
without -N was done, we should keep the safety to prevent
accidental removal of the working tree file, which has the only
existing copy of the user content.

On the other hand, I am also OK if the behaviour was like this:

$ git rm --cached RENAMING
... removed without complaints ...
$ git rm RENAMING
error: file 'RENAMING' does not have staged content yet.
(use -f to force removal)

In any case, here is a how about this weather-balloon patch for
git apply

 builtin/apply.c | 13 -
 1 file changed, 12 insertions(+), 1 deletion(-)

diff --git a/builtin/apply.c b/builtin/apply.c
index 146be97..653191e 100644
--- a/builtin/apply.c
+++ b/builtin/apply.c
@@ -3546,12 +3546,23 @@ static int check_preimage(struct patch *patch, struct 
cache_entry **ce, struct s
 #define EXISTS_IN_INDEX 1
 #define EXISTS_IN_WORKTREE 2
 
+static int exists_in_index(const char *new_name)
+{
+   int pos = cache_name_pos(new_name, strlen(new_name));
+
+   if (pos  0)
+   return 0;
+   if (active_cache[pos]-ce_flags  CE_INTENT_TO_ADD)
+   return 0;
+   return 1;
+}
+
 static int check_to_create(const char *new_name, int ok_if_exists)
 {
struct stat nst;
 
if (check_index 
-   cache_name_pos(new_name, strlen(new_name)) = 0 
+   exists_in_index(new_name) 
!ok_if_exists)
return EXISTS_IN_INDEX;
if (cached)
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 08/19] parse-option: rename parse_opt_with_commit()

2015-06-22 Thread Karthik Nayak
On Mon, Jun 22, 2015 at 6:34 AM, Junio C Hamano gits...@pobox.com wrote:
 On Sun, Jun 21, 2015 at 1:48 PM, Karthik Nayak karthik@gmail.com wrote:
 Rename parse_opt_with_commit() to parse_opt_commit_object_name()
 to show that it can be used to obtain a list of commits and is
 not constricted to usage of '--contains' option.

 I think that is a brilliant idea, but unlike the other function you
 added earlier
 that can do only one object and adopts last one wins rule, this is 
 cumulative,
 and that fact should be made clear to the developers in some way, no?

Will add a comment :) Thanks!



-- 
Regards,
Karthik Nayak
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 08/19] parse-option: rename parse_opt_with_commit()

2015-06-22 Thread Karthik Nayak
On Mon, Jun 22, 2015 at 9:57 PM, Junio C Hamano gits...@pobox.com wrote:
 On Mon, Jun 22, 2015 at 8:38 AM, Karthik Nayak karthik@gmail.com wrote:
 On Mon, Jun 22, 2015 at 6:34 AM, Junio C Hamano gits...@pobox.com wrote:
 On Sun, Jun 21, 2015 at 1:48 PM, Karthik Nayak karthik@gmail.com 
 wrote:
 Rename parse_opt_with_commit() to parse_opt_commit_object_name()
 to show that it can be used to obtain a list of commits and is
 not constricted to usage of '--contains' option.

 I think that is a brilliant idea, but unlike the other function you
 added earlier
 that can do only one object and adopts last one wins rule, this is 
 cumulative,
 and that fact should be made clear to the developers in some way, no?

 Will add a comment

 I didn't mean that. Can't we indicate this with plural somewhere in the name?
 parse_opt_commits(), parse_opt_commit_into_list(), etc.?

Definitely! thanks :)

-- 
Regards,
Karthik Nayak
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 00/11] add options to for-each-ref

2015-06-22 Thread Karthik Nayak

 Just FYI, you can git format-patch -11 my-work~8 or something like that
 and get 01/11 to 11/11 even if you have more commits that are not yet ready
 near the tip.

I usually do a `git format-patch a..b` but I missed out the b it seems ;-)
Thanks!


-- 
Regards,
Karthik Nayak
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v7 13/19] fsck: Optionally ignore specific fsck issues completely

2015-06-22 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 + git --git-dir=dst/.git branch -D bogus 
 + git  --git-dir=dst/.git config --add \
 + receive.fsck.missingEmail ignore 
 + git  --git-dir=dst/.git config --add \
 + receive.fsck.badDate warn 

Funny double-SP (will locally fix).

There are a few other minor style nits (not in this patch but in
other patches in the series) like

s/free((char *) var)/free((char *)var)/;

that I locally added SQUASH, and I may also have tweaked some log
messages.  Please check what you will see in 'pu' when I push the
day's integration out later.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in


Submodule and proxy server.

2015-06-22 Thread Jamie Archibald
I am behind a proxy at work and I've setup a git repo with a submodule.


Here are the commands I'm executing.


$ mkdir MyProject
$ cd MyProject
$ git init
$ git remote add origin http://path/to/repo/MyProject.git
$ git config —add remote.origin.proxy 
$ git pull origin master


MyProject has a submodule called MySubmodule

The above commands work and have no problem pulling down the files to
local. The issue occurs when I attempt to update the submodule

$ git submodule init

Submodule 'MySubmodule' (http://path/to/submodule/MySubmodule.git)
registered for path 'MySubmodule'

$ git submodule update

Cloning into 'MySubmodule'...

fatal: unable to access
'http://http://path/to/submodule/MySubmodule.git/': The requested URL
returned error: 502

Clone of 'http://path/to/submodule/MySubmodule' into sub module path
'MySubmodule' failed

It seems I am getting a 502. I have a funny feeling that perhaps the
submodule command is not inheriting the proxy settings I've configured
above.

Any help would be appreciated.

GIT rocks!


-- 
Jamie
--
To unsubscribe from this list: send the line unsubscribe git in


Re: apply --cached --whitespace=fix now failing on items added with add -N

2015-06-22 Thread Duy Nguyen
On Mon, Jun 22, 2015 at 9:29 PM, Patrick Higgins phigg...@google.com wrote:
 I like to use git to remove trailing whitespace from my files. I use
 the following ~/.gitconfig to make this convenient:

 [alias]
 wsadd = !sh -c 'git diff -- \$@\ | git apply --cached
 --whitespace=fix;\
 git checkout -- ${1-.} \$@\' -

 The wsadd alias doesn't work with new files, so I have to use git add
 -N on them first. As of a week or two ago, the apply --cached step
 now fails with the following, assuming a new file named bar containing
 foo  has been added with add -N:

 $ git diff -- $@ | git apply --cached --whitespace=fix
 stdin:7: trailing whitespace.
 foo
 error: bar: already exists in index

 The final git checkout at the end of wsadd truncates my file. I've
 changed the ; to a  to fix the truncation.

 Were there any recent changes to git apply that might have caused this?

Probably fallout from this one, merged to 'master' about 5 weeks ago.
I'll have a look at it tomorrow unless somebody does it first

d95d728 (diff-lib.c: adjust position of i-t-a entries in diff - 2015-03-16)
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux?

2015-06-22 Thread Charles Bailey
On Mon, Jun 22, 2015 at 03:17:40PM +0200, Bastien Traverse wrote:
 test case:
 $ mkdir accent-test  cd !$
 $ git init
 $ touch rêve réunion
 $ git status
 On branch master
 
 Initial commit
 
 Untracked files:
   (use git add file... to include in what will be committed)
 
   r\303\251union
   r\303\252ve

Note that these aren't decomposed (in the unicode decomposition
sense) but are merely octal escaped representations of the utf-8
encoded file names.

My understanding that this is normal and probably dates back (at least
for status as far as:

commit a734d0b10bd0f5554abb3acdf11426040cfc4df0
Author: Dmitry Potapov dpota...@gmail.com
Date:   Fri Mar 7 05:30:58 2008 +0300

Make private quote_path() in wt-status.c available as
quote_path_relative()

[...]

The behaviour can be changed by setting the git config variable
core.quotePath to false.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 06/19] ref-filter: implement '--merged' and '--no-merged' options

2015-06-22 Thread Karthik Nayak
On Mon, Jun 22, 2015 at 6:30 AM, Junio C Hamano gits...@pobox.com wrote:
 On Sun, Jun 21, 2015 at 1:48 PM, Karthik Nayak karthik@gmail.com wrote:
 +static void do_merge_filter(struct ref_filter_cbdata *ref_cbdata)
 +{
 +   struct rev_info revs;
 +   int i, old_nr;
 +   struct ref_filter *filter = ref_cbdata-filter;
 +   struct ref_array *array = ref_cbdata-array;
 +   struct commit_list *p, *to_clear = NULL;
 +
 +   init_revisions(revs, NULL);
 +
 +   for (i = 0; i  array-nr; i++) {
 +   struct ref_array_item *item = array-items[i];
 +   add_pending_object(revs, item-commit-object, 
 item-refname);
 +   commit_list_insert(item-commit, to_clear);

 Use of commit_list for an array of known number of commits feels somewhat
 wasteful. Couldn't to_clear be

 struct commit **to_clear = xcalloc(sizeof(struct commit *), array-nr);

 instread?

Awesome! you're right, will drop the commit_list.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option

2015-06-22 Thread Karthik Nayak
On Tue, Jun 23, 2015 at 12:54 AM, Junio C Hamano gits...@pobox.com wrote:
 Matthieu Moy matthieu@grenoble-inp.fr writes:

 Karthik Nayak karthik@gmail.com writes:

 On Mon, Jun 22, 2015 at 6:15 AM, Junio C Hamano gits...@pobox.com wrote:
  3  4 as a single patch may make more sense, if we were to tolerate the
 let's copy  paste first and then later remove the duplicate as a way to
 postpone touching tag.c side in order to first concentrate on 
 for-each-ref.

 I have not formed a firm opinion on what the right split of the series is, 
 but
 so far (assuming that the temporary duplication is the best we can do) what
 I am seeing in this series makes sense to me.

 Thanks.

 That would mean squashing 34, 67 and 1011 also on similar lines.

 I have a slight preference for keeping the pairs not squashed. This way,
 we have a clear separation write reusable library code / use it. But
 I'm fine with squashing if others prefer.

 As I cannot firmly say that copy  paste first and then later
 clean-up is bad and we should split them in different way, I
 am fine with leaving them separate as they are.

Even I think it's better to leave them separate, on the lines of what
Matthieu said.

-- 
Regards,
Karthik Nayak
--
To unsubscribe from this list: send the line unsubscribe git in


Re: Fast enumeration of objects

2015-06-22 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 ...
 So my conclusions are:

   1. Yes, the pipe/parsing overhead of a separate processor really is
  measurable. That's hidden in the wall-clock time if you have
  multiple cores, but you may care more about CPU time. I still think
  the flexibility is worth it.

   2. Cutting out the pipe to cat-file is worth doing, as it saves a few
  seconds. Cutting out %(objecttype) saves a lot, too, and is worth
  doing. We should teach list-all-objects -v to use cat-file's
  custom formatters (alternatively, we could just teach cat-file a
  --batch-all-objects option rather than add a new command).

   3. We should teach cat-file a --buffer option to use fwrite. Even if
  we end up with list-all-objects --format='%(objectsize)' for this
  task, it would help all the other uses of cat-file.

All sounds very sensible.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] Add list-all-objects command

2015-06-22 Thread Charles Bailey
On Mon, Jun 22, 2015 at 06:33:21AM -0400, Jeff King wrote:
 On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote:
 
   + prepare_packed_git();
   + for (p = packed_git; p; p = p-next) {
   + open_pack_index(p);
   + }
  
  Yikes. The fact that you need to do this means that
  for_each_packed_object is buggy, IMHO. I'll send a patch.
 
 Here's that patch. And since I did not want to pile work on Charles, I
 went ahead and just implemented the patches I suggested in the other
 email.

I have to say that I think that adding this functionality to cat-file
makes a lot of sense. If it only catted files it might be a stretch but
as it's already grown --batch-check functionality, it now seems a
reasonable extension. I'm not particularly attached to having a
standalone list-all-objects command per se.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v3 12/19] initial_ref_transaction_commit(): check for duplicate refs

2015-06-22 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 Error out if the ref_transaction includes more than one update for any
 refname.

 Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
 ---
  refs.c | 11 +++
  1 file changed, 11 insertions(+)

This somehow feels like ehh, I now know better and this function
should have been like this from the beginning to me.

But that is OK.

Is the initial creation logic too fragile to deserve its own
function to force callers to think about it, by the way?

What I am wondering is if we could turn the safety logic that appear
here (i.e. no existing refs must be assumed by the set of updates,
etc.)  into an optimization cue and implement this as a special case
helper to ref_transaction_commit(), i.e.

ref_transaction_commit(...)
{
if (updates are all initial creation 
no existing refs in repository)
return initial_ref_transaction_commit(...);
/* otherwise we do the usual thing */
...
}

and have clone call ref_transaction_commit() as usual.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v3 19/19] delete_ref(): use the usual convention for old_sha1

2015-06-22 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 The ref_transaction_update() family of functions use the following
 convention for their old_sha1 parameters:

 * old_sha1 == NULL: Don't check the old value at all.
 * is_null_sha1(old_sha1): Ensure that the reference didn't exist
   before the transaction.
 * otherwise: Ensure that the reference had the specified value before
   the transaction.

 delete_ref() had a different convention, namely treating
 is_null_sha1(old_sha1) as don't care. Change it to adhere to the
 standard convention to reduce the scope for confusion.

 Please note that it is now a bug to pass old_sha1=NULL_SHA1 to
 delete_ref() (because it doesn't make sense to delete a reference that
 you already know doesn't exist). This is consistent with the behavior
 of ref_transaction_delete().

Nice.

Everything I read in this round of changes makes sense (except for
the ones I had minor comments on, which were separately sent).

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v7 13/19] fsck: Optionally ignore specific fsck issues completely

2015-06-22 Thread Johannes Schindelin
Hi Junio,

On 2015-06-22 20:04, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 +git --git-dir=dst/.git branch -D bogus 
 +git  --git-dir=dst/.git config --add \
 +receive.fsck.missingEmail ignore 
 +git  --git-dir=dst/.git config --add \
 +receive.fsck.badDate warn 
 
 Funny double-SP (will locally fix).
 
 There are a few other minor style nits (not in this patch but in
 other patches in the series) like
 
   s/free((char *) var)/free((char *)var)/;
 
 that I locally added SQUASH,

I fixed all of this locally, ready to push out v8, but...

 and I may also have tweaked some log messages.  Please check what you will 
 see in 'pu' when I push the day's integration out later.

... I did not see that yet.

For the record, my current state is available at

https://github.com/dscho/git/compare/fsck-api~19...fsck-api

I could wait for tomorrow to adjust my commit messages... or what do you want 
me to do?

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v7 00/19] Introduce an internal API to interact with the fsck machinery

2015-06-22 Thread Johannes Schindelin
Hi Junio,

On 2015-06-22 20:02, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 Changes since v6:

 - camelCased message IDs

 - multiple author checking now as suggested by Junio

 - renamed `--quick` to `--connectivity-only`, better commit message

 - `fsck.skipList` is now handled correctly (and not mistaken for a message
   type setting)

 - `fsck.skipList` can handle user paths now

 - index-pack configures the walk function in a more logical place now

 - simplified code by avoiding working on partial strings (i.e. removed
   `substrcmp()`). This saves 10 lines. To accomodate parsing config
   variables directly, we now work on lowercased message IDs; unfortunately
   this means that we cannot use them in append_msg_id() because that
   function wants to append camelCased message IDs.

 Interdiff below diffstat.
 
 Except for minor nits I sent separate messages, this round looks
 very nicely done (I however admit that I haven't read the skiplist
 parsing code carefully at all, expecting that you wouldn't screw up
 with something simple like that ;-))
 
 Thanks, will replace what is queued.  Let's start thinking about
 moving it down to 'next' (meaning: we _could_ still accept a reroll,
 but I think we are in a good shape and minor incremental refinements
 would suffice), cooking it for the remainder of the cycle and having
 it graduate to 'master' at the beginning of the next cycle.

Let me submit a v8 with the borked fixup fixed (i.e. part of 04/19 moved to 
03/19, where it really belongs), the `for` style fix, the fixed double space 
and the cast style, too.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] Add list-all-objects command

2015-06-22 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote:

  +  prepare_packed_git();
  +  for (p = packed_git; p; p = p-next) {
  +  open_pack_index(p);
  +  }
 
 Yikes. The fact that you need to do this means that
 for_each_packed_object is buggy, IMHO. I'll send a patch.

 Here's that patch. And since I did not want to pile work on Charles, I
 went ahead and just implemented the patches I suggested in the other
 email.

 We may want to take patch 1 separately for the maint-track, as it is
 really a bug-fix (albeit one that I do not think actually affects anyone
 in practice right now).

Hmph, add_unseen_recent_objects_to_traversal() is the only existing
user, and before d3038d22 (prune: keep objects reachable from recent
objects, 2014-10-15) added that function, for-each-packed-object
existed but had no callers.

And the objects not beeing seen by that function (due to lack of
open) would matter only for pruning purposes, which would mean
you have to be calling into the codepath when running a full repack,
so you would have opened all the packs that matter anyway (if you
have a old cruft archive pack that contains only objects that
are unreachable, you may not have opened that pack, though, and you
may prune the thing away prematurely).

So, I think I can agree that this would unlikely affect anybody in
practice.

 Patches 2-5 are useful even if we go with Charles' command, as they make
 cat-file better (cleanups and he new buffer option).

 Patches 6-7 implement the cat-file option that would be redundant with
 list-all-objects.

 By the way, in addition to not showing objects in order,
 list-all-objects (and my cat-file option) may show duplicates. Do we
 want to sort -u for the user? It might be nice for them to always get
 a de-duped and sorted list. Aside from the CPU cost of sorting, it does
 mean we'll allocate ~80MB for the kernel to store the sha1s. I guess
 that's not too much when you are talking about the kernel repo. I took
 the coward's way out and just mentioned the limitation in the
 documentation, but I'm happy to be persuaded.

   [1/7]: for_each_packed_object: automatically open pack index
   [2/7]: cat-file: minor style fix in options list
   [3/7]: cat-file: move batch_options definition to top of file
   [4/7]: cat-file: add --buffer option
   [5/7]: cat-file: stop returning value from batch_one_object
   [6/7]: cat-file: split batch_one_object into two stages
   [7/7]: cat-file: add --batch-all-objects option

 -Peff
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCHv6 1/3] git-rebase -i: add command drop to remove a commit

2015-06-22 Thread Galan Rémi
Instead of removing a line to remove the commit, you can use the
command drop (just like pick or edit). It has the same effect as
deleting the line (removing the commit) except that you keep a visual
trace of your actions, allowing a better control and reducing the
possibility of removing a commit by mistake.

Signed-off-by: Galan Rémi remi.galan-alfo...@ensimag.grenoble-inp.fr
---
 Documentation/git-rebase.txt  |  3 +++
 git-rebase--interactive.sh|  3 ++-
 t/lib-rebase.sh   |  4 ++--
 t/t3404-rebase-interactive.sh | 16 
 4 files changed, 23 insertions(+), 3 deletions(-)

diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 1d01baa..34bd070 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -514,6 +514,9 @@ rebasing.
 If you just want to edit the commit message for a commit, replace the
 command pick with the command reword.
 
+To drop a commit, replace the command pick with drop, or just
+delete the matching line.
+
 If you want to fold two or more commits into one, replace the command
 pick for the second and subsequent commits with squash or fixup.
 If the commits had different authors, the folded commit will be
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index dc3133f..72abf90 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -152,6 +152,7 @@ Commands:
  s, squash = use commit, but meld into previous commit
  f, fixup = like squash, but discard this commit's log message
  x, exec = run command (the rest of the line) using shell
+ d, drop = remove commit
 
 These lines can be re-ordered; they are executed from top to bottom.
 
@@ -505,7 +506,7 @@ do_next () {
rm -f $msg $author_script $amend $state_dir/stopped-sha || exit
read -r command sha1 rest  $todo
case $command in
-   $comment_char*|''|noop)
+   $comment_char*|''|noop|drop|d)
mark_action_done
;;
pick|p)
diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh
index 6bd2522..fdbc900 100644
--- a/t/lib-rebase.sh
+++ b/t/lib-rebase.sh
@@ -14,7 +14,7 @@
 #   specified line.
 #
 #   cmd lineno -- add a line with the specified command
-#   (squash, fixup, edit, or reword) and the SHA1 taken
+#   (squash, fixup, edit, reword or drop) and the SHA1 taken
 #   from the specified line.
 #
 #   exec_cmd_with_args -- add an exec cmd with args line.
@@ -46,7 +46,7 @@ set_fake_editor () {
action=pick
for line in $FAKE_LINES; do
case $line in
-   squash|fixup|edit|reword)
+   squash|fixup|edit|reword|drop)
action=$line;;
exec*)
echo $line | sed 's/_/ /g'  $1;;
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index ac429a0..ecd277c 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -1102,4 +1102,20 @@ test_expect_success 'rebase -i commits that overwrite 
untracked files (no ff)' '
test $(git cat-file commit HEAD | sed -ne \$p) = I
 '
 
+test_rebase_end () {
+   test_when_finished git checkout master 
+   git branch -D $1 
+   test_might_fail git rebase --abort 
+   git checkout -b $1 master
+}
+
+test_expect_success 'drop' '
+   test_rebase_end dropTest 
+   set_fake_editor 
+   FAKE_LINES=1 drop 2 3 drop 4 5 git rebase -i --root 
+   test E = $(git cat-file commit HEAD | sed -ne \$p) 
+   test C = $(git cat-file commit HEAD^ | sed -ne \$p) 
+   test A = $(git cat-file commit HEAD^^ | sed -ne \$p)
+'
+
 test_done
-- 
2.4.3.371.g8992f2a

--
To unsubscribe from this list: send the line unsubscribe git in


[PATCHv6 2/3] git rebase -i: warn about removed commits

2015-06-22 Thread Galan Rémi
Check if commits were removed (i.e. a line was deleted) and print
warnings or stop git rebase depending on the value of the
configuration variable rebase.missingCommitsCheck.

This patch gives the user the possibility to avoid silent loss of
information (losing a commit through deleting the line in this case)
if he wants.

Add the configuration variable rebase.missingCommitsCheck.
- When unset or set to ignore, no checking is done.
- When set to warn, the commits are checked, warnings are
  displayed but git rebase still proceeds.
- When set to error, the commits are checked, warnings are
  displayed and the rebase is stopped.
  (The user can then use 'git rebase --edit-todo' and
  'git rebase --continue', or 'git rebase --abort')

rebase.missingCommitsCheck defaults to ignore.

Signed-off-by: Galan Rémi remi.galan-alfo...@ensimag.grenoble-inp.fr
---
 On the contrary of v5, the SHA-1 that are read are the short one.
 (the todo-list is read directly after the user closes his editor
 and not after expansion of ids)

 Documentation/config.txt  |  11 +
 Documentation/git-rebase.txt  |   6 +++
 git-rebase--interactive.sh| 104 --
 t/t3404-rebase-interactive.sh |  66 +++
 4 files changed, 184 insertions(+), 3 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 4d21ce1..25b2a04 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2160,6 +2160,17 @@ rebase.autoStash::
successful rebase might result in non-trivial conflicts.
Defaults to false.
 
+rebase.missingCommitsCheck::
+   If set to warn, git rebase -i will print a warning if some
+   commits are removed (e.g. a line was deleted), however the
+   rebase will still proceed. If set to error, it will print
+   the previous warning and stop the rebase, 'git rebase
+   --edit-todo' can then be used to correct the error. If set to
+   ignore, no checking is done.
+   To drop a commit without warning or error, use the `drop`
+   command in the todo-list.
+   Defaults to ignore.
+
 receive.advertiseAtomic::
By default, git-receive-pack will advertise the atomic push
capability to its clients. If you don't want to this capability
diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
index 34bd070..2ca3b8d 100644
--- a/Documentation/git-rebase.txt
+++ b/Documentation/git-rebase.txt
@@ -213,6 +213,12 @@ rebase.autoSquash::
 rebase.autoStash::
If set to true enable '--autostash' option by default.
 
+rebase.missingCommitsCheck::
+   If set to warn, print warnings about removed commits in
+   interactive mode. If set to error, print the warnings and
+   stop the rebase. If set to ignore, no checking is
+   done. ignore by default.
+
 OPTIONS
 ---
 --onto newbase::
diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 72abf90..66daacf 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -834,6 +834,104 @@ add_exec_commands () {
mv $1.new $1
 }
 
+# Print the list of the SHA-1 of the commits
+# from stdin to stdout
+todo_list_to_sha_list () {
+   git stripspace --strip-comments |
+   while read -r command sha1 rest
+   do
+   case $command in
+   $comment_char*|''|noop|x|exec)
+   ;;
+   *)
+   long_sha=$(git rev-list --no-walk $sha1 2/dev/null)
+   printf %s\n $long_sha
+   ;;
+   esac
+   done
+}
+
+# Use warn for each line in stdin
+warn_lines () {
+   while read -r line
+   do
+   warn  - $line
+   done
+}
+
+# Switch to the branch in $into and notify it in the reflog
+checkout_onto () {
+   GIT_REFLOG_ACTION=$GIT_REFLOG_ACTION: checkout $onto_name
+   output git checkout $onto || die_abort could not detach HEAD
+   git update-ref ORIG_HEAD $orig_head
+}
+
+# Check if the user dropped some commits by mistake
+# Behaviour determined by rebase.missingCommitsCheck.
+check_todo_list () {
+   raise_error=f
+
+   check_level=$(git config --get rebase.missingCommitsCheck)
+   check_level=${check_level:-ignore}
+   # Don't be case sensitive
+   check_level=$(printf '%s' $check_level | tr 'A-Z' 'a-z')
+
+   case $check_level in
+   warn|error)
+   # Get the SHA-1 of the commits
+   todo_list_to_sha_list $todo.backup $todo.oldsha1
+   todo_list_to_sha_list $todo $todo.newsha1
+
+   # Sort the SHA-1 and compare them
+   sort -u $todo.oldsha1 $todo.oldsha1+
+   mv $todo.oldsha1+ $todo.oldsha1
+   sort -u $todo.newsha1 $todo.newsha1+
+   mv $todo.newsha1+ $todo.newsha1
+   comm -2 -3 $todo.oldsha1 $todo.newsha1 $todo.miss
+
+   # Warn about 

[PATCHv6 3/3] git rebase -i: add static check for commands and SHA-1

2015-06-22 Thread Galan Rémi
Check before the start of the rebasing if the commands exists, and for
the commands expecting a SHA-1, check if the SHA-1 is present and
corresponds to a commit. In case of error, print the error, stop git
rebase and prompt the user to fix with 'git rebase --edit-todo' or to
abort.

This allows to avoid doing half of a rebase before finding an error
and giving back what's left of the todo list to the user and prompt
him to fix when it might be too late for him to do so (he might have
to abort and restart the rebase).

Signed-off-by: Galan Rémi remi.galan-alfo...@ensimag.grenoble-inp.fr
---
 I used:
   read -r command sha1 rest EOF
   $line
   EOF
 because
   printf '%s' $line | read -r command sha1 rest
 doesn't work (the 3 variables have no value as a result).
 There might be a better way to do this, but I don't have it right now.

 git-rebase--interactive.sh| 70 +++
 t/lib-rebase.sh   |  5 
 t/t3404-rebase-interactive.sh | 40 +
 3 files changed, 115 insertions(+)

diff --git a/git-rebase--interactive.sh b/git-rebase--interactive.sh
index 66daacf..6381686 100644
--- a/git-rebase--interactive.sh
+++ b/git-rebase--interactive.sh
@@ -834,6 +834,52 @@ add_exec_commands () {
mv $1.new $1
 }
 
+# Check if the SHA-1 passed as an argument is a
+# correct one, if not then print $2 in $todo.badsha
+# $1: the SHA-1 to test
+# $2: the line to display if incorrect SHA-1
+check_commit_sha () {
+   badsha=f
+   if test -z $1
+   then
+   badsha=t
+   else
+   sha1_verif=$(git rev-parse --verify --quiet $1^{commit})
+   if test -z $sha1_verif
+   then
+   badsha=t
+   fi
+   fi
+
+   if test $badsha = t
+   then
+   printf '%s\n' $2 $todo.badsha
+   fi
+}
+
+# prints the bad commits and bad commands
+# from the todolist in stdin
+check_bad_cmd_and_sha () {
+   git stripspace --strip-comments |
+   while read -r line
+   do
+   read -r command sha1 rest EOF
+$line
+EOF
+   case $command in
+   ''|noop|x|exec)
+   # Doesn't expect a SHA-1
+   ;;
+   pick|p|drop|d|reword|r|edit|e|squash|s|fixup|f)
+   check_commit_sha $sha1 $line
+   ;;
+   *)
+   printf '%s\n' $line $todo.badcmd
+   ;;
+   esac
+   done
+}
+
 # Print the list of the SHA-1 of the commits
 # from stdin to stdout
 todo_list_to_sha_list () {
@@ -868,6 +914,8 @@ checkout_onto () {
 
 # Check if the user dropped some commits by mistake
 # Behaviour determined by rebase.missingCommitsCheck.
+# Check if there is an unrecognized command or a
+# bad SHA-1 in a command.
 check_todo_list () {
raise_error=f
 
@@ -919,6 +967,28 @@ check_todo_list () {
;;
esac
 
+   check_bad_cmd_and_sha $todo
+
+   if test -s $todo.badsha
+   then
+   raise_error=t
+
+   warn Warning: the SHA-1 is missing or isn't \
+   a commit in the following line(s):
+   warn_lines $todo.badsha
+   warn
+   fi
+
+   if test -s $todo.badcmd
+   then
+   raise_error=t
+
+   warn Warning: the command isn't recognized \
+   in the following line(s):
+   warn_lines $todo.badcmd
+   warn
+   fi
+
if test $raise_error = t
then
# Checkout before the first commit of the
diff --git a/t/lib-rebase.sh b/t/lib-rebase.sh
index fdbc900..9a96e15 100644
--- a/t/lib-rebase.sh
+++ b/t/lib-rebase.sh
@@ -54,6 +54,11 @@ set_fake_editor () {
echo '# comment'  $1;;
)
echo  $1;;
+   bad)
+   action=badcmd;;
+   fakesha)
+   echo $action XXX False commit  $1
+   action=pick;;
*)
sed -n ${line}s/^pick/$action/p  $1.tmp  $1
action=pick;;
diff --git a/t/t3404-rebase-interactive.sh b/t/t3404-rebase-interactive.sh
index a92ae19..d691b1c 100755
--- a/t/t3404-rebase-interactive.sh
+++ b/t/t3404-rebase-interactive.sh
@@ -1184,4 +1184,44 @@ test_expect_success 'rebase -i respects 
rebase.missingCommitsCheck = error' '
test B = $(git cat-file commit HEAD^ | sed -ne \$p)
 '
 
+cat expect EOF
+Warning: the command isn't recognized in the following line(s):
+ - badcmd $(git rev-list --oneline -1 master~1)
+
+You can fix this with 'git rebase --edit-todo'.
+Or you can abort the rebase with 'git rebase --abort'.
+EOF
+
+test_expect_success 'static check of bad command' '
+   test_rebase_end tmp2 
+   set_fake_editor 
+   test_must_fail env FAKE_LINES=1 2 3 bad 4 5 \
+ 

Re: [PATCH v4 03/19] ref-filter: implement '--points-at' option

2015-06-22 Thread Eric Sunshine
On Sun, Jun 21, 2015 at 4:48 PM, Karthik Nayak karthik@gmail.com wrote:
 In 'tag -l' we have '--points-at' option which lets users
 list only tags which point to a particular commit. Implement
 this option in 'ref-filter.{c,h}' so that other commands can
 benefit from this.

 This is duplicated from tag.c, we will eventually remove that
 when we port tag.c to use ref-filter APIs.

 Based-on-patch-by: Jeff King p...@peff.net
 Mentored-by: Christian Couder christian.cou...@gmail.com
 Mentored-by: Matthieu Moy matthieu@grenoble-inp.fr
 Signed-off-by: Karthik Nayak karthik@gmail.com
 ---
 diff --git a/builtin/tag.c b/builtin/tag.c
 index e36c43e..280981f 100644
 --- a/builtin/tag.c
 +++ b/builtin/tag.c
 @@ -56,6 +56,10 @@ static int match_pattern(const char **patterns, const char 
 *ref)
 return 0;
  }

 +/*
 + * This is currently duplicated in ref-filter.c, and will eventually be
 + * removed as we port tag.c to use the ref-filter APIs.
 + */
  static const unsigned char *match_points_at(const char *refname,
 const unsigned char *sha1)
  {
 diff --git a/ref-filter.c b/ref-filter.c
 index 43502a4..591e281 100644
 --- a/ref-filter.c
 +++ b/ref-filter.c
 @@ -842,6 +842,29 @@ static int match_name_as_path(const char **pattern, 
 const char *refname)
 return 0;
  }

 +/*
 + * Given a ref (sha1, refname) see if it points to a one of the sha1s

s/a one/one/

 + * in a sha1_array.
 + */
 +static int match_points_at(struct sha1_array *points_at, const unsigned char 
 *sha1,
 +  const char *refname)
 +{
 +   struct object *obj;
 +
 +   if (!points_at || !points_at-nr)
 +   return 1;
 +
 +   if (sha1_array_lookup(points_at, sha1) = 0)
 +   return 1;
 +
 +   obj = parse_object_or_die(sha1, refname);
 +   if (obj-type == OBJ_TAG 
 +   sha1_array_lookup(points_at, ((struct tag *)obj)-tagged-sha1) 
 = 0)
 +   return 1;
 +
 +   return 0;
 +}
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 07/19] for-each-ref: add '--merged' and '--no-merged' options

2015-06-22 Thread Eric Sunshine
On Sun, Jun 21, 2015 at 4:48 PM, Karthik Nayak karthik@gmail.com wrote:
 Add the '--merged' and '--no-merged' options provided by 'ref-filter'.
 The '--merged' option lets the user to only list refs merged into the
 named commit. The '--no-merged' option lets the user to only list refs
 not merged into the named commit.

 Add documentation and tests for the same.

 Based-on-patch-by: Jeff King p...@peff.net
 Mentored-by: Christian Couder christian.cou...@gmail.com
 Mentored-by: Matthieu Moy matthieu@grenoble-inp.fr
 Signed-off-by: Karthik Nayak karthik@gmail.com
 ---
 diff --git a/Documentation/git-for-each-ref.txt 
 b/Documentation/git-for-each-ref.txt
 index 0ede41d..c6dcd99 100644
 --- a/Documentation/git-for-each-ref.txt
 +++ b/Documentation/git-for-each-ref.txt
 @@ -10,7 +10,7 @@ SYNOPSIS
  [verse]
  'git for-each-ref' [--count=count] [--shell|--perl|--python|--tcl]
[(--sort=key)...] [--format=format] [pattern...]
 -  [--points-at object]
 +  [--points-at object] [(--merged | --no-merged) object]

According to the documentation you added to the OPTIONS section, the
object following --merged and --no-merged is optional. Therefore,
shouldn't this be s/object/[object]/ ?

Also, in the OPTIONS section, you spelled it commit rather than object.

  DESCRIPTION
  ---
 @@ -66,6 +66,14 @@ OPTIONS
  --points-at object::
 Only list refs pointing to the given object.

 +--merged [commit]::
 +   Only list refs whose tips are reachable from the
 +   specified commit (HEAD if not specified).
 +
 +--no-merged [commit]::
 +   Only list refs whose tips are not reachable from the
 +   specified commit (HEAD if not specified).
 +
  FIELD NAMES
  ---

 diff --git a/builtin/for-each-ref.c b/builtin/for-each-ref.c
 index 46f9b05..09d48da 100644
 --- a/builtin/for-each-ref.c
 +++ b/builtin/for-each-ref.c
 @@ -8,6 +8,7 @@
  static char const * const for_each_ref_usage[] = {
 N_(git for-each-ref [options] [pattern]),
 N_(git for-each-ref [--points-at object]),
 +   N_(git for-each-ref [(--merged | --no-merged) object]),

Ditto: s/object/[object]/

 NULL
  };
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c

2015-06-22 Thread Junio C Hamano
Charles Bailey char...@hashpling.org writes:

 On Sun, Jun 21, 2015 at 07:25:44PM +0100, Charles Bailey wrote:
 From: Charles Bailey cbaile...@bloomberg.net
 
 diff --git a/parse-options.c b/parse-options.c
 index 80106c0..101b649 100644
 --- a/parse-options.c
 +++ b/parse-options.c
 @@ -180,6 +180,23 @@ static int get_value(struct parse_opt_ctx_t *p,
  return opterror(opt, expects a numerical value, 
 flags);
  return 0;
  
 +case OPTION_MAGNITUDE:
 +if (unset) {
 +*(unsigned long *)opt-value = 0;
 +return 0;
 +}
 +if (opt-flags  PARSE_OPT_OPTARG  !p-opt) {
 +*(unsigned long *)opt-value = opt-defval;
 +return 0;
 +}
 +if (get_arg(p, opt, flags, arg))
 +return -1;
 +if (!git_parse_ulong(arg, opt-value))
 +return opterror(opt,
 +expects a integer value with an optional k/m/g 
 suffix,
 +flags);
 +return 0;
 +

 Spotted after sending:
 s/expects a integer/expects an integer/

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects

2015-06-22 Thread Charles Bailey
On Mon, Jun 22, 2015 at 07:06:32AM -0400, Jeff King wrote:
 On Mon, Jun 22, 2015 at 06:33:21AM -0400, Jeff King wrote:
 
  By the way, in addition to not showing objects in order,
  list-all-objects (and my cat-file option) may show duplicates. Do we
  want to sort -u for the user? It might be nice for them to always get
  a de-duped and sorted list. Aside from the CPU cost of sorting, it does
  mean we'll allocate ~80MB for the kernel to store the sha1s. I guess
  that's not too much when you are talking about the kernel repo. I took
  the coward's way out and just mentioned the limitation in the
  documentation, but I'm happy to be persuaded.
 
 The patch below does the sort/de-dup. I'd probably just squash it into
 patch 7, though.

Woah, 8 out of 7! Did you get a chance to measure the performance hit of
the sort? If not, I may test it out when I next get the chance.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option

2015-06-22 Thread Eric Sunshine
On Sun, Jun 21, 2015 at 4:48 PM, Karthik Nayak karthik@gmail.com wrote:
 Add the '--points-at' option provided by 'ref-filter'. The
 option lets the user to pick only refs which point to a particular
 commit.

 Add documentation and tests for the same.

 Based-on-patch-by: Jeff King p...@peff.net
 Mentored-by: Christian Couder christian.cou...@gmail.com
 Mentored-by: Matthieu Moy matthieu@grenoble-inp.fr
 Signed-off-by: Karthik Nayak karthik@gmail.com
 ---
 diff --git a/t/t6301-for-each-ref-filter.sh b/t/t6301-for-each-ref-filter.sh
 index b1fa8d4..67de3a7 100755
 --- a/t/t6301-for-each-ref-filter.sh
 +++ b/t/t6301-for-each-ref-filter.sh
 @@ -16,4 +16,24 @@ test_expect_success 'setup some history and refs' '
 git update-ref refs/odd/spot master
  '

 +test_expect_success 'filtering with --points-at' '
 +   cat expect -\EOF 
 +   refs/heads/master
 +   refs/odd/spot
 +   refs/tags/three
 +   EOF
 +   git for-each-ref --format=%(refname) --points-at=master actual 
 +   test_cmp expect actual
 +'
 +
 +test_expect_success 'check signed tags with --points-at' '
 +   cat expect -\EOF 
 +   refs/heads/side
 +   refs/tags/four
 +   refs/tags/signed-tag four
 +   EOF
 +   git for-each-ref  --format=%(refname) %(*subject) --points-at=side 
 actual 

s/for-each-ref\s+/for-each-ref /

 +   test_cmp expect actual
 +'
 +
  test_done
 --
 2.4.3.439.gfea0c2a.dirty
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 1/3] contrib/subtree: Use tabs consitently for indentation in tests

2015-06-22 Thread Eric Sunshine
On Mon, Jun 22, 2015 at 9:53 AM, Charles Bailey char...@hashpling.org wrote:
 contrib/subtree: Use tabs consitently for indentation in tests

s/consitently/consistently/

 Although subtrees tests uses more spaces for indentation than tabs,
 there are still quite a lot of lines indented with tabs. As tabs conform
 with Git coding guidelines resolve the inconsistency in favour of tabs.

 Signed-off-by: Charles Bailey cbaile...@bloomberg.net
--
To unsubscribe from this list: send the line unsubscribe git in


Re: Improvements to integer option parsing

2015-06-22 Thread Charles Bailey

 On 22 Jun 2015, at 23:09, Junio C Hamano gits...@pobox.com wrote:
 
 Charles Bailey char...@hashpling.org writes:
 
 - marginally improved the opterror message on failed parses
 
 I'd queue with s/a integer/a non-negative integer/.

Ha! That's what I had before I submitted, but then the source line got quite 
long (which could have been split) and the generated message got quite long as 
well so I cropped it. This was probably the source of the grammar mistake.

If you're happy with the longer message, I am happy with it too.

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects

2015-06-22 Thread Jeff King
On Mon, Jun 22, 2015 at 11:03:50PM +0100, Charles Bailey wrote:

  The patch below does the sort/de-dup. I'd probably just squash it into
  patch 7, though.
 
 Woah, 8 out of 7! Did you get a chance to measure the performance hit of
 the sort? If not, I may test it out when I next get the chance.

No, that last patch was my eh, one more thing before bed patch. ;)

It's easy enough to time, though. Running:

  git cat-file --batch-all-objects \
   --batch-check='%(objectsize) %(objectname)' \
   --buffer /dev/null

on linux.git, my best-of-five goes from (no sorting):

  real0m3.604s
  user0m3.556s
  sys 0m0.048s

to (with sorting):

  real0m4.053s
  user0m4.004s
  sys 0m0.052s

So it does matter, but not too much. We could de-dup with a hash table,
which might be a little faster, but I doubt it would make much
difference.  It's also mostly in sorted order already; it's possible
that a merge sort would behave a little better. I'm not sure how deep
it's worth going into that rabbit hole.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in


config commands not working _Noobe question

2015-06-22 Thread Greg Ledger
after adding git config ‹global user.name Greg Ledger and git config
‹global user.email gled...@glcdelivers.com, when I run:
source ~/.gitconfig I get

-bash: [user]: command not found
-bash: name: command not found
-bash: email: command not found
-bash: [color]: command not found
-bash: ui: command not found

What gives? (I had to download xcode tools before I could do my first git
init

Greg Ledger 
Web Developer  E-Marketing Specialist | GLC
847.205.3125 
gled...@glcdelivers.com
www.glcdelivers.com http://www.glcdelivers.com

--
To unsubscribe from this list: send the line unsubscribe git in


Re: Improvements to integer option parsing

2015-06-22 Thread Junio C Hamano
Charles Bailey char...@hashpling.org writes:

 This is a re-roll of the first two patches in my previous series which used to
 include filter-objects which is now a separate topic.

 [PATCH 1/2] Correct test-parse-options to handle negative ints

 The first one has changed only in that I've moved the additional test to a 
 more
 logical place in the test file.

 [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c

 I've made the following changes to the second commit:

 - renamed this OPT_MAGNITUDE to try and convey something that is
 both unsigned and might benefit from a 'scale' suffix. I'm expecting
 more discussion on the name!

I think that name is very sensible.

 - marginally improved the opterror message on failed parses

I'd queue with s/a integer/a non-negative integer/.

 - noted the change in behavior for the error messages generated for
 pack-objects' --max-pack-size and --window-memory in the commit message

Thanks.  Queued.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH 2/2] Move unsigned long option parsing out of pack-objects.c

2015-06-22 Thread Junio C Hamano
Charles Bailey char...@hashpling.org writes:

 From: Charles Bailey cbaile...@bloomberg.net

 The unsigned long option parsing (including 'k'/'m'/'g' suffix parsing)
 is more widely applicable. Add support for OPT_MAGNITUDE to
 parse-options.h and change pack-objects.c use this support.

 The error behavior on parse errors follows that of OPT_INTEGER.
 The name of the option that failed to parse is reported with a brief
 message describing the expect format for the option argument and then
 the full usage message for the command invoked.

 This is differs from the previous behavior for OPT_ULONG used in

s/is //; (locally fixed--no need to resend).
--
To unsubscribe from this list: send the line unsubscribe git in


What's cooking in git.git (Jun 2015, #05; Mon, 22)

2015-06-22 Thread Junio C Hamano
Here are the topics that have been cooking.  Commits prefixed with
'-' are only in 'pu' (proposed updates) while commits prefixed with
'+' are in 'next'.

Some of the topics in flight have overlaps with each other and have
been excluded from 'pu'; most notably, I think the remainder of
bc/object-id needs to wait until the for-each-ref topic from Karthik
settles and then rebased on it, or something.

Hopefully the pre-release freeze can start late this week, and the
next cycle would reopen mid next month.  In the meantime, let's
shift the focus on making sure that what has already been merged to
'master' are good (i.e. regression hunting and fixes).

You can find the changes described here in the integration branches
of the repositories listed at

http://git-blame.blogspot.com/p/git-public-repositories.html

--
[New Topics]

* js/rebase-i-clean-up-upon-continue-to-skip (2015-06-18) 3 commits
 - rebase -i: do not leave a CHERRY_PICK_HEAD file behind
 - SQUASH: test_must_fail is a shell function
 - t3404: demonstrate CHERRY_PICK_HEAD bug

 Abandoning an already applied change in git rebase -i with
 --continue left CHERRY_PICK_HEAD and confused later steps.

 Expecting a reroll.
 ($gmane/271856)


* me/fetch-into-shallow-safety (2015-06-17) 1 commit
 - fetch-pack: check for shallow if depth given

 git fetch --depth=depth and git clone --depth=depth issued
 a shallow transfer request even to an upload-pack that does not
 support the capability.

 Will merge to 'next'.


* mm/describe-doc (2015-06-16) 1 commit
 - Documentation/describe: improve one-line summary

 Will merge to 'next'.


* rh/test-color-avoid-terminfo-in-original-home (2015-06-17) 2 commits
 - test-lib.sh: fix color support when tput needs ~/.terminfo
 - Revert test-lib.sh: do tests for color support after changing HOME

 An ancient test framework enhancement to allow color was not
 entirely correct; this makes it work even when tput needs to read
 from the ~/.terminfo under the user's real HOME directory.

 Will merge to 'next'.


* tb/checkout-doc (2015-06-17) 1 commit
 - git-checkout.txt: document git checkout pathspec better

 $gmane/271812


* jk/pretty-encoding-doc (2015-06-17) 1 commit
 - docs: clarify that --encoding can produce invalid sequences

 $gmane/271879


* ak/format-patch-odir-config (2015-06-19) 1 commit
 - format-patch: introduce format.outputDirectory configuration

 Reroll exists but didn't pick it up as it seems to be still
 collecting review comments.


* bc/gpg-verify-raw (2015-06-22) 7 commits
 - verify-tag: add option to print raw gpg status information
 - verify-commit: add option to print raw gpg status information
 - gpg: centralize printing signature buffers
 - gpg: centralize signature check
 - verify-commit: add test for exit status on untrusted signature
 - verify-tag: share code with verify-commit
 - verify-tag: add tests

 git verify-tag and git verify-commit have been taught to share
 more code, and then learned to optionally show the verification
 message from the underlying GPG implementation.

 Will merge to 'next'.


* cb/parse-magnitude (2015-06-22) 2 commits
 - parse-options: move unsigned long option parsing out of pack-objects.c
 - test-parse-options: update to handle negative ints

 Move machinery to parse human-readable scaled numbers like 1k, 4M,
 and 2G as an option parameter's value from pack-objects to
 parse-options API, to make it available to other codepaths.

 Will merge to 'next'.


* cb/subtree-tests-update (2015-06-22) 3 commits
 - contrib/subtree: small tidy-up to test
 - contrib/subtree: fix broken -chains and revealed test error
 - contrib/subtree: use tabs consitently for indentation in tests

 Will merge to 'next'.


* da/mergetool-winmerge (2015-06-19) 1 commit
 - mergetool-lib: fix default tool selection

 An earlier change already in 'master' broke the default tool
 selection for mergetool.

 Will merge to 'next'.


* jk/cat-file-batch-all (2015-06-22) 8 commits
 - cat-file: sort and de-dup output of --batch-all-objects
 - cat-file: add --batch-all-objects option
 - cat-file: split batch_one_object into two stages
 - cat-file: stop returning value from batch_one_object
 - cat-file: add --buffer option
 - cat-file: move batch_options definition to top of file
 - cat-file: minor style fix in options list
 - Merge branch 'jk/maint-for-each-packed-object' into jk/cat-file-batch-all
 (this branch uses jk/maint-for-each-packed-object.)

 cat-file learned --batch-all-objects option to enumerate all
 available objects in the repository more quickly than rev-list
 --all --objects (the output includes unreachable objects, though).



* jk/maint-for-each-packed-object (2015-06-22) 1 commit
 - for_each_packed_object: automatically open pack index
 (this branch is used by jk/cat-file-batch-all.)

 The for_each_packed_object() API function did not iterate over
 objects in a packfile that hasn't been used yet.

 Will merge to 'next'.



Re: [PATCH] Add list-all-objects command

2015-06-22 Thread Jeff King
On Mon, Jun 22, 2015 at 02:50:10PM -0700, Junio C Hamano wrote:

  We may want to take patch 1 separately for the maint-track, as it is
  really a bug-fix (albeit one that I do not think actually affects anyone
  in practice right now).
 
 Hmph, add_unseen_recent_objects_to_traversal() is the only existing
 user, and before d3038d22 (prune: keep objects reachable from recent
 objects, 2014-10-15) added that function, for-each-packed-object
 existed but had no callers.

I think that is because it was added by d3038d22^. :)

 And the objects not beeing seen by that function (due to lack of
 open) would matter only for pruning purposes, which would mean
 you have to be calling into the codepath when running a full repack,
 so you would have opened all the packs that matter anyway (if you
 have a old cruft archive pack that contains only objects that
 are unreachable, you may not have opened that pack, though, and you
 may prune the thing away prematurely).

Exactly, that matches my analysis.

 So, I think I can agree that this would unlikely affect anybody in
 practice.

Yep. I am OK if we do not even worry about it for maint, then.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in


RFC/Pull Request: Refs db backend

2015-06-22 Thread David Turner
I've revived and modified Ronnie Sahlberg's work on the refs db
backend.  

The work is on top of be3c13e5564, Junio's First batch for 2.5 cycle.
I recognize that there have been changes to the refs code since then,
and that there are some further changes in-flight from e.g. Michael
Haggerty.  If there is interest in this, I can rebase once Michael's
changes land.

The changes can be found here:
https://github.com/dturner-tw/git.git on the dturner/pluggable-backends
branch

The db backend code was added in the penultimate commit; the rest is
just code rearrangement and minor changes to make alternate backends
possible.  There ended up being a fair amount of this rearrangement, but
the end result is that almost the entire git test suite runs under the
db backend without error (see below for details).

The db backend runs git for-each-ref about 30% faster than the files
backend with fully-packed refs on a repo with ~120k refs.  It's also
about 4x faster than using fully-unpacked refs.  In addition, and
perhaps more importantly, it avoids case-conflict issues on OS X.

I chose to use LMDB for the database.  LMDB has a few features that make
it suitable for usage in git:

1. It is relatively lightweight; it requires only one header file, and
the library itself is under 300k (as opposed to 700k for
e.g. sqlite).

2. It is well-tested: it's been used in OpenLDAP for years.

3. It's very fast.  LMDB's benchmarks show that it is among
the fastest key-value stores.

4. It has a relatively simple concurrency story; readers don't
block writers and writers don't block readers.

Ronnie Sahlberg's original version of this patchset used tdb.  The
advantage of tdb is that it's smaller (~125k).  The disadvantages are
that tdb is hard to build on OS X.  It's also not in homebrew.  So lmdb
seemed simpler.

To test this backend's correctness, I hacked test-lib.sh and
test-lib-functions.sh to run all tests under the refs backend. Dozens
of tests use manual ref/reflog reading/writing, or create submodules
without passing --refs-backend-type to git init.  If those tests are
changed to use the update-ref machinery or test-refs-be-db (or, in the
case of packed-refs, corrupt refs, and dumb fetch tests, are skipped),
the only remaining failing tests are the git-new-workdir tests and the
gitweb tests.  

Please let me know how it would be best to proceed. 

--
To unsubscribe from this list: send the line unsubscribe git in


Re: 'eol' documentation confusion

2015-06-22 Thread Torsten Bögershausen

On 06/21/2015 04:16 PM, Robert Dailey wrote:

On Sun, Jun 21, 2015 at 9:04 AM, Robert Dailey rcdailey.li...@gmail.com wrote:

Upon inspection of the gitattributes documentation page here:
https://git-scm.com/docs/gitattributes

When comparing the documentation for 'text' with 'eol', I see the
following missing explanations for 'eol':

* eol
* -eol

Maybe the fact that these are missing means they are not valid to use.
There is also the issue that `text` usually controls EOL anyway. Is
there ever any reason to set eol in a way differently than explained
in the documentation (that is, `eol=lf` or `eol=crlf`)?

No.
eol=lf or eol=crlf are the only useful settings.
Everything else is ignored because it does not make sense.

See convert.c:
static enum eol git_path_check_eol()


--
To unsubscribe from this list: send the line unsubscribe git in


Re: Sporadic test failures on OSX 10.10.3

2015-06-22 Thread Eric Sunshine
On Sat, Jun 20, 2015 at 5:47 AM, Heiko Voigt hvo...@hvoigt.net wrote:
 I am currently experiencing sporadic test failures on Mac OS X 10.10.3:

 Test Summary Report
 ---
 t7503-pre-commit-hook.sh (Wstat: 256 Tests: 11 
 Failed: 1)
   Failed test:  9
   Non-zero exit status: 1
 t7502-commit.sh  (Wstat: 256 Tests: 64 
 Failed: 1)
   Failed test:  59
   Non-zero exit status: 1
 t7407-submodule-foreach.sh   (Wstat: 256 Tests: 17 
 Failed: 1)
   Failed test:  14
   Non-zero exit status: 1
 t7406-submodule-update.sh(Wstat: 256 Tests: 43 
 Failed: 2)
   Failed tests:  36-37
   Non-zero exit status: 1
 Files=702, Tests=12559, 618 wallclock secs ( 6.56 usr  2.08 sys + 716.50 cusr 
 932.03 csys = 1657.17 CPU)
 Result: FAIL
 make[1]: *** [prove] Error 1
 make: *** [test] Error 2

 When I execute the tests individually they succeed. Sometimes running the
 testsuite succeeds sometimes it does not. For example now running the
 testsuite the first and second time showed no failures. Only the third run
 revealed the above.

 Is anyone else experiencing this as well? I am seeing this on Junios
 master (f86f31ab33c3). The list of failed tests varies sometimes.

Confirmed.
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH 6/7] cat-file: split batch_one_object into two stages

2015-06-22 Thread Jeff King
There are really two things going on in this function:

  1. We convert the name we got on stdin to a sha1.

  2. We look up and print information on the sha1.

Let's split out the second half so that we can call it
separately.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 39 +++
 1 file changed, 23 insertions(+), 16 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 7d99c15..499ccda 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -251,10 +251,31 @@ static void print_object_or_die(struct batch_options 
*opt, struct expand_data *d
}
 }
 
+static void batch_object_write(const char *obj_name, struct batch_options *opt,
+  struct expand_data *data)
+{
+   struct strbuf buf = STRBUF_INIT;
+
+   if (sha1_object_info_extended(data-sha1, data-info, 
LOOKUP_REPLACE_OBJECT)  0) {
+   printf(%s missing\n, obj_name);
+   fflush(stdout);
+   return;
+   }
+
+   strbuf_expand(buf, opt-format, expand_format, data);
+   strbuf_addch(buf, '\n');
+   batch_write(opt, buf.buf, buf.len);
+   strbuf_release(buf);
+
+   if (opt-print_contents) {
+   print_object_or_die(opt, data);
+   batch_write(opt, \n, 1);
+   }
+}
+
 static void batch_one_object(const char *obj_name, struct batch_options *opt,
 struct expand_data *data)
 {
-   struct strbuf buf = STRBUF_INIT;
struct object_context ctx;
int flags = opt-follow_symlinks ? GET_SHA1_FOLLOW_SYMLINKS : 0;
enum follow_symlinks_result result;
@@ -294,21 +315,7 @@ static void batch_one_object(const char *obj_name, struct 
batch_options *opt,
return;
}
 
-   if (sha1_object_info_extended(data-sha1, data-info, 
LOOKUP_REPLACE_OBJECT)  0) {
-   printf(%s missing\n, obj_name);
-   fflush(stdout);
-   return;
-   }
-
-   strbuf_expand(buf, opt-format, expand_format, data);
-   strbuf_addch(buf, '\n');
-   batch_write(opt, buf.buf, buf.len);
-   strbuf_release(buf);
-
-   if (opt-print_contents) {
-   print_object_or_die(opt, data);
-   batch_write(opt, \n, 1);
-   }
+   batch_object_write(obj_name, opt, data);
 }
 
 static int batch_objects(struct batch_options *opt)
-- 
2.4.4.719.g3984bc6

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] Add list-all-objects command

2015-06-22 Thread Jeff King
On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote:

  +   prepare_packed_git();
  +   for (p = packed_git; p; p = p-next) {
  +   open_pack_index(p);
  +   }
 
 Yikes. The fact that you need to do this means that
 for_each_packed_object is buggy, IMHO. I'll send a patch.

Here's that patch. And since I did not want to pile work on Charles, I
went ahead and just implemented the patches I suggested in the other
email.

We may want to take patch 1 separately for the maint-track, as it is
really a bug-fix (albeit one that I do not think actually affects anyone
in practice right now).

Patches 2-5 are useful even if we go with Charles' command, as they make
cat-file better (cleanups and he new buffer option).

Patches 6-7 implement the cat-file option that would be redundant with
list-all-objects.

By the way, in addition to not showing objects in order,
list-all-objects (and my cat-file option) may show duplicates. Do we
want to sort -u for the user? It might be nice for them to always get
a de-duped and sorted list. Aside from the CPU cost of sorting, it does
mean we'll allocate ~80MB for the kernel to store the sha1s. I guess
that's not too much when you are talking about the kernel repo. I took
the coward's way out and just mentioned the limitation in the
documentation, but I'm happy to be persuaded.

  [1/7]: for_each_packed_object: automatically open pack index
  [2/7]: cat-file: minor style fix in options list
  [3/7]: cat-file: move batch_options definition to top of file
  [4/7]: cat-file: add --buffer option
  [5/7]: cat-file: stop returning value from batch_one_object
  [6/7]: cat-file: split batch_one_object into two stages
  [7/7]: cat-file: add --batch-all-objects option

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH 1/7] for_each_packed_object: automatically open pack index

2015-06-22 Thread Jeff King
When for_each_packed_object is called, we call
prepare_packed_git() to make sure we have the actual list of
packs. But the latter does not actually open the pack
indices, meaning that pack-nr_objects may simply be 0 if
the pack has not otherwise been used since the program
started.

In practice, this didn't come up for the current callers,
because they iterate the packed objects only after iterating
all reachable objects (so for it to matter you would have to
have a pack consisting only of unreachable objects). But it
is a dangerous and confusing interface that should be fixed
for future callers.

Note that we do not end the iteration when a pack cannot be
opened, but we do return an error. That lets you complete
the iteration even in actively-repacked repository where an
.idx file may racily go away, but it also lets callers know
that they may not have gotten the complete list (which the
current reachability-check caller does care about).

We have to tweak one of the prune tests due to the changed
return value; an earlier test creates bogus .idx files and
does not clean them up. Having to make this tweak is a good
thing; it means we will not prune in a broken repository,
and the test confirms that we do not negatively impact a
more lenient caller, count-objects.

Signed-off-by: Jeff King p...@peff.net
---
 sha1_file.c  | 7 ++-
 t/t5304-prune.sh | 1 +
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/sha1_file.c b/sha1_file.c
index 5038475..f1f0efb 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -3573,14 +3573,19 @@ int for_each_packed_object(each_packed_object_fn cb, 
void *data, unsigned flags)
 {
struct packed_git *p;
int r = 0;
+   int pack_errors = 0;
 
prepare_packed_git();
for (p = packed_git; p; p = p-next) {
if ((flags  FOR_EACH_OBJECT_LOCAL_ONLY)  !p-pack_local)
continue;
+   if (open_pack_index(p)) {
+   pack_errors = 1;
+   continue;
+   }
r = for_each_object_in_pack(p, cb, data);
if (r)
break;
}
-   return r;
+   return r ? r : pack_errors;
 }
diff --git a/t/t5304-prune.sh b/t/t5304-prune.sh
index 0794d33..023d7c6 100755
--- a/t/t5304-prune.sh
+++ b/t/t5304-prune.sh
@@ -218,6 +218,7 @@ test_expect_success 'gc: prune old objects after local 
clone' '
 '
 
 test_expect_success 'garbage report in count-objects -v' '
+   test_when_finished rm -f .git/objects/pack/fake* 
: .git/objects/pack/foo 
: .git/objects/pack/foo.bar 
: .git/objects/pack/foo.keep 
-- 
2.4.4.719.g3984bc6

--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH 3/7] cat-file: move batch_options definition to top of file

2015-06-22 Thread Jeff King
That way all of the functions can make use of it.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 13 +++--
 1 file changed, 7 insertions(+), 6 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 6cb..d4101b7 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -10,6 +10,13 @@
 #include streaming.h
 #include tree-walk.h
 
+struct batch_options {
+   int enabled;
+   int follow_symlinks;
+   int print_contents;
+   const char *format;
+};
+
 static int cat_one_file(int opt, const char *exp_type, const char *obj_name,
int unknown_type)
 {
@@ -232,12 +239,6 @@ static void print_object_or_die(int fd, struct expand_data 
*data)
}
 }
 
-struct batch_options {
-   int enabled;
-   int follow_symlinks;
-   int print_contents;
-   const char *format;
-};
 
 static int batch_one_object(const char *obj_name, struct batch_options *opt,
struct expand_data *data)
-- 
2.4.4.719.g3984bc6

--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH 2/7] cat-file: minor style fix in options list

2015-06-22 Thread Jeff King
We do not put extra whitespace before the first macro
argument.

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 049a95f..6cb 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -412,7 +412,7 @@ int cmd_cat_file(int argc, const char **argv, const char 
*prefix)
OPT_CMDMODE('p', NULL, opt, N_(pretty-print object's 
content), 'p'),
OPT_CMDMODE(0, textconv, opt,
N_(for blob objects, run textconv on object's 
content), 'c'),
-   OPT_BOOL( 0, allow-unknown-type, unknown_type,
+   OPT_BOOL(0, allow-unknown-type, unknown_type,
  N_(allow -s and -t to work with broken/corrupt 
objects)),
{ OPTION_CALLBACK, 0, batch, batch, format,
N_(show info and content of objects fed from the 
standard input),
-- 
2.4.4.719.g3984bc6

--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH 8/7] cat-file: sort and de-dup output of --batch-all-objects

2015-06-22 Thread Jeff King
On Mon, Jun 22, 2015 at 06:33:21AM -0400, Jeff King wrote:

 By the way, in addition to not showing objects in order,
 list-all-objects (and my cat-file option) may show duplicates. Do we
 want to sort -u for the user? It might be nice for them to always get
 a de-duped and sorted list. Aside from the CPU cost of sorting, it does
 mean we'll allocate ~80MB for the kernel to store the sha1s. I guess
 that's not too much when you are talking about the kernel repo. I took
 the coward's way out and just mentioned the limitation in the
 documentation, but I'm happy to be persuaded.

The patch below does the sort/de-dup. I'd probably just squash it into
patch 7, though.

I did have one additional thought, though. We are treating this as two
separate operations: what are the sha1s in the repo and show me
information about this sha1. But by integrating with cat-file, we could
actually show information not just about a particular sha1, but about a
particular on-disk object.

E.g., if there are duplicates of a particular object, some formatters
like %(objectsize:disk) and %(deltabase) pick one arbitrarily to
show. I don't know if anybody actually cares about that in practice, but
if we show duplicates, we could give the accurate information for each
instance (and in fact we could give other information like loose vs
packed, which file contains the object, etc).

I tend to think that the lack of de-duping is sufficiently confusing
that it should be the default, and we can always add a no really, show
me the duplicates option later. It is not as simple as skipping the
de-dup step. We'd have to actually avoid calling sha1_object_info, and
use the information found in the loose/pack traversal (which would in
turn require exposing the low-level bits of sha1_object_info).

-- 8 --
Subject: cat-file: sort and de-dup output of --batch-all-objects

The sorting we could probably live without, but printing
duplicates is just a hassle for the user, who must then
de-dup themselves (or risk a wrong answer if they are doing
something like counting objects with a particular property).

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-cat-file.txt |  3 +--
 builtin/cat-file.c | 22 +++---
 t/t1006-cat-file.sh|  3 +--
 3 files changed, 17 insertions(+), 11 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 6831b08..3105fc0 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -74,8 +74,7 @@ OPTIONS
requested batch operation on all objects in the repository and
any alternate object stores (not just reachable objects).
Requires `--batch` or `--batch-check` be specified. Note that
-   the order of the objects is unspecified, and there may be
-   duplicate entries.
+   the objects are visited in order sorted by their hashes.
 
 --buffer::
Normally batch output is flushed after each object is output, so
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 95604c4..07baad1 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -9,6 +9,7 @@
 #include userdiff.h
 #include streaming.h
 #include tree-walk.h
+#include sha1-array.h
 
 struct batch_options {
int enabled;
@@ -324,19 +325,19 @@ struct object_cb_data {
struct expand_data *expand;
 };
 
-static int batch_object_cb(const unsigned char *sha1,
-  struct object_cb_data *data)
+static void batch_object_cb(const unsigned char sha1[20], void *vdata)
 {
+   struct object_cb_data *data = vdata;
hashcpy(data-expand-sha1, sha1);
batch_object_write(NULL, data-opt, data-expand);
-   return 0;
 }
 
 static int batch_loose_object(const unsigned char *sha1,
  const char *path,
  void *data)
 {
-   return batch_object_cb(sha1, data);
+   sha1_array_append(data, sha1);
+   return 0;
 }
 
 static int batch_packed_object(const unsigned char *sha1,
@@ -344,7 +345,8 @@ static int batch_packed_object(const unsigned char *sha1,
   uint32_t pos,
   void *data)
 {
-   return batch_object_cb(sha1, data);
+   sha1_array_append(data, sha1);
+   return 0;
 }
 
 static int batch_objects(struct batch_options *opt)
@@ -375,11 +377,17 @@ static int batch_objects(struct batch_options *opt)
data.info.typep = data.type;
 
if (opt-all_objects) {
+   struct sha1_array sa = SHA1_ARRAY_INIT;
struct object_cb_data cb;
+
+   for_each_loose_object(batch_loose_object, sa, 0);
+   for_each_packed_object(batch_packed_object, sa, 0);
+
cb.opt = opt;
cb.expand = data;
-   for_each_loose_object(batch_loose_object, cb, 0);
-   for_each_packed_object(batch_packed_object, cb, 0);
+   sha1_array_for_each_unique(sa, 

Re: [PATCH v4 05/19] ref-filter: add parse_opt_merge_filter()

2015-06-22 Thread Matthieu Moy
Junio C Hamano gits...@pobox.com writes:

 This is copied from 'builtin/branch.c' which will eventually be removed
 when we port 'branch.c' to use ref-filter APIs.

 Hmph. I somehow thought Matthieu's instruction was to finish tag.c
 side first

I would call in advice rather than instruction. I still think we
should prioritize the tag.c side, but if this patch is ready, it makes
sense to keep it in the series.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH 7/7] cat-file: add --batch-all-objects option

2015-06-22 Thread Jeff King
It can sometimes be useful to examine all objects in the
repository. Normally this is done with git rev-list --all
--objects, but:

  1. That shows only reachable objects. You may want to look
 at all available objects.

  2. It's slow. We actually open each object to walk the
 graph. If your operation is OK with seeing unreachable
 objects, it's an order of magnitude faster to just
 enumerate the loose directories and pack indices.

You can do this yourself using ls and git show-index,
but it's non-obvious.  This patch adds an option to
cat-file --batch-check to operate on all available
objects (rather than reading names from stdin).

This is based on a proposal by Charles Bailey to provide a
separate git list-all-objects command. That is more
orthogonal, as it splits enumerating the objects from
getting information about them. However, in practice you
will either:

  a. Feed the list of objects directly into cat-file anyway,
 so you can find out information about them. Keeping it
 in a single process is more efficient.

  b. Ask the listing process to start telling you more
 information about the objects, in which case you will
 reinvent cat-file's batch-check formatter.

Adding a cat-file option is simple and efficient. And if you
really do want just the object names, you can always do:

  git cat-file --batch-check='%(objectname)' --batch-all-objects

Signed-off-by: Jeff King p...@peff.net
---
 Documentation/git-cat-file.txt |  8 
 builtin/cat-file.c | 44 --
 t/t1006-cat-file.sh| 27 ++
 3 files changed, 77 insertions(+), 2 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 0058bd4..6831b08 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -69,6 +69,14 @@ OPTIONS
not be combined with any other options or arguments.  See the
section `BATCH OUTPUT` below for details.
 
+--batch-all-objects::
+   Instead of reading a list of objects on stdin, perform the
+   requested batch operation on all objects in the repository and
+   any alternate object stores (not just reachable objects).
+   Requires `--batch` or `--batch-check` be specified. Note that
+   the order of the objects is unspecified, and there may be
+   duplicate entries.
+
 --buffer::
Normally batch output is flushed after each object is output, so
that a process can interactively read and write from
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 499ccda..95604c4 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -15,6 +15,7 @@ struct batch_options {
int follow_symlinks;
int print_contents;
int buffer_output;
+   int all_objects;
const char *format;
 };
 
@@ -257,7 +258,7 @@ static void batch_object_write(const char *obj_name, struct 
batch_options *opt,
struct strbuf buf = STRBUF_INIT;
 
if (sha1_object_info_extended(data-sha1, data-info, 
LOOKUP_REPLACE_OBJECT)  0) {
-   printf(%s missing\n, obj_name);
+   printf(%s missing\n, obj_name ? obj_name : 
sha1_to_hex(data-sha1));
fflush(stdout);
return;
}
@@ -318,6 +319,34 @@ static void batch_one_object(const char *obj_name, struct 
batch_options *opt,
batch_object_write(obj_name, opt, data);
 }
 
+struct object_cb_data {
+   struct batch_options *opt;
+   struct expand_data *expand;
+};
+
+static int batch_object_cb(const unsigned char *sha1,
+  struct object_cb_data *data)
+{
+   hashcpy(data-expand-sha1, sha1);
+   batch_object_write(NULL, data-opt, data-expand);
+   return 0;
+}
+
+static int batch_loose_object(const unsigned char *sha1,
+ const char *path,
+ void *data)
+{
+   return batch_object_cb(sha1, data);
+}
+
+static int batch_packed_object(const unsigned char *sha1,
+  struct packed_git *pack,
+  uint32_t pos,
+  void *data)
+{
+   return batch_object_cb(sha1, data);
+}
+
 static int batch_objects(struct batch_options *opt)
 {
struct strbuf buf = STRBUF_INIT;
@@ -345,6 +374,15 @@ static int batch_objects(struct batch_options *opt)
if (opt-print_contents)
data.info.typep = data.type;
 
+   if (opt-all_objects) {
+   struct object_cb_data cb;
+   cb.opt = opt;
+   cb.expand = data;
+   for_each_loose_object(batch_loose_object, cb, 0);
+   for_each_packed_object(batch_packed_object, cb, 0);
+   return 0;
+   }
+
/*
 * We are going to call get_sha1 on a potentially very large number of
 * objects. In most large cases, these will be actual object sha1s. The
@@ -436,6 

[PATCH 4/7] cat-file: add --buffer option

2015-06-22 Thread Jeff King
We use a direct write() to output the results of --batch and
--batch-check. This is good for processes feeding the input
and reading the output interactively, but it introduces
measurable overhead if you do not want this feature. For
example, on linux.git:

  $ git rev-list --objects --all | cut -d' ' -f1 objects
  $ time git cat-file --batch-check='%(objectsize)' \
  objects /dev/null
  real0m5.440s
  user0m5.060s
  sys 0m0.384s

This patch adds an option to use regular stdio buffering:

  $ time git cat-file --batch-check='%(objectsize)' \
  --buffer objects /dev/null
  real0m4.975s
  user0m4.888s
  sys 0m0.092s

Signed-off-by: Jeff King p...@peff.net
---
This selectively uses fwrite or write_or_die, depending on the buffer
setting. Another option would be to just always use fwrite(), and then
selectively fflush(). It feels kind of wasteful in the non-buffered
case, as it's just another layer to write through. OTOH, the cost of
writing a line into the buffer only to flush is probably dwarfed by the
system call of actually flushing.

If we went that direction, we could probably simplify the code a bit
(both getting rid of the batch_write function I call here, and dropping
a bunch of existing fflush() calls, where we must flush any time we use
printf for its formatting capabilities).

I also considered that the --buffer case is likely to be the common
one. We cannot flip the default, though, as it would break any existing
callers (who would need to specify --no-buffer). We can do the usual
deprecation dance, but I don't know if it is worth it for a plumbing
command like this.

 Documentation/git-cat-file.txt |  7 +++
 builtin/cat-file.c | 26 +++---
 2 files changed, 26 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-cat-file.txt b/Documentation/git-cat-file.txt
index 319ab4c..0058bd4 100644
--- a/Documentation/git-cat-file.txt
+++ b/Documentation/git-cat-file.txt
@@ -69,6 +69,13 @@ OPTIONS
not be combined with any other options or arguments.  See the
section `BATCH OUTPUT` below for details.
 
+--buffer::
+   Normally batch output is flushed after each object is output, so
+   that a process can interactively read and write from
+   `cat-file`. With this option, the output uses normal stdio
+   buffering; this is much more efficient when invoking
+   `--batch-check` on a large number of objects.
+
 --allow-unknown-type::
Allow -s or -t to query broken/corrupt objects of unknown type.
 
diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index d4101b7..741e100 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -14,6 +14,7 @@ struct batch_options {
int enabled;
int follow_symlinks;
int print_contents;
+   int buffer_output;
const char *format;
 };
 
@@ -211,14 +212,25 @@ static size_t expand_format(struct strbuf *sb, const char 
*start, void *data)
return end - start + 1;
 }
 
-static void print_object_or_die(int fd, struct expand_data *data)
+static void batch_write(struct batch_options *opt, const void *data, int len)
+{
+   if (opt-buffer_output) {
+   if (fwrite(data, 1, len, stdout) != len)
+   die_errno(unable to write to stdout);
+   } else
+   write_or_die(1, data, len);
+}
+
+static void print_object_or_die(struct batch_options *opt, struct expand_data 
*data)
 {
const unsigned char *sha1 = data-sha1;
 
assert(data-info.typep);
 
if (data-type == OBJ_BLOB) {
-   if (stream_blob_to_fd(fd, sha1, NULL, 0)  0)
+   if (opt-buffer_output)
+   fflush(stdout);
+   if (stream_blob_to_fd(1, sha1, NULL, 0)  0)
die(unable to stream %s to stdout, sha1_to_hex(sha1));
}
else {
@@ -234,12 +246,11 @@ static void print_object_or_die(int fd, struct 
expand_data *data)
if (data-info.sizep  size != data-size)
die(object %s changed size!?, sha1_to_hex(sha1));
 
-   write_or_die(fd, contents, size);
+   batch_write(opt, contents, size);
free(contents);
}
 }
 
-
 static int batch_one_object(const char *obj_name, struct batch_options *opt,
struct expand_data *data)
 {
@@ -294,12 +305,12 @@ static int batch_one_object(const char *obj_name, struct 
batch_options *opt,
 
strbuf_expand(buf, opt-format, expand_format, data);
strbuf_addch(buf, '\n');
-   write_or_die(1, buf.buf, buf.len);
+   batch_write(opt, buf.buf, buf.len);
strbuf_release(buf);
 
if (opt-print_contents) {
-   print_object_or_die(1, data);
-   write_or_die(1, \n, 1);
+   print_object_or_die(opt, data);
+   batch_write(opt, \n, 1);
}
return 0;
 }
@@ -415,6 +426,7 @@ int cmd_cat_file(int 

[PATCH 5/7] cat-file: stop returning value from batch_one_object

2015-06-22 Thread Jeff King
If batch_one_object returns an error code, we stop reading
input.  However, it will only do so if we feed it NULL,
which cannot happen; we give it the buf member of a
strbuf, which is always non-NULL.

We did originally stop on other errors (like a missing
object), but this was changed in 3c076db (cat-file --batch /
--batch-check: do not exit if hashes are missing,
2008-06-09). These days we keep going for any per-object
error (and print missing when necessary).

Signed-off-by: Jeff King p...@peff.net
---
 builtin/cat-file.c | 18 ++
 1 file changed, 6 insertions(+), 12 deletions(-)

diff --git a/builtin/cat-file.c b/builtin/cat-file.c
index 741e100..7d99c15 100644
--- a/builtin/cat-file.c
+++ b/builtin/cat-file.c
@@ -251,17 +251,14 @@ static void print_object_or_die(struct batch_options 
*opt, struct expand_data *d
}
 }
 
-static int batch_one_object(const char *obj_name, struct batch_options *opt,
-   struct expand_data *data)
+static void batch_one_object(const char *obj_name, struct batch_options *opt,
+struct expand_data *data)
 {
struct strbuf buf = STRBUF_INIT;
struct object_context ctx;
int flags = opt-follow_symlinks ? GET_SHA1_FOLLOW_SYMLINKS : 0;
enum follow_symlinks_result result;
 
-   if (!obj_name)
-  return 1;
-
result = get_sha1_with_context(obj_name, flags, data-sha1, ctx);
if (result != FOUND) {
switch (result) {
@@ -286,7 +283,7 @@ static int batch_one_object(const char *obj_name, struct 
batch_options *opt,
break;
}
fflush(stdout);
-   return 0;
+   return;
}
 
if (ctx.mode == 0) {
@@ -294,13 +291,13 @@ static int batch_one_object(const char *obj_name, struct 
batch_options *opt,
   (uintmax_t)ctx.symlink_path.len,
   ctx.symlink_path.buf);
fflush(stdout);
-   return 0;
+   return;
}
 
if (sha1_object_info_extended(data-sha1, data-info, 
LOOKUP_REPLACE_OBJECT)  0) {
printf(%s missing\n, obj_name);
fflush(stdout);
-   return 0;
+   return;
}
 
strbuf_expand(buf, opt-format, expand_format, data);
@@ -312,7 +309,6 @@ static int batch_one_object(const char *obj_name, struct 
batch_options *opt,
print_object_or_die(opt, data);
batch_write(opt, \n, 1);
}
-   return 0;
 }
 
 static int batch_objects(struct batch_options *opt)
@@ -367,9 +363,7 @@ static int batch_objects(struct batch_options *opt)
data.rest = p;
}
 
-   retval = batch_one_object(buf.buf, opt, data);
-   if (retval)
-   break;
+   batch_one_object(buf.buf, opt, data);
}
 
strbuf_release(buf);
-- 
2.4.4.719.g3984bc6

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] Add list-all-objects command

2015-06-22 Thread Jeff King
On Sun, Jun 21, 2015 at 08:20:31PM +0100, Charles Bailey wrote:

 +OPTIONS
 +---
 +
 +-v::
 +--verbose::
 + Output in the followin format instead of just printing object ids:
 + sha1 SP type SP size

s/followin/g/

 +int cmd_list_all_objects(int argc, const char **argv, const char *prefix)
 +{
 + struct packed_git *p;
 +
 + argc = parse_options(argc, argv, prefix, builtin_filter_objects_options,
 +  NULL, 0);
 +
 + for_each_loose_object(check_loose_object, NULL, 0);
 +
 + prepare_packed_git();
 + for (p = packed_git; p; p = p-next) {
 + open_pack_index(p);
 + }

Yikes. The fact that you need to do this means that
for_each_packed_object is buggy, IMHO. I'll send a patch.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] Add list-all-objects command

2015-06-22 Thread Duy Nguyen
On Mon, Jun 22, 2015 at 2:20 AM, Charles Bailey char...@hashpling.org wrote:
 From: Charles Bailey cbaile...@bloomberg.net

 list-all-objects is a command to print the ids of all objects in the
 object database of a repository. It is designed as a low overhead
 interface for scripts that want to analyse all objects but don't require
 the ordering implied by a revision walk.

 It will list all objects, loose and packed, and will include unreachable
 objects.

Nit picking, but perhaps we should allow to select object source:
loose, packed, alternates.. These info are available now and cheap to
get. It's ok not to do it now though.

Personally I would name this command find-objects (after unix
command find) where we could still filter objects _not_ based on
object content.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in


Re: broken repo after power cut

2015-06-22 Thread Richard Weinberger
Am 22.06.2015 um 02:35 schrieb Theodore Ts'o:
 On Sun, Jun 21, 2015 at 03:07:41PM +0200, Richard Weinberger wrote:
 
 I was then shocked to learn that ext4 apparently has a default
 setting that allows it to truncate files upon power failure
 (something about a full journal vs a fast journal or some such)
 
 s/ext4/all modern file systems/
 
 POSIX makes **no guarantees** about what happens after a power failure
 unless you use fsync() --- which git does not do by default (see below).

Thanks for pointing this out.

 The bottome lins is that if you care about files being written, you
 need to use fsync().  Should git use fsync() by default?  Well, if you
 are willing to accept that if your system crashes within a second or
 so of your last git operation, you might need to run git fsck and
 potentially recover from a busted repo, maybe speed is more important
 for you (and git is known for its speed/performance, after all. :-)
 
 The actual state of the source tree would have been written using a
 text editor which tends to be paranoid about using fsync (at least, if
 you use a real editor like Emacs or Vi, as opposed to the toy notepad
 editors shipped with GNOME or KDE :-).  So as long as you know what
 you're doing, it's unlikely that you will actually lose any work.
 
 Personally, I have core.fsyncobjectfiles set to yes in my .gitconfig.
 Part of this is because I have an SSD, so the speed hit really doesn't
 bother me, and needing to recover a corrupted git repository is a pain
 (although I have certainly done it in the past).

I think core.fsyncObjectFiles documentation really needs an update.
What about this one?

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 43bb53c..b08fa11 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -693,10 +693,16 @@ core.whitespace::
 core.fsyncObjectFiles::
This boolean will enable 'fsync()' when writing object files.
 +
-This is a total waste of time and effort on a filesystem that orders
-data writes properly, but can be useful for filesystems that do not use
-journalling (traditional UNIX filesystems) or that only journal metadata
-and not file contents (OS X's HFS+, or Linux ext3 with data=writeback).
+For performance reasons git does not call 'fsync()' after writing object
+files. This means that after a power cut your git repository can get
+corrupted as not all data hit the storage media. Especially on modern
+filesystems like ext4, xfs or btrfs this can happen very easily.
+If you have to face power cuts and care about your data it is strongly
+recommended to enable this setting.
+Please note that git's behavior used to be safe on ext3 with data=ordered,
+for any other filesystems or mount settings this is not the case as
+POSIX clearly states that you have to call 'fsync()' to make sure that
+all data is written.

 core.preloadIndex::
Enable parallel index preload for operations like 'git diff'

--
Thanks,
//richard
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH] Add list-all-objects command

2015-06-22 Thread Charles Bailey
On Mon, Jun 22, 2015 at 04:38:22AM -0400, Jeff King wrote:
 On Sun, Jun 21, 2015 at 08:20:31PM +0100, Charles Bailey wrote:
 
  +   prepare_packed_git();
  +   for (p = packed_git; p; p = p-next) {
  +   open_pack_index(p);
  +   }
 
 Yikes. The fact that you need to do this means that
 for_each_packed_object is buggy, IMHO. I'll send a patch.

I'm glad you said that; the interface did seem a bit warty at the time
but as I fixed this early in my hacking I didn't remeber to revisit
this and ask if it was actually intentional.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 18/19] fsck: git receive-pack: support excluding objects from fsck'ing

2015-06-22 Thread Johannes Schindelin
Hi Junio,

On 2015-06-22 06:21, Junio C Hamano wrote:
 On Fri, Jun 19, 2015 at 6:35 AM, Johannes Schindelin
 johannes.schinde...@gmx.de wrote:

 @@ -227,6 +277,10 @@ static int report(struct fsck_options *options, struct 
 object *object,
 if (msg_type == FSCK_IGNORE)
 return 0;

 +   if (options-skiplist  object 
 +   sha1_array_lookup(options-skiplist, object-sha1) 
 = 0)
 +   return 0;
 +
 if (msg_type == FSCK_FATAL)
 msg_type = FSCK_ERROR;
 else if (msg_type == FSCK_INFO)
 
 I just double checked this patch because I wanted to make sure this
 was applied in the
 report() function (i.e. behave as if FSCK_IGNORE was specified for
 specific objects on
 the skip list), and I am happy to see that it indeed is the case.
 
 That was because I briefly feared that skip could be done before going
 through the usual
 verification chain, which would have been very wrong (e.g. we may want
 not to hear about
 missing tagger in v2.6.11-tree tag, but nevertheless we would want to
 check all the tree
 contents pointed at by that tag, as that tree may not be reachable by
 any other way).

To be honest, an earlier iteration actually did have that test much earlier in 
the call chain, but I had changed it to the current location in v5.

My rationale was slightly different from yours: I wanted to affect the 
performance as little as possible. So looking up each and every object in the 
skip list (which I expect to be relatively small) seemed to be wasteful. And 
then it occurred to me that it would make much more sense to just make the 
skip-list functionality equivalent to the ignore message type.

It just occurred to me, however, that one thing is possibly surprising with 
either version of the skip list functionality: if a certain object is corrupt 
on disk, it cannot be skipped via the skip-list, as the object is *still* 
unpacked (which would fail in the case of a corrupt object).

I will document that in the man page.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: Fast enumeration of objects

2015-06-22 Thread Jeff King
On Sun, Jun 21, 2015 at 08:20:30PM +0100, Charles Bailey wrote:

 I performed some test timings of some different commands on a clone of
 the Linux kernel which was completely packed.

Thanks for timing things. I think we can fairly easily improve a bit on
what you have here. I'll go through my full analysis, but see the
conclusions at the end.

   $ time git rev-list --all --objects |
   cut -d  -f1 |
   git cat-file --batch-check |
   awk '{if ($3 = 512000) { print $1 }}' |
   wc -l
   958
 
   real0m30.823s
   user0m41.904s
   sys 0m7.728s
 
 list-all-objects gives a significant improvement:
 
   $ time git list-all-objects |
   git cat-file --batch-check |
   awk '{if ($3 = 512000) { print $1 }}' |
   wc -l
   958
 
   real0m9.585s
   user0m10.820s
   sys 0m4.960s

That makes sense; of course these two are not necessarily producing the
same answer (they do in your case because it's a fresh clone, and all of
the objects are reachable). I think that's an acceptable caveat.

You can speed up the second one by asking batch-check only for the parts
you care about:

  git list-all-objects |
  git cat-file --batch-check='%(objectsize) %(objectname)' |
  awk '{if ($1 = 512000) { print $2 }}' |
  wc -l

That dropped my best-of-five timings for the same test down from 9.5s to
7.0s. The answer should be the same. The reason is that cat-file will
only compute the items it needs to show, and the object-type is more
expensive to get than the size[1].

Replacing awk with:

  perl -alne 'print $F[0] if $F[1]  512000'

dropped that to 6.0s. That mostly means my awk sucks, but it is
interesting to note that not all of the extra time is pipe overhead
inherent to this approach; your choice of processor matters, too.

If you're willing to get a slightly different answer, but one that is
often just as useful, you can replace the %(objectsize) in the
cat-file invocation with %(objectsize:disk). That gives you the actual
on-disk size of the object, which includes delta and zlib compression.
For 512K, that produces very different results (because files of that
size may actually be text file). But for most truly huge files, they
typically do not delta or compress at all, and the on-disk size is
roughly the same.

That only shaves off 100-200 milliseconds, though.

[1] If you are wondering why the size is cheaper than the type, it is
because of deltas. For base objects, we can get either immediately
from the pack entry's header. For a delta, to get the size we have
to open the object data; the expected size is part of the delta
data. So we pay the extra cost to zlib-inflate the first few bytes.
But finding the type works differently; the type in the pack header
is OFS_DELTA, so we have to walk back to the parent entry to find
the real type.  If that parent is a delta, we walk back recursively
until we hit a base object.

You'd think that would also make %(objectsize:disk) much cheaper
than %(objectsize), too. But the disk sizes require computing a
the pack revindex on the fly, which takes a few hundred milliseconds
on linux.git.

 skipping the cat-filter filter is a lesser but still significant
 improvement:
 
   $ time git list-all-objects -v |
   awk '{if ($3 = 512000) { print $1 }}' |
   wc -l
   958
 
   real0m5.637s
   user0m6.652s
   sys 0m0.156s

That's a pretty nice improvement over the piped version. But we cannot
do the same custom-format optimization there, because -v does not
support it. It would be nice if it supported the full range of cat-file
formatters.

I did a hacky proof-of-concept, and that brought my 6.0s time down to
4.9s.

I also noticed that cat-file doesn't do any output buffering; this is
because it may be used interactively, line by line, by a caller
controlling both pipes. Replacing write_or_die() with fwrite in my
proof-of-concept dropped the time to 3.7s.

That's faster still than your original (different machines, obviously,
but your times are similar to mine):

 The old filter-objects could do the size filter a little be faster, but
 not by much:
 
   $ time git filter-objects --min-size=500k |
   wc -l
   958
 
   real0m4.564s
   user0m4.496s
   sys 0m0.064s

This is likely caused by your use of sha1_object_info(), which always
computes the type. Switching to the extended form would probably buy you
another 2 seconds or so.

Also, all my numbers are wall-clock times. The CPU time for my 3.7s time
is actually 6.8s. Whereas doing it all in one process would probably
require 3.0s or so of actual CPU time.

So my conclusions are:

  1. Yes, the pipe/parsing overhead of a separate processor really is
 measurable. That's hidden in the wall-clock time if you have
 multiple cores, but you may care more 

Re: [PATCH] Add list-all-objects command

2015-06-22 Thread Jeff King
On Mon, Jun 22, 2015 at 04:57:28PM +0700, Duy Nguyen wrote:

 On Mon, Jun 22, 2015 at 2:20 AM, Charles Bailey char...@hashpling.org wrote:
  From: Charles Bailey cbaile...@bloomberg.net
 
  list-all-objects is a command to print the ids of all objects in the
  object database of a repository. It is designed as a low overhead
  interface for scripts that want to analyse all objects but don't require
  the ordering implied by a revision walk.
 
  It will list all objects, loose and packed, and will include unreachable
  objects.
 
 Nit picking, but perhaps we should allow to select object source:
 loose, packed, alternates.. These info are available now and cheap to
 get. It's ok not to do it now though.

There is already plumbing to do those individual operations if you want.
Although some of the plumbing involves for i in objects/pack/*.pack,
which is perhaps a little less abstract than we'd like. :)

 Personally I would name this command find-objects (after unix
 command find) where we could still filter objects _not_ based on
 object content.

I like that better than ls, too, but I propose that we actually add
this as a feature to cat-file. I'll send patches in a moment.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v3 3/6] bisect: simplify the addition of new bisect terms

2015-06-22 Thread Antoine Delaite
We create a file BISECT_TERMS in the repository .git to be read during a
bisection. The fonctions to be changed if we add new terms are quite
few.
In git-bisect.sh :
check_and_set_terms
bisect_voc

Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr
Signed-off-by: Louis Stuber stub...@ensimag.grenoble-inp.fr
Signed-off-by: Valentin Duperray valentin.duper...@ensimag.imag.fr
Signed-off-by: Franck Jonas franck.jo...@ensimag.imag.fr
Signed-off-by: Lucien Kong lucien.k...@ensimag.imag.fr
Signed-off-by: Thomas Nguy thomas.n...@ensimag.imag.fr
Signed-off-by: Huynh Khoi Nguyen Nguyen 
huynh-khoi-nguyen.ngu...@ensimag.imag.fr
Signed-off-by: Matthieu Moy matthieu@grenoble-inp.fr
---
 bisect.c  |   38 --
 git-bisect.sh |   70 +++-
 2 files changed, 98 insertions(+), 10 deletions(-)

diff --git a/bisect.c b/bisect.c
index cda30fa..2fc8a78 100644
--- a/bisect.c
+++ b/bisect.c
@@ -747,7 +747,10 @@ static void handle_bad_merge_base(void)
between %s and [%s].\n,
bad_hex, bad_hex, good_hex);
} else {
-   die(BUG: terms %s/%s not managed, name_bad, 
name_good);
+   fprintf(stderr, The merge base %s is %s.\n
+   This means the first commit marked %s is 
+   between %s and [%s].\n,
+   bad_hex, name_bad, name_bad, bad_hex, good_hex);
}
exit(3);
}
@@ -902,6 +905,36 @@ static void show_diff_tree(const char *prefix, struct 
commit *commit)
 }
 
 /*
+ * The terms used for this bisect session are stored in BISECT_TERMS.
+ * We read them and store them to adapt the messages accordingly.
+ * Default is bad/good.
+ */
+void read_bisect_terms(const char **read_bad, const char **read_good)
+{
+   struct strbuf str = STRBUF_INIT;
+   const char *filename = git_path(BISECT_TERMS);
+   FILE *fp = fopen(filename, r);
+
+   if (!fp) {
+   if (errno==2) {
+   *read_bad = bad;
+   *read_good = good;
+   return;
+   } else {
+   die(could not read file '%s': %s, filename,
+   strerror(errno));
+   }
+   } else {
+   strbuf_getline(str, fp, '\n');
+   *read_bad = strbuf_detach(str, NULL);
+   strbuf_getline(str, fp, '\n');
+   *read_good = strbuf_detach(str, NULL);
+   }
+   strbuf_release(str);
+   fclose(fp);
+}
+
+/*
  * We use the convention that exiting with an exit code 10 means that
  * the bisection process finished successfully.
  * In this case the calling shell script should exit 0.
@@ -917,8 +950,7 @@ int bisect_next_all(const char *prefix, int no_checkout)
const unsigned char *bisect_rev;
char bisect_rev_hex[GIT_SHA1_HEXSZ + 1];
 
-   name_bad=bad;
-   name_good=good;
+   read_bisect_terms(name_bad, name_good);
if (read_bisect_refs())
die(reading bisect refs failed);
 
diff --git a/git-bisect.sh b/git-bisect.sh
index ce6412f..55b9ebd 100644
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -77,6 +77,9 @@ bisect_start() {
orig_args=$(git rev-parse --sq-quote $@)
bad_seen=0
eval=''
+   # revision_seen is true if a git bisect start
+   # has revision as arguments
+   revision_seen=0
if test z$(git rev-parse --is-bare-repository) != zfalse
then
mode=--no-checkout
@@ -101,6 +104,9 @@ bisect_start() {
die $(eval_gettext '\$arg' does not appear to 
be a valid revision)
break
}
+
+   revision_seen=1
+
case $bad_seen in
0) state=$NAME_BAD ; bad_seen=1 ;;
*) state=$NAME_GOOD ;;
@@ -172,6 +178,11 @@ bisect_start() {
} 
git rev-parse --sq-quote $@ $GIT_DIR/BISECT_NAMES 
eval $eval true 
+   if test $revision_seen -eq 1  test ! -s $GIT_DIR/BISECT_TERMS
+   then
+   echo $NAME_BAD $GIT_DIR/BISECT_TERMS 
+   echo $NAME_GOOD $GIT_DIR/BISECT_TERMS
+   fi 
echo git bisect start$orig_args $GIT_DIR/BISECT_LOG || exit
#
# Check if we can proceed to the next bisect state.
@@ -232,6 +243,7 @@ bisect_skip() {
 bisect_state() {
bisect_autostart
state=$1
+   check_and_set_terms $state
case $#,$state in
0,*)
die $(gettext Please call 'bisect_state' with at least one 
argument.) ;;
@@ -291,15 +303,17 @@ bisect_next_check() {
: bisect without $NAME_GOOD...
;;
*)
-
+   bad_syn=$(bisect_voc bad)
+ 

[PATCH v3 2/6] bisect: replace hardcoded bad|good by variables

2015-06-22 Thread Antoine Delaite
To add new tags like old/new and have keywords less confusing, the
first step is to avoid hardcoding the keywords.

The default mode is still bad/good.

Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr
Signed-off-by: Louis Stuber stub...@ensimag.grenoble-inp.fr
Signed-off-by: Valentin Duperray valentin.duper...@ensimag.imag.fr
Signed-off-by: Franck Jonas franck.jo...@ensimag.imag.fr
Signed-off-by: Lucien Kong lucien.k...@ensimag.imag.fr
Signed-off-by: Thomas Nguy thomas.n...@ensimag.imag.fr
Signed-off-by: Huynh Khoi Nguyen Nguyen 
huynh-khoi-nguyen.ngu...@ensimag.imag.fr
Signed-off-by: Matthieu Moy matthieu@grenoble-inp.fr
---
 bisect.c  |   51 ++-
 git-bisect.sh |   57 +++--
 2 files changed, 65 insertions(+), 43 deletions(-)
 mode change 100755 = 100644 git-bisect.sh

diff --git a/bisect.c b/bisect.c
index 5b8357d..cda30fa 100644
--- a/bisect.c
+++ b/bisect.c
@@ -21,6 +21,9 @@ static const char *argv_checkout[] = {checkout, -q, NULL, 
--, NULL};
 static const char *argv_show_branch[] = {show-branch, NULL, NULL};
 static const char *argv_update_ref[] = {update-ref, --no-deref, 
BISECT_HEAD, NULL, NULL};
 
+static const char *name_bad;
+static const char *name_good;
+
 /* Remember to update object flag allocation in object.h */
 #define COUNTED(1u16)
 
@@ -403,15 +406,21 @@ struct commit_list *find_bisection(struct commit_list 
*list,
 static int register_ref(const char *refname, const struct object_id *oid,
int flags, void *cb_data)
 {
-   if (!strcmp(refname, bad)) {
+   struct strbuf good_prefix = STRBUF_INIT;
+   strbuf_addstr(good_prefix, name_good);
+   strbuf_addstr(good_prefix, -);
+
+   if (!strcmp(refname, name_bad)) {
current_bad_oid = xmalloc(sizeof(*current_bad_oid));
oidcpy(current_bad_oid, oid);
-   } else if (starts_with(refname, good-)) {
+   } else if (starts_with(refname, good_prefix.buf)) {
sha1_array_append(good_revs, oid-hash);
} else if (starts_with(refname, skip-)) {
sha1_array_append(skipped_revs, oid-hash);
}
 
+   strbuf_release(good_prefix);
+
return 0;
 }
 
@@ -634,7 +643,7 @@ static void exit_if_skipped_commits(struct commit_list 
*tried,
return;
 
printf(There are only 'skip'ped commits left to test.\n
-  The first bad commit could be any of:\n);
+  The first %s commit could be any of:\n, name_bad);
print_commit_list(tried, %s\n, %s\n);
if (bad)
printf(%s\n, oid_to_hex(bad));
@@ -732,18 +741,21 @@ static void handle_bad_merge_base(void)
if (is_expected_rev(current_bad_oid)) {
char *bad_hex = oid_to_hex(current_bad_oid);
char *good_hex = join_sha1_array_hex(good_revs, ' ');
-
-   fprintf(stderr, The merge base %s is bad.\n
-   This means the bug has been fixed 
-   between %s and [%s].\n,
-   bad_hex, bad_hex, good_hex);
-
+   if (!strcmp(name_bad, bad)) {
+   fprintf(stderr, The merge base %s is bad.\n
+   This means the bug has been fixed 
+   between %s and [%s].\n,
+   bad_hex, bad_hex, good_hex);
+   } else {
+   die(BUG: terms %s/%s not managed, name_bad, 
name_good);
+   }
exit(3);
}
 
-   fprintf(stderr, Some good revs are not ancestor of the bad rev.\n
+   fprintf(stderr, Some %s revs are not ancestor of the %s rev.\n
git bisect cannot work properly in this case.\n
-   Maybe you mistook good and bad revs?\n);
+   Maybe you mistook %s and %s revs?\n,
+   name_good, name_bad, name_good, name_bad);
exit(1);
 }
 
@@ -755,10 +767,10 @@ static void handle_skipped_merge_base(const unsigned char 
*mb)
 
warning(the merge base between %s and [%s] 
must be skipped.\n
-   So we cannot be sure the first bad commit is 
+   So we cannot be sure the first %s commit is 
between %s and %s.\n
We continue anyway.,
-   bad_hex, good_hex, mb_hex, bad_hex);
+   bad_hex, good_hex, name_bad, mb_hex, bad_hex);
free(good_hex);
 }
 
@@ -839,7 +851,7 @@ static void check_good_are_ancestors_of_bad(const char 
*prefix, int no_checkout)
int fd;
 
if (!current_bad_oid)
-   die(a bad revision is needed);
+   die(a %s revision is needed, name_bad);
 
/* Check if file BISECT_ANCESTORS_OK exists. */
if (!stat(filename, st)  S_ISREG(st.st_mode))
@@ -905,6 +917,8 @@ int bisect_next_all(const char *prefix, int 

[PATCH v3 6/6] bisect: allows any terms set by user

2015-06-22 Thread Antoine Delaite
Introduction of the git bisect terms function.
The user can set its own terms.
It will work exactly like before. The terms must be set
before the start.

Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr
Signed-off-by: Louis Stuber stub...@ensimag.grenoble-inp.fr
---
 Documentation/git-bisect.txt |   19 
 git-bisect.sh|   68 ++---
 t/t6030-bisect-porcelain.sh  |   43 ++
 3 files changed, 125 insertions(+), 5 deletions(-)

diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt
index 3c3021a..ef0c03c 100644
--- a/Documentation/git-bisect.txt
+++ b/Documentation/git-bisect.txt
@@ -133,6 +133,25 @@ You must run `git bisect start` without commits as 
argument and run
 `git bisect new rev`/`git bisect old rev...` after to add the
 commits.
 
+Alternative terms: use your own terms
+
+
+If the builtins terms bad/good and new/old do not satisfy you, you can
+set your own terms.
+
+
+git bisect terms term1 term2
+
+
+This command has to be used before a bisection has started.
+The term1 must be associated with the latest revisions and term2 with the
+ancestors of term1.
+
+Only the first bisection following the 'git bisect terms' will use the terms.
+If you mistyped one of the terms you can do again 'git bisect terms term1
+term2'.
+
+
 Bisect visualize
 
 
diff --git a/git-bisect.sh b/git-bisect.sh
index a11ca06..7da22b1 100644
--- a/git-bisect.sh
+++ b/git-bisect.sh
@@ -1,6 +1,6 @@
 #!/bin/sh
 
-USAGE='[help|start|bad|good|new|old|skip|next|reset|visualize|replay|log|run]'
+USAGE='[help|start|bad|good|new|old|terms|skip|next|reset|visualize|replay|log|run]'
 LONG_USAGE='git bisect help
print this long help message.
 git bisect start [--no-checkout] [bad [good...]] [--] [pathspec...]
@@ -11,6 +11,8 @@ git bisect (bad|new) [rev]
 git bisect (good|old) [rev...]
mark rev... known-good revisions/
revisions before change in a given property.
+git bisect terms term1 term2
+   set up term1 and term2 as bisection terms.
 git bisect skip [(rev|range)...]
mark rev... untestable revisions.
 git bisect next
@@ -82,6 +84,14 @@ bisect_start() {
# revision_seen is true if a git bisect start
# has revision as arguments
revision_seen=0
+   # terms_defined is used to detect if the user
+   # defined his own terms with git bisect terms
+   terms_defined=0
+   if test -s $GIT_DIR/TERMS_DEFINED
+   then
+   terms_defined=1
+   get_terms
+   fi
if test z$(git rev-parse --is-bare-repository) != zfalse
then
mode=--no-checkout
@@ -180,10 +190,14 @@ bisect_start() {
} 
git rev-parse --sq-quote $@ $GIT_DIR/BISECT_NAMES 
eval $eval true 
-   if test $revision_seen -eq 1  test ! -s $GIT_DIR/BISECT_TERMS
+   if test $revision_seen -eq 1  test ! -s $GIT_DIR/BISECT_TERMS || 
test $terms_defined -eq 1
then
echo $NAME_BAD $GIT_DIR/BISECT_TERMS 
-   echo $NAME_GOOD $GIT_DIR/BISECT_TERMS
+   echo $NAME_GOOD $GIT_DIR/BISECT_TERMS 
+   if test $terms_defined -eq 1
+   then
+   echo git bisect terms $NAME_BAD $NAME_GOOD 
$GIT_DIR/BISECT_LOG || exit
+   fi
fi 
echo git bisect start$orig_args $GIT_DIR/BISECT_LOG || exit
#
@@ -419,6 +433,7 @@ bisect_clean_state() {
rm -f $GIT_DIR/BISECT_NAMES 
rm -f $GIT_DIR/BISECT_RUN 
rm -f $GIT_DIR/BISECT_TERMS 
+   rm -f $GIT_DIR/TERMS_DEFINED 
# Cleanup head-name if it got left by an old version of git-bisect
rm -f $GIT_DIR/head-name 
git update-ref -d --no-deref BISECT_HEAD 
@@ -447,6 +462,8 @@ bisect_replay () {
eval $cmd ;;
$NAME_GOOD|$NAME_BAD|skip)
bisect_write $command $rev ;;
+   terms)
+   bisect_terms $rev ;;
*)
die $(gettext ?? what are you talking about?) ;;
esac
@@ -531,7 +548,8 @@ get_terms () {
 check_and_set_terms () {
cmd=$1
case $cmd in
-   bad|good|new|old)
+   skip|start|terms) ;;
+   *)
if test -s $GIT_DIR/BISECT_TERMS  test $cmd != 
$NAME_BAD  test $cmd != $NAME_GOOD
then
die $(eval_gettext Invalid command: you're currently 
in a \$NAME_BAD/\$NAME_GOOD bisect.)
@@ -564,6 +582,44 @@ bisect_voc () {
esac
 }
 
+bisect_terms () {
+   case $# in
+   0)
+   if test -s $GIT_DIR/BISECT_TERMS
+   then
+   {
+   read term1
+   read term2
+

[PATCH v3 5/6] revision: fix rev-list --bisect in old/new mode

2015-06-22 Thread Antoine Delaite
From: Louis Stuber stub...@ensimag.grenoble-inp.fr

Calling git rev-list --bisect when an old/new mode bisection was started
shows the help notice. This has been fixed by reading BISECT_TERMS in
revision.c to find the correct bisect refs path (which was always
refs/bisect/bad (or good) before and can be refs/bisect/new (old) now).

Signed-off-by: Louis Stuber stub...@ensimag.grenoble-inp.fr
Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr
---
 revision.c |   16 ++--
 1 files changed, 14 insertions(+), 2 deletions(-)

diff --git a/revision.c b/revision.c
index 3ff8723..27750ac 100644
--- a/revision.c
+++ b/revision.c
@@ -21,6 +21,9 @@
 
 volatile show_early_output_fn_t show_early_output;
 
+static const char *name_bad;
+static const char *name_good;
+
 char *path_name(const struct name_path *path, const char *name)
 {
const struct name_path *p;
@@ -2076,14 +2079,22 @@ void parse_revision_opt(struct rev_info *revs, struct 
parse_opt_ctx_t *ctx,
ctx-argc -= n;
 }
 
+extern void read_bisect_terms(const char **bad, const char **good);
+
 static int for_each_bad_bisect_ref(const char *submodule, each_ref_fn fn, void 
*cb_data)
 {
-   return for_each_ref_in_submodule(submodule, refs/bisect/bad, fn, 
cb_data);
+   char bisect_refs_path[256];
+   strcpy(bisect_refs_path, refs/bisect/);
+   strcat(bisect_refs_path, name_bad);
+   return for_each_ref_in_submodule(submodule, bisect_refs_path, fn, 
cb_data);
 }
 
 static int for_each_good_bisect_ref(const char *submodule, each_ref_fn fn, 
void *cb_data)
 {
-   return for_each_ref_in_submodule(submodule, refs/bisect/good, fn, 
cb_data);
+   char bisect_refs_path[256];
+   strcpy(bisect_refs_path, refs/bisect/);
+   strcat(bisect_refs_path, name_good);
+   return for_each_ref_in_submodule(submodule, bisect_refs_path, fn, 
cb_data);
 }
 
 static int handle_revision_pseudo_opt(const char *submodule,
@@ -2112,6 +2123,7 @@ static int handle_revision_pseudo_opt(const char 
*submodule,
handle_refs(submodule, revs, *flags, 
for_each_branch_ref_submodule);
clear_ref_exclusion(revs-ref_excludes);
} else if (!strcmp(arg, --bisect)) {
+   read_bisect_terms(name_bad, name_good);
handle_refs(submodule, revs, *flags, for_each_bad_bisect_ref);
handle_refs(submodule, revs, *flags ^ (UNINTERESTING | BOTTOM), 
for_each_good_bisect_ref);
revs-bisect = 1;
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v3 4/6] bisect: add the terms old/new

2015-06-22 Thread Antoine Delaite
When not looking for a regression during a bisect but for a fix or a
change in another given property, it can be confusing to use 'good'
and 'bad'.

This patch introduce `git bisect new` and `git bisect old` as an
alternative to 'bad' and good': the commits which have a certain
property must be marked as `new` and the ones which do not as `old`.

The output will be the first commit after the change in the property.
During a new/old bisect session you cannot use bad/good commands and
vice-versa.

Some commands are still not available for old/new:
 * git rev-list --bisect does not treat the revs/bisect/new and
   revs/bisect/old-SHA1 files.

Old discussions:
- http://thread.gmane.org/gmane.comp.version-control.git/86063
introduced bisect fix unfixed to find fix.
- http://thread.gmane.org/gmane.comp.version-control.git/182398
discussion around bisect yes/no or old/new.
- http://thread.gmane.org/gmane.comp.version-control.git/199758
last discussion and reviews
New discussions:
- http://thread.gmane.org/gmane.comp.version-control.git/271320
( v2 1/7-4/7 )
- http://comments.gmane.org/gmane.comp.version-control.git/271343
( v2 5/7-7/7 )

Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr
Signed-off-by: Louis Stuber stub...@ensimag.grenoble-inp.fr
Signed-off-by: Valentin Duperray valentin.duper...@ensimag.imag.fr
Signed-off-by: Franck Jonas franck.jo...@ensimag.imag.fr
Signed-off-by: Lucien Kong lucien.k...@ensimag.imag.fr
Signed-off-by: Thomas Nguy thomas.n...@ensimag.imag.fr
Signed-off-by: Huynh Khoi Nguyen Nguyen 
huynh-khoi-nguyen.ngu...@ensimag.imag.fr
Signed-off-by: Matthieu Moy matthieu@grenoble-inp.fr
---
 Documentation/git-bisect.txt |   48 -
 bisect.c |   11 +++--
 git-bisect.sh|   30 +
 t/t6030-bisect-porcelain.sh  |   38 +
 4 files changed, 112 insertions(+), 15 deletions(-)

diff --git a/Documentation/git-bisect.txt b/Documentation/git-bisect.txt
index 4cb52a7..3c3021a 100644
--- a/Documentation/git-bisect.txt
+++ b/Documentation/git-bisect.txt
@@ -18,8 +18,8 @@ on the subcommand:
 
  git bisect help
  git bisect start [--no-checkout] [bad [good...]] [--] [paths...]
- git bisect bad [rev]
- git bisect good [rev...]
+ git bisect (bad|new) [rev]
+ git bisect (good|old) [rev...]
  git bisect skip [(rev|range)...]
  git bisect reset [commit]
  git bisect visualize
@@ -104,6 +104,35 @@ For example, `git bisect reset HEAD` will leave you on the 
current
 bisection commit and avoid switching commits at all, while `git bisect
 reset bisect/bad` will check out the first bad revision.
 
+
+Alternative terms: bisect new and bisect old
+
+
+If you are not at ease with the terms bad and good, perhaps
+because you are looking for the commit that introduced a fix, you can
+alternatively use new and old instead.
+But note that you cannot mix bad and good with new and old.
+
+
+git bisect new [rev]
+
+
+Marks the commit as new, e.g. the bug is no longer there, if you are looking
+for a commit that fixed a bug, or the feature that used to work is now broken
+at this point, if you are looking for a commit that introduced a bug.
+It is the equivalent of git bisect bad [rev].
+
+
+git bisect old [rev...]
+
+
+Marks the commit as old, as the opposite of 'git bisect new'.
+It is the equivalent of git bisect good [rev...].
+
+You must run `git bisect start` without commits as argument and run
+`git bisect new rev`/`git bisect old rev...` after to add the
+commits.
+
 Bisect visualize
 
 
@@ -379,6 +408,21 @@ In this case, when 'git bisect run' finishes, bisect/bad 
will refer to a commit
 has at least one parent whose reachable graph is fully traversable in the sense
 required by 'git pack objects'.
 
+* Look for a fix instead of a regression in the code
++
+
+$ git bisect start
+$ git bisect new HEAD# current commit is marked as new
+$ git bisect old HEAD~10 # the tenth commit from now is marked as old
+
++
+Let's consider the last commit has a given property, and that we are looking
+for the commit which introduced this property. For each commit the bisection
+guide us to, we will test if the property is present. If it is we will mark
+the commit as new with 'git bisect new', otherwise we will mark it as old.
+At the end of the bisect session, the result will be the first new commit (e.g
+the first one with the property).
+
 
 SEE ALSO
 
diff --git a/bisect.c b/bisect.c
index 2fc8a78..7492fdc 100644
--- a/bisect.c
+++ b/bisect.c
@@ -746,6 +746,11 @@ static 

Re: [PATCH v7 04/19] fsck: Offer a function to demote fsck errors to warnings

2015-06-22 Thread Johannes Schindelin
Hi Junio,

On 2015-06-22 19:37, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 diff --git a/fsck.c b/fsck.c
 index 1a3f7ce..e81a342 100644
 --- a/fsck.c
 +++ b/fsck.c
 @@ -64,30 +64,29 @@ enum fsck_msg_id {
  #undef MSG_ID

  #define STR(x) #x
 -#define MSG_ID(id, msg_type) { STR(id), FSCK_##msg_type },
 +#define MSG_ID(id, msg_type) { STR(id), NULL, FSCK_##msg_type },
  static struct {
  const char *id_string;
 +const char *lowercased;
  int msg_type;
  } msg_id_info[FSCK_MSG_MAX + 1] = {
  FOREACH_MSG_ID(MSG_ID)
 -{ NULL, -1 }
 +{ NULL, NULL, -1 }
  };
  #undef MSG_ID

  static int parse_msg_id(const char *text)
  {
 -static char **lowercased;
  int i;

 -if (!lowercased) {
 +if (!msg_id_info[0].lowercased) {
  /* convert id_string to lower case, without underscores. */
 -lowercased = xmalloc(FSCK_MSG_MAX * sizeof(*lowercased));
  for (i = 0; i  FSCK_MSG_MAX; i++) {
  const char *p = msg_id_info[i].id_string;
  int len = strlen(p);
  char *q = xmalloc(len);

 -lowercased[i] = q;
 +msg_id_info[i].lowercased = q;
  while (*p)
  if (*p == '_')
  p++;
 @@ -98,7 +97,7 @@ static int parse_msg_id(const char *text)
  }

  for (i = 0; i  FSCK_MSG_MAX; i++)
 -if (!strcmp(text, lowercased[i]))
 +if (!strcmp(text, msg_id_info[i].lowercased))
  return i;

  return -1;
 
 Heh, this was the first thing that came to my mind when I saw 03/19
 that lazily prepares downcased version (which is good) but do so in
 a separately allocated buffer (which is improved by this change) ;-)
 
 IOW, I think all of the above should have been part of 03/19, not
 oops I belatedly realized that this way is better fixup here.

Gh. Wrong commit fixed up. Sorry. Will be fixed in v8.

 +void fsck_set_msg_types(struct fsck_options *options, const char *values)
 +{
 +char *buf = xstrdup(values), *to_free = buf;
 +int done = 0;
 +
 +while (!done) {
 +int len = strcspn(buf,  ,|), equal;
 +
 +done = !buf[len];
 +if (!len) {
 +buf++;
 +continue;
 +}
 +buf[len] = '\0';
 +
 +for (equal = 0; equal  len 
 +buf[equal] != '='  buf[equal] != ':'; equal++)
 
 Style.  I'd format this more like so:
 
   for (equal = 0;
  equal  len  buf[equal] != '='  buf[equal] != ':';
equal++)

Will be fixed.

 +buf[equal] = tolower(buf[equal]);
 +buf[equal] = '\0';
 +
 +if (equal == len)
 +die(Missing '=': '%s', buf);
 +
 +fsck_set_msg_type(options, buf, buf + equal + 1);
 +buf += len + 1;
 +}
 +free(to_free);
 +}
 
 Overall, the change is good (and it was good in v6, too), and I
 think it has become simpler to follow the logic with the upfront
 downcasing.

Yep, I agree. I did not expect that, but it was worth the effort to compare the 
two versions.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v3 1/6] bisect: correction of typo

2015-06-22 Thread Antoine Delaite
Signed-off-by: Antoine Delaite antoine.dela...@ensimag.grenoble-inp.fr
---
 bisect.c|2 +-
 t/t6030-bisect-porcelain.sh |2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/bisect.c b/bisect.c
index 03d5cd9..5b8357d 100644
--- a/bisect.c
+++ b/bisect.c
@@ -743,7 +743,7 @@ static void handle_bad_merge_base(void)
 
fprintf(stderr, Some good revs are not ancestor of the bad rev.\n
git bisect cannot work properly in this case.\n
-   Maybe you mistake good and bad revs?\n);
+   Maybe you mistook good and bad revs?\n);
exit(1);
 }
 
diff --git a/t/t6030-bisect-porcelain.sh b/t/t6030-bisect-porcelain.sh
index 06b4868..9e2c203 100755
--- a/t/t6030-bisect-porcelain.sh
+++ b/t/t6030-bisect-porcelain.sh
@@ -362,7 +362,7 @@ test_expect_success 'bisect starting with a detached HEAD' '
 test_expect_success 'bisect errors out if bad and good are mistaken' '
git bisect reset 
test_must_fail git bisect start $HASH2 $HASH4 2 rev_list_error 
-   grep mistake good and bad rev_list_error 
+   grep mistook good and bad rev_list_error 
git bisect reset
 '
 
-- 
1.7.1

--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option

2015-06-22 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 I have a slight preference for keeping the pairs not squashed. This way,
 we have a clear separation write reusable library code / use it. But
 I'm fine with squashing if others prefer.

 As I cannot firmly say that copy  paste first and then later
 clean-up is bad and we should split them in different way, I
 am fine with leaving them separate as they are.

Having said that, I have a slight preference that a split does not
break my build in the middle of the series by introducing an
unused function, which is noticed by the compiler as a warning, and
turned into an error with -Werror.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 05/19] ref-filter: add parse_opt_merge_filter()

2015-06-22 Thread Karthik Nayak
On Mon, Jun 22, 2015 at 6:25 AM, Junio C Hamano gits...@pobox.com wrote:

 Why SHOUT here?


Just used to typing macros in caps. Will change!

 This is copied from 'builtin/branch.c' which will eventually be removed
 when we port 'branch.c' to use ref-filter APIs.

 Hmph. I somehow thought Matthieu's instruction was to finish tag.c
 side first and then
 do branch (i.e. with 3 and 4 you brought things from tag to
 for-each-ref, now it is a time
 to rewrite tag by using what you wrote for for-each-ref with 3 and 4,
 before moving to
 this patch)? Was that plan scrapped or found inappropriate or something?


 I would call in advice rather than instruction. I still think we
 should prioritize the tag.c side, but if this patch is ready, it makes
 sense to keep it in the series.

Yes, Matthieu did advice that.
But I had already started working on this. But if you guys think thats
a better option
I can do that also, as I already have tag.c ported over to use
ref-filter on my local branch.
But that'll include a lot of changes.

Also I found this more systematic as we will have a complete
ref-filter library ready and
only porting of tag.c and branch.c would be left.


 When does this trigger? You have lastarg-default with HEAD, and I am
 having trouble guessing when arg upon entry to this function can ever
 be NULL.


This is redundant. will remove.


 Can --merged (or --no-merged) be given more than once? Is the rule
 the last one wins?
 Does the old value of rf-merge_commit leak (no, it does not, but I am
 just asking for
 completeness)?

Yes! currently for e.g. `git for-each-ref --merged master --merged ref_filter`
would mean that ref_filter would be considered for the --merge option.

Does the old value of rf-merge_commit leak From my understanding of
object.c/commit.c, It just points to an object in the obj_hash array, so when
multiple options are given the pointer just points to another part of obj_hash,
I'm sure there's more to it. This is what I gathered from an over the top view.

-- 
Regards,
Karthik Nayak
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option

2015-06-22 Thread Junio C Hamano
Matthieu Moy matthieu@grenoble-inp.fr writes:

 Karthik Nayak karthik@gmail.com writes:

 On Mon, Jun 22, 2015 at 6:15 AM, Junio C Hamano gits...@pobox.com wrote:
  3  4 as a single patch may make more sense, if we were to tolerate the
 let's copy  paste first and then later remove the duplicate as a way to
 postpone touching tag.c side in order to first concentrate on 
 for-each-ref.

 I have not formed a firm opinion on what the right split of the series is, 
 but
 so far (assuming that the temporary duplication is the best we can do) what
 I am seeing in this series makes sense to me.

 Thanks.

 That would mean squashing 34, 67 and 1011 also on similar lines.

 I have a slight preference for keeping the pairs not squashed. This way,
 we have a clear separation write reusable library code / use it. But
 I'm fine with squashing if others prefer.

As I cannot firmly say that copy  paste first and then later
clean-up is bad and we should split them in different way, I
am fine with leaving them separate as they are.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option

2015-06-22 Thread Karthik Nayak
On Mon, Jun 22, 2015 at 6:15 AM, Junio C Hamano gits...@pobox.com wrote:
  3  4 as a single patch may make more sense, if we were to tolerate the
 let's copy  paste first and then later remove the duplicate as a way to
 postpone touching tag.c side in order to first concentrate on for-each-ref.

 I have not formed a firm opinion on what the right split of the series is, but
 so far (assuming that the temporary duplication is the best we can do) what
 I am seeing in this series makes sense to me.

 Thanks.

That would mean squashing 34, 67 and 1011 also on similar lines.

-- 
Regards,
Karthik Nayak
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v4 04/19] for-each-ref: add '--points-at' option

2015-06-22 Thread Matthieu Moy
Karthik Nayak karthik@gmail.com writes:

 On Mon, Jun 22, 2015 at 6:15 AM, Junio C Hamano gits...@pobox.com wrote:
  3  4 as a single patch may make more sense, if we were to tolerate the
 let's copy  paste first and then later remove the duplicate as a way to
 postpone touching tag.c side in order to first concentrate on for-each-ref.

 I have not formed a firm opinion on what the right split of the series is, 
 but
 so far (assuming that the temporary duplication is the best we can do) what
 I am seeing in this series makes sense to me.

 Thanks.

 That would mean squashing 34, 67 and 1011 also on similar lines.

I have a slight preference for keeping the pairs not squashed. This way,
we have a clear separation write reusable library code / use it. But
I'm fine with squashing if others prefer.

-- 
Matthieu Moy
http://www-verimag.imag.fr/~moy/
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v7 00/19] Introduce an internal API to interact with the fsck machinery

2015-06-22 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 Changes since v6:
 
 - camelCased message IDs

 - multiple author checking now as suggested by Junio

 - renamed `--quick` to `--connectivity-only`, better commit message

 - `fsck.skipList` is now handled correctly (and not mistaken for a message
   type setting)

 - `fsck.skipList` can handle user paths now

 - index-pack configures the walk function in a more logical place now

 - simplified code by avoiding working on partial strings (i.e. removed
   `substrcmp()`). This saves 10 lines. To accomodate parsing config
   variables directly, we now work on lowercased message IDs; unfortunately
   this means that we cannot use them in append_msg_id() because that
   function wants to append camelCased message IDs.

 Interdiff below diffstat.

Except for minor nits I sent separate messages, this round looks
very nicely done (I however admit that I haven't read the skiplist
parsing code carefully at all, expecting that you wouldn't screw up
with something simple like that ;-))

Thanks, will replace what is queued.  Let's start thinking about
moving it down to 'next' (meaning: we _could_ still accept a reroll,
but I think we are in a good shape and minor incremental refinements
would suffice), cooking it for the remainder of the cycle and having
it graduate to 'master' at the beginning of the next cycle.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: config commands not working _Noobe question

2015-06-22 Thread Christian Couder
On Tue, Jun 23, 2015 at 1:31 AM, Greg Ledger gled...@glcdelivers.com wrote:
 after adding git config ‹global user.name Greg Ledger and git config
 ‹global user.email gled...@glcdelivers.com, when I run:
 source ~/.gitconfig

The ~/.gitconfig file is not a shell script. You should not source it.
It is a text file that is read by Git commands when those commands are run.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: RFC/Pull Request: Refs db backend

2015-06-22 Thread Junio C Hamano
David Turner dtur...@twopensource.com writes:

 I've revived and modified Ronnie Sahlberg's work on the refs db
 backend.  

 The work is on top of be3c13e5564, Junio's First batch for 2.5 cycle.
 I recognize that there have been changes to the refs code since then,
 and that there are some further changes in-flight from e.g. Michael
 Haggerty.  If there is interest in this, I can rebase once Michael's
 changes land.
 ...
 The db backend runs git for-each-ref about 30% faster than the files
 backend with fully-packed refs on a repo with ~120k refs.  It's also
 about 4x faster than using fully-unpacked refs.  In addition, and
 perhaps more importantly, it avoids case-conflict issues on OS X.

 I chose to use LMDB for the database...
 ...
 Ronnie Sahlberg's original version of this patchset used tdb.  The
 advantage of tdb is that it's smaller (~125k).  The disadvantages are
 that tdb is hard to build on OS X.  It's also not in homebrew.  So lmdb
 seemed simpler.

If there is interest?  Shut up and take my money ;-)

More seriously, that's great that you stepped up to resurrect this
topic.  In a sense, the choice of sample database backend does not
matter.  I do not care if it is tdb, lmdb, or even Berkeley DB as
long as it functions. ;-)

As long as the interface between ref-transaction system on the Git
side and the database backend is designed right, your lmdb thing can
serve as a reference implementation for other people to plug other
database backends to the same interface, right?  As one step to
validate the interface to the database backends, it would be nice to
eventually have at least two backends that talk to meaningfully
different systems, but we have to start somewhere, and for now we
have lmdb is as good a place to start as any other db backend.

I wonder if we can do a filesystem backend on top of the same
backend interface---is that too much impedance mismatch to make it
unpractical?

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH v2 2/7] bisect: replace hardcoded bad|good by variables

2015-06-22 Thread Antoine Delaite
Matthieu Moy matthieu@grenoble-inp.fr wrote: 

Being an acceptable ref name is a constraint you have to check (Junio 
already mentionned check-ref-format). I think quoting variables makes 
sense too 

I don't get how 'git check-ref-format' works exactly. It says it needs 
at least one slash in the ref name. So it would not be good for bisect 
terms.
Then do we take the same format as branches ? 
( i.e 'git check-ref-format --branch' )

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v6 17/19] fsck: Introduce `git fsck --quick`

2015-06-22 Thread Johannes Schindelin
Hi Junio,

On 2015-06-21 22:35, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 On 2015-06-21 19:15, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 That's brilliant.

 Just to make sure I am reading you correctly, you mean the current
 overall structure:

 [...]

 The way I read Michael's mail, he actually meant something different:
 if all of the blob-related errors/warnings are switched to ignore,
 simply skip unpacking the blobs.
 
 That is how I read his mail, too.
 
 But because IIRC we do not check anything special with blob other
 than we can read it correctly, my description of overall structure
 stayed at a very high conceptual level.  The unpacking may happen at
 a much higher level in the code, i.e. it comes way before this part
 of the logic flow:
 
 if (is bad_blob ignored?)
   ;
   else if (! is the blob loadable and well-formed?) {
 
 in which case is bad blobs ignored? check may have to happen
 before we unpack the object.
 
 And I do not suggest introducing yet another BAD_BLOB error class; I
 would have guessed that you already have an error class for objects
 that are not stored correctly (be it truncated loose object, checksum
 mismatch in the packed base object, or corrupt delta in pack).

Sadly, there is no BAD_BLOB class. The reason is that we actually perform no 
test on blobs, as you pointed out, except for the implicit one: read it as a 
blob object.

And reading them even only partially would still imply a lot of I/O, taking 
away much of the performance improvement I wanted to achieve here.

Further, please note that the `--quick` option *solely* impacts `git fsck`, not 
`git receive-pack`, because we actually really skipped everything except the 
connectivity test.

To allow this discussion to be resolved without further ado, I therefore 
renamed the `--quick` option to `--connectivity-only`, as even I realize that 
there is not much of a check left if not even author or committer lines are 
tested.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: broken repo after power cut

2015-06-22 Thread Theodore Ts'o
On Mon, Jun 22, 2015 at 01:19:59PM +0200, Richard Weinberger wrote:
 
  The bottome lins is that if you care about files being written, you
  need to use fsync().  Should git use fsync() by default?  Well, if you
  are willing to accept that if your system crashes within a second or
  so of your last git operation, you might need to run git fsck and
  potentially recover from a busted repo, maybe speed is more important
  for you (and git is known for its speed/performance, after all. :-)

I made a typo in the above.  s/second/minute/.  (Linux's writeback
timer is 30 seconds, but if the disk is busy it might take a bit
longer to get all of the data blocks written out to disk and
committed.)

 I think core.fsyncObjectFiles documentation really needs an update.
 What about this one?
 
 diff --git a/Documentation/config.txt b/Documentation/config.txt
 index 43bb53c..b08fa11 100644
 --- a/Documentation/config.txt
 +++ b/Documentation/config.txt
 @@ -693,10 +693,16 @@ core.whitespace::
  core.fsyncObjectFiles::
   This boolean will enable 'fsync()' when writing object files.
  +
 -This is a total waste of time and effort on a filesystem that orders
 -data writes properly, but can be useful for filesystems that do not use
 -journalling (traditional UNIX filesystems) or that only journal metadata
 -and not file contents (OS X's HFS+, or Linux ext3 with data=writeback).
 +For performance reasons git does not call 'fsync()' after writing object
 +files. This means that after a power cut your git repository can get
 +corrupted as not all data hit the storage media. Especially on modern
 +filesystems like ext4, xfs or btrfs this can happen very easily.
 +If you have to face power cuts and care about your data it is strongly
 +recommended to enable this setting.
 +Please note that git's behavior used to be safe on ext3 with data=ordered,
 +for any other filesystems or mount settings this is not the case as
 +POSIX clearly states that you have to call 'fsync()' to make sure that
 +all data is written.


My main complaint about this is that it's a bit Linux-centric.  For
example, the fact that fsync(2) is needed to push data out of the
cache is also true for MacOS (and indeed all other Unix systems going
back three decades) as well as Windows.  In fact, it's not a matter of
POSIX says, but POSIX documented, but since standards are held in
high esteem, it's sometimes a bit more convenient to use them as an
appeal to authority.  :-)

(Ext3's data=ordered behaviour is an outlier, and in fact, the reason
why it mostly safe to skip fsync(2) calls when using ext3 data=ordered
was an accidental side effect of another problem which was trying to
solve based on the relatively primitive way it handled block
allocation.)

Cheers,

- Ted
--
To unsubscribe from this list: send the line unsubscribe git in


[Question] Is it normal for accented characters to be shown as decomposed Unicode on GNU/Linux?

2015-06-22 Thread Bastien Traverse
Hi everybody,

I have a repository where some files and folders contain accented
characters due to being in French. Such names include rêve (dream),
réunion (meeting) etc.

Whether already in version control or not, git tools only show their
*decomposed* representation (I use a UTF-8 locale, see below), but don't
accept those representations as input (and auto-completion is broken for
those), which is a bit misleading (test case follows).

I've seen the threads about accented characters on OSX and the use of
'core.precomposeunicode', but as I'm running on GNU/Linux I thought this
shouldn't apply.

Since I've already had a problem in git with a weirdly encoded character
(see http://thread.gmane.org/gmane.comp.version-control.git/269710), I
wanted to get some feedback to determine whether my setup was the cause
of it or if it was normal to see decomposed file names in git. I found
in man git-status:

 If a filename contains whitespace or other nonprintable
 characters, that field will be quoted in the manner of a C string
 literal: surrounded by ASCII double quote (34) characters, and with
 interior special characters backslash-escaped.

So do everybody using accented characters see those in decomposed form
in git? And if so why some softwares built on top of it (like gitit [1])
don't inherit those decomposed representations?

[1] http://gitit.net/

Thanks!

---
test case:
$ mkdir accent-test  cd !$
$ git init
$ touch rêve réunion
$ git status
On branch master

Initial commit

Untracked files:
  (use git add file... to include in what will be committed)

r\303\251union
r\303\252ve
$ git add .
$ git commit -m accent test
[master (root commit) 0d776b7] accent test
 2 files changed, 0 insertions(+), 0 deletions(-)
 create mode 100644 r\303\251union
 create mode 100644 r\303\252ve
$ git log --summary
commit 0d776b7a09d5384a76066999431507018e292efe
Author: Bastien Traverse bastien@traverse.email
Date:   2015-06-22 14:13:46 +0200

accent test

 create mode 100644 r\303\251union
 create mode 100644 r\303\252ve
$ mv rêve reve
$ git status
On branch master
Changes not staged for commit:
  (use git add/rm file... to update what will be committed)
  (use git checkout -- file... to discard changes in working directory)

deleted:r\303\252ve

Untracked files:
  (use git add file... to include in what will be committed)

reve

no changes added to commit (use git add and/or git commit -a)
$ git add [[TAB-TAB]]
r\303\252ve  reve
$ git add [[TAB]] -- git add \r\\303\\252ve\
fatal: pathspec 'r\303\252ve' did not match any files
$ git add r\303\252ve
fatal: pathspec 'r\303\252ve' did not match any files
$ git add rêve reve OR git add .
$ git status
On branch master
Changes to be committed:
  (use git reset HEAD file... to unstage)

renamed:r\303\252ve - reve

I'm running an up-to-date Arch linux with following software versions
and locale config:

$ uname -a
Linux xxx 4.0.5-1-ARCH #1 SMP PREEMPT Sat Jun 6 18:37:49 CEST 2015
x86_64 GNU/Linux
$ bash --version
GNU bash, version 4.3.39(1)-release (x86_64-unknown-linux-gnu)
$ git --version
git version 2.4.3
$ locale
LANG=fr_FR.utf8
LC_CTYPE=fr_FR.utf8
LC_NUMERIC=fr_FR.utf8
LC_TIME=fr_FR.utf8
LC_COLLATE=fr_FR.utf8
LC_MONETARY=fr_FR.utf8
LC_MESSAGES=fr_FR.utf8
LC_PAPER=fr_FR.utf8
LC_NAME=fr_FR.utf8
LC_ADDRESS=fr_FR.utf8
LC_TELEPHONE=fr_FR.utf8
LC_MEASUREMENT=fr_FR.utf8
LC_IDENTIFICATION=fr_FR.utf8
LC_ALL=
$ localectl
   System Locale: LANG=fr_FR.UTF8
   VC Keymap: fr
  X11 Layout: fr
 X11 Variant: oss

Cheers
--
To unsubscribe from this list: send the line unsubscribe git in


[PATCH 2/3] contrib/subtree: Fix broken -chains and revealed test error

2015-06-22 Thread Charles Bailey
From: Charles Bailey cbaile...@bloomberg.net

This fixes two instances where a -chain was broken in the subtree
tests and fixes a test error that was revealed because of this.

Many tests in t7900-subtree.sh make a commit and then use 'undo' to
reset the state for the next test. In the 'check hash of split' test,
an 'undo' was being invoked after a 'subtree split' even though the
particular invocation of 'subtree split' did not actually make a commit.
The subsequent check_equal was failing, but this failure was masked by
that broken -chain.

Removing this undo causes the failing check_equal to succeed but breaks
the a check_equal later on in the same test.

It turns out that an earlier test ('check if --message for merge works
with squash too') makes a commit but doesn't 'undo' to the state
expected by the remaining tests. None of the intervening tests cared
enough about the state of the test repo to fail and the spurious 'undo'
in 'check hash of split' restored the expected state for any remaining
test that might care.

Adding the missing 'undo' to 'check if --message for merge works
with squash too' and removing the spurious one from 'check hash of
split' fixes all tests once the -chains are completed.

Signed-off-by: Charles Bailey cbaile...@bloomberg.net
---
 contrib/subtree/t/t7900-subtree.sh | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/contrib/subtree/t/t7900-subtree.sh 
b/contrib/subtree/t/t7900-subtree.sh
index 2c5bfc1..001c604 100755
--- a/contrib/subtree/t/t7900-subtree.sh
+++ b/contrib/subtree/t/t7900-subtree.sh
@@ -177,7 +177,8 @@ test_expect_success 'check if --message for merge works 
with squash too' '
 test_expect_success 'merge new subproj history into subdir' '
git subtree merge --prefix=subdir FETCH_HEAD 
git branch pre-split 
-   check_equal ''$(last_commit_message)'' Merge commit '''$(git 
rev-parse sub2)''' into mainline
+   check_equal ''$(last_commit_message)'' Merge commit '''$(git 
rev-parse sub2)''' into mainline 
+   undo
 '
 
 test_expect_success 'Check that prefix argument is required for split' '
@@ -218,9 +219,8 @@ test_expect_success 'check split with --branch' '
 
 test_expect_success 'check hash of split' '
spl1=$(git subtree split --prefix subdir) 
-   undo 
git subtree split --prefix subdir --branch splitbr1test 
-   check_equal ''$(git rev-parse splitbr1test)'' $spl1
+   check_equal ''$(git rev-parse splitbr1test)'' $spl1 
git checkout splitbr1test 
new_hash=$(git rev-parse HEAD~2) 
git checkout mainline 
@@ -269,7 +269,7 @@ test_expect_success 'add sub9' '
 cd ..
 
 test_expect_success 'split for sub8' '
-   split2=''$(git subtree split --annotate=''*'' --prefix subdir/ 
--rejoin)''
+   split2=''$(git subtree split --annotate=''*'' --prefix subdir/ 
--rejoin)'' 
git branch split2 $split2
 '
 
-- 
2.4.0.53.g8440f74

--
To unsubscribe from this list: send the line unsubscribe git in


  1   2   >