[PATCH v7 00/19] Introduce an internal API to interact with the fsck machinery

2015-06-22 Thread Johannes Schindelin
At the moment, the git-fsck's integrity checks are targeted toward the
end user, i.e. the error messages are really just messages, intended for
human consumption.

Under certain circumstances, some of those errors should be allowed to
be turned into mere warnings, though, because the cost of fixing the
issues might well be larger than the cost of carrying those flawed
objects. For example, when an already-public repository contains a
commit object with two authors for years, it does not make sense to
force the maintainer to rewrite the history, affecting all contributors
negatively by forcing them to update.

This branch introduces an internal fsck API to be able to turn some of
the errors into warnings, and to make it easier to call the fsck
machinery from elsewhere in general.

I am proud to report that this work has been sponsored by GitHub.

Changes since v6:

- camelCased message IDs

- multiple author checking now as suggested by Junio

- renamed `--quick` to `--connectivity-only`, better commit message

- `fsck.skipList` is now handled correctly (and not mistaken for a message
  type setting)

- `fsck.skipList` can handle user paths now

- index-pack configures the walk function in a more logical place now

- simplified code by avoiding working on partial strings (i.e. removed
  `substrcmp()`). This saves 10 lines. To accomodate parsing config
  variables directly, we now work on lowercased message IDs; unfortunately
  this means that we cannot use them in append_msg_id() because that
  function wants to append camelCased message IDs.

Interdiff below diffstat.

Johannes Schindelin (19):
  fsck: Introduce fsck options
  fsck: Introduce identifiers for fsck messages
  fsck: Provide a function to parse fsck message IDs
  fsck: Offer a function to demote fsck errors to warnings
  fsck (receive-pack): Allow demoting errors to warnings
  fsck: Report the ID of the error/warning
  fsck: Make fsck_ident() warn-friendly
  fsck: Make fsck_commit() warn-friendly
  fsck: Handle multiple authors in commits specially
  fsck: Make fsck_tag() warn-friendly
  fsck: Add a simple test for receive.fsck.msg-id
  fsck: Disallow demoting grave fsck errors to warnings
  fsck: Optionally ignore specific fsck issues completely
  fsck: Allow upgrading fsck warnings to errors
  fsck: Document the new receive.fsck.msg-id options
  fsck: Support demoting errors to warnings
  fsck: Introduce `git fsck --connectivity-only`
  fsck: git receive-pack: support excluding objects from fsck'ing
  fsck: support ignoring objects in `git fsck` via fsck.skiplist

 Documentation/config.txt|  41 +++
 Documentation/git-fsck.txt  |   7 +-
 builtin/fsck.c  |  78 --
 builtin/index-pack.c|  13 +-
 builtin/receive-pack.c  |  28 +-
 builtin/unpack-objects.c|  16 +-
 fsck.c  | 554 +++-
 fsck.h  |  30 ++-
 t/t1450-fsck.sh |  37 ++-
 t/t5302-pack-index.sh   |   2 +-
 t/t5504-fetch-receive-strict.sh |  51 
 11 files changed, 692 insertions(+), 165 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 5aba63a..69dda93 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1252,11 +1252,11 @@ filter.driver.smudge::
 
 fsck.msg-id::
Allows overriding the message type (error, warn or ignore) of a
-   specific message ID such as `missingemail`.
+   specific message ID such as `missingEmail`.
 +
 For convenience, fsck prefixes the error/warning with the message ID,
-e.g.  missingemail: invalid author/committer line - missing email means
-that setting `fsck.missingemail = ignore` will hide that issue.
+e.g.  missingEmail: invalid author/committer line - missing email means
+that setting `fsck.missingEmail = ignore` will hide that issue.
 +
 This feature is intended to support working with legacy repositories
 which cannot be repaired without disruptive changes.
@@ -1267,6 +1267,7 @@ fsck.skipList::
be ignored. This feature is useful when an established project
should be accepted despite early commits containing errors that
can be safely ignored such as invalid committer email addresses.
+   Note: corrupt objects cannot be skipped with this setting.
 
 gc.aggressiveDepth::
The depth parameter used in the delta compression
@@ -2228,9 +2229,9 @@ receive.fsck.msg-id::
to warnings and vice versa by configuring the `receive.fsck.msg-id`
setting where the `msg-id` is the fsck message ID and the value
is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes
-   the error/warning with the message ID, e.g. missingemail: invalid
+   the error/warning with the message ID, e.g. missingEmail: invalid
author/committer line - missing email means that setting
-   `receive.fsck.missingemail = ignore` will hide that issue.
+   

Re: [PATCH v7 00/19] Introduce an internal API to interact with the fsck machinery

2015-06-22 Thread Johannes Schindelin
Hi Junio,

On 2015-06-22 20:02, Junio C Hamano wrote:
 Johannes Schindelin johannes.schinde...@gmx.de writes:
 
 Changes since v6:

 - camelCased message IDs

 - multiple author checking now as suggested by Junio

 - renamed `--quick` to `--connectivity-only`, better commit message

 - `fsck.skipList` is now handled correctly (and not mistaken for a message
   type setting)

 - `fsck.skipList` can handle user paths now

 - index-pack configures the walk function in a more logical place now

 - simplified code by avoiding working on partial strings (i.e. removed
   `substrcmp()`). This saves 10 lines. To accomodate parsing config
   variables directly, we now work on lowercased message IDs; unfortunately
   this means that we cannot use them in append_msg_id() because that
   function wants to append camelCased message IDs.

 Interdiff below diffstat.
 
 Except for minor nits I sent separate messages, this round looks
 very nicely done (I however admit that I haven't read the skiplist
 parsing code carefully at all, expecting that you wouldn't screw up
 with something simple like that ;-))
 
 Thanks, will replace what is queued.  Let's start thinking about
 moving it down to 'next' (meaning: we _could_ still accept a reroll,
 but I think we are in a good shape and minor incremental refinements
 would suffice), cooking it for the remainder of the cycle and having
 it graduate to 'master' at the beginning of the next cycle.

Let me submit a v8 with the borked fixup fixed (i.e. part of 04/19 moved to 
03/19, where it really belongs), the `for` style fix, the fixed double space 
and the cast style, too.

Ciao,
Dscho
--
To unsubscribe from this list: send the line unsubscribe git in


Re: [PATCH v7 00/19] Introduce an internal API to interact with the fsck machinery

2015-06-22 Thread Junio C Hamano
Johannes Schindelin johannes.schinde...@gmx.de writes:

 Changes since v6:
 
 - camelCased message IDs

 - multiple author checking now as suggested by Junio

 - renamed `--quick` to `--connectivity-only`, better commit message

 - `fsck.skipList` is now handled correctly (and not mistaken for a message
   type setting)

 - `fsck.skipList` can handle user paths now

 - index-pack configures the walk function in a more logical place now

 - simplified code by avoiding working on partial strings (i.e. removed
   `substrcmp()`). This saves 10 lines. To accomodate parsing config
   variables directly, we now work on lowercased message IDs; unfortunately
   this means that we cannot use them in append_msg_id() because that
   function wants to append camelCased message IDs.

 Interdiff below diffstat.

Except for minor nits I sent separate messages, this round looks
very nicely done (I however admit that I haven't read the skiplist
parsing code carefully at all, expecting that you wouldn't screw up
with something simple like that ;-))

Thanks, will replace what is queued.  Let's start thinking about
moving it down to 'next' (meaning: we _could_ still accept a reroll,
but I think we are in a good shape and minor incremental refinements
would suffice), cooking it for the remainder of the cycle and having
it graduate to 'master' at the beginning of the next cycle.

Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in