Re: [PATCH v2 2/5] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-31 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> It's our documentation that should be clearly stating those reasons. If
> we're not saying anything about these being historical bugs, then e.g. I
> (not knowing the implementation) wouldn't have turned this on globally
> on my site knowing that because I have none of these now I'm *very*
> unlikely to have them in the future.
>
> That's different from something that just happens rarely, because a rare
> non-historical event can be expected to happen in the future.

Interesting.  If I did not know Git at all, I would decide
completely opposite---because I have none of these now, I'm very
unlikely to have them in the future, so I would leave fsck.
alone to the generally recommended state (i.e. not customizing
without understanding what I am doing).  That way, (1) if that
unlikely thing happens, I would notice and have a chance to deal
with it, and (2) otherwise, I wouldn't have to worry about that
unlikely event at all.

And that decision would not change even if I _knew_ these knobs'
categories were invented to align with bugs and anomalies in older
implementations of Git.

>
>> Between "fsck. makes sense only when you use these rare and
>> you-probably-never-heard-of tools ongoing basis" and "when you
>> already have (slightly)broken objects, naming each of them in
>> skiplist, rather than covering the class, is better because you want
>> *new* instances of the same breakage", I'd imagine the latter would be
>> more helpful.
>>
>> In any case, let's see if there are more input to this topic and
>> then wrap it up in v3 ;-)
>>
>> Thanks.


Re: [PATCH v2 2/5] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-31 Thread Ævar Arnfjörð Bjarmason


On Wed, May 30 2018, Junio C Hamano wrote:

> Ævar Arnfjörð Bjarmason  writes:
>
>> On Mon, May 28, 2018 at 11:45 AM, Junio C Hamano  wrote:
>>> If the project has some tool constraints and have to accept new
>>> "broken" objects on ongoing basis, then fsck. facility may
>>> make sense, but that is probably a very narrow special use case.
>>
>> That makes sense. I'll reword this bit.
>> ...
>> I'll try to clarify this, but I think we really should have some bit
>> there about historical tools. Realistically no new git tools produce
>> these, so the user needs to be made aware of what the trade-off of
>> turning these on is.
>>
>> The reality of that is that these objects are exceedingly rare, and
>> mostly found in various old repositories. Something like that need to
>> be mentioned so the user can weigh the trade-off of turning this on.
>
> Rare or not, once we say "avoid fsck. unless you have a good
> reason not to", wouldn't that be sufficient?

It's our documentation that should be clearly stating those reasons. If
we're not saying anything about these being historical bugs, then e.g. I
(not knowing the implementation) wouldn't have turned this on globally
on my site knowing that because I have none of these now I'm *very*
unlikely to have them in the future.

That's different from something that just happens rarely, because a rare
non-historical event can be expected to happen in the future.

> Between "fsck. makes sense only when you use these rare and
> you-probably-never-heard-of tools ongoing basis" and "when you
> already have (slightly)broken objects, naming each of them in
> skiplist, rather than covering the class, is better because you want
> *new* instances of the same breakage", I'd imagine the latter would be
> more helpful.
>
> In any case, let's see if there are more input to this topic and
> then wrap it up in v3 ;-)
>
> Thanks.


Re: [PATCH v2 2/5] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-29 Thread Junio C Hamano
Junio C Hamano  writes:

> Between "fsck. makes sense only when you use these rare and
> you-probably-never-heard-of tools ongoing basis" and "when you
> already have (slightly)broken objects, naming each of them in
> skiplist, rather than covering the class, is better because you want
> *new* instances of the same breakage", I'd imagine the latter would be

s/breakage/& caught/; obviously, otherwise what I typed does not
make much sense.  Sorry about the premature .

> more helpful.
>
> In any case, let's see if there are more input to this topic and
> then wrap it up in v3 ;-)
>
> Thanks.


Re: [PATCH v2 2/5] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-29 Thread Junio C Hamano
Ævar Arnfjörð Bjarmason  writes:

> On Mon, May 28, 2018 at 11:45 AM, Junio C Hamano  wrote:
>> If the project has some tool constraints and have to accept new
>> "broken" objects on ongoing basis, then fsck. facility may
>> make sense, but that is probably a very narrow special use case.
>
> That makes sense. I'll reword this bit.
> ...
> I'll try to clarify this, but I think we really should have some bit
> there about historical tools. Realistically no new git tools produce
> these, so the user needs to be made aware of what the trade-off of
> turning these on is.
>
> The reality of that is that these objects are exceedingly rare, and
> mostly found in various old repositories. Something like that need to
> be mentioned so the user can weigh the trade-off of turning this on.

Rare or not, once we say "avoid fsck. unless you have a good
reason not to", wouldn't that be sufficient?  

Between "fsck. makes sense only when you use these rare and
you-probably-never-heard-of tools ongoing basis" and "when you
already have (slightly)broken objects, naming each of them in
skiplist, rather than covering the class, is better because you want
*new* instances of the same breakage", I'd imagine the latter would be
more helpful.

In any case, let's see if there are more input to this topic and
then wrap it up in v3 ;-)

Thanks.


Re: [PATCH v2 2/5] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-28 Thread Ævar Arnfjörð Bjarmason
On Mon, May 28, 2018 at 11:45 AM, Junio C Hamano  wrote:
> Eric Sunshine  writes:
>
>>> @@ -1554,23 +1554,42 @@ filter..smudge::
>>> +Depending on the circumstances it might be better to use
>>> +`fsck.skipList` instead to explicitly whitelist those objects that
>>> +have issues, otherwise new occurrences of the same issue will be
>
> I had to read the "instead to ..." part three times before
> convincing myself that this is a good description, and I agree with
> the assessment.  Perhaps we would want to make the recommendation a
> bit stronger, even.
>
> In general, it is better to enumerate existing objects with
> problems with skipList, instead of listing the kind of
> breakages these problematic objects share to be ignored, as
> doing the latter will allow new instances of the same
> breakages go unnoticed.
>
> If the project has some tool constraints and have to accept new
> "broken" objects on ongoing basis, then fsck. facility may
> make sense, but that is probably a very narrow special use case.

That makes sense. I'll reword this bit.

>>> +hidden going forward, although that's unlikely to happen in practice
>>> +unless someone is being deliberately malicious.
>>
>> Is it worth mentioning buggy tools and/or other buggy Git
>> implementations as sources?
>
> Or old Git implementations we ourselves shipped.  I do not think it
> is worth mentioning it, but I do agree that "unless somebody is
> being deliberatly malicious" is misleading, if that is what you are
> getting at.  One use of fsck is about noticing that you are under
> attack, so "unless someone is being malicious" there in the sentence
> does not make sense.  When somebody is attacking you, you do not
> want to use fsck. to ignore it.
>
> But that becomes a moot point, if we were to follow the line of
> rewrite I suggested above.

I'll try to clarify this, but I think we really should have some bit
there about historical tools. Realistically no new git tools produce
these, so the user needs to be made aware of what the trade-off of
turning these on is.

The reality of that is that these objects are exceedingly rare, and
mostly found in various old repositories. Something like that need to
be mentioned so the user can weigh the trade-off of turning this on.


Re: [PATCH v2 2/5] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-28 Thread Junio C Hamano
Eric Sunshine  writes:

>> @@ -1554,23 +1554,42 @@ filter..smudge::
>> +Depending on the circumstances it might be better to use
>> +`fsck.skipList` instead to explicitly whitelist those objects that
>> +have issues, otherwise new occurrences of the same issue will be

I had to read the "instead to ..." part three times before
convincing myself that this is a good description, and I agree with
the assessment.  Perhaps we would want to make the recommendation a
bit stronger, even.

In general, it is better to enumerate existing objects with
problems with skipList, instead of listing the kind of
breakages these problematic objects share to be ignored, as
doing the latter will allow new instances of the same
breakages go unnoticed.

If the project has some tool constraints and have to accept new
"broken" objects on ongoing basis, then fsck. facility may
make sense, but that is probably a very narrow special use case.

>> +hidden going forward, although that's unlikely to happen in practice
>> +unless someone is being deliberately malicious.
>
> Is it worth mentioning buggy tools and/or other buggy Git
> implementations as sources?

Or old Git implementations we ourselves shipped.  I do not think it
is worth mentioning it, but I do agree that "unless somebody is
being deliberatly malicious" is misleading, if that is what you are
getting at.  One use of fsck is about noticing that you are under
attack, so "unless someone is being malicious" there in the sentence
does not make sense.  When somebody is attacking you, you do not
want to use fsck. to ignore it.

But that becomes a moot point, if we were to follow the line of
rewrite I suggested above.


Re: [PATCH v2 2/5] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-25 Thread Eric Sunshine
On Fri, May 25, 2018 at 3:28 PM, Ævar Arnfjörð Bjarmason
 wrote:
> The documentation for the fsck. and receive.fsck.
> variables was mostly duplicated in two places, with fsck.
> making no mention of the corresponding receive.fsck., and the
> same for fsck.skipList.
> [...]
> Rectify this situation by describing the feature in general terms
> under the fsck.* documentation, and make the receive.fsck.*
> documentation refer to those variables instead.
> [...]
> Signed-off-by: Ævar Arnfjörð Bjarmason 
> ---
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> @@ -1554,23 +1554,42 @@ filter..smudge::
> +Depending on the circumstances it might be better to use
> +`fsck.skipList` instead to explicitly whitelist those objects that
> +have issues, otherwise new occurrences of the same issue will be
> +hidden going forward, although that's unlikely to happen in practice
> +unless someone is being deliberately malicious.

Is it worth mentioning buggy tools and/or other buggy Git
implementations as sources?

>  fsck.skipList::
> The path to a sorted list of object names (i.e. one SHA-1 per
> -   line) that are known to be broken in a non-fatal way and should
> -   be ignored. This feature is useful when an established project
> -   should be accepted despite early commits containing errors that
> -   can be safely ignored such as invalid committer email addresses.
> -   Note: corrupt objects cannot be skipped with this setting.
> +   line) that are known to be broken in a non-fatal way and
> +   should be ignored. This feature is useful when an established
> +   project should be accepted despite early commits containing
> +   errors that can be safely ignored such as invalid committer
> +   email addresses. Note: corrupt objects cannot be skipped with
> +   this setting.

Not sure why this paragraph got re-wrapped (without, as far as I can
see, any other changes), thus making the patch unnecessarily noisy.

> +Like `fsck.` this variable has a corresponding
> +`receive.fsck.skipList` variant.


[PATCH v2 2/5] config doc: unify the description of fsck.* and receive.fsck.*

2018-05-25 Thread Ævar Arnfjörð Bjarmason
The documentation for the fsck. and receive.fsck.
variables was mostly duplicated in two places, with fsck.
making no mention of the corresponding receive.fsck., and the
same for fsck.skipList.

I spent quite a lot of time today wondering why setting the
fsck. variant wasn't working to clone a legacy repository (not
that that would have worked anyway, but a subsequent patch implements
fetch.fsck.).

Rectify this situation by describing the feature in general terms
under the fsck.* documentation, and make the receive.fsck.*
documentation refer to those variables instead.

This documentation was initially added in 2becf00ff7 ("fsck: support
demoting errors to warnings", 2015-06-22) and 4b55b9b479 ("fsck:
document the new receive.fsck. options", 2015-06-22).

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 Documentation/config.txt | 73 ++--
 1 file changed, 41 insertions(+), 32 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 623dffd198..af7311e73f 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -1554,23 +1554,42 @@ filter..smudge::
linkgit:gitattributes[5] for details.
 
 fsck.::
-   Allows overriding the message type (error, warn or ignore) of a
-   specific message ID such as `missingEmail`.
-+
-For convenience, fsck prefixes the error/warning with the message ID,
-e.g.  "missingEmail: invalid author/committer line - missing email" means
-that setting `fsck.missingEmail = ignore` will hide that issue.
-+
-This feature is intended to support working with legacy repositories
-which cannot be repaired without disruptive changes.
+   During fsck git may find issues with legacy data which
+   wouldn't be generated by current versions of git, and which
+   wouldn't be sent over the wire if `transfer.fsckObjects` was
+   set. This feature is intended to support working with legacy
+   repositories containing such data.
++
+Setting `fsck.` will be picked up by linkgit:git-fsck[1], but
+to accept pushes of such data set `receive.fsck.` instead. The
+rest of the documentation discusses `fsck.*` for brevity, but the same
+applies for the corresponding `receive.fsck.*` variables.
++
+When `fsck.` is set, errors can be switched to warnings and
+vice versa by configuring the `fsck.` setting where the
+`` is the fsck message ID and the value is one of `error`,
+`warn` or `ignore`. For convenience, fsck prefixes the error/warning
+with the message ID, e.g. "missingEmail: invalid author/committer line
+- missing email" means that setting `fsck.missingEmail = ignore` will
+hide that issue.
++
+Depending on the circumstances it might be better to use
+`fsck.skipList` instead to explicitly whitelist those objects that
+have issues, otherwise new occurrences of the same issue will be
+hidden going forward, although that's unlikely to happen in practice
+unless someone is being deliberately malicious.
 
 fsck.skipList::
The path to a sorted list of object names (i.e. one SHA-1 per
-   line) that are known to be broken in a non-fatal way and should
-   be ignored. This feature is useful when an established project
-   should be accepted despite early commits containing errors that
-   can be safely ignored such as invalid committer email addresses.
-   Note: corrupt objects cannot be skipped with this setting.
+   line) that are known to be broken in a non-fatal way and
+   should be ignored. This feature is useful when an established
+   project should be accepted despite early commits containing
+   errors that can be safely ignored such as invalid committer
+   email addresses. Note: corrupt objects cannot be skipped with
+   this setting.
++
+Like `fsck.` this variable has a corresponding
+`receive.fsck.skipList` variant.
 
 gc.aggressiveDepth::
The depth parameter used in the delta compression
@@ -2849,26 +2868,16 @@ receive.fsckObjects::
`transfer.fsckObjects` is used instead.
 
 receive.fsck.::
-   When `receive.fsckObjects` is set to true, errors can be switched
-   to warnings and vice versa by configuring the `receive.fsck.`
-   setting where the `` is the fsck message ID and the value
-   is one of `error`, `warn` or `ignore`. For convenience, fsck prefixes
-   the error/warning with the message ID, e.g. "missingEmail: invalid
-   author/committer line - missing email" means that setting
-   `receive.fsck.missingEmail = ignore` will hide that issue.
-+
-This feature is intended to support working with legacy repositories
-which would not pass pushing when `receive.fsckObjects = true`, allowing
-the host to accept repositories with certain known issues but still catch
-other issues.
+   Acts like `fsck.`, but is used by
+   linkgit:git-receive-pack[1] instead of
+   linkgit:git-fsck[1]. See the `fsck.` documentation for
+   details.
 
 receive.fsck.skipList::
-