Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-13 Thread Ramsay Jones



On 13/07/18 20:46, Jeff King wrote:
> On Fri, Jul 13, 2018 at 03:41:19PM -0400, Jeff King wrote:
> 
>>>   not ok 18 - push rejects corrupt .gitmodules (policy)
>>>   # 
>>>   # rm -rf dst.git &&
>>>   # git init --bare dst.git &&
>>>   # git -C dst.git config transfer.fsckObjects true &&
>>>   # git -C dst.git config fsck.gitmodulesParse error &&
>>>   # test_must_fail git -C corrupt push ../dst.git HEAD 2>output &&
>>>   # grep gitmodulesParse output &&
>>>   # test_i18ngrep ! "bad config" output
>>
>> There are separate config slots for local fsck versus receiving objects.
>> So I think you need to be setting receive.fsck.gitmodulesParse.
> 
> I confirmed that s/fsck/receive.fsck/ in your patch makes the tests
> pass.

Doh! Thanks for catching my stupid mistake! I was rushing a bit
just before going out (yes, I'm going to be late now!).

> I didn't bother adding extra push tests in the patch I just sent, since
> upgrading of config error classes is already covered elsewhere in t5504.

yeah, I like to 'test' by adding tests if I can (makes repeating
the steps less effort ...). So, I was just 'showing my working',
as it were.

> That said, I'm not opposed to adding more tests on top even if they are
> slightly redundant (well, not redundant if you're into black-box
> testing, but our current tests are usually written with an assumption of
> where the module boundaries are, and what is likely to be a problem).

I don't mind either way. I will let you and Junio decide.

Thanks!

ATB,
Ramsay Jones




Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-13 Thread Jeff King
On Fri, Jul 13, 2018 at 03:41:19PM -0400, Jeff King wrote:

> >   not ok 18 - push rejects corrupt .gitmodules (policy)
> >   # 
> >   # rm -rf dst.git &&
> >   # git init --bare dst.git &&
> >   # git -C dst.git config transfer.fsckObjects true &&
> >   # git -C dst.git config fsck.gitmodulesParse error &&
> >   # test_must_fail git -C corrupt push ../dst.git HEAD 2>output &&
> >   # grep gitmodulesParse output &&
> >   # test_i18ngrep ! "bad config" output
> 
> There are separate config slots for local fsck versus receiving objects.
> So I think you need to be setting receive.fsck.gitmodulesParse.

I confirmed that s/fsck/receive.fsck/ in your patch makes the tests
pass.

I didn't bother adding extra push tests in the patch I just sent, since
upgrading of config error classes is already covered elsewhere in t5504.

That said, I'm not opposed to adding more tests on top even if they are
slightly redundant (well, not redundant if you're into black-box
testing, but our current tests are usually written with an assumption of
where the module boundaries are, and what is likely to be a problem).

-Peff


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-13 Thread Jeff King
On Fri, Jul 13, 2018 at 08:37:46PM +0100, Ramsay Jones wrote:

> OK, so I found some time to test this tonight. It is not good
> news (assuming that I haven't messed up the testing, of course). :(

I think you may have. :)

>   not ok 18 - push rejects corrupt .gitmodules (policy)
>   #   
>   #   rm -rf dst.git &&
>   #   git init --bare dst.git &&
>   #   git -C dst.git config transfer.fsckObjects true &&
>   #   git -C dst.git config fsck.gitmodulesParse error &&
>   #   test_must_fail git -C corrupt push ../dst.git HEAD 2>output &&
>   #   grep gitmodulesParse output &&
>   #   test_i18ngrep ! "bad config" output

There are separate config slots for local fsck versus receiving objects.
So I think you need to be setting receive.fsck.gitmodulesParse.

-Peff


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-13 Thread Jeff King
On Wed, Jul 11, 2018 at 08:31:34PM +0100, Ramsay Jones wrote:

> >> Simply, I have found (for many different reasons) that, if there
> >> is no good reason to execute some code, it is _far_ better to not
> >> do so! ;-)
> > 
> > Heh. I also agree with that as a guiding principle. But I _also_ like
> > the principle of "if you do not need to do add this code, do not add
> > it". So the two are a little at odds here. :)
> 
> I agree with that also! ;-) However, in this case, I can't
> imagine having to do less, to do nothing - if you see what
> I mean! So, I think "don't execute code you don't need to"
> trumps "don't add code you don't need to" here.

Fair enough. I'm OK with it either way, then.

> > @@ -76,6 +75,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
> > FUNC(NUL_IN_COMMIT, WARN) \
> > /* infos (reported as warnings, but ignored by default) */ \
> > FUNC(BAD_TAG_NAME, INFO) \
> > +   FUNC(GITMODULES_PARSE, INFO) \
> > FUNC(MISSING_TAGGER_ENTRY, INFO)
> >  
> >  #define MSG_ID(id, msg_type) FSCK_MSG_##id,
> > 
> 
> So, just squinting at this in my email client, if this allowed
> a push/fetch to succeed (along with an 'info' message), while
> providing an admin the means to configure it to loudly deny
> the push/fetch - then I think we have a winner! ;-)
> 
> Sorry for not testing the patch.

No problem. I didn't get back to it until today. And indeed, the patch
works as advertised, but there's one additional bit needed (in the
preparatory patch below).

So here's what I came up with, which I think is pretty reasonable. The
commit message for the second one is quite long, but I tried to lay out
the pros and cons from our discussion. And I think what we discussed
here may end up being the blueprint for how we consider similar cases in
the future, so I tried to be exhaustive.

I built these two on top of my earlier four patches (which Junio has
queued as jk/fsck-gitmodules-gently). The code change itself is
orthogonal to silencing the config code, but I built on top of the test
added earlier. If we want to back-port this to v2.17 or earlier, I can
build it the other way around.

If this is what we decide to do upstream, then I'll lobby to flip
GitHub's defaults to match. Other hosters (especially ones using other
implementations) may consider doing the same.

  [1/2]: fsck: split ".gitmodules too large" error from parse failure
  [2/2]: fsck: downgrade gitmodulesParse default to "info"

 fsck.c | 5 +++--
 t/t7415-submodule-names.sh | 2 +-
 2 files changed, 4 insertions(+), 3 deletions(-)

-Peff


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-13 Thread Ramsay Jones



On 11/07/18 20:31, Ramsay Jones wrote:
> On 07/07/18 02:32, Jeff King wrote:
[snip]
>> Hmm, we seem to have "info" these days, so maybe that would do what I
>> want. I.e., I wonder if the patch below does everything we'd want. It's
>> late here and I probably won't get back to this until Monday, but you
>> may want to play with it in the meantime.
> 
> Sorry, I've been busy with other things and have not had the
> time to try the patch below (still trying to catch up with
> the mailing-list emails!).
> 
>> diff --git a/fsck.c b/fsck.c
>> index 48e7e36869..0b0003055e 100644
>> --- a/fsck.c
>> +++ b/fsck.c
>> @@ -61,7 +61,6 @@ static struct oidset gitmodules_done = OIDSET_INIT;
>>  FUNC(ZERO_PADDED_DATE, ERROR) \
>>  FUNC(GITMODULES_MISSING, ERROR) \
>>  FUNC(GITMODULES_BLOB, ERROR) \
>> -FUNC(GITMODULES_PARSE, ERROR) \
>>  FUNC(GITMODULES_NAME, ERROR) \
>>  FUNC(GITMODULES_SYMLINK, ERROR) \
>>  /* warnings */ \
>> @@ -76,6 +75,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
>>  FUNC(NUL_IN_COMMIT, WARN) \
>>  /* infos (reported as warnings, but ignored by default) */ \
>>  FUNC(BAD_TAG_NAME, INFO) \
>> +FUNC(GITMODULES_PARSE, INFO) \
>>  FUNC(MISSING_TAGGER_ENTRY, INFO)
>>  
>>  #define MSG_ID(id, msg_type) FSCK_MSG_##id,
>>
> 
> So, just squinting at this in my email client, if this allowed
> a push/fetch to succeed (along with an 'info' message), while
> providing an admin the means to configure it to loudly deny
> the push/fetch - then I think we have a winner! ;-)
> 
> Sorry for not testing the patch.

OK, so I found some time to test this tonight. It is not good
news (assuming that I haven't messed up the testing, of course). :(

On top of 'pu' (@9026cfc855), I reverted commit d4c5675233
("fsck: silence stderr when parsing .gitmodules", 2018-06-28) and
added the patch given below.

Unfortunately, the final test fails, thus:

  $ cd t
  $ ./t7415-submodule-names.sh
  ok 1 - check names
  ok 2 - create innocent subrepo
  ok 3 - submodule add refuses invalid names
  ok 4 - add evil submodule
  ok 5 - add other submodule
  ok 6 - clone evil superproject
  ok 7 - fsck detects evil superproject
  ok 8 - transfer.fsckObjects detects evil superproject (unpack)
  ok 9 - transfer.fsckObjects detects evil superproject (index)
  ok 10 - create oddly ordered pack
  ok 11 - transfer.fsckObjects handles odd pack (unpack)
  ok 12 - transfer.fsckObjects handles odd pack (index)
  ok 13 - index-pack --strict works for non-repo pack
  ok 14 - fsck detects symlinked .gitmodules file
  ok 15 - fsck detects non-blob .gitmodules
  ok 16 - fsck detects corrupt .gitmodules
  ok 17 - push warns about corrupt .gitmodules
  not ok 18 - push rejects corrupt .gitmodules (policy)
  # 
  # rm -rf dst.git &&
  # git init --bare dst.git &&
  # git -C dst.git config transfer.fsckObjects true &&
  # git -C dst.git config fsck.gitmodulesParse error &&
  # test_must_fail git -C corrupt push ../dst.git HEAD 2>output &&
  # grep gitmodulesParse output &&
  # test_i18ngrep ! "bad config" output
  # 
  # failed 1 among 18 test(s)
  1..18
  $ 

i.e. the test_must_fail doesn't! ;-)

I have to go out now, but I will hopefully take a look at
this again tomorrow. (Do the test additions look OK?)

ATB,
Ramsay Jones

-- >8 --
Subject: [PATCH] WIP: try jeff's last patch

Signed-off-by: Ramsay Jones 
---
 fsck.c |  2 +-
 t/t7415-submodule-names.sh | 26 ++
 2 files changed, 27 insertions(+), 1 deletion(-)

diff --git a/fsck.c b/fsck.c
index 17b3a51fa..b74856cee 100644
--- a/fsck.c
+++ b/fsck.c
@@ -64,7 +64,6 @@ static struct oidset gitmodules_done = OIDSET_INIT;
FUNC(ZERO_PADDED_DATE, ERROR) \
FUNC(GITMODULES_MISSING, ERROR) \
FUNC(GITMODULES_BLOB, ERROR) \
-   FUNC(GITMODULES_PARSE, ERROR) \
FUNC(GITMODULES_NAME, ERROR) \
FUNC(GITMODULES_SYMLINK, ERROR) \
/* warnings */ \
@@ -79,6 +78,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
FUNC(NUL_IN_COMMIT, WARN) \
/* infos (reported as warnings, but ignored by default) */ \
FUNC(BAD_TAG_NAME, INFO) \
+   FUNC(GITMODULES_PARSE, INFO) \
FUNC(MISSING_TAGGER_ENTRY, INFO)
 
 #define MSG_ID(id, msg_type) FSCK_MSG_##id,
diff --git a/t/t7415-submodule-names.sh b/t/t7415-submodule-names.sh
index ba8af785a..ef9535a44 100755
--- a/t/t7415-submodule-names.sh
+++ b/t/t7415-submodule-names.sh
@@ -185,10 +185,36 @@ test_expect_success 'fsck detects corrupt .gitmodules' '
git add .gitmodules &&
git commit -m "broken gitmodules" &&
 
+   # issues an "info" warning, but does not fail
+   git fsck 2>output &&
+   grep gitmodulesParse output &&
+   test_i18ngrep ! "bad config" output &&
+
+   # up-rate gitmodulesParse to error
+   git 

Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-11 Thread Ramsay Jones



On 07/07/18 02:32, Jeff King wrote:
[snip]
>> I'm not interested in any savings - it would have to be a pretty
>> wacky repo for there to be much in the way of savings!
>>
>> Simply, I have found (for many different reasons) that, if there
>> is no good reason to execute some code, it is _far_ better to not
>> do so! ;-)
> 
> Heh. I also agree with that as a guiding principle. But I _also_ like
> the principle of "if you do not need to do add this code, do not add
> it". So the two are a little at odds here. :)

I agree with that also! ;-) However, in this case, I can't
imagine having to do less, to do nothing - if you see what
I mean! So, I think "don't execute code you don't need to"
trumps "don't add code you don't need to" here.

[snip]
> What next? I was kind of leaning towards loosening, but it sounded like
> Junio thought the opposite. One thing I didn't like about the patch I
> sent earlier is that it removes the option for the admin to say "no, I
> really do want to enforce this". I don't know if that's of value or not.

Yes, it would be good to let the admin set the policy.

> In an ideal world, I think we'd detect the problem and then react
> according to the receiver's receive.fsck.* config. And we could just
> flip the default for receive.fsck.gitmodulesParse to "warning". But
> IIRC, the fsck code in index-pack does not bother distinguishing between
> warnings and errors. I think it ought to, but that's too big a change to
> go on maint.
> 
> It _might_ work to just flip the default to "ignore". That leaves the
> paranoid admin with a way to re-enable it if they want, and I _think_ it
> would be respected by index-pack.

Ah, that would be good, if it works.

> [goes and looks at the code]
> 
> Hmm, we seem to have "info" these days, so maybe that would do what I
> want. I.e., I wonder if the patch below does everything we'd want. It's
> late here and I probably won't get back to this until Monday, but you
> may want to play with it in the meantime.

Sorry, I've been busy with other things and have not had the
time to try the patch below (still trying to catch up with
the mailing-list emails!).

> diff --git a/fsck.c b/fsck.c
> index 48e7e36869..0b0003055e 100644
> --- a/fsck.c
> +++ b/fsck.c
> @@ -61,7 +61,6 @@ static struct oidset gitmodules_done = OIDSET_INIT;
>   FUNC(ZERO_PADDED_DATE, ERROR) \
>   FUNC(GITMODULES_MISSING, ERROR) \
>   FUNC(GITMODULES_BLOB, ERROR) \
> - FUNC(GITMODULES_PARSE, ERROR) \
>   FUNC(GITMODULES_NAME, ERROR) \
>   FUNC(GITMODULES_SYMLINK, ERROR) \
>   /* warnings */ \
> @@ -76,6 +75,7 @@ static struct oidset gitmodules_done = OIDSET_INIT;
>   FUNC(NUL_IN_COMMIT, WARN) \
>   /* infos (reported as warnings, but ignored by default) */ \
>   FUNC(BAD_TAG_NAME, INFO) \
> + FUNC(GITMODULES_PARSE, INFO) \
>   FUNC(MISSING_TAGGER_ENTRY, INFO)
>  
>  #define MSG_ID(id, msg_type) FSCK_MSG_##id,
> 

So, just squinting at this in my email client, if this allowed
a push/fetch to succeed (along with an 'info' message), while
providing an admin the means to configure it to loudly deny
the push/fetch - then I think we have a winner! ;-)

Sorry for not testing the patch.

ATB,
Ramsay Jones



Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-06 Thread Jeff King
On Wed, Jul 04, 2018 at 01:12:40AM +0100, Ramsay Jones wrote:

> > True that we don't even bother doing the parsing with your patch. But I
> > think I talked myself out of that part being a significant savings
> > elsewhere.
> 
> [Sorry for the late reply - watching football again!]
> 
> I'm not interested in any savings - it would have to be a pretty
> wacky repo for there to be much in the way of savings!
> 
> Simply, I have found (for many different reasons) that, if there
> is no good reason to execute some code, it is _far_ better to not
> do so! ;-)

Heh. I also agree with that as a guiding principle. But I _also_ like
the principle of "if you do not need to do add this code, do not add
it". So the two are a little at odds here. :)

> > I'm not sure. This has been running on every push to GitHub for the past
> > 6 weeks, and this is the first report. It's hard to say what that means,
> > and technically speaking of course this _is_ a regression.
> 
> Hmm, are you concerned about old clients 'transmitting' the
> bad data via large hosting sites? (New clients would complain
> about a dodgy .gitmodules file, no matter were it came from,
> right?)

I just meant above that anybody with a broken .gitmodules would have had
their push rejected, and we haven't gotten any such reports beyond yours
and Mike's. So that's some evidence that it's relatively rare.

As far as why those fsck checks are there in the first place, yes, it's
about transmitting bad data to unpatched clients. Ultimately the
responsibility for not being vulnerable is on the clients themselves.
But in practice large hosting sites can help by not being vectors.

> Has the definition of the config file syntax changed recently?
> If not, then old client versions will see the same syntax errors
> as recently 'fixed' versions. So they should error out without
> 'looking' at the bad data, right? (ignoring the 'lets fix the
> obvious syntax error' idea).

No, it hasn't change. So as far as I know, loosening the syntax check
does not impact _this_ vulnerability, because it requires a file that
can be parsed, and the parser is the same for both cases. It might
affect future vulnerabilities. We could also tighten when those come to
light, or tighten in a way that blocks the specific bug. But there's
potentially some value in putting our foot down _now_ and saying that
we're going to enforce certain properties of special files, before it
gets any further out-of-hand.

And of course I could be wrong. We do occasionally fix bugs in the
parser, so I won't be surprised if there's some obscure case where git
<2.0 would not be protected or something like that. But frankly,
anything that old is probably already vulnerable to other ancient bugs,
too.

> > Thanks. If we're going to do any loosening, I think we may want to
> > address that _first_, so it can go directly on top of the patches in
> > v2.17.1 (because it's a bigger issue than the stray message, IMHO).
> 
> Agreed. I probably haven't given it sufficient thought, but my
> immediate reaction is to loosen the check - I don't see how
> this could be exploited to significantly reduce security. (My lack
> of imagination has been noted several times in the past, however!)
> 
> Having said that, I am no security expert, so I will let those who
> have security expertise decide what is best to do in this situation.

I think you have a grasp on the situation from what you wrote above.

What next? I was kind of leaning towards loosening, but it sounded like
Junio thought the opposite. One thing I didn't like about the patch I
sent earlier is that it removes the option for the admin to say "no, I
really do want to enforce this". I don't know if that's of value or not.

In an ideal world, I think we'd detect the problem and then react
according to the receiver's receive.fsck.* config. And we could just
flip the default for receive.fsck.gitmodulesParse to "warning". But
IIRC, the fsck code in index-pack does not bother distinguishing between
warnings and errors. I think it ought to, but that's too big a change to
go on maint.

It _might_ work to just flip the default to "ignore". That leaves the
paranoid admin with a way to re-enable it if they want, and I _think_ it
would be respected by index-pack.

[goes and looks at the code]

Hmm, we seem to have "info" these days, so maybe that would do what I
want. I.e., I wonder if the patch below does everything we'd want. It's
late here and I probably won't get back to this until Monday, but you
may want to play with it in the meantime.

diff --git a/fsck.c b/fsck.c
index 48e7e36869..0b0003055e 100644
--- a/fsck.c
+++ b/fsck.c
@@ -61,7 +61,6 @@ static struct oidset gitmodules_done = OIDSET_INIT;
FUNC(ZERO_PADDED_DATE, ERROR) \
FUNC(GITMODULES_MISSING, ERROR) \
FUNC(GITMODULES_BLOB, ERROR) \
-   FUNC(GITMODULES_PARSE, ERROR) \
FUNC(GITMODULES_NAME, ERROR) \
FUNC(GITMODULES_SYMLINK, ERROR) \
/* warnings */ \
@@ -76,6 

Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-03 Thread Ramsay Jones



On 03/07/18 15:34, Jeff King wrote:
> On Fri, Jun 29, 2018 at 02:10:59AM +0100, Ramsay Jones wrote:
> 
>> On 28/06/18 23:03, Jeff King wrote:
>>> On Thu, Jun 28, 2018 at 07:53:27PM +0100, Ramsay Jones wrote:
>> [snip]
>>> Yes, it can go in quickly. But I'd prefer not to keep it in the long
>>> term if it's literally doing nothing.
>>
>> Hmm, I don't think you can say its doing nothing!
>>
>> "Yeah, this solution seems sensible. Given that we would
>>  never report any error for that blob, there is no point
>>  in even looking at it."
>>
>> ... is no less true, with or without additional patches! ;-)
> 
> True that we don't even bother doing the parsing with your patch. But I
> think I talked myself out of that part being a significant savings
> elsewhere.

[Sorry for the late reply - watching football again!]

I'm not interested in any savings - it would have to be a pretty
wacky repo for there to be much in the way of savings!

Simply, I have found (for many different reasons) that, if there
is no good reason to execute some code, it is _far_ better to not
do so! ;-)

> I guess it would be OK to leave it in. It just feels like it would be
> vestigial after the rest of the patches.
> 
[snip]

>>> Yes, it would include any syntax error. I also have a slight worry about
>>> that, but nobody seems to have screamed _yet_. :)
>>
>> Hmm, I don't think we can ignore this. :(
> 
> I'm not sure. This has been running on every push to GitHub for the past
> 6 weeks, and this is the first report. It's hard to say what that means,
> and technically speaking of course this _is_ a regression.

Hmm, are you concerned about old clients 'transmitting' the
bad data via large hosting sites? (New clients would complain
about a dodgy .gitmodules file, no matter were it came from,
right?)

Has the definition of the config file syntax changed recently?
If not, then old client versions will see the same syntax errors
as recently 'fixed' versions. So they should error out without
'looking' at the bad data, right? (ignoring the 'lets fix the
obvious syntax error' idea).

> There's a nearby thread of interest, too, which I cc'd you on:
> 
>   https://public-inbox.org/git/20180703070650.b3drk5a6kb4k4...@glandium.org/

Yeah, I don't quite follow what's going on there - I would have
to read up some more. ;-)

>> So, FWIW, Ack!
>>
>> [I still think my original patch, with the 'to_be_skipped'
>> function name changed to 'object_on_skiplist', should be
>> the first patch of the series!]
> 
> Thanks. If we're going to do any loosening, I think we may want to
> address that _first_, so it can go directly on top of the patches in
> v2.17.1 (because it's a bigger issue than the stray message, IMHO).

Agreed. I probably haven't given it sufficient thought, but my
immediate reaction is to loosen the check - I don't see how
this could be exploited to significantly reduce security. (My lack
of imagination has been noted several times in the past, however!)

Having said that, I am no security expert, so I will let those who
have security expertise decide what is best to do in this situation.

Thanks!

ATB,
Ramsay Jones






Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-07-03 Thread Jeff King
On Fri, Jun 29, 2018 at 02:10:59AM +0100, Ramsay Jones wrote:

> On 28/06/18 23:03, Jeff King wrote:
> > On Thu, Jun 28, 2018 at 07:53:27PM +0100, Ramsay Jones wrote:
> [snip]
> > Yes, it can go in quickly. But I'd prefer not to keep it in the long
> > term if it's literally doing nothing.
> 
> Hmm, I don't think you can say its doing nothing!
> 
> "Yeah, this solution seems sensible. Given that we would
>  never report any error for that blob, there is no point
>  in even looking at it."
> 
> ... is no less true, with or without additional patches! ;-)

True that we don't even bother doing the parsing with your patch. But I
think I talked myself out of that part being a significant savings
elsewhere.

I guess it would be OK to leave it in. It just feels like it would be
vestigial after the rest of the patches.

> > I have some patches which I think solve your problem. They apply on
> > v2.18.0, but not on v2.17.1 (because they rely on Dscho's increased
> > passing of config_options in v2.18). Is that good enough?
> 
> Heh, I was also writing patches to address this tonight (but
> I was also watching the football, so I was somewhat behind you).
> My patches were not too dissimilar to yours, except I was aiming
> to allow even do_config_from_file() etc., to suppress errors.

I think this should work via do_config_from_file(). The thing it really
misses is that git_config_with_options() will not respect it, but the
handling of options there is already a bug (well, I don't think there's
anything triggerable either before or after my patches, but it feels
like a bug waiting to happen).

> Your patches were cleaner and more focused than mine. (Instead of
> turning die_on_error into an enum, I added an additional 'quiet'
> flag. When pushing the stack (eg. for include files), I had to
> copy the quiet flag from the parent struct, etc, ... ;-) ).

Yes, I think that's what you have to do pre-v2.18, where we don't pass
the options struct around.

> > Yes, it would include any syntax error. I also have a slight worry about
> > that, but nobody seems to have screamed _yet_. :)
> 
> Hmm, I don't think we can ignore this. :(

I'm not sure. This has been running on every push to GitHub for the past
6 weeks, and this is the first report. It's hard to say what that means,
and technically speaking of course this _is_ a regression.

There's a nearby thread of interest, too, which I cc'd you on:

  https://public-inbox.org/git/20180703070650.b3drk5a6kb4k4...@glandium.org/

> > Here are the patches I came up with.
> 
> Yes, I applied these locally and tested them. All OK here.
> 
> So, FWIW, Ack!
> 
> [I still think my original patch, with the 'to_be_skipped'
> function name changed to 'object_on_skiplist', should be
> the first patch of the series!]

Thanks. If we're going to do any loosening, I think we may want to
address that _first_, so it can go directly on top of the patches in
v2.17.1 (because it's a bigger issue than the stray message, IMHO).

-Peff


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-28 Thread Ramsay Jones



On 28/06/18 23:03, Jeff King wrote:
> On Thu, Jun 28, 2018 at 07:53:27PM +0100, Ramsay Jones wrote:
[snip]
> Yes, it can go in quickly. But I'd prefer not to keep it in the long
> term if it's literally doing nothing.

Hmm, I don't think you can say its doing nothing!

"Yeah, this solution seems sensible. Given that we would
 never report any error for that blob, there is no point
 in even looking at it."

... is no less true, with or without additional patches! ;-)

> I have some patches which I think solve your problem. They apply on
> v2.18.0, but not on v2.17.1 (because they rely on Dscho's increased
> passing of config_options in v2.18). Is that good enough?

Heh, I was also writing patches to address this tonight (but
I was also watching the football, so I was somewhat behind you).
My patches were not too dissimilar to yours, except I was aiming
to allow even do_config_from_file() etc., to suppress errors.

Your patches were cleaner and more focused than mine. (Instead of
turning die_on_error into an enum, I added an additional 'quiet'
flag. When pushing the stack (eg. for include files), I had to
copy the quiet flag from the parent struct, etc, ... ;-) ).

> Yes, it would include any syntax error. I also have a slight worry about
> that, but nobody seems to have screamed _yet_. :)

Hmm, I don't think we can ignore this. :(

> Here are the patches I came up with.

Yes, I applied these locally and tested them. All OK here.

So, FWIW, Ack!

[I still think my original patch, with the 'to_be_skipped'
function name changed to 'object_on_skiplist', should be
the first patch of the series!]

ATB,
Ramsay Jones


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-28 Thread Jeff King
On Thu, Jun 28, 2018 at 07:53:27PM +0100, Ramsay Jones wrote:

> > Yes, though I don't think it's too bad. We already have a "die_on_error"
> > flag in the config code. I think it just needs to become a tristate:
> > die/error/silent (and probably get passed in via config_options, since I
> > think we tie it right now to the file/blob source).
> 
> Yes, but this code is already very crufty - and I'm just
> waiting for someone to want to have per-repo/submodule
> config parsing i... ;-)

It is crufty, but I think we actually handle that part OK; the flag gets
attached to the stack.

> > Hmm, if we end up doing the config thing above, then this patch would
> > become unnecessary.
> 
> I was thinking of timing - the current patch could go
> in quickly to solve the immediate problem (eg. in maint).
> 
> Also, it does not hurt to do this _as well as_ suppress
> the config errors.

Yes, it can go in quickly. But I'd prefer not to keep it in the long
term if it's literally doing nothing.

I have some patches which I think solve your problem. They apply on
v2.18.0, but not on v2.17.1 (because they rely on Dscho's increased
passing of config_options in v2.18). Is that good enough?

> > And I think doing that would help _all_ cases, even ones without a
> > skiplist. They don't need to see random config error messages either,
> > even if we do eventually report an fsck error.
> 
> Oh, yes, I agree. You will have noticed that it was my
> first suggested solution. (I have started writing that
> patch a few times, but it just makes me want to throw
> the current code away and start again)!
> 
> Hmm, BTW, the 'rejected push' problem would include *any*
> '.gitmodules' blob that contained a syntax error, right?
> 
> Maybe it won't be as rare as all that! (Imagine not being
> able to push due to a compiler error/warning in source files.
> How irritating would that be! :-P ).

Yes, it would include any syntax error. I also have a slight worry about
that, but nobody seems to have screamed _yet_. :)

> (if we fix this, you could hide some nefarious settings
> after an obvious syntax error - then get the victim to
> fix the syntax error ...)

You can also usually get the victim to type "make", which is even
simpler. :)

Here are the patches I came up with.

Note that the config_options struct has a bit of a dual-nature. Some
options are respected only via config_from_options(), and some only from
git_config_from_file(). I think this should probably be remedied in the
long run, but I stopped here in the interest of keeping this
maint-worthy.

  [1/4]: config: turn die_on_error into caller-facing enum
  [2/4]: config: add CONFIG_ERROR_SILENT handler
  [3/4]: config: add options parameter to git_config_from_mem
  [4/4]: fsck: silence stderr when parsing .gitmodules

 config.c   | 32 +++-
 config.h   | 13 +++--
 fsck.c |  4 +++-
 submodule-config.c |  2 +-
 t/t7415-submodule-names.sh | 15 +++
 5 files changed, 53 insertions(+), 13 deletions(-)

-Peff


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-28 Thread Ramsay Jones



On 28/06/18 18:45, Jeff King wrote:
> On Thu, Jun 28, 2018 at 05:56:18PM +0100, Ramsay Jones wrote:
[snip]
>>> One thing we could do is turn the parse failure into a noop. The main
>>> point of the check is to protect people against the malicious
>>> .gitmodules bug. If the file can't be parsed, then it also can't be an
>>> attack vector (assuming the parser used for the fsck check and the
>>> parser used by the victim behave identically).
>>
>> Hmm, yeah, but I would have to provide a means of suppressing
>> the config parser error messages. Something I wanted to avoid. :(
> 
> Yes, though I don't think it's too bad. We already have a "die_on_error"
> flag in the config code. I think it just needs to become a tristate:
> die/error/silent (and probably get passed in via config_options, since I
> think we tie it right now to the file/blob source).

Yes, but this code is already very crufty - and I'm just
waiting for someone to want to have per-repo/submodule
config parsing i... ;-)

>> Junio, do you want me to address the above 'rejected push'
>> issue in this patch, or with a follow-up patch? (It should
>> be a pretty rare problem - famous last words!)
> 
> Hmm, if we end up doing the config thing above, then this patch would
> become unnecessary.

I was thinking of timing - the current patch could go
in quickly to solve the immediate problem (eg. in maint).

Also, it does not hurt to do this _as well as_ suppress
the config errors.

> And I think doing that would help _all_ cases, even ones without a
> skiplist. They don't need to see random config error messages either,
> even if we do eventually report an fsck error.

Oh, yes, I agree. You will have noticed that it was my
first suggested solution. (I have started writing that
patch a few times, but it just makes me want to throw
the current code away and start again)!

Hmm, BTW, the 'rejected push' problem would include *any*
'.gitmodules' blob that contained a syntax error, right?

Maybe it won't be as rare as all that! (Imagine not being
able to push due to a compiler error/warning in source files.
How irritating would that be! :-P ).

(if we fix this, you could hide some nefarious settings
after an obvious syntax error - then get the victim to
fix the syntax error ...)

ATB,
Ramsay Jones



Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-28 Thread Jeff King
On Thu, Jun 28, 2018 at 05:56:18PM +0100, Ramsay Jones wrote:

> > Yeah, this solution seems sensible. Given that we would never report any
> > error for that blob, there is no point in even looking at it. I wonder
> > if we ought to do the same for other types, too. Is there any point in
> > opening a tree that is in the skiplist?
> 
> Note that the 'blob' object has already been 'loaded' at this
> point anyway (and the basic 'object' checks have been done).

Yeah, you're right. If we wanted to avoid that, we should prevent it
from entering the gitmodules_found list in the first place.

Of course, we'd generally still load it once anyway in order to check
the sha1. So really, the most we could do is prevent it from being
loaded a _second_ time for no reason during the fsck_finish() stage.

But for the reasons I gave in the fsck_finish() patches (like pack
ordering), we will quite often end up hitting it in the first pass
anyway. So it's probably not worth spending too much time trying to
optimize it.

And thinking on that, my "should we do this for trees" is pretty dumb,
too. We load them and compute their sha1 anyway (I _think_ bitrot checks
like that are totally independent of skiplist). So all we'd be saving is
walking the buffer entries.

> I did think about this, briefly, but decided that we should
> only 'skip' the leaf nodes (blobs). (So, when processing
> commits, trees and tags, we should not report an error for
> that object-id, but that should not stop us from checking
> the tree object associated with a commit, just because of
> a problem with the commit message).
> 
> [Oh, wait - Hmm, each object could be checked independently
> of all others and not used for any object traversal right?
> (e.g. using packfile index). I saw fsck_walk() and didn't
> look any further ... Ah, broken link check, ... I obviously
> need to read the code some more ... :-D ]

Yes, I've been confused by this code before. I'm still not sure I
totally understand it. ;)

> > One thing we could do is turn the parse failure into a noop. The main
> > point of the check is to protect people against the malicious
> > .gitmodules bug. If the file can't be parsed, then it also can't be an
> > attack vector (assuming the parser used for the fsck check and the
> > parser used by the victim behave identically).
> 
> Hmm, yeah, but I would have to provide a means of suppressing
> the config parser error messages. Something I wanted to avoid. :(

Yes, though I don't think it's too bad. We already have a "die_on_error"
flag in the config code. I think it just needs to become a tristate:
die/error/silent (and probably get passed in via config_options, since I
think we tie it right now to the file/blob source).

> Junio, do you want me to address the above 'rejected push'
> issue in this patch, or with a follow-up patch? (It should
> be a pretty rare problem - famous last words!)

Hmm, if we end up doing the config thing above, then this patch would
become unnecessary.

And I think doing that would help _all_ cases, even ones without a
skiplist. They don't need to see random config error messages either,
even if we do eventually report an fsck error.

-Peff


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-28 Thread Jeff King
On Thu, Jun 28, 2018 at 09:39:39AM -0700, Junio C Hamano wrote:

> Jeff King  writes:
> 
> > Yeah, this solution seems sensible. Given that we would never report any
> > error for that blob, there is no point in even looking at it. I wonder
> > if we ought to do the same for other types, too. Is there any point in
> > opening a tree that is in the skiplist?
> 
> Suppose the tree is listed there only because it has one entry for a
> subtree with leading 0 in its mode.  We do want to ignore that
> format violation, but we still want to learn the fact that the
> subtree it points at and its contents are connected and not
> dangling, so we do need to open it.  Is that open done in a separate
> phase?

To be honest, I'm not sure. There _is_ a separate phase for checking
reachability, but I think there may be some dependencies between the
phases. Once upon a time they were communicated by the existence of
entries in obj_hash (blech!) but I think these days it happens using a
a bit in object->flags.

There is at least one case of interest just in this phase, though: we
have to read the surrounding tree to find out that a particular blob is
a .gitmodules file. So if you skiplist'd a tree, that would also mean we
fail to mark its .gitmodules (if any) as such. I'm not sure if that
would be a bug or a feature, though.

-Peff


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-28 Thread Junio C Hamano
Ramsay Jones  writes:

> Junio, do you want me to address the above 'rejected push'
> issue in this patch, or with a follow-up patch? (It should
> be a pretty rare problem - famous last words!)

If you feel the need to say "famous last words", it is an indication
that it is better done as a follow-up, I would think ;-)

Thanks for spotting and addressing this issue.


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-28 Thread Ramsay Jones



On 28/06/18 12:49, Jeff King wrote:
> On Wed, Jun 27, 2018 at 07:39:53PM +0100, Ramsay Jones wrote:
> 
>> Since commit ed8b10f631 ("fsck: check .gitmodules content", 2018-05-02),
>> fsck will issue an error message for '.gitmodules' content that cannot
>> be parsed correctly. This is the case, even when the corresponding blob
>> object has been included on the skiplist. For example, using the cgit
>> repository, we see the following:
>>
>>   $ git fsck
>>   Checking object directories: 100% (256/256), done.
>>   error: bad config line 5 in blob .gitmodules
>>   error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: 
>> could not parse gitmodules blob
>>   Checking objects: 100% (6626/6626), done.
>>   $
>>
>>   $ git config fsck.skiplist '.git/skip'
>>   $ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip
>>   $
>>
>>   $ git fsck
>>   Checking object directories: 100% (256/256), done.
>>   error: bad config line 5 in blob .gitmodules
>>   Checking objects: 100% (6626/6626), done.
>>   $
>>
>> Note that the error message issued by the config parser is still
>> present, despite adding the object-id of the blob to the skiplist.
>>
>> One solution would be to provide a means of suppressing the messages
>> issued by the config parser. However, given that (logically) we are
>> asking fsck to ignore this object, a simpler approach is to just not
>> call the config parser if the object is to be skipped. Add a check to
>> the 'fsck_blob()' processing function, to determine if the object is
>> on the skiplist and, if so, exit the function early.
> 
> Yeah, this solution seems sensible. Given that we would never report any
> error for that blob, there is no point in even looking at it. I wonder
> if we ought to do the same for other types, too. Is there any point in
> opening a tree that is in the skiplist?

Note that the 'blob' object has already been 'loaded' at this
point anyway (and the basic 'object' checks have been done).

I did think about this, briefly, but decided that we should
only 'skip' the leaf nodes (blobs). (So, when processing
commits, trees and tags, we should not report an error for
that object-id, but that should not stop us from checking
the tree object associated with a commit, just because of
a problem with the commit message).

[Oh, wait - Hmm, each object could be checked independently
of all others and not used for any object traversal right?
(e.g. using packfile index). I saw fsck_walk() and didn't
look any further ... Ah, broken link check, ... I obviously
need to read the code some more ... :-D ]

>> I noticed recently that the 'cgit.git' repo was complaining when
>> doing an 'git fsck' ...
>>
>> Note that 'cgit' had a '.gitmodules' file and a 'submodule.sh'
>> script back in 2007, which had nothing to do with the current
>> git submodule facilities, ... ;-)
> 
> Yikes. I worried about that sort of regression when adding the
> .gitmodules checks. But this _is_ a pretty unique case: somebody was
> implementing their own version of submodules (pre-git-submodule) and
> decided to use the same name. So I'm not sure if this is a sign that we
> need to think through the regression, or a sign that it really is rare.

I don't have any numbers, but my gut tells me that this would
be very rare indeed. Of course, my gut has been wrong before ... :-D

> One thing we could do is turn the parse failure into a noop. The main
> point of the check is to protect people against the malicious
> .gitmodules bug. If the file can't be parsed, then it also can't be an
> attack vector (assuming the parser used for the fsck check and the
> parser used by the victim behave identically).

Hmm, yeah, but I would have to provide a means of suppressing
the config parser error messages. Something I wanted to avoid. :(

> That wouldn't help with your stray message, of course, but it would
> eliminate the need to deal with the skiplist here (and skiplists aren't
> always easy to do -- for instance, pushing up a non-fork of cgit to
> GitHub would now be rejected because of this old file, and you'd have to
> contact support to resolve it).

Good point.

>> I just remembered that I had intended to review the name of the
>> function that this patch introduces before sending, but totally
>> forgot! :(
>>
>> [Hmm, 'to_be_skipped' -> object_to_be_skipped, object_on_skiplist,
>> etc., ... suggestions welcome!]
> 
> The current name is OK to be, but object_on_skiplist() also seems quite
> obvious.

object_on_skiplist() it is!

Junio, do you want me to address the above 'rejected push'
issue in this patch, or with a follow-up patch? (It should
be a pretty rare problem - famous last words!)

ATB,
Ramsay Jones





Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-28 Thread Junio C Hamano
Jeff King  writes:

> Yeah, this solution seems sensible. Given that we would never report any
> error for that blob, there is no point in even looking at it. I wonder
> if we ought to do the same for other types, too. Is there any point in
> opening a tree that is in the skiplist?

Suppose the tree is listed there only because it has one entry for a
subtree with leading 0 in its mode.  We do want to ignore that
format violation, but we still want to learn the fact that the
subtree it points at and its contents are connected and not
dangling, so we do need to open it.  Is that open done in a separate
phase?


Re: [PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-28 Thread Jeff King
On Wed, Jun 27, 2018 at 07:39:53PM +0100, Ramsay Jones wrote:

> Since commit ed8b10f631 ("fsck: check .gitmodules content", 2018-05-02),
> fsck will issue an error message for '.gitmodules' content that cannot
> be parsed correctly. This is the case, even when the corresponding blob
> object has been included on the skiplist. For example, using the cgit
> repository, we see the following:
> 
>   $ git fsck
>   Checking object directories: 100% (256/256), done.
>   error: bad config line 5 in blob .gitmodules
>   error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: 
> could not parse gitmodules blob
>   Checking objects: 100% (6626/6626), done.
>   $
> 
>   $ git config fsck.skiplist '.git/skip'
>   $ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip
>   $
> 
>   $ git fsck
>   Checking object directories: 100% (256/256), done.
>   error: bad config line 5 in blob .gitmodules
>   Checking objects: 100% (6626/6626), done.
>   $
> 
> Note that the error message issued by the config parser is still
> present, despite adding the object-id of the blob to the skiplist.
> 
> One solution would be to provide a means of suppressing the messages
> issued by the config parser. However, given that (logically) we are
> asking fsck to ignore this object, a simpler approach is to just not
> call the config parser if the object is to be skipped. Add a check to
> the 'fsck_blob()' processing function, to determine if the object is
> on the skiplist and, if so, exit the function early.

Yeah, this solution seems sensible. Given that we would never report any
error for that blob, there is no point in even looking at it. I wonder
if we ought to do the same for other types, too. Is there any point in
opening a tree that is in the skiplist?

> I noticed recently that the 'cgit.git' repo was complaining when
> doing an 'git fsck' ...
> 
> Note that 'cgit' had a '.gitmodules' file and a 'submodule.sh'
> script back in 2007, which had nothing to do with the current
> git submodule facilities, ... ;-)

Yikes. I worried about that sort of regression when adding the
.gitmodules checks. But this _is_ a pretty unique case: somebody was
implementing their own version of submodules (pre-git-submodule) and
decided to use the same name. So I'm not sure if this is a sign that we
need to think through the regression, or a sign that it really is rare.

One thing we could do is turn the parse failure into a noop. The main
point of the check is to protect people against the malicious
.gitmodules bug. If the file can't be parsed, then it also can't be an
attack vector (assuming the parser used for the fsck check and the
parser used by the victim behave identically).

That wouldn't help with your stray message, of course, but it would
eliminate the need to deal with the skiplist here (and skiplists aren't
always easy to do -- for instance, pushing up a non-fork of cgit to
GitHub would now be rejected because of this old file, and you'd have to
contact support to resolve it).

> I just remembered that I had intended to review the name of the
> function that this patch introduces before sending, but totally
> forgot! :(
> 
> [Hmm, 'to_be_skipped' -> object_to_be_skipped, object_on_skiplist,
> etc., ... suggestions welcome!]

The current name is OK to be, but object_on_skiplist() also seems quite
obvious.

-Peff


[PATCH] fsck: check skiplist for object in fsck_blob()

2018-06-27 Thread Ramsay Jones


Since commit ed8b10f631 ("fsck: check .gitmodules content", 2018-05-02),
fsck will issue an error message for '.gitmodules' content that cannot
be parsed correctly. This is the case, even when the corresponding blob
object has been included on the skiplist. For example, using the cgit
repository, we see the following:

  $ git fsck
  Checking object directories: 100% (256/256), done.
  error: bad config line 5 in blob .gitmodules
  error in blob 51dd1eff1edc663674df9ab85d2786a40f7ae3a5: gitmodulesParse: 
could not parse gitmodules blob
  Checking objects: 100% (6626/6626), done.
  $

  $ git config fsck.skiplist '.git/skip'
  $ echo 51dd1eff1edc663674df9ab85d2786a40f7ae3a5 >.git/skip
  $

  $ git fsck
  Checking object directories: 100% (256/256), done.
  error: bad config line 5 in blob .gitmodules
  Checking objects: 100% (6626/6626), done.
  $

Note that the error message issued by the config parser is still
present, despite adding the object-id of the blob to the skiplist.

One solution would be to provide a means of suppressing the messages
issued by the config parser. However, given that (logically) we are
asking fsck to ignore this object, a simpler approach is to just not
call the config parser if the object is to be skipped. Add a check to
the 'fsck_blob()' processing function, to determine if the object is
on the skiplist and, if so, exit the function early.

Signed-off-by: Ramsay Jones 
---

Hi Junio,

I noticed recently that the 'cgit.git' repo was complaining when
doing an 'git fsck' ...

Note that 'cgit' had a '.gitmodules' file and a 'submodule.sh'
script back in 2007, which had nothing to do with the current
git submodule facilities, ... ;-)

Viz:

  $ git show 51dd1eff1e
  # This file maps a submodule path to an url from where the submodule
  # can be obtained. The script "submodules.sh" finds the url in this file
  # when invoked with -i to clone the submodules.

  git git://git.kernel.org/pub/scm/git/git.git
  $ 

I just remembered that I had intended to review the name of the
function that this patch introduces before sending, but totally
forgot! :(

[Hmm, 'to_be_skipped' -> object_to_be_skipped, object_on_skiplist,
etc., ... suggestions welcome!]

ATB,
Ramsay Jones


 fsck.c | 13 +++--
 1 file changed, 11 insertions(+), 2 deletions(-)

diff --git a/fsck.c b/fsck.c
index 48e7e3686..702ceb629 100644
--- a/fsck.c
+++ b/fsck.c
@@ -281,6 +281,13 @@ static void append_msg_id(struct strbuf *sb, const char 
*msg_id)
strbuf_addstr(sb, ": ");
 }
 
+static int to_be_skipped(struct fsck_options *opts, struct object *obj)
+{
+   if (opts && opts->skiplist && obj)
+   return oid_array_lookup(opts->skiplist, >oid) >= 0;
+   return 0;
+}
+
 __attribute__((format (printf, 4, 5)))
 static int report(struct fsck_options *options, struct object *object,
enum fsck_msg_id id, const char *fmt, ...)
@@ -292,8 +299,7 @@ static int report(struct fsck_options *options, struct 
object *object,
if (msg_type == FSCK_IGNORE)
return 0;
 
-   if (options->skiplist && object &&
-   oid_array_lookup(options->skiplist, >oid) >= 0)
+   if (to_be_skipped(options, object))
return 0;
 
if (msg_type == FSCK_FATAL)
@@ -963,6 +969,9 @@ static int fsck_blob(struct blob *blob, const char *buf,
return 0;
oidset_insert(_done, >object.oid);
 
+   if (to_be_skipped(options, >object))
+   return 0;
+
if (!buf) {
/*
 * A missing buffer here is a sign that the caller found the
-- 
2.18.0