Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-22 Thread Michael Haggerty
On 08/20/2014 11:47 PM, Ronnie Sahlberg wrote:
 [...]
 Since we already display broken/unresolvable refs, I think the most
 consistent path is to also allow showing the refs broken/illegal-names
 too in the list. (when DO_FOR_EACH_INCLUDE_BROKEN is specified)
 Of course, an end user could fix this by deleting the file but since
 it is easy to add the special casing to 'git branch -D' to handle this
 case I think this would be more userfriendly since then the user can
 use git branch -D regardless of the reason why the ref is broken.

My concern with this idea is that some code relies on at least some of
the reference name constraints for its proper functioning; for example,

* The ref caching code would likely be confused by ill-formed refnames
like refs/heads//foo or /refs/heads/foo or refs/heads/foo/.  (I
understand that such references cannot exist as loose refs, but they
could be represented in the packed-refs file.)

* Any code that might try to read or write a loose reference would
likely be confused by refs/heads//foo or refs/heads/./foo or
refs/heads/../foo or /refs/heads/foo or refs/heads/foo/.  On
Windows there might also be problems with refs/heads\foo or
d:refs/heads/foo or prn:refs/heads/foo or //refs/heads/foo.

* The locking code could easily be confused by a reference named
refs/heads/foo.lock.

So to the extent that we loosen the checks on refnames when they are
read, we would have to re-vet any code that touches them to make sure
that it doesn't break in a horrible (and possibly security-compromising)
way.  This is why I would prefer to quarantine broken reference names in
the smallest possible part of the code.

I *think* that the biggest problems would be related to reference names
that do not map straightforwardly to relative filenames, so an
alternative would be to do some minimal checks in any case, but make it
possible to turn off the stricter checks (those that mostly exist to
make reference expression parsing possible) when necessary.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-21 Thread Ronnie Sahlberg
On Wed, Aug 20, 2014 at 11:34 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 08/20/2014 06:28 PM, Ronnie Sahlberg wrote:
 On Wed, Aug 20, 2014 at 7:52 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 I'm a little worried that abandoning *all* refname checks could open us
 up to somehow trying to delete a reference with a name like
 ../../../../etc/passwd.  Either such names have to be prohibited
 somehow, or we have to be very sure that they can only come from trusted
 sources.

 I only set this flag from builtin/branch.c so it should only be used
 when a user runs 'git branch -D' from the command line.
 All other places where we delete branches we should still be checking
 the rename for badness.

 That said, unless the rules for good refname changes in the future,
 which is unlikely, is should be exceptionally rare that a user ends up
 with a bad refname in the first place.
 Perhaps my example I gave was bad since if you manually create bad
 refs using echo  .git/refs/heads/...  then you should probably know
 how to fix it too and thus maybe we do not need this patch in the
 first place.

 Do you want me to delete this patch and resend this part of the series
 ? Or is the 'only works for branch -D from the commandline' sufficient
 ?
 I have no strong feelings either way so I will just follow what you decide.

 I think that if you run the refname through normalize_path_copy_len()
 and that function returns (1) without an error, (2) without modifying
 its argument, and (3) the result does not begin with a
 has_dos_drive_prefix() or is_dir_sep(), then we should be safe against
 directory traversal attacks.  I suggest doing this kind of check even if
 not doing the full check_refname_format() check.

Good idea.
Let me add this.



 Michael

 --
 Michael Haggerty
 mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-20 Thread Michael Haggerty
On 07/15/2014 10:58 PM, Ronnie Sahlberg wrote:
 On Tue, Jul 15, 2014 at 12:34 PM, Ronnie Sahlberg sahlb...@google.com wrote:
 On Tue, Jul 15, 2014 at 11:04 AM, Jonathan Nieder jrnie...@gmail.com wrote:
 Michael Haggerty wrote:

 So...I like the idea of enforcing refname checks at the lowest level
 possible, but I think that the change you propose is too abrupt.  I
 think it needs either more careful analysis showing that it won't hurt
 anybody, or some kind of tooling or non-strict mode that people can use
 to fix their repositories.

 The recovery code has been broken for a while, so I don't see harm
 in this change.

 How to take care of the recovery use case is another question.  FWIW I
 also would prefer if git update-ref -d or git branch -D could be
 used to delete corrupt refs instead of having to use fsck (since a
 fsck run can take a while), but that's a question for a later series.

 In an ideal world, the low-level functions would allow *reading* and
 *deleting* poorly named refs (even without any special flag) but not
 creating them.  Is that doable?

 That should be doable. Ill add these repairs as 3-4 new patches at the
 end of the current patch series.

 The main complication I can see is
 iteration: would iteration skip poorly named refs and warn, or would
 something more complicated be needed?

I think we can get away with not including broken refnames when
iterating.  After all, the main goal of tolerating them is to let them
be deleted, right?  Then as long as iteration is not needed to implement
the command git update-ref -d, it seems to me that it is OK if
iterating over a reference with a broken name causes a die().

 Right now,  git branch  will error and abort right away when it
 finds the first bad ref. I haven't checked the exact spot it does this
 yet but I suspect it is when building the ref-cache.
 I will solve the cases for create and delete for now.


 What/how to handle iterables will require more thought.
 Right now, since these refs will be filtered out and never end up in
 ref-cache, either from loose refs or from packed refs
 it does mean that anyone that uses an iterator is guaranteed to only
 get refs with valid names passed back to them.
 We would need to audit all code that uses iterators and make sure it
 can handle the case where the callback is suddenly
 invoked with a bad refname.


 Thanks,
 Jonathan
 
 The following seems to do the trick to allow deleting a bad ref. We
 would need something for the iterator too.
 Since this touches the same areas that my ~100 other ref transaction
 patches that are queued up do, I
 would like to wait applying this patch until once the next few series
 are finished and merged.
 (to avoid having to do a lot of rebases and fix legio of merge
 conflicts that this would introduce in my branches).
 
 
 Treat this as an approximation on what the fix to repair git branch -D
 will look like once the time comes.

I'm a little worried that abandoning *all* refname checks could open us
up to somehow trying to delete a reference with a name like
../../../../etc/passwd.  Either such names have to be prohibited
somehow, or we have to be very sure that they can only come from trusted
sources.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-20 Thread Ronnie Sahlberg
On Wed, Aug 20, 2014 at 7:52 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 07/15/2014 10:58 PM, Ronnie Sahlberg wrote:
 On Tue, Jul 15, 2014 at 12:34 PM, Ronnie Sahlberg sahlb...@google.com 
 wrote:
 On Tue, Jul 15, 2014 at 11:04 AM, Jonathan Nieder jrnie...@gmail.com 
 wrote:
 Michael Haggerty wrote:

 So...I like the idea of enforcing refname checks at the lowest level
 possible, but I think that the change you propose is too abrupt.  I
 think it needs either more careful analysis showing that it won't hurt
 anybody, or some kind of tooling or non-strict mode that people can use
 to fix their repositories.

 The recovery code has been broken for a while, so I don't see harm
 in this change.

 How to take care of the recovery use case is another question.  FWIW I
 also would prefer if git update-ref -d or git branch -D could be
 used to delete corrupt refs instead of having to use fsck (since a
 fsck run can take a while), but that's a question for a later series.

 In an ideal world, the low-level functions would allow *reading* and
 *deleting* poorly named refs (even without any special flag) but not
 creating them.  Is that doable?

 That should be doable. Ill add these repairs as 3-4 new patches at the
 end of the current patch series.

 The main complication I can see is
 iteration: would iteration skip poorly named refs and warn, or would
 something more complicated be needed?

 I think we can get away with not including broken refnames when
 iterating.  After all, the main goal of tolerating them is to let them
 be deleted, right?  Then as long as iteration is not needed to implement
 the command git update-ref -d, it seems to me that it is OK if
 iterating over a reference with a broken name causes a die().

 Right now,  git branch  will error and abort right away when it
 finds the first bad ref. I haven't checked the exact spot it does this
 yet but I suspect it is when building the ref-cache.
 I will solve the cases for create and delete for now.


 What/how to handle iterables will require more thought.
 Right now, since these refs will be filtered out and never end up in
 ref-cache, either from loose refs or from packed refs
 it does mean that anyone that uses an iterator is guaranteed to only
 get refs with valid names passed back to them.
 We would need to audit all code that uses iterators and make sure it
 can handle the case where the callback is suddenly
 invoked with a bad refname.


 Thanks,
 Jonathan

 The following seems to do the trick to allow deleting a bad ref. We
 would need something for the iterator too.
 Since this touches the same areas that my ~100 other ref transaction
 patches that are queued up do, I
 would like to wait applying this patch until once the next few series
 are finished and merged.
 (to avoid having to do a lot of rebases and fix legio of merge
 conflicts that this would introduce in my branches).


 Treat this as an approximation on what the fix to repair git branch -D
 will look like once the time comes.

 I'm a little worried that abandoning *all* refname checks could open us
 up to somehow trying to delete a reference with a name like
 ../../../../etc/passwd.  Either such names have to be prohibited
 somehow, or we have to be very sure that they can only come from trusted
 sources.


I only set this flag from builtin/branch.c so it should only be used
when a user runs 'git branch -D' from the command line.
All other places where we delete branches we should still be checking
the rename for badness.

That said, unless the rules for good refname changes in the future,
which is unlikely, is should be exceptionally rare that a user ends up
with a bad refname in the first place.
Perhaps my example I gave was bad since if you manually create bad
refs using echo  .git/refs/heads/...  then you should probably know
how to fix it too and thus maybe we do not need this patch in the
first place.

Do you want me to delete this patch and resend this part of the series
? Or is the 'only works for branch -D from the commandline' sufficient
?
I have no strong feelings either way so I will just follow what you decide.


regards
ronnie sahlberg
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-20 Thread Jonathan Nieder
Hi,

Ronnie Sahlberg wrote:
 On Wed, Aug 20, 2014 at 7:52 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:

 I'm a little worried that abandoning *all* refname checks could open us
 up to somehow trying to delete a reference with a name like
 ../../../../etc/passwd.  Either such names have to be prohibited
 somehow, or we have to be very sure that they can only come from trusted
 sources.

 I only set this flag from builtin/branch.c so it should only be used
 when a user runs 'git branch -D' from the command line.
 All other places where we delete branches we should still be checking
 the rename for badness.

Right, this should be safe for 'git branch -D' and 'git update-ref -d'.

But if we wanted to open it up in the future for 'git push --delete',
too, then it would be a way to break out of the repository on hosts where
people use git-shell instead of relying on filesystem permissions.  And
that wouldn't be good.

I think elsewhere git has some checks for does this pathname fall in
this directory.  Could that be reused here, too, to make sure the
resolved path is under the resolved $GIT_DIR/refs directory?

Alternatively, when a ref being deleted doesn't meet the
'check-ref-format' checks, would it make sense to check that it is one
of the refs you can get by iteration?

Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-20 Thread Ronnie Sahlberg
On Wed, Aug 20, 2014 at 10:49 AM, Jonathan Nieder jrnie...@gmail.com wrote:
 Hi,

 Ronnie Sahlberg wrote:
 On Wed, Aug 20, 2014 at 7:52 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:

 I'm a little worried that abandoning *all* refname checks could open us
 up to somehow trying to delete a reference with a name like
 ../../../../etc/passwd.  Either such names have to be prohibited
 somehow, or we have to be very sure that they can only come from trusted
 sources.

 I only set this flag from builtin/branch.c so it should only be used
 when a user runs 'git branch -D' from the command line.
 All other places where we delete branches we should still be checking
 the rename for badness.

 Right, this should be safe for 'git branch -D' and 'git update-ref -d'.

 But if we wanted to open it up in the future for 'git push --delete',
 too, then it would be a way to break out of the repository on hosts where
 people use git-shell instead of relying on filesystem permissions.  And
 that wouldn't be good.

 I think elsewhere git has some checks for does this pathname fall in
 this directory.  Could that be reused here, too, to make sure the
 resolved path is under the resolved $GIT_DIR/refs directory?

 Alternatively, when a ref being deleted doesn't meet the
 'check-ref-format' checks, would it make sense to check that it is one
 of the refs you can get by iteration?

Good idea! I will add such a check using the iterator.
That means that we can git branch -D anything that shows up in the
iterator regardless if the ref is badly named or unresolvable which
sounds like fairly sane semantics.

Thanks!

Ronnie
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-20 Thread Michael Haggerty
On 08/20/2014 06:28 PM, Ronnie Sahlberg wrote:
 On Wed, Aug 20, 2014 at 7:52 AM, Michael Haggerty mhag...@alum.mit.edu 
 wrote:
 I'm a little worried that abandoning *all* refname checks could open us
 up to somehow trying to delete a reference with a name like
 ../../../../etc/passwd.  Either such names have to be prohibited
 somehow, or we have to be very sure that they can only come from trusted
 sources.
 
 I only set this flag from builtin/branch.c so it should only be used
 when a user runs 'git branch -D' from the command line.
 All other places where we delete branches we should still be checking
 the rename for badness.
 
 That said, unless the rules for good refname changes in the future,
 which is unlikely, is should be exceptionally rare that a user ends up
 with a bad refname in the first place.
 Perhaps my example I gave was bad since if you manually create bad
 refs using echo  .git/refs/heads/...  then you should probably know
 how to fix it too and thus maybe we do not need this patch in the
 first place.
 
 Do you want me to delete this patch and resend this part of the series
 ? Or is the 'only works for branch -D from the commandline' sufficient
 ?
 I have no strong feelings either way so I will just follow what you decide.

I think that if you run the refname through normalize_path_copy_len()
and that function returns (1) without an error, (2) without modifying
its argument, and (3) the result does not begin with a
has_dos_drive_prefix() or is_dir_sep(), then we should be safe against
directory traversal attacks.  I suggest doing this kind of check even if
not doing the full check_refname_format() check.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-20 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 I think we can get away with not including broken refnames when
 iterating.  After all, the main goal of tolerating them is to let them
 be deleted, right?

Or read from a ref whose name has retroactively made invalid, in
order to create a similar but validly named ref to serve as its
replacement?  So at least we need to give the users some way of
reading from them, in addition to deleting them, no?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-20 Thread Michael Haggerty
On 08/20/2014 09:45 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 I think we can get away with not including broken refnames when
 iterating.  After all, the main goal of tolerating them is to let them
 be deleted, right?
 
 Or read from a ref whose name has retroactively made invalid, in
 order to create a similar but validly named ref to serve as its
 replacement?  So at least we need to give the users some way of
 reading from them, in addition to deleting them, no?

The die() error message would presumably include the name of the invalid
reference.

If we change the rules for reference names and a bunch of names become
invalid, then it would admittedly be cumbersome to run git N times to
find the N invalid names.  But if we change the rules, then *at that
time* we could always build in iteration over broken reference names.

It's not that I have something against building it iteration over broken
reference names now, as long as it is clearly segregated from normal
iteration to prevent illegal names from getting loose in the code.

Michael

-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-20 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 On 08/20/2014 09:45 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 I think we can get away with not including broken refnames when
 iterating.  After all, the main goal of tolerating them is to let them
 be deleted, right?
 
 Or read from a ref whose name has retroactively made invalid, in
 order to create a similar but validly named ref to serve as its
 replacement?  So at least we need to give the users some way of
 reading from them, in addition to deleting them, no?

 The die() error message would presumably include the name of the invalid
 reference.

 If we change the rules for reference names and a bunch of names become
 invalid, then it would admittedly be cumbersome to run git N times to
 find the N invalid names.  But if we change the rules, then *at that
 time* we could always build in iteration over broken reference names.

 It's not that I have something against building it iteration over broken
 reference names now, as long as it is clearly segregated from normal
 iteration to prevent illegal names from getting loose in the code.

Oh, I don't think it is that important for iterator to show broken
ones.  If refs/heads/foo/$brokenname exists but is skipped from any
and all iterations, nobody will get hurt until the end user wonders
where foo/$brokenname went, at which time rev-parse can be used to
grab the value from it before update-ref -d can be used to remove it.

If refs/heads/foo/$brokenname, which is skipped from iterations,
prevents a valid ref refs/heads/foo from getting created, that would
give a bit of surprise to the user who long forgot foo/$brokenname
existed, but the recovery procedure is exactly the same as the case
where he has a branch foo/$koshername and wants to create foo; first
'branch -D' the D/F-conflicting one and then create foo.

So the primary things I care about are when the user has a string
that is the name of an existing misnamed ref, the value of the ref
can be obtained, and the ref can be removed.

Thanks.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-08-20 Thread Ronnie Sahlberg
On Wed, Aug 20, 2014 at 1:11 PM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 08/20/2014 09:45 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:

 I think we can get away with not including broken refnames when
 iterating.  After all, the main goal of tolerating them is to let them
 be deleted, right?

 Or read from a ref whose name has retroactively made invalid, in
 order to create a similar but validly named ref to serve as its
 replacement?  So at least we need to give the users some way of
 reading from them, in addition to deleting them, no?

 The die() error message would presumably include the name of the invalid
 reference.

 If we change the rules for reference names and a bunch of names become
 invalid, then it would admittedly be cumbersome to run git N times to
 find the N invalid names.  But if we change the rules, then *at that
 time* we could always build in iteration over broken reference names.

 It's not that I have something against building it iteration over broken
 reference names now, as long as it is clearly segregated from normal
 iteration to prevent illegal names from getting loose in the code.

We already have iterators that show also bad refs.
For example, git branch uses
DO_FOR_EACH_INCLUDE_BROKEN
which is a flag to include also broken refs that can not be resolved
which allows them to be listed :



$ echo this is not a valid sha1 .git/refs/heads/broken
$ git branch
  broken
  foo
* master
$ git branch -D broken
error: branch 'broken' not found.

(allowing -D to remove it is in a different patch further down my series)


Since we already display broken/unresolvable refs, I think the most
consistent path is to also allow showing the refs broken/illegal-names
too in the list. (when DO_FOR_EACH_INCLUDE_BROKEN is specified)
Of course, an end user could fix this by deleting the file but since
it is easy to add the special casing to 'git branch -D' to handle this
case I think this would be more userfriendly since then the user can
use git branch -D regardless of the reason why the ref is broken.


Ronnie
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-07-15 Thread Ronnie Sahlberg
On Tue, Jul 8, 2014 at 8:02 AM, Michael Haggerty mhag...@alum.mit.edu wrote:
 On 06/20/2014 04:43 PM, Ronnie Sahlberg wrote:
 Move the check for check_refname_format from lock_any_ref_for_update
 to lock_ref_sha1_basic. At some later stage we will get rid of
 lock_any_ref_for_update completely.

 If lock_ref_sha1_basic fails the check_refname_format test, set errno to
 EINVAL before returning NULL. This to guarantee that we will not return an
 error without updating errno.

 This leaves lock_any_ref_for_updates as a no-op wrapper which could be 
 removed.
 But this wrapper is also called from an external caller and we will soon
 make changes to the signature to lock_ref_sha1_basic that we do not want to
 expose to that caller.

 This changes semantics for lock_ref_sha1_basic slightly. With this change
 it is no longer possible to open a ref that has a badly name which breaks

 s/badly name/bad name,/

 any codepaths that tries to open and repair badly named refs. The normal refs

 s/tries/try/

 API should not allow neither creating nor accessing refs with invalid names.

 s/not allow neither/allow neither/

 If we need such recovery code we could add it as an option to git fsck and 
 have
 git fsck be the only sanctioned way of bypassing the normal API and checks.

 I like the sentiment, but in the real world I'm not sure we can take
 such a step based only on good intentions.  Which callers would be
 affected?  Where is this git fsck code that would be needed to help
 people rescue their repos?


I think the repair things are already busted since a while, so I don't
think this will make things worse,
but I will change it to make it better than this patch and better than
current master,  please see below.

current git
$ cp .git/refs/heads/master .git/refs/heads/master.@\*@\\.
$ git branch
   fatal: Reference has invalid format: 'refs/heads/master.@*@\.'
$ git branch -D master.@\*@\\.
  error: branch 'master.@*@\.' not found.

git fsck does report that there is a problem with a broken ref
  fatal: Reference has invalid format: 'refs/heads/master.@*@\.'

But I don't think there is any way to fix this other than manually
deleting the file from the command line.
(this is because we need to do a resolve_ref_unsafe which will not
work and if we can not do a resolve_ref_unsafe we can not delete the
ref,
 there is also the issue where we can not read the ref into ref-cache)


What I suggest doing here is to create a new patch towards the end of
the series that will :
* change the resolve_ref_unsafe(... , int reading, ...) argument to be
a bitmask of flags with
#define RESOLVE_REF_READING 0x01  (== current flag)
#define RESOLVE_REF_ALLOW_BAD_NAME 0x02  (== disable checking the
refname format during resolve)
* then provide the flag for RESOLVE_REF_ALLOW_BAD_NAME for the cases
where we try to resolve a ref in builtin/branch.c where we try to
delete a ref

* then also pass the same flag to lock_ref_sha1_basic when we are
deleting a ref from transaction_commit so we can disable the check
refname check there too.

I think that is all that is needed but I will see if there are
additional changes needed once I implement it.



This will mean that the semantics will become :
* you can not create or access a branch with a bad name since both
resolving it and locking it will fail.
* but you can always delete a branch regardless of whether the name is
good or bad.
(this will also mean that you will be able to rename a bad branch name
to a new good name)

which I think would be pretty sane semantics.


I will implement these changes as a separate patch.

Comments?


regards
ronnie sahlberg



 I can also imagine that we will tighten up the check_refname_format
 checks in the future; for example, I think it would be a good idea to
 prohibit reference names that start with '-' because it is almost
 impossible to work with them (their names look like command-line
 options).  If we ever make a change like that, we will need some amount
 of tolerance in git versions around the transition.

 So...I like the idea of enforcing refname checks at the lowest level
 possible, but I think that the change you propose is too abrupt.  I
 think it needs either more careful analysis showing that it won't hurt
 anybody, or some kind of tooling or non-strict mode that people can use
 to fix their repositories.

 Michael

 Signed-off-by: Ronnie Sahlberg sahlb...@google.com
 ---
  refs.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)

 diff --git a/refs.c b/refs.c
 index 389a55f..bccf8c3 100644
 --- a/refs.c
 +++ b/refs.c
 @@ -2088,6 +2088,11 @@ static struct ref_lock *lock_ref_sha1_basic(const 
 char *refname,
   int missing = 0;
   int attempts_remaining = 3;

 + if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
 + errno = EINVAL;
 + return NULL;
 + }
 +
   lock = xcalloc(1, sizeof(struct ref_lock));
   lock-lock_fd = -1;

 @@ -2179,8 +2184,6 

Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-07-15 Thread Jonathan Nieder
Michael Haggerty wrote:

 So...I like the idea of enforcing refname checks at the lowest level
 possible, but I think that the change you propose is too abrupt.  I
 think it needs either more careful analysis showing that it won't hurt
 anybody, or some kind of tooling or non-strict mode that people can use
 to fix their repositories.

The recovery code has been broken for a while, so I don't see harm
in this change.

How to take care of the recovery use case is another question.  FWIW I
also would prefer if git update-ref -d or git branch -D could be
used to delete corrupt refs instead of having to use fsck (since a
fsck run can take a while), but that's a question for a later series.

In an ideal world, the low-level functions would allow *reading* and
*deleting* poorly named refs (even without any special flag) but not
creating them.  Is that doable?  The main complication I can see is
iteration: would iteration skip poorly named refs and warn, or would
something more complicated be needed?

Thanks,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-07-15 Thread Jonathan Nieder
Ronnie Sahlberg wrote:

 What I suggest doing here is to create a new patch towards the end of
 the series that will :
 * change the resolve_ref_unsafe(... , int reading, ...) argument to be
 a bitmask of flags with
 #define RESOLVE_REF_READING 0x01  (== current flag)
 #define RESOLVE_REF_ALLOW_BAD_NAME 0x02  (== disable checking the
 refname format during resolve)
 * then provide the flag for RESOLVE_REF_ALLOW_BAD_NAME for the cases
 where we try to resolve a ref in builtin/branch.c where we try to
 delete a ref

 * then also pass the same flag to lock_ref_sha1_basic when we are
 deleting a ref from transaction_commit so we can disable the check
 refname check there too.

Yeah, that sounds lovely.

I wasn't able to reproduce a regression or convince myself there is
one so I think it's okay if that happens in a separate series.  But
doing it now would be fine, too.

Thanks,
Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-07-15 Thread Junio C Hamano
Jonathan Nieder jrnie...@gmail.com writes:

 How to take care of the recovery use case is another question.  FWIW I
 also would prefer if git update-ref -d or git branch -D could be
 used to delete corrupt refs instead of having to use fsck (since a
 fsck run can take a while), but that's a question for a later series.

Good thinking.

 In an ideal world, the low-level functions would allow *reading* and
 *deleting* poorly named refs (even without any special flag) but not
 creating them.  Is that doable?  The main complication I can see is
 iteration: would iteration skip poorly named refs and warn, or would
 something more complicated be needed?

I somehow thought that was what we have always designed for, which
DO_FOR_EACH_INCLUDE_BROKEN was a part of.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-07-15 Thread Ronnie Sahlberg
On Tue, Jul 15, 2014 at 11:04 AM, Jonathan Nieder jrnie...@gmail.com wrote:
 Michael Haggerty wrote:

 So...I like the idea of enforcing refname checks at the lowest level
 possible, but I think that the change you propose is too abrupt.  I
 think it needs either more careful analysis showing that it won't hurt
 anybody, or some kind of tooling or non-strict mode that people can use
 to fix their repositories.

 The recovery code has been broken for a while, so I don't see harm
 in this change.

 How to take care of the recovery use case is another question.  FWIW I
 also would prefer if git update-ref -d or git branch -D could be
 used to delete corrupt refs instead of having to use fsck (since a
 fsck run can take a while), but that's a question for a later series.

 In an ideal world, the low-level functions would allow *reading* and
 *deleting* poorly named refs (even without any special flag) but not
 creating them.  Is that doable?

That should be doable. Ill add these repairs as 3-4 new patches at the
end of the current patch series.

 The main complication I can see is
 iteration: would iteration skip poorly named refs and warn, or would
 something more complicated be needed?

Right now,  git branch  will error and abort right away when it
finds the first bad ref. I haven't checked the exact spot it does this
yet but I suspect it is when building the ref-cache.
I will solve the cases for create and delete for now.


What/how to handle iterables will require more thought.
Right now, since these refs will be filtered out and never end up in
ref-cache, either from loose refs or from packed refs
it does mean that anyone that uses an iterator is guaranteed to only
get refs with valid names passed back to them.
We would need to audit all code that uses iterators and make sure it
can handle the case where the callback is suddenly
invoked with a bad refname.


 Thanks,
 Jonathan
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-07-15 Thread Ronnie Sahlberg
On Tue, Jul 15, 2014 at 11:34 AM, Junio C Hamano gits...@pobox.com wrote:
 Jonathan Nieder jrnie...@gmail.com writes:

 How to take care of the recovery use case is another question.  FWIW I
 also would prefer if git update-ref -d or git branch -D could be
 used to delete corrupt refs instead of having to use fsck (since a
 fsck run can take a while), but that's a question for a later series.

 Good thinking.

 In an ideal world, the low-level functions would allow *reading* and
 *deleting* poorly named refs (even without any special flag) but not
 creating them.  Is that doable?  The main complication I can see is
 iteration: would iteration skip poorly named refs and warn, or would
 something more complicated be needed?

 I somehow thought that was what we have always designed for, which
 DO_FOR_EACH_INCLUDE_BROKEN was a part of.

I think that include broken only handles the case where the ref itself
is bad, not when the refname is bad.
I.e. it affects cases where the sha1 does not exist or the symref
points to nowhere.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-07-15 Thread Ronnie Sahlberg
On Tue, Jul 15, 2014 at 12:34 PM, Ronnie Sahlberg sahlb...@google.com wrote:
 On Tue, Jul 15, 2014 at 11:04 AM, Jonathan Nieder jrnie...@gmail.com wrote:
 Michael Haggerty wrote:

 So...I like the idea of enforcing refname checks at the lowest level
 possible, but I think that the change you propose is too abrupt.  I
 think it needs either more careful analysis showing that it won't hurt
 anybody, or some kind of tooling or non-strict mode that people can use
 to fix their repositories.

 The recovery code has been broken for a while, so I don't see harm
 in this change.

 How to take care of the recovery use case is another question.  FWIW I
 also would prefer if git update-ref -d or git branch -D could be
 used to delete corrupt refs instead of having to use fsck (since a
 fsck run can take a while), but that's a question for a later series.

 In an ideal world, the low-level functions would allow *reading* and
 *deleting* poorly named refs (even without any special flag) but not
 creating them.  Is that doable?

 That should be doable. Ill add these repairs as 3-4 new patches at the
 end of the current patch series.

 The main complication I can see is
 iteration: would iteration skip poorly named refs and warn, or would
 something more complicated be needed?

 Right now,  git branch  will error and abort right away when it
 finds the first bad ref. I haven't checked the exact spot it does this
 yet but I suspect it is when building the ref-cache.
 I will solve the cases for create and delete for now.


 What/how to handle iterables will require more thought.
 Right now, since these refs will be filtered out and never end up in
 ref-cache, either from loose refs or from packed refs
 it does mean that anyone that uses an iterator is guaranteed to only
 get refs with valid names passed back to them.
 We would need to audit all code that uses iterators and make sure it
 can handle the case where the callback is suddenly
 invoked with a bad refname.


 Thanks,
 Jonathan

The following seems to do the trick to allow deleting a bad ref. We
would need something for the iterator too.
Since this touches the same areas that my ~100 other ref transaction
patches that are queued up do, I
would like to wait applying this patch until once the next few series
are finished and merged.
(to avoid having to do a lot of rebases and fix legio of merge
conflicts that this would introduce in my branches).


Treat this as an approximation on what the fix to repair git branch -D
will look like once the time comes.

regards
ronnie sahlberg



===

commit a2514213999a192c9995a3a5e1479d7d09e2c083
Author: Ronnie Sahlberg sahlb...@google.com
Date:   Tue Jul 15 12:59:36 2014 -0700

refs.c: fix handling of badly named refs

We currently do not handle badly named refs well :
$ cp .git/refs/heads/master .git/refs/heads/master.@\*@\\.
$ git branch
   fatal: Reference has invalid format: 'refs/heads/master.@*@\.'
$ git branch -D master.@\*@\\.
  error: branch 'master.@*@\.' not found.

But we can not really recover from a badly named ref with less than
manually deleting the .git/refs/heads/refname file.

Change resolve_ref_unsafe to take a flags field instead of a 'reading'
boolean and update all callers that used a non-zero value for reading
to pass the flag RESOLVE_REF_READING instead.
Add another flag RESOLVE_REF_ALLOW_BAD_NAME that will make
resolve_ref_unsafe skip checking the refname for sanity and use this
from branch.c so that we will be able to call resolve_ref_unsafe on such
refs when trying to delete it.
Add checks for refname sanity when updating (not deleting) a ref in
ref_transaction_update and in ref_transaction_create to make the transaction
fail if an attempt is made to create/update a badly named ref.
Since all ref changes will later go through the transaction layer this means
we no longer need to check for and fail for bad refnames in
lock_ref_sha1_basic.

Change lock_ref_sha1_basic to not fail for bad refnames. Just check the
refname, and print an error, and remember that the refname is bad so that
we can skip calling verify_lock().

Signed-off-by: Ronnie Sahlberg sahlb...@google.com

diff --git a/builtin/blame.c b/builtin/blame.c
index 662e3fe..76340e2 100644
--- a/builtin/blame.c
+++ b/builtin/blame.c
@@ -2278,7 +2278,7 @@ static struct commit
*fake_working_tree_commit(struct diff_options *opt,
  commit-object.type = OBJ_COMMIT;
  parent_tail = commit-parents;

- if (!resolve_ref_unsafe(HEAD, head_sha1, 1, NULL))
+ if (!resolve_ref_unsafe(HEAD, head_sha1, RESOLVE_REF_READING, NULL))
  die(no such ref: HEAD);

  parent_tail = append_parent(parent_tail, head_sha1);
diff --git a/builtin/branch.c b/builtin/branch.c
index 652b1d2..5c95656 100644
--- a/builtin/branch.c
+++ b/builtin/branch.c
@@ -129,7 +129,8 @@ static int branch_merged(int kind, const char *name,
 branch-merge[0] 
 

Re: [PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-07-08 Thread Michael Haggerty
On 06/20/2014 04:43 PM, Ronnie Sahlberg wrote:
 Move the check for check_refname_format from lock_any_ref_for_update
 to lock_ref_sha1_basic. At some later stage we will get rid of
 lock_any_ref_for_update completely.
 
 If lock_ref_sha1_basic fails the check_refname_format test, set errno to
 EINVAL before returning NULL. This to guarantee that we will not return an
 error without updating errno.
 
 This leaves lock_any_ref_for_updates as a no-op wrapper which could be 
 removed.
 But this wrapper is also called from an external caller and we will soon
 make changes to the signature to lock_ref_sha1_basic that we do not want to
 expose to that caller.
 
 This changes semantics for lock_ref_sha1_basic slightly. With this change
 it is no longer possible to open a ref that has a badly name which breaks

s/badly name/bad name,/

 any codepaths that tries to open and repair badly named refs. The normal refs

s/tries/try/

 API should not allow neither creating nor accessing refs with invalid names.

s/not allow neither/allow neither/

 If we need such recovery code we could add it as an option to git fsck and 
 have
 git fsck be the only sanctioned way of bypassing the normal API and checks.

I like the sentiment, but in the real world I'm not sure we can take
such a step based only on good intentions.  Which callers would be
affected?  Where is this git fsck code that would be needed to help
people rescue their repos?

I can also imagine that we will tighten up the check_refname_format
checks in the future; for example, I think it would be a good idea to
prohibit reference names that start with '-' because it is almost
impossible to work with them (their names look like command-line
options).  If we ever make a change like that, we will need some amount
of tolerance in git versions around the transition.

So...I like the idea of enforcing refname checks at the lowest level
possible, but I think that the change you propose is too abrupt.  I
think it needs either more careful analysis showing that it won't hurt
anybody, or some kind of tooling or non-strict mode that people can use
to fix their repositories.

Michael

 Signed-off-by: Ronnie Sahlberg sahlb...@google.com
 ---
  refs.c | 7 +--
  1 file changed, 5 insertions(+), 2 deletions(-)
 
 diff --git a/refs.c b/refs.c
 index 389a55f..bccf8c3 100644
 --- a/refs.c
 +++ b/refs.c
 @@ -2088,6 +2088,11 @@ static struct ref_lock *lock_ref_sha1_basic(const char 
 *refname,
   int missing = 0;
   int attempts_remaining = 3;
  
 + if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
 + errno = EINVAL;
 + return NULL;
 + }
 +
   lock = xcalloc(1, sizeof(struct ref_lock));
   lock-lock_fd = -1;
  
 @@ -2179,8 +2184,6 @@ struct ref_lock *lock_any_ref_for_update(const char 
 *refname,
const unsigned char *old_sha1,
int flags, int *type_p)
  {
 - if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
 - return NULL;
   return lock_ref_sha1_basic(refname, old_sha1, flags, type_p);
  }
  
 


-- 
Michael Haggerty
mhag...@alum.mit.edu

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v20 43/48] refs.c: move the check for valid refname to lock_ref_sha1_basic

2014-06-20 Thread Ronnie Sahlberg
Move the check for check_refname_format from lock_any_ref_for_update
to lock_ref_sha1_basic. At some later stage we will get rid of
lock_any_ref_for_update completely.

If lock_ref_sha1_basic fails the check_refname_format test, set errno to
EINVAL before returning NULL. This to guarantee that we will not return an
error without updating errno.

This leaves lock_any_ref_for_updates as a no-op wrapper which could be removed.
But this wrapper is also called from an external caller and we will soon
make changes to the signature to lock_ref_sha1_basic that we do not want to
expose to that caller.

This changes semantics for lock_ref_sha1_basic slightly. With this change
it is no longer possible to open a ref that has a badly name which breaks
any codepaths that tries to open and repair badly named refs. The normal refs
API should not allow neither creating nor accessing refs with invalid names.
If we need such recovery code we could add it as an option to git fsck and have
git fsck be the only sanctioned way of bypassing the normal API and checks.

Signed-off-by: Ronnie Sahlberg sahlb...@google.com
---
 refs.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/refs.c b/refs.c
index 389a55f..bccf8c3 100644
--- a/refs.c
+++ b/refs.c
@@ -2088,6 +2088,11 @@ static struct ref_lock *lock_ref_sha1_basic(const char 
*refname,
int missing = 0;
int attempts_remaining = 3;
 
+   if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL)) {
+   errno = EINVAL;
+   return NULL;
+   }
+
lock = xcalloc(1, sizeof(struct ref_lock));
lock-lock_fd = -1;
 
@@ -2179,8 +2184,6 @@ struct ref_lock *lock_any_ref_for_update(const char 
*refname,
 const unsigned char *old_sha1,
 int flags, int *type_p)
 {
-   if (check_refname_format(refname, REFNAME_ALLOW_ONELEVEL))
-   return NULL;
return lock_ref_sha1_basic(refname, old_sha1, flags, type_p);
 }
 
-- 
2.0.0.420.g181e020.dirty

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html