Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-11-08 Thread Bryan Turner
>
>> Is there anything I can do to help? I'm happy to test out changes.
>
> The patch at the end of his mail obviously passes the newly-added tests
> for me, but please confirm that it fixes your test suite.
>
> I gather your suite is about noticing behavior changes between different
> versions. For cases where we know there is an obvious right behavior, it
> would be nice if you could contribute them as patches to git's test
> suite. This case was overlooked because there was no test coverage at
> all.
>
> Barring that, running your suite and giving easily-reproducible problem
> reports is valuable. The earlier the better. So I am happy to see this
> on -rc0, and not on the final release. Periodically running it on
> "master" during the development cycle would have caught it even sooner.

I've applied your patch to the tip of the 2.11.0-rc0 tag (just to make
sure I don't accidentally pick up anything else on master; I'll test
that separately) and my full test suite passes without issue.

I'm going to investigate whether I can setup a version of this build
that runs "periodically" (I'm not sure what that period will be)
against git/git master. I've got a lot of the infrastructure in place,
but I'm going to need to automate a few things to make it really
viable.

As for contributing extensions to the test suite, that's a good idea.
I need to fast track getting a development environment setup.


Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-11-07 Thread Jeff King
On Mon, Nov 07, 2016 at 05:12:43PM -0800, Bryan Turner wrote:

> > The obvious solution is one of:
> >
> >   1. Stop calling normalize() at all when we do not have a relative base
> >  and the path is not absolute. This restores the original quirky
> >  behavior (plus makes the "foo/../../bar" case work).

Actually, I think we want to keep normalizing, as it is possible for
relative paths to normalize correctly (e.g., "foo/../bar"). We just need
to ignore the error, which is easy.

The patch is below, and is the absolute minimum I think we'd need for
v2.11.

Beyond that, we could go further:

  a. Actually make a real absolute path based on getcwd(), which would
 protect against later chdir() calls, and possibly help with
 duplicate suppression. I'm not sure there are actually any
 triggerable bugs here, so I went for the minimal fix.

  b. Pick a more sane base for the absolute path, like $GIT_DIR. If we
 did so, then people using relative paths in
 GIT_ALTERNATE_OBJECT_DIRECTORIES from a bare repo would continue to
 work, and people in non-bare repositories would have to add an
 extra ".." to most of their paths. So a slight regression, but
 saner overall semantics.

 Making it relative to the object directory ($GIT_DIR/objects, or
 even whatever is in $GIT_OBJECT_DIRECTORY) makes even more sense
 to me, but would regress even the bare case (and would probably be
 "interesting" along with the tmp-objdir stuff, which sets
 $GIT_OBJECT_DIRECTORY on the fly, as that would invalidate
 $GIT_ALTERNATE_OBJECT_DIRECTORIES).

I'm inclined to leave those to anybody interested post-v2.11 (or never,
if nobody cares). But it would be pretty trivial to do (a) as part of
this initial fix if anybody feels strongly.

> Is there anything I can do to help? I'm happy to test out changes.

The patch at the end of his mail obviously passes the newly-added tests
for me, but please confirm that it fixes your test suite.

I gather your suite is about noticing behavior changes between different
versions. For cases where we know there is an obvious right behavior, it
would be nice if you could contribute them as patches to git's test
suite. This case was overlooked because there was no test coverage at
all.

Barring that, running your suite and giving easily-reproducible problem
reports is valuable. The earlier the better. So I am happy to see this
on -rc0, and not on the final release. Periodically running it on
"master" during the development cycle would have caught it even sooner.

> At the moment I have limited ability to actually try to submit patches
> myself. I really need to sit down and setup a working development
> environment for Git. (My current dream, if I could get such an
> environment running, is to follow up on your git blame-tree work.

I would be happy for somebody to pick that up, too. It has been powering
GitHub's tree-view for several years now, but I know there are some
rough edges as well as opportunities to optimize it.

-- >8 --
Subject: [PATCH] alternates: re-allow relative paths from environment

Commit 670c359da (link_alt_odb_entry: handle normalize_path
errors, 2016-10-03) regressed the handling of relative paths
in the GIT_ALTERNATE_OBJECT_DIRECTORIES variable. It's not
entirely clear this was ever meant to work, but it _has_
worked for several years, so this commit restores the
original behavior.

When we get a path in GIT_ALTERNATE_OBJECT_DIRECTORIES, we
add it the path to the list of alternate object directories
as if it were found in objects/info/alternates, but with one
difference: we do not provide the link_alt_odb_entry()
function with a base for relative paths. That function
doesn't turn it into an absolute path, and we end up feeding
the relative path to the strbuf_normalize_path() function.

Most relative paths break out of the top-level directory
(e.g., "../foo.git/objects"), and thus normalizing fails.
Prior to 670c359da, we simply ignored the error, and due to
the way normalize_path_copy() was implemented it happened to
return the original path in this case. We then accessed the
alternate objects using this relative path.

By storing the relative path in the alt_odb list, the path
is relative to wherever we happen to be at the time we do an
object lookup. That means we look from $GIT_DIR in a bare
repository, and from the top of the worktree in a non-bare
repository.

If this were being designed from scratch, it would make
sense to pick a stable location (probably $GIT_DIR, or even
the object directory) and use that as the relative base,
turning the result into an absolute path.  However, given
the history, at this point the minimal fix is to match the
pre-670c359da behavior.

We can do this simply by ignoring the error when we have no
relative base and using the original value (which we now
reliably have, thanks to strbuf_normalize_path()).

That still leaves us with a relative path that foils our
duplicate detection, and 

Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-11-07 Thread Bryan Turner
On Mon, Nov 7, 2016 at 4:30 PM, Jeff King  wrote:
> On Mon, Nov 07, 2016 at 03:42:35PM -0800, Bryan Turner wrote:
>
>> > @@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int 
>> > len, int sep,
>> > }
>> >
>> > strbuf_add_absolute_path(, get_object_directory());
>> > -   normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
>> > +   if (strbuf_normalize_path() < 0)
>> > +   die("unable to normalize object directory: %s",
>> > +   objdirbuf.buf);
>>
>> This appears to break the ability to use a relative alternate via an
>> environment variable, since normalize_path_copy_len is explicitly
>> documented "Returns failure (non-zero) if a ".." component appears as
>> first path"
>
> That shouldn't happen, though, because the path we are normalizing has
> been converted to an absolute path via strbuf_add_absolute_path. IOW, if
> your relative path is "../../../foo", we should be feeding something
> like "/path/to/repo/.git/objects/../../../foo" and normalizing that to
> "/path/to/foo".
>
> But in your example, you see:
>
>   error: unable to normalize alternate object path: ../0/objects
>
> which cannot come from the code above, which calls die(). It should be
> coming from the call in link_alt_odb_entry().

Ah, of course. I should have looked more closely at the call.



> No, I had no intention of disallowing relative alternates (and as you
> noticed, a commit from the same series actually expands the use of
> relative alternates). My use has been entirely within info/alternates
> files, though, not via the environment.
>
> As I said, I'm not sure this was ever meant to work, but as far as I can
> tell it mostly _has_ worked, modulo some quirks. So I think we should
> consider it a regression for it to stop working in v2.11.
>
> The obvious solution is one of:
>
>   1. Stop calling normalize() at all when we do not have a relative base
>  and the path is not absolute. This restores the original quirky
>  behavior (plus makes the "foo/../../bar" case work).
>
>  If we want to do the minimum before releasing v2.11, it would be
>  that. I'm not sure it leaves things in a very sane state, but at
>  least v2.11 does no harm, and anybody who cares can build saner
>  semantics for v2.12.
>
>   2. Fix it for real. Pass a real relative_base when linking from the
>  environment. The question is: what is the correct relative base? I
>  suppose "getcwd() at the time we prepare the alt odb" is
>  reasonable, and would behave similarly to older versions ($GIT_DIR
>  for bare repos, top of the working tree otherwise).
>
>  If we were designing from scratch, I think saner semantics would
>  probably be always relative from $GIT_DIR, or even always relative
>  from the object directory (i.e., behave as if the paths were given
>  in objects/info/alternates). But that breaks compatibility with
>  older versions. If we are treating this as a regression, it is not
>  very friendly to say "you are still broken, but you might just need
>  to add an extra '..' to your path".
>
> So I dunno. I guess that inclines me towards (1), as it lets us punt on
> the harder question.

Is there anything I can do to help? I'm happy to test out changes.
I've got a set of ~1,040 tests that verify all sorts of different Git
behaviors (those tests flagged this change, for example, and found a
regression in git diff-tree in 2.0.2/2.0.3, among other things). I run
them on the "newest" patch release for every feature-bearing line of
Git from 1.8.x up to 2.10 (currently 1.8.0.3, 1.8.1.5, 1.8.2.3,
1.8.3.4, 1.8.4.5, 1.8.5.6, 1.9.5, 2.0.5, 2.1.4, 2.2.3, 2.3.10, 2.4.11,
2.5.5, 2.6.6, 2.7.4, 2.8.4, 2.9.3 and 2.10.2), and I add in RCs of new
as soon as they become available. (I also test Git for Windows; at the
moment I've got 1.8.0, 1.8.1.2, 1.8.3, 1.8.4, 1.8.5.2 and 1.9.5.1 from
msysgit and 2.3.7.1, 2.4.6, 2.5.3, 2.6.4, 2.7.4, 2.8.4, 2.9.3 and
2.10.2 from Git for Windows. 2.11.0-rc0 on Windows passes my test
suite; it looks like it's not tagging the same git/git commit as
v2.11.0-rc0 is.) I wish there was an easy way for me to open this up.
At the moment, it's something I can really only run in-house, as it
were.

At the moment I have limited ability to actually try to submit patches
myself. I really need to sit down and setup a working development
environment for Git. (My current dream, if I could get such an
environment running, is to follow up on your git blame-tree work.

>
> -Peff


Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-11-07 Thread Jeff King
On Mon, Nov 07, 2016 at 03:42:35PM -0800, Bryan Turner wrote:

> > @@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int 
> > len, int sep,
> > }
> >
> > strbuf_add_absolute_path(, get_object_directory());
> > -   normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
> > +   if (strbuf_normalize_path() < 0)
> > +   die("unable to normalize object directory: %s",
> > +   objdirbuf.buf);
> 
> This appears to break the ability to use a relative alternate via an
> environment variable, since normalize_path_copy_len is explicitly
> documented "Returns failure (non-zero) if a ".." component appears as
> first path"

That shouldn't happen, though, because the path we are normalizing has
been converted to an absolute path via strbuf_add_absolute_path. IOW, if
your relative path is "../../../foo", we should be feeding something
like "/path/to/repo/.git/objects/../../../foo" and normalizing that to
"/path/to/foo".

But in your example, you see:

  error: unable to normalize alternate object path: ../0/objects

which cannot come from the code above, which calls die(). It should be
coming from the call in link_alt_odb_entry().

I think what is happening is that relative paths via environment
variables have always been slightly broken, but happened to mostly work.
In prepare_alt_odb(), we call link_alt_odb_entries() with a NULL
relative_base. That function does two things with it:

  - it may unconditionally dereference it for an error message, which
would cause a segfault. This is impossible to trigger in practice,
though, because the error message is related to the depth, which we
know will always be 0 here.

  - we pass the NULL along to the singular link_alt_odb_entry().
That function only creates an absolute path if given a non-NULL
relative_base; otherwise we have always fed the path to
normalize_path_copy, which is nonsense for a relative path.

So normalize_path_copy() was _always_ returning an error there, but
we ignored it and used whatever happened to be left in the buffer
anyway. And because of the way normalize_path_copy() is implemented,
that happened to be the untouched original string in most cases. But
that's mostly an accident. I think it would not be for something
like "foo/../../bar", which is technically valid (if done from a
relative base that has at least one path component).

Moreover, it means we don't have an absolute path to our alternate
odb. So the path is taken as relative whenever we do an object
lookup, meaning it will behave differently between a bare repository
(where we chdir to $GIT_DIR) and one with a working tree (where we
are generally in the root of the working tree). It can even behave
differently in the same process if we chdir between object lookups.

So it did happen to work, but I'm not sure it was planned (and obviously
we have no test coverage for it). More on that below.

> Other commits, like [1], suggest the ability to use relative paths in
> alternates is something still actively developed and enhanced. Is it
> intentional that this breaks the ability to use relative alternates?
> If this is to be the "new normal", is there any other option when
> using environment variables besides using absolute paths?

No, I had no intention of disallowing relative alternates (and as you
noticed, a commit from the same series actually expands the use of
relative alternates). My use has been entirely within info/alternates
files, though, not via the environment.

As I said, I'm not sure this was ever meant to work, but as far as I can
tell it mostly _has_ worked, modulo some quirks. So I think we should
consider it a regression for it to stop working in v2.11.

The obvious solution is one of:

  1. Stop calling normalize() at all when we do not have a relative base
 and the path is not absolute. This restores the original quirky
 behavior (plus makes the "foo/../../bar" case work).

 If we want to do the minimum before releasing v2.11, it would be
 that. I'm not sure it leaves things in a very sane state, but at
 least v2.11 does no harm, and anybody who cares can build saner
 semantics for v2.12.

  2. Fix it for real. Pass a real relative_base when linking from the
 environment. The question is: what is the correct relative base? I
 suppose "getcwd() at the time we prepare the alt odb" is
 reasonable, and would behave similarly to older versions ($GIT_DIR
 for bare repos, top of the working tree otherwise).

 If we were designing from scratch, I think saner semantics would
 probably be always relative from $GIT_DIR, or even always relative
 from the object directory (i.e., behave as if the paths were given
 in objects/info/alternates). But that breaks compatibility with
 older versions. If we are treating this as a regression, it is not
 very friendly to say "you are still 

Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-11-07 Thread Bryan Turner
On Mon, Oct 3, 2016 at 1:34 PM, Jeff King  wrote:
> When we add a new alternate to the list, we try to normalize
> out any redundant "..", etc. However, we do not look at the
> return value of normalize_path_copy(), and will happily
> continue with a path that could not be normalized. Worse,
> the normalizing process is done in-place, so we are left
> with whatever half-finished working state the normalizing
> function was in.
>



> @@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int 
> len, int sep,
> }
>
> strbuf_add_absolute_path(, get_object_directory());
> -   normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
> +   if (strbuf_normalize_path() < 0)
> +   die("unable to normalize object directory: %s",
> +   objdirbuf.buf);

This appears to break the ability to use a relative alternate via an
environment variable, since normalize_path_copy_len is explicitly
documented "Returns failure (non-zero) if a ".." component appears as
first path"

For example, when trying to run a rev-list over commits in two
repositories using GIT_ALTERNATE_OBJECT_DIRECTORIES, in 2.10.x and
prior the following command works. I know the alternate worked
previously because I'm passing a commit that does not exist in the
repository I'm running the command in; it only exists in a repository
linked by alternate, as shown by the "fatal: bad object" when the
alternates are rejected.

Before, using Git 2.7.4 (but I've verified this behavior through to
and including 2.10.2):

bturner@elysoun /tmp/1478561282706-0/shared/data/repositories/3 $
GIT_ALTERNATE_OBJECT_DIRECTORIES=../0/objects:../1/objects git
rev-list --format="%H" 2d8897c9ac29ce42c3442cf80ac977057045e7f6
74de5497dfca9731e455d60552f9a8906e5dc1ac
^6053a1eaa1c009dd11092d09a72f3c41af1b59ad
^017caf31eca7c46eb3d1800fcac431cfa7147a01 --
commit 74de5497dfca9731e455d60552f9a8906e5dc1ac
74de5497dfca9731e455d60552f9a8906e5dc1ac
commit 3528cf690cb37f6adb85b7bd40cc7a6118d4b598
3528cf690cb37f6adb85b7bd40cc7a6118d4b598
commit 2d8897c9ac29ce42c3442cf80ac977057045e7f6
2d8897c9ac29ce42c3442cf80ac977057045e7f6
commit 9c05f43f859375e392d90d23a13717c16d0fdcda
9c05f43f859375e392d90d23a13717c16d0fdcda

Now, using Git 2.11.0-rc0

bturner@elysoun /tmp/1478561282706-0/shared/data/repositories/3 $
GIT_ALTERNATE_OBJECT_DIRECTORIES=../0/objects:../1/objects
/opt/git/2.11.0-rc0/bin/git rev-list --format="%H"
2d8897c9ac29ce42c3442cf80ac977057045e7f6
74de5497dfca9731e455d60552f9a8906e5dc1ac
^6053a1eaa1c009dd11092d09a72f3c41af1b59ad
^017caf31eca7c46eb3d1800fcac431cfa7147a01 --
error: unable to normalize alternate object path: ../0/objects
error: unable to normalize alternate object path: ../1/objects
fatal: bad object 74de5497dfca9731e455d60552f9a8906e5dc1ac

Other commits, like [1], suggest the ability to use relative paths in
alternates is something still actively developed and enhanced. Is it
intentional that this breaks the ability to use relative alternates?
If this is to be the "new normal", is there any other option when
using environment variables besides using absolute paths?

Best regards,
Bryan Turner

[1]: https://github.com/git/git/commit/087b6d584062f5b704356286d6445bcc84d686fb
-- Also newly tagged in 2.11.0-rc0


Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-10-05 Thread Jeff King
On Wed, Oct 05, 2016 at 08:47:29PM +0200, René Scharfe wrote:

> > Instead, let's provide a strbuf helper that does an in-place
> > normalize, but restores the original contents on error. This
> > uses a second buffer under the hood, which is slightly less
> > efficient, but this is not a performance-critical code path.
> 
> Hmm, in-place functions are quite rare in the strbuf collection.  It looks
> like a good fit for the two callers and makes sense in general, though.

Yeah, I almost wrote "strbuf_add_normalized_path()" instead. But then
the callers end up having to do the allocate-and-swap thing themselves.
And I think we're still set in the future to add that if somebody wants
it (and we can then implement the in-place version in terms of it).

Another alternative is to observe that the strbuf is generally used in
the first place to make the path absolute. So another interface is
perhaps something like:

  strbuf_add_path(struct strbuf *sb, const char *path,
  const char *relative_base)
  {
struct strbuf scratch = STRBUF_INIT;
int ret;

if (is_absolute_path(path))
strbuf_grow(sb, strlen(path));
else {
if (relative_path)
strbuf_addstr(, path);
else {
if (strbuf_getcwd())
return -1;
}
strbuf_addch(, '/');
strbuf_addstr(, path);

strbuf_grow(sb, scratch.len);
path = scratch.buf;
}

ret = normalize_path_copy(sb.buf + sb.len, path);
strbuf_release();
return ret;
  }

I don't think its worth the complexity of interface for the spots in
this series, but maybe there are other places that could use it. I'll
leave that to somebody else to explore if the ywant to.

-Peff


Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-10-05 Thread René Scharfe

Am 03.10.2016 um 22:34 schrieb Jeff King:

When we add a new alternate to the list, we try to normalize
out any redundant "..", etc. However, we do not look at the
return value of normalize_path_copy(), and will happily
continue with a path that could not be normalized. Worse,
the normalizing process is done in-place, so we are left
with whatever half-finished working state the normalizing
function was in.

Fortunately, this cannot cause us to read past the end of
our buffer, as that working state will always leave the
NUL from the original path in place. And we do tend to
notice problems when we check is_directory() on the path.
But you can see the nonsense that we feed to is_directory
with an entry like:

  this/../../is/../../way/../../too/../../deep/../../to/../../resolve

in your objects/info/alternates, which yields:

  error: object directory
  
/to/e/deep/too/way//ects/this/../../is/../../way/../../too/../../deep/../../to/../../resolve
  does not exist; check .git/objects/info/alternates.

We can easily fix this just by checking the return value.
But that makes it hard to generate a good error message,
since we're normalizing in-place and our input value has
been overwritten by cruft.

Instead, let's provide a strbuf helper that does an in-place
normalize, but restores the original contents on error. This
uses a second buffer under the hood, which is slightly less
efficient, but this is not a performance-critical code path.


Hmm, in-place functions are quite rare in the strbuf collection.  It 
looks like a good fit for the two callers and makes sense in general, 
though.


Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-10-04 Thread Junio C Hamano
Jeff King  writes:

> +int strbuf_normalize_path(struct strbuf *src)
> +{
> + struct strbuf dst = STRBUF_INIT;
> +
> + strbuf_grow(, src->len);
> + if (normalize_path_copy(dst.buf, src->buf) < 0) {
> + strbuf_release();
> + return -1;
> + }
> +
> + /*
> +  * normalize_path does not tell us the new length, so we have to
> +  * compute it by looking for the new NUL it placed
> +  */
> + strbuf_setlen(, strlen(dst.buf));
> + strbuf_swap(src, );
> + strbuf_release();
> + return 0;
> +}

Makes sense.



Re: [PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-10-04 Thread Jacob Keller
On Mon, Oct 3, 2016 at 1:34 PM, Jeff King  wrote:
> When we add a new alternate to the list, we try to normalize
> out any redundant "..", etc. However, we do not look at the
> return value of normalize_path_copy(), and will happily
> continue with a path that could not be normalized. Worse,
> the normalizing process is done in-place, so we are left
> with whatever half-finished working state the normalizing
> function was in.
>
> Fortunately, this cannot cause us to read past the end of
> our buffer, as that working state will always leave the
> NUL from the original path in place. And we do tend to
> notice problems when we check is_directory() on the path.
> But you can see the nonsense that we feed to is_directory
> with an entry like:
>
>   this/../../is/../../way/../../too/../../deep/../../to/../../resolve
>
> in your objects/info/alternates, which yields:
>
>   error: object directory
>   
> /to/e/deep/too/way//ects/this/../../is/../../way/../../too/../../deep/../../to/../../resolve
>   does not exist; check .git/objects/info/alternates.
>

Yikes, that doesn't seem helpful.

> We can easily fix this just by checking the return value.
> But that makes it hard to generate a good error message,
> since we're normalizing in-place and our input value has
> been overwritten by cruft.

Right. Definitely want to check the return value here...

>
> Instead, let's provide a strbuf helper that does an in-place
> normalize, but restores the original contents on error. This
> uses a second buffer under the hood, which is slightly less
> efficient, but this is not a performance-critical code path.
>

I agree, I don't think this duplication is really a big deal, since it
helps ensure that the function doesn't modify its arguments on error.

> The strbuf helper can also properly set the "len" parameter
> of the strbuf before returning. Just doing:
>
>   normalize_path_copy(buf.buf, buf.buf);
>
> will shorten the string, but leave buf.len at the original
> length. That may be confusing to later code which uses the
> strbuf.
>

Makes sense here. Properly setting the length will help prevent future issues.

Thanks,
Jake

> Signed-off-by: Jeff King 
> ---
>  sha1_file.c | 11 +--
>  strbuf.c| 20 
>  strbuf.h|  8 
>  3 files changed, 37 insertions(+), 2 deletions(-)
>
> diff --git a/sha1_file.c b/sha1_file.c
> index b9c1fa3..68571bd 100644
> --- a/sha1_file.c
> +++ b/sha1_file.c
> @@ -263,7 +263,12 @@ static int link_alt_odb_entry(const char *entry, const 
> char *relative_base,
> }
> strbuf_addstr(, entry);
>
> -   normalize_path_copy(pathbuf.buf, pathbuf.buf);
> +   if (strbuf_normalize_path() < 0) {
> +   error("unable to normalize alternate object path: %s",
> + pathbuf.buf);
> +   strbuf_release();
> +   return -1;
> +   }
>
> pfxlen = strlen(pathbuf.buf);
>
> @@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int 
> len, int sep,
> }
>
> strbuf_add_absolute_path(, get_object_directory());
> -   normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
> +   if (strbuf_normalize_path() < 0)
> +   die("unable to normalize object directory: %s",
> +   objdirbuf.buf);
>
> alt_copy = xmemdupz(alt, len);
> string_list_split_in_place(, alt_copy, sep, -1);
> diff --git a/strbuf.c b/strbuf.c
> index b839be4..8fec657 100644
> --- a/strbuf.c
> +++ b/strbuf.c
> @@ -870,3 +870,23 @@ void strbuf_stripspace(struct strbuf *sb, int 
> skip_comments)
>
> strbuf_setlen(sb, j);
>  }
> +
> +int strbuf_normalize_path(struct strbuf *src)
> +{
> +   struct strbuf dst = STRBUF_INIT;
> +
> +   strbuf_grow(, src->len);
> +   if (normalize_path_copy(dst.buf, src->buf) < 0) {
> +   strbuf_release();
> +   return -1;
> +   }
> +
> +   /*
> +* normalize_path does not tell us the new length, so we have to
> +* compute it by looking for the new NUL it placed
> +*/

And we can't correctly set the length inside normalize_path_copy
because it just takes C strings directly and not actually a strbuf. Ok
so it makes sense that we have to set it here.

Thanks,
Jake

> +   strbuf_setlen(, strlen(dst.buf));
> +   strbuf_swap(src, );
> +   strbuf_release();
> +   return 0;
> +}
> diff --git a/strbuf.h b/strbuf.h
> index ba8d5f1..2262b12 100644
> --- a/strbuf.h
> +++ b/strbuf.h
> @@ -443,6 +443,14 @@ extern int strbuf_getcwd(struct strbuf *sb);
>   */
>  extern void strbuf_add_absolute_path(struct strbuf *sb, const char *path);
>


[PATCH 07/18] link_alt_odb_entry: handle normalize_path errors

2016-10-03 Thread Jeff King
When we add a new alternate to the list, we try to normalize
out any redundant "..", etc. However, we do not look at the
return value of normalize_path_copy(), and will happily
continue with a path that could not be normalized. Worse,
the normalizing process is done in-place, so we are left
with whatever half-finished working state the normalizing
function was in.

Fortunately, this cannot cause us to read past the end of
our buffer, as that working state will always leave the
NUL from the original path in place. And we do tend to
notice problems when we check is_directory() on the path.
But you can see the nonsense that we feed to is_directory
with an entry like:

  this/../../is/../../way/../../too/../../deep/../../to/../../resolve

in your objects/info/alternates, which yields:

  error: object directory
  
/to/e/deep/too/way//ects/this/../../is/../../way/../../too/../../deep/../../to/../../resolve
  does not exist; check .git/objects/info/alternates.

We can easily fix this just by checking the return value.
But that makes it hard to generate a good error message,
since we're normalizing in-place and our input value has
been overwritten by cruft.

Instead, let's provide a strbuf helper that does an in-place
normalize, but restores the original contents on error. This
uses a second buffer under the hood, which is slightly less
efficient, but this is not a performance-critical code path.

The strbuf helper can also properly set the "len" parameter
of the strbuf before returning. Just doing:

  normalize_path_copy(buf.buf, buf.buf);

will shorten the string, but leave buf.len at the original
length. That may be confusing to later code which uses the
strbuf.

Signed-off-by: Jeff King 
---
 sha1_file.c | 11 +--
 strbuf.c| 20 
 strbuf.h|  8 
 3 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/sha1_file.c b/sha1_file.c
index b9c1fa3..68571bd 100644
--- a/sha1_file.c
+++ b/sha1_file.c
@@ -263,7 +263,12 @@ static int link_alt_odb_entry(const char *entry, const 
char *relative_base,
}
strbuf_addstr(, entry);
 
-   normalize_path_copy(pathbuf.buf, pathbuf.buf);
+   if (strbuf_normalize_path() < 0) {
+   error("unable to normalize alternate object path: %s",
+ pathbuf.buf);
+   strbuf_release();
+   return -1;
+   }
 
pfxlen = strlen(pathbuf.buf);
 
@@ -335,7 +340,9 @@ static void link_alt_odb_entries(const char *alt, int len, 
int sep,
}
 
strbuf_add_absolute_path(, get_object_directory());
-   normalize_path_copy(objdirbuf.buf, objdirbuf.buf);
+   if (strbuf_normalize_path() < 0)
+   die("unable to normalize object directory: %s",
+   objdirbuf.buf);
 
alt_copy = xmemdupz(alt, len);
string_list_split_in_place(, alt_copy, sep, -1);
diff --git a/strbuf.c b/strbuf.c
index b839be4..8fec657 100644
--- a/strbuf.c
+++ b/strbuf.c
@@ -870,3 +870,23 @@ void strbuf_stripspace(struct strbuf *sb, int 
skip_comments)
 
strbuf_setlen(sb, j);
 }
+
+int strbuf_normalize_path(struct strbuf *src)
+{
+   struct strbuf dst = STRBUF_INIT;
+
+   strbuf_grow(, src->len);
+   if (normalize_path_copy(dst.buf, src->buf) < 0) {
+   strbuf_release();
+   return -1;
+   }
+
+   /*
+* normalize_path does not tell us the new length, so we have to
+* compute it by looking for the new NUL it placed
+*/
+   strbuf_setlen(, strlen(dst.buf));
+   strbuf_swap(src, );
+   strbuf_release();
+   return 0;
+}
diff --git a/strbuf.h b/strbuf.h
index ba8d5f1..2262b12 100644
--- a/strbuf.h
+++ b/strbuf.h
@@ -443,6 +443,14 @@ extern int strbuf_getcwd(struct strbuf *sb);
  */
 extern void strbuf_add_absolute_path(struct strbuf *sb, const char *path);
 
+
+/**
+ * Normalize in-place the path contained in the strbuf. See
+ * normalize_path_copy() for details. If an error occurs, the contents of "sb"
+ * are left untouched, and -1 is returned.
+ */
+extern int strbuf_normalize_path(struct strbuf *sb);
+
 /**
  * Strip whitespace from a buffer. The second parameter controls if
  * comments are considered contents to be removed or not.
-- 
2.10.0.618.g82cc264