Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all

2018-06-13 Thread Kirill Smelkov
On Wed, Jun 13, 2018 at 05:13:02PM -0400, Jeff King wrote:
> On Tue, Jun 12, 2018 at 06:54:17PM +, Kirill Smelkov wrote:
> 
> > > If an extra connection isn't a problem, you might be better off with
> > > "git ls-remote", and then picking through the results for refs of
> > > interest, and then "git fetch-pack" to actually get the pack. That's how
> > > git-fetch worked when it was a shell script (e.g., see c3a200120d, the
> > > last shell version).
> > 
> > Yes, this is what I ended up doing:
> > 
> > https://lab.nexedi.com/kirr/git-backup/commit/899103bf
> > 
> > but for another reason - to avoid repeating for every fetched repository
> > slow (in case of my "big" destination backup repository) quickfetch()
> > checking in every spawned `git fetch`: git-backup can build index of
> > objects we already have ourselves only once at startup, and then in
> > fetch, after checking lsremote output, consult that index, and if we see
> > we already have everything for an advertised reference - just avoid
> > giving it to fetch-pack to process. It turns out for many pulled
> > repositories there is usually no references changed at all and this way
> > fetch-pack can be skipped completely:
> > 
> > https://lab.nexedi.com/kirr/git-backup/commit/3efed898
> 
> Thanks for sharing that, it's an interesting case. I'd hope that
> git-fetch is smart enough not to even bother with quickfetch() if there
> are no refs to update. But if we have even one change to fetch, then
> yeah, in the general case it makes sense to me that you could do better
> by amortizing the scan of local objects across many operations.

Thanks for feedback. For the reference in case of git-backup `git fetch`
or `git fetch-pack` would have to always do quickfetch scan or equivalent,
because in case of backup repo there is only one reference in it - its
master - and references of backed up repositories do not have anything
representing them in backup.git/refs/ .

Kirill


Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all

2018-06-13 Thread Jeff King
On Tue, Jun 12, 2018 at 06:54:17PM +, Kirill Smelkov wrote:

> > If an extra connection isn't a problem, you might be better off with
> > "git ls-remote", and then picking through the results for refs of
> > interest, and then "git fetch-pack" to actually get the pack. That's how
> > git-fetch worked when it was a shell script (e.g., see c3a200120d, the
> > last shell version).
> 
> Yes, this is what I ended up doing:
> 
> https://lab.nexedi.com/kirr/git-backup/commit/899103bf
> 
> but for another reason - to avoid repeating for every fetched repository
> slow (in case of my "big" destination backup repository) quickfetch()
> checking in every spawned `git fetch`: git-backup can build index of
> objects we already have ourselves only once at startup, and then in
> fetch, after checking lsremote output, consult that index, and if we see
> we already have everything for an advertised reference - just avoid
> giving it to fetch-pack to process. It turns out for many pulled
> repositories there is usually no references changed at all and this way
> fetch-pack can be skipped completely:
> 
> https://lab.nexedi.com/kirr/git-backup/commit/3efed898

Thanks for sharing that, it's an interesting case. I'd hope that
git-fetch is smart enough not to even bother with quickfetch() if there
are no refs to update. But if we have even one change to fetch, then
yeah, in the general case it makes sense to me that you could do better
by amortizing the scan of local objects across many operations.

-Peff


Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all

2018-06-12 Thread Kirill Smelkov
On Tue, Jun 12, 2018 at 05:48:49AM -0400, Jeff King wrote:
> On Mon, Jun 11, 2018 at 09:43:02AM +, Kirill Smelkov wrote:
> 
> > > Looking deeper, we do not need these trees and blobs at all. The problem
> > > is really just a tag that peels to an object that is not otherwise a ref
> > > tip, regardless of its type.
> > 
> > Thanks for feedback and for coming up with the fix. Sure, I'm ok with
> > moving the test into your patch. However, even if a test becomes
> > different - narrowing down root of _current_ problem, I suggest to also
> > keep explicitly testing tag-to-blob and tag-to-tree (and if we really
> > also want tag-to-commit and tag-to-tag) behaviour. Reason is: if we skip
> > those now, they can potentially break in the future.
> 
> Yeah, I have no problem testing these cases separately. There's no bug
> with them now, but it is a slightly uncommon case. My suggestion would
> be to submit a patch that goes on top of mine that covers these cases.

Ok, I will try to do it.


> > I would also suggest to fix upload-pack, as it is just not consistent to
> > reject sending objects that were advertised, and so can strike again
> > some way in the future. After all git.git's fetch-pack is not the only
> > git client that should be possible to interact with git.git's
> > upload-pack on remote side, right?
> 
> No, it's not the only client. At the same time, I am on the fence over
> whether upload-pack's behavior is wrong or not. It depends what you take
> a peeled advertisement line to mean. Does it mean: this object has been
> advertised and clients should be able to fetch it? Or does it mean: by
> the way, you may be interested to know the peeled value of this tag in
> case you want to do tag-following?
> 
> So far I think it has only meant the latter. I could see an argument for
> the former, but any client depending on that would never have worked,
> AFAICT. We could _make_ it work, but how would a client know which
> server version it's talking to (and therefore whether it is safe to make
> the request?). I think you'd have to add a capability to negotiate.

I see. I don't know the details of the exchange, just it was surprising
for outside observer that fetching what was advertised is rejected. For
the reference there is no strong need for me for this to work anymore
(please see below).


> > I'm not sure, but I would say that `fetch-pack --all` from an empty
> > repository should not fail and should just give empty output as fetch
> > does.
> 
> Yeah, that seems reasonable to me. The die() that catches this dates
> back to 2005-era, and we later taught the "fetch" porcelain to handle
> this. I don't _think_ anybody would be upset that the plumbing learned
> to treat this as a noop. It's probably a one-liner change in
> fetch_pack() to return early instead of dying.

Ok, I will try to send related testcase, and it is indeed easy to find
- the fix itself.


> > For the reference all the cases presented here are real - they appear in
> > our repositories on lab.nexedi.com for which I maintain the backup, and
> > I've noticed them in the process of switching git-backup from using
> > fetch to fetch-pack here:
> > 
> > https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L436
> 
> I applaud you using the porcelain for your scripts, but I suspect that
> fetch-pack by itself is not at all well-used or well-tested these days
> (certainly this --all bug has been around for almost 6 years and is not
> very hard to trigger in practice).

I see; thanks for the warning.


> If an extra connection isn't a problem, you might be better off with
> "git ls-remote", and then picking through the results for refs of
> interest, and then "git fetch-pack" to actually get the pack. That's how
> git-fetch worked when it was a shell script (e.g., see c3a200120d, the
> last shell version).

Yes, this is what I ended up doing:

https://lab.nexedi.com/kirr/git-backup/commit/899103bf

but for another reason - to avoid repeating for every fetched repository
slow (in case of my "big" destination backup repository) quickfetch()
checking in every spawned `git fetch`: git-backup can build index of
objects we already have ourselves only once at startup, and then in
fetch, after checking lsremote output, consult that index, and if we see
we already have everything for an advertised reference - just avoid
giving it to fetch-pack to process. It turns out for many pulled
repositories there is usually no references changed at all and this way
fetch-pack can be skipped completely:

https://lab.nexedi.com/kirr/git-backup/commit/3efed898

> It may also be sane to just use "git fetch", which I'd say is _fairly_
> safe to script. Of course I have no problem if you want to fix all of
> the corner cases in fetch-pack. Just giving you fair warning. :)

Thanks again for the warning. I'm happy the switch to fetch plumbing
happenned on my side, and so far it is working well. Like I said above I
cannot use `git fetch` as is, 

Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all

2018-06-12 Thread Jeff King
On Mon, Jun 11, 2018 at 09:43:02AM +, Kirill Smelkov wrote:

> > Looking deeper, we do not need these trees and blobs at all. The problem
> > is really just a tag that peels to an object that is not otherwise a ref
> > tip, regardless of its type.
> 
> Thanks for feedback and for coming up with the fix. Sure, I'm ok with
> moving the test into your patch. However, even if a test becomes
> different - narrowing down root of _current_ problem, I suggest to also
> keep explicitly testing tag-to-blob and tag-to-tree (and if we really
> also want tag-to-commit and tag-to-tag) behaviour. Reason is: if we skip
> those now, they can potentially break in the future.

Yeah, I have no problem testing these cases separately. There's no bug
with them now, but it is a slightly uncommon case. My suggestion would
be to submit a patch that goes on top of mine that covers these cases.

> I would also suggest to fix upload-pack, as it is just not consistent to
> reject sending objects that were advertised, and so can strike again
> some way in the future. After all git.git's fetch-pack is not the only
> git client that should be possible to interact with git.git's
> upload-pack on remote side, right?

No, it's not the only client. At the same time, I am on the fence over
whether upload-pack's behavior is wrong or not. It depends what you take
a peeled advertisement line to mean. Does it mean: this object has been
advertised and clients should be able to fetch it? Or does it mean: by
the way, you may be interested to know the peeled value of this tag in
case you want to do tag-following?

So far I think it has only meant the latter. I could see an argument for
the former, but any client depending on that would never have worked,
AFAICT. We could _make_ it work, but how would a client know which
server version it's talking to (and therefore whether it is safe to make
the request?). I think you'd have to add a capability to negotiate.

> I'm not sure, but I would say that `fetch-pack --all` from an empty
> repository should not fail and should just give empty output as fetch
> does.

Yeah, that seems reasonable to me. The die() that catches this dates
back to 2005-era, and we later taught the "fetch" porcelain to handle
this. I don't _think_ anybody would be upset that the plumbing learned
to treat this as a noop. It's probably a one-liner change in
fetch_pack() to return early instead of dying.

> For the reference all the cases presented here are real - they appear in
> our repositories on lab.nexedi.com for which I maintain the backup, and
> I've noticed them in the process of switching git-backup from using
> fetch to fetch-pack here:
> 
> https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L436

I applaud you using the porcelain for your scripts, but I suspect that
fetch-pack by itself is not at all well-used or well-tested these days
(certainly this --all bug has been around for almost 6 years and is not
very hard to trigger in practice).

If an extra connection isn't a problem, you might be better off with
"git ls-remote", and then picking through the results for refs of
interest, and then "git fetch-pack" to actually get the pack. That's how
git-fetch worked when it was a shell script (e.g., see c3a200120d, the
last shell version).

It may also be sane to just use "git fetch", which I'd say is _fairly_
safe to script. Of course I have no problem if you want to fix all of
the corner cases in fetch-pack. Just giving you fair warning. :)

-Peff


Re: [PATCH v2] fetch-pack: don't try to fetch peel values with --all

2018-06-11 Thread Kirill Smelkov
Jeff,

On Mon, Jun 11, 2018 at 01:53:57AM -0400, Jeff King wrote:
> On Mon, Jun 11, 2018 at 01:28:23AM -0400, Eric Sunshine wrote:
> 
> > On Mon, Jun 11, 2018 at 12:47 AM, Jeff King  wrote:
> > > Subject: fetch-pack: don't try to fetch peeled values with --all
> > > [...]
> > > Original report and test from Kirill Smelkov.
> > >
> > > Signed-off-by: Kirill Smelkov 
> > > Signed-off-by: Jeff King 
> > > ---
> > > diff --git a/t/t5500-fetch-pack.sh b/t/t5500-fetch-pack.sh
> > > @@ -506,30 +506,45 @@ test_expect_success 'test missing ref before 
> > > existing' '
> > > +test_expect_success 'test --all wrt tag to non-commits' '
> > > +   blob_sha1=$(echo "hello blob" | git hash-object -t blob -w 
> > > --stdin) &&
> > > +   git tag -a -m "tag -> blob" tag-to-blob $blob_sha1 &&
> > > +   tree_sha1=$(printf "100644 blob $blob_sha1\tfile\n" | git mktree) 
> > > &&
> > 
> > Perhaps modernize these names to 'blob_oid' and 'tree_oid', or even
> > simpler, just 'blob' and 'tree'.
> 
> Looking deeper, we do not need these trees and blobs at all. The problem
> is really just a tag that peels to an object that is not otherwise a ref
> tip, regardless of its type.

Thanks for feedback and for coming up with the fix. Sure, I'm ok with
moving the test into your patch. However, even if a test becomes
different - narrowing down root of _current_ problem, I suggest to also
keep explicitly testing tag-to-blob and tag-to-tree (and if we really
also want tag-to-commit and tag-to-tag) behaviour. Reason is: if we skip
those now, they can potentially break in the future.

I would also suggest to fix upload-pack, as it is just not consistent to
reject sending objects that were advertised, and so can strike again
some way in the future. After all git.git's fetch-pack is not the only
git client that should be possible to interact with git.git's
upload-pack on remote side, right?

By the way, another problem I noticed with fetch-pack is that fetching
with --all from completely empty repository also fails:

.../r$ git init --bare repo.git
Initialized empty Git repository in /home/kirr/tmp/trashme/r/repo.git/
.../r$ mkdir clone
.../r$ cd clone/
.../r/clone$ git init
Initialized empty Git repository in /home/kirr/tmp/trashme/r/clone/.git/
.../r/clone$ git fetch-pack --all ../repo.git/
fatal: no matching remote head
.../r/clone$ echo $?
128
.../r/clone$ git ls-remote ../repo.git/
.../r/clone$ echo $?
0
.../r/clone$ git fetch ../repo.git/ 'refs/*:refs/repo/*'
.../r/clone$ echo $?
0

I'm not sure, but I would say that `fetch-pack --all` from an empty
repository should not fail and should just give empty output as fetch
does.

For the reference all the cases presented here are real - they appear in
our repositories on lab.nexedi.com for which I maintain the backup, and
I've noticed them in the process of switching git-backup from using
fetch to fetch-pack here:

https://lab.nexedi.com/kirr/git-backup/blob/0ab7bbb6/git-backup.go#L436

Kirill



> So below is a patch that simplifies the test even further (the actual
> code change is the same).
> 
> > > +   git tag -a -m "tag -> tree" tag-to-tree $tree_sha1 &&
> > > +   mkdir fetchall &&
> > > +   (
> > > +   cd fetchall &&
> > > +   git init &&
> > > +   git fetch-pack --all .. &&
> > 
> > Simpler:
> > 
> > git init fetchall &&
> > (
> > cd fetchall &&
> > git fetch-pack --all .. &&
> > 
> > Although, I see that this script already has a mix of the two styles
> > (simpler and not-so-simple), so...
> 
> The nearby tests actually reuse the "client" directory. We can do that,
> too, if we simply create new objects for our test, to make sure they
> still need fetching. See below (we could also use "git -C" there, but
> the subshell matches the other tests).
> 
> -- >8 --
> Subject: fetch-pack: don't try to fetch peel values with --all
> 
> When "fetch-pack --all" sees a tag-to-blob on the remote, it
> tries to fetch both the tag itself ("refs/tags/foo") and the
> peeled value that the remote advertises ("refs/tags/foo^{}").
> Asking for the object pointed to by the latter can cause
> upload-pack to complain with "not our ref", since it does
> not mark the peeled objects with the OUR_REF (unless they
> were at the tip of some other ref).
> 
> Arguably upload-pack _should_ be marking those peeled
> objects. But it never has in the past, since clients would
> generally just ask for the tag and expect to get the peeled
> value along with it. And that's how "git fetch" works, as
> well as older versions of "fetch-pack --all".
> 
> The problem was introduced by 5f0fc64513 (fetch-pack:
> eliminate spurious error messages, 2012-09-09). Before then,
> the matching logic was something like:
> 
>   if (refname is ill-formed)
>  do nothing
>   else if (doing --all)
>  always consider it