Re: [PATCH] builtin/describe.c: describe a blob
On Sun, Nov 12, 2017 at 5:33 PM, Junio C Hamanowrote: > Stefan Beller writes: > >> Sometimes users are given a hash of an object and they want to >> identify it further (ex.: Use verify-pack to find the largest blobs, >> but what are these? or [1]) > > Thanks for finishing the topic you started. > >> @@ -11,6 +11,7 @@ SYNOPSIS >> [verse] >> 'git describe' [--all] [--tags] [--contains] [--abbrev=] >> [...] >> 'git describe' [--all] [--tags] [--contains] [--abbrev=] --dirty[=] >> +'git describe' > > OK. > >> diff --git a/builtin/describe.c b/builtin/describe.c >> index 9e9a5ed5d4..acfd853a30 100644 >> --- a/builtin/describe.c >> +++ b/builtin/describe.c >> ... >> static void describe(const char *arg, int last_one) >> { >> ... >> @@ -445,11 +497,18 @@ static void describe(const char *arg, int last_one) >> ... >> + cmit = lookup_commit_reference_gently(, 1); >> + >> + if (cmit) >> + describe_commit(, ); >> + else if (lookup_blob()) { >> + if (all || tags || longformat || first_parent || >> + patterns.nr || exclude_patterns.nr || >> + always || dirty || broken) >> + die(_("options not available for describing blobs")); >> + describe_blob(oid, ); > > I am not sure if I agree with some of them. > >> + } else >> + die(_("%s is neither a commit nor blob"), arg); > > This side I would agree with, though. > > The caller of the describe() function is either > > * 'git describe' makes a single call to it with "HEAD" and >exits; or > * 'git describe A B C...' makes one call to it for each of these >command line arguments. > > And 'git describe ' code is most likely trigger from the latter, > as it is not likely for HEAD to be pointing at a blob. > > $ blob=$(git rev-parse master:Makefile) > $ git describe --always master $blob > > and trigger the above check. Is the check against "always" useful, > or is it better to simply ignore it while describing $blob, but > still keeping it in effect while describing 'master'? > > The 'dirty' and 'broken' check is redundant because we would have > already errored out if either of them is set before calling describe() > on user-supplied object names. > > If I understand the way "describe " works correctly, it > traverses the history with objects, doing a moral equivalent of > "rev-list --objects", stops when it finds the blob object with the > given name, and when it stops, it knows the commit object that > contains the blob and path in that commit to the blob. Then the > commit is described to be a human-readable string, so that the path > can be concatenated after it. > > Aren't these options that affect how a commit object is described in > effect and useful when you do the "internal" describe of the commit > you found the blob object in? IOW, wouldn't this > > $ git describe --exclude='*-wip' $blob > > make sure that in its output $commit:$path, $commit is not described > in terms of any "wip" refs? yes, we would exclude those refs. But the name alone (without reading the docs) suggests anything given in --exclude is excluded, so what about the blob $commit:path-wip/test1.c which I might want to exclude from the search using the exclude pattern? After reading the docs, this is silly of course, and the exclusion only applies to the commit description part. --all is an interesting case, as we pass --all to the revision walking machinery for blobs, but that is slightly different than the --all flag given to describe, which also only relates to the commit name that should be produced. So, I'll go through all the options and see which we can drop. Thanks, Stefan >
Re: [PATCH] builtin/describe.c: describe a blob
Stefan Bellerwrites: > Sometimes users are given a hash of an object and they want to > identify it further (ex.: Use verify-pack to find the largest blobs, > but what are these? or [1]) Thanks for finishing the topic you started. > @@ -11,6 +11,7 @@ SYNOPSIS > [verse] > 'git describe' [--all] [--tags] [--contains] [--abbrev=] [...] > 'git describe' [--all] [--tags] [--contains] [--abbrev=] --dirty[=] > +'git describe' OK. > diff --git a/builtin/describe.c b/builtin/describe.c > index 9e9a5ed5d4..acfd853a30 100644 > --- a/builtin/describe.c > +++ b/builtin/describe.c > ... > static void describe(const char *arg, int last_one) > { > ... > @@ -445,11 +497,18 @@ static void describe(const char *arg, int last_one) > ... > + cmit = lookup_commit_reference_gently(, 1); > + > + if (cmit) > + describe_commit(, ); > + else if (lookup_blob()) { > + if (all || tags || longformat || first_parent || > + patterns.nr || exclude_patterns.nr || > + always || dirty || broken) > + die(_("options not available for describing blobs")); > + describe_blob(oid, ); I am not sure if I agree with some of them. > + } else > + die(_("%s is neither a commit nor blob"), arg); This side I would agree with, though. The caller of the describe() function is either * 'git describe' makes a single call to it with "HEAD" and exits; or * 'git describe A B C...' makes one call to it for each of these command line arguments. And 'git describe ' code is most likely trigger from the latter, as it is not likely for HEAD to be pointing at a blob. $ blob=$(git rev-parse master:Makefile) $ git describe --always master $blob and trigger the above check. Is the check against "always" useful, or is it better to simply ignore it while describing $blob, but still keeping it in effect while describing 'master'? The 'dirty' and 'broken' check is redundant because we would have already errored out if either of them is set before calling describe() on user-supplied object names. If I understand the way "describe " works correctly, it traverses the history with objects, doing a moral equivalent of "rev-list --objects", stops when it finds the blob object with the given name, and when it stops, it knows the commit object that contains the blob and path in that commit to the blob. Then the commit is described to be a human-readable string, so that the path can be concatenated after it. Aren't these options that affect how a commit object is described in effect and useful when you do the "internal" describe of the commit you found the blob object in? IOW, wouldn't this $ git describe --exclude='*-wip' $blob make sure that in its output $commit:$path, $commit is not described in terms of any "wip" refs?
[PATCH] builtin/describe.c: describe a blob
Sometimes users are given a hash of an object and they want to identify it further (ex.: Use verify-pack to find the largest blobs, but what are these? or [1]) When describing commits, we try to anchor them to tags or refs, as these are conceptually on a higher level than the commit. And if there is no ref or tag that matches exactly, we're out of luck. So we employ a heuristic to make up a name for the commit. These names are ambiguous, there might be different tags or refs to anchor to, and there might be different path in the DAG to travel to arrive at the commit precisely. When describing a blob, we want to describe the blob from a higher layer as well, which is a tuple of (commit, deep/path) as the tree objects involved are rather uninteresting. The same blob can be referenced by multiple commits, so how we decide which commit to use? This patch implements a rather naive approach on this: As there are no back pointers from blobs to commits in which the blob occurs, we'll start walking from any tips available, listing the blobs in-order of the commit and once we found the blob, we'll take the first commit that listed the blob. For source code this is likely not the first commit that introduced the blob, but rather the latest commit that contained the blob. For example: git describe v0.99:Makefile v0.99-5-gab6625e06a:Makefile tells us the latest commit that contained the Makefile as it was in tag v0.99 is commit v0.99-5-gab6625e06a (and at the same path), as the next commit on top v0.99-6-gb1de9de2b9 ([PATCH] Bootstrap "make dist", 2005-07-11) touches the Makefile. Let's see how this description turns out, if it is useful in day-to-day use as I have the intuition that we'd rather want to see the *first* commit that this blob was introduced to the repository (which can be achieved easily by giving the `--reverse` flag in the describe_blob rev walk). [1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob Signed-off-by: Stefan BellerSigned-off-by: Junio C Hamano --- Documentation/git-describe.txt | 13 +++- builtin/describe.c | 71 ++ t/t6120-describe.sh| 15 + 3 files changed, 92 insertions(+), 7 deletions(-) diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt index c924c945ba..a25443ca91 100644 --- a/Documentation/git-describe.txt +++ b/Documentation/git-describe.txt @@ -3,7 +3,7 @@ git-describe(1) NAME -git-describe - Describe a commit using the most recent tag reachable from it +git-describe - Describe a commit or blob using the graph relations SYNOPSIS @@ -11,6 +11,7 @@ SYNOPSIS [verse] 'git describe' [--all] [--tags] [--contains] [--abbrev=] [...] 'git describe' [--all] [--tags] [--contains] [--abbrev=] --dirty[=] +'git describe' DESCRIPTION --- @@ -24,6 +25,16 @@ By default (without --all or --tags) `git describe` only shows annotated tags. For more information about creating annotated tags see the -a and -s options to linkgit:git-tag[1]. +If the given object refers to a blob, it will be described +as `:`, such that the blob can be found +at `` in the ``. Note, that the commit is likely +not the commit that introduced the blob, but the one that was found +first; to find the commit that introduced the blob, you need to find +the commit that last touched the path, e.g. +`git log -- `. +As blobs do not point at the commits they are contained in, +describing blobs is slow as we have to walk the whole graph. + OPTIONS --- ...:: diff --git a/builtin/describe.c b/builtin/describe.c index 9e9a5ed5d4..acfd853a30 100644 --- a/builtin/describe.c +++ b/builtin/describe.c @@ -3,6 +3,7 @@ #include "lockfile.h" #include "commit.h" #include "tag.h" +#include "blob.h" #include "refs.h" #include "builtin.h" #include "exec_cmd.h" @@ -11,8 +12,9 @@ #include "hashmap.h" #include "argv-array.h" #include "run-command.h" +#include "revision.h" +#include "list-objects.h" -#define SEEN (1u << 0) #define MAX_TAGS (FLAG_BITS - 1) static const char * const describe_usage[] = { @@ -434,6 +436,56 @@ static void describe_commit(struct object_id *oid, struct strbuf *dst) strbuf_addstr(dst, suffix); } +struct process_commit_data { + struct object_id current_commit; + struct object_id looking_for; + struct strbuf *dst; + struct rev_info *revs; +}; + +static void process_commit(struct commit *commit, void *data) +{ + struct process_commit_data *pcd = data; + pcd->current_commit = commit->object.oid; +} + +static void process_object(struct object *obj, const char *path, void *data) +{ + struct process_commit_data *pcd = data; + + if (!oidcmp(>looking_for, >oid) && !pcd->dst->len) { + reset_revision_walk(); + describe_commit(>current_commit, pcd->dst); + strbuf_addf(pcd->dst,