Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-12-22 Thread Junio C Hamano
Stefan Beller  writes:

> On Tue, Dec 19, 2017 at 11:22 AM, Junio C Hamano  wrote:
>> I had to squash in the following to make 'pu' pass under
>> gettext-poison build.  Is this ready for 'next' otherwise?
>
> I saw that in pu, thanks for squashing. I should have spoken
> up earlier confirming it.

So is this ready for 'next' if it is squashed?

>> With the "log --find-object" thing, it may be that this no longer is
>> needed, but then again we haven't done anything with the other
>> Jonathan's idea to unify the --find-object thing into the --pickaxe
>> framework.
>
> I'll look into that after I finish looking at a submodule related bug.

Thanks.

>> It seems that we tend to open and then abandon new interests without
>> seeing them through completion, leaving too many loose ends untied.
>> This has to stop.
>
> I'll try to find my balance again.

Sorry, but this was not really about you, but was more about me.  

I probably should more aggressively drop a topic that stalled and
also merge a topic that reached the point of diminishing returns to
'next'.


Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-12-19 Thread Stefan Beller
On Tue, Dec 19, 2017 at 11:22 AM, Junio C Hamano  wrote:
> I had to squash in the following to make 'pu' pass under
> gettext-poison build.  Is this ready for 'next' otherwise?

I saw that in pu, thanks for squashing. I should have spoken
up earlier confirming it.

> With the "log --find-object" thing, it may be that this no longer is
> needed, but then again we haven't done anything with the other
> Jonathan's idea to unify the --find-object thing into the --pickaxe
> framework.

I'll look into that after I finish looking at a submodule related bug.

> It seems that we tend to open and then abandon new interests without
> seeing them through completion, leaving too many loose ends untied.
> This has to stop.

I'll try to find my balance again.

When working on too few topics at the same time, sometimes I get stalled
waiting for the mailing list (or internal reviewers) to respond, so I start many
topics, which then overwhelm me after a while once reviews catch up.

In this specific case it was not clear what the best way is, i.e. the
interest was
rather broad: "Hey I have this blob sha1, how do I get more information?",
which can be solved in many different ways, so I tried some of them.
(Another alternative just now considered: I could have wrapped that
script from stackoverflow into a new command and called it a day, though that
has very poor benefits for the rest of ecosystem, an addition to an established
powerful command gives more benefits IMHO)

Thanks,
Stefan


Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-12-19 Thread Junio C Hamano
I had to squash in the following to make 'pu' pass under
gettext-poison build.  Is this ready for 'next' otherwise?

With the "log --find-object" thing, it may be that this no longer is
needed, but then again we haven't done anything with the other
Jonathan's idea to unify the --find-object thing into the --pickaxe
framework.

It seems that we tend to open and then abandon new interests without
seeing them through completion, leaving too many loose ends untied.
This has to stop.

diff --git a/t/t6120-describe.sh b/t/t6120-describe.sh
index 4668f0058e..3e3fb462a0 100755
--- a/t/t6120-describe.sh
+++ b/t/t6120-describe.sh
@@ -341,7 +341,7 @@ test_expect_success 'describe directly tagged blob' '
 test_expect_success 'describe tag object' '
git tag test-blob-1 -a -m msg unique-file:file &&
test_must_fail git describe test-blob-1 2>actual &&
-   grep "fatal: test-blob-1 is neither a commit nor blob" actual
+   test_i18ngrep "fatal: test-blob-1 is neither a commit nor blob" actual
 '
 
 test_expect_failure ULIMIT_STACK_SIZE 'name-rev works in a deep repo' '


Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-11-27 Thread Junio C Hamano
Stefan Beller  writes:

>> I was reacting to your "fixed".  So will we see a rerolled series or
>> not?
>
> I was not planning on rerolling it.

OK.  Then I do not have to worry about this one and drop it perhaps.
One less topic on 'pu' is always good, if it is not active.

Enjoy your vacation.

Thanks.


Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-11-27 Thread Stefan Beller
On Thu, Nov 23, 2017 at 11:18 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> On Tue, Nov 21, 2017 at 11:55 PM, Junio C Hamano  wrote:
>>> Stefan Beller  writes:
>>>
 ...
 fixed.
 ...
 fixed the text...
 ...
 I am not fully convinced all descriptions are in recent history, but I
 tend to agree that most are, so probably the trade off is a wash.
>>>
>>> So what do we want with this topic?  I think the "teach 'git log' to
>>> highlight commits whose changes involve the given blob" is a more or
>>> less an orthogonal thing,
>>
>> Well, both of them solve our immediate needs, so I'd be fine with pursuing
>> just one of them, but I do not oppose taking both.
>>
>>> and I suspect that it is something users
>>> may (although I personally do not) find valuable to have a related
>>> but different feature in "git describe".
>>
>> agreed.
>
> I was reacting to your "fixed".  So will we see a rerolled series or
> not?

I was not planning on rerolling it.

FYI: I am on vacation until next Monday, so if there is anything
to be fixed here, I'd do it in a week, which may be enough time
to warrant incremental fixes?


Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-11-23 Thread Junio C Hamano
Stefan Beller  writes:

> On Tue, Nov 21, 2017 at 11:55 PM, Junio C Hamano  wrote:
>> Stefan Beller  writes:
>>
>>> ...
>>> fixed.
>>> ...
>>> fixed the text...
>>> ...
>>> I am not fully convinced all descriptions are in recent history, but I
>>> tend to agree that most are, so probably the trade off is a wash.
>>
>> So what do we want with this topic?  I think the "teach 'git log' to
>> highlight commits whose changes involve the given blob" is a more or
>> less an orthogonal thing,
>
> Well, both of them solve our immediate needs, so I'd be fine with pursuing
> just one of them, but I do not oppose taking both.
>
>> and I suspect that it is something users
>> may (although I personally do not) find valuable to have a related
>> but different feature in "git describe".
>
> agreed.

I was reacting to your "fixed".  So will we see a rerolled series or
not?


Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-11-22 Thread Stefan Beller
On Tue, Nov 21, 2017 at 11:55 PM, Junio C Hamano  wrote:
> Stefan Beller  writes:
>
>> ...
>> fixed.
>> ...
>> fixed the text...
>> ...
>> I am not fully convinced all descriptions are in recent history, but I
>> tend to agree that most are, so probably the trade off is a wash.
>
> So what do we want with this topic?  I think the "teach 'git log' to
> highlight commits whose changes involve the given blob" is a more or
> less an orthogonal thing,

Well, both of them solve our immediate needs, so I'd be fine with pursuing
just one of them, but I do not oppose taking both.

> and I suspect that it is something users
> may (although I personally do not) find valuable to have a related
> but different feature in "git describe".

agreed.


Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-11-21 Thread Junio C Hamano
Stefan Beller  writes:

> ...
> fixed.
> ...
> fixed the text...
> ...
> I am not fully convinced all descriptions are in recent history, but I
> tend to agree that most are, so probably the trade off is a wash.

So what do we want with this topic?  I think the "teach 'git log' to
highlight commits whose changes involve the given blob" is a more or
less an orthogonal thing, and I suspect that it is something users
may (although I personally do not) find valuable to have a related
but different feature in "git describe".




Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-11-20 Thread Philip Oakley

From: "Stefan Beller" 
Sent: Thursday, November 16, 2017 2:00 AM

[in catch up mode..]


Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

When describing commits, we try to anchor them to tags or refs, as these
are conceptually on a higher level than the commit. And if there is no ref
or tag that matches exactly, we're out of luck.  So we employ a heuristic
to make up a name for the commit. These names are ambiguous, there might
be different tags or refs to anchor to, and there might be different
path in the DAG to travel to arrive at the commit precisely.

When describing a blob, we want to describe the blob from a higher layer
as well, which is a tuple of (commit, deep/path) as the tree objects
involved are rather uninteresting.  The same blob can be referenced by
multiple commits, so how we decide which commit to use?  This patch
implements a rather naive approach on this: As there are no back pointers
from blobs to commits in which the blob occurs, we'll start walking from
any tips available, listing the blobs in-order of the commit and once we
found the blob, we'll take the first commit that listed the blob. For
example

 git describe --tags v0.99:Makefile
 conversion-901-g7672db20c2:Makefile

tells us the Makefile as it was in v0.99 was introduced in commit 
7672db20.


The walking is performed in reverse order to show the introduction of a
blob rather than its last occurrence.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob

Signed-off-by: Stefan Beller 
---
Documentation/git-describe.txt | 18 ++--
builtin/describe.c | 62 
++

t/t6120-describe.sh| 34 +++
3 files changed, 107 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-describe.txt 
b/Documentation/git-describe.txt

index c924c945ba..e027fb8c4b 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -3,14 +3,14 @@ git-describe(1)

NAME

-git-describe - Describe a commit using the most recent tag reachable from 
it

-
+git-describe - Give an object a human readable name based on an available 
ref


SYNOPSIS

[verse]
'git describe' [--all] [--tags] [--contains] [--abbrev=] 
[...]
'git describe' [--all] [--tags] [--contains] 
[--abbrev=] --dirty[=]

+'git describe' 

DESCRIPTION
---
@@ -24,6 +24,12 @@ By default (without --all or --tags) `git describe` 
only shows

annotated tags.  For more information about creating annotated tags
see the -a and -s options to linkgit:git-tag[1].

+If the given object refers to a blob, it will be described
+as `:`, such that the blob can be found
+at `` in the ``, which itself describes the
+first commit in which this blob occurs in a reverse revision walk
+from HEAD.
+
OPTIONS
---
...::
@@ -186,6 +192,14 @@ selected and output.  Here fewest commits different 
is defined as

the number of commits which would be shown by `git log tag..input`
will be the smallest number of commits possible.

+BUGS
+
+
+Tree objects as well as tag objects not pointing at commits, cannot be 
described.


Is this true? Is it stand alone from the describing of a blob? If so should 
it be its own patchlet. - I thought I'd read that within the series there is 
now a tree / tag (of blob/trees) description capability.


I'd prefer that we don't start with the "can't" view (relative to the 
subsequent sentences of the paragraph). It puts off the reader - we are 
about to say what can be described but in a limited way - the limitation 
being the bug. Maybe just swap the line to form a second paragraph.


+When describing blobs, the lightweight tags pointing at blobs are 
ignored,
+but the blob is still described as : despite the 
lightweight

+tag being favorable.
+

--
Philip


GIT
---
Part of the linkgit:git[1] suite
diff --git a/builtin/describe.c b/builtin/describe.c
index 9e9a5ed5d4..5b4bfaba3f 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -3,6 +3,7 @@
#include "lockfile.h"
#include "commit.h"
#include "tag.h"
+#include "blob.h"
#include "refs.h"
#include "builtin.h"
#include "exec_cmd.h"
@@ -11,8 +12,9 @@
#include "hashmap.h"
#include "argv-array.h"
#include "run-command.h"
+#include "revision.h"
+#include "list-objects.h"

-#define SEEN (1u << 0)
#define MAX_TAGS (FLAG_BITS - 1)

static const char * const describe_usage[] = {
@@ -434,6 +436,53 @@ static void describe_commit(struct object_id *oid, 
struct strbuf *dst)

 strbuf_addstr(dst, suffix);
}

+struct process_commit_data {
+ struct object_id current_commit;
+ struct object_id looking_for;
+ struct strbuf *dst;
+ struct rev_info *revs;
+};
+
+static void process_commit(struct commit *commit, void *data)
+{
+ struct process_commit_data *pcd = data;
+ pcd->current_commit = commit->object.oid;
+}
+
+static void process_object(struct 

Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-11-16 Thread Stefan Beller
On Wed, Nov 15, 2017 at 7:24 PM, Junio C Hamano  wrote:

> I am not sure if "And if there is ..." is adding much value here (I
> do not think it is even technically correct for that matter).  If
> there are more than one tag that point at the commit the user is
> interested in, we use one of the tags, as tags conceptually sit at a
> higher level.  And we use a heuristic to use one or the other tag to
> make up a name for the commit, so the same commit can have two valid
> names. ---So what?  Neither of these two valid names is "ambigous";
> the commit object the user wanted to name _is_ correctly identified
> (I would assume that we are not discussing a hash collision).
>
> Lucikly, if we remove "And if...precisely", the logic still flows
> nicely, if not more, to the next paragraph.

fixed.

>> When describing a blob, we want to describe the blob from a higher layer
>> as well, which is a tuple of (commit, deep/path) as the tree objects
>> involved are rather uninteresting.  The same blob can be referenced by
>> multiple commits, so how we decide which commit to use?  This patch
>> implements a rather naive approach on this: As there are no back pointers
>> from blobs to commits in which the blob occurs, we'll start walking from
>> any tips available, listing the blobs in-order of the commit and once we
>
> Is "any tips" still the case?  I was wondering why you start your
> traversal at HEAD and nothing else in this iteration.  There seems
> to be no mention of this design decision in the documentation and no
> justification in the log.

fixed the text. The design decision to reverse walk HEAD is tied to
your observation below:

> The reversing may improve the chance of an older commit to be chosen
> rather than the newer one, but it does not even guarantee to show the
> "introduction".

This is what I realized when we started walking all refs including reflog.
The introduction can only be found when we take the commit-parents
into account and look at the diffs. I noticed you started
origin/jc/diff-blobfind
which may be helpful to find the introduction correctly, which I'll play around
with that and see if I can get that working.

> What this guarantees is that a long history will be traversed fully
> before we start considering which commit has the blob of interest, I
> am afraid.  Is this a sensible trade-off?

I am not fully convinced all descriptions are in recent history, but I
tend to agree that most are, so probably the trade off is a wash.


Re: [PATCHv5 7/7] builtin/describe.c: describe a blob

2017-11-15 Thread Junio C Hamano
Stefan Beller  writes:

> When describing commits, we try to anchor them to tags or refs, as these
> are conceptually on a higher level than the commit. And if there is no ref
> or tag that matches exactly, we're out of luck.  So we employ a heuristic
> to make up a name for the commit. These names are ambiguous, there might
> be different tags or refs to anchor to, and there might be different
> path in the DAG to travel to arrive at the commit precisely.

I am not sure if "And if there is ..." is adding much value here (I
do not think it is even technically correct for that matter).  If
there are more than one tag that point at the commit the user is
interested in, we use one of the tags, as tags conceptually sit at a
higher level.  And we use a heuristic to use one or the other tag to
make up a name for the commit, so the same commit can have two valid
names. ---So what?  Neither of these two valid names is "ambigous";
the commit object the user wanted to name _is_ correctly identified
(I would assume that we are not discussing a hash collision).

Lucikly, if we remove "And if...precisely", the logic still flows
nicely, if not more, to the next paragraph.

> When describing a blob, we want to describe the blob from a higher layer
> as well, which is a tuple of (commit, deep/path) as the tree objects
> involved are rather uninteresting.  The same blob can be referenced by
> multiple commits, so how we decide which commit to use?  This patch
> implements a rather naive approach on this: As there are no back pointers
> from blobs to commits in which the blob occurs, we'll start walking from
> any tips available, listing the blobs in-order of the commit and once we

Is "any tips" still the case?  I was wondering why you start your
traversal at HEAD and nothing else in this iteration.  There seems
to be no mention of this design decision in the documentation and no
justification in the log.

> found the blob, we'll take the first commit that listed the blob. For
> example
>
>   git describe --tags v0.99:Makefile
>   conversion-901-g7672db20c2:Makefile
>
> tells us the Makefile as it was in v0.99 was introduced in commit 7672db20.
>
> The walking is performed in reverse order to show the introduction of a
> blob rather than its last occurrence.

The reversing may improve the chance of an older commit to be chosen
rather than the newer one, but it does not even guarantee to show the
"introduction".

What this guarantees is that a long history will be traversed fully
before we start considering which commit has the blob of interest, I
am afraid.  Is this a sensible trade-off?

> + argv_array_pushl(, "internal: The first arg is not parsed",
> + "--objects", "--in-commit-order", "--reverse", "HEAD",
> + NULL);



[PATCHv5 7/7] builtin/describe.c: describe a blob

2017-11-15 Thread Stefan Beller
Sometimes users are given a hash of an object and they want to
identify it further (ex.: Use verify-pack to find the largest blobs,
but what are these? or [1])

When describing commits, we try to anchor them to tags or refs, as these
are conceptually on a higher level than the commit. And if there is no ref
or tag that matches exactly, we're out of luck.  So we employ a heuristic
to make up a name for the commit. These names are ambiguous, there might
be different tags or refs to anchor to, and there might be different
path in the DAG to travel to arrive at the commit precisely.

When describing a blob, we want to describe the blob from a higher layer
as well, which is a tuple of (commit, deep/path) as the tree objects
involved are rather uninteresting.  The same blob can be referenced by
multiple commits, so how we decide which commit to use?  This patch
implements a rather naive approach on this: As there are no back pointers
from blobs to commits in which the blob occurs, we'll start walking from
any tips available, listing the blobs in-order of the commit and once we
found the blob, we'll take the first commit that listed the blob. For
example

  git describe --tags v0.99:Makefile
  conversion-901-g7672db20c2:Makefile

tells us the Makefile as it was in v0.99 was introduced in commit 7672db20.

The walking is performed in reverse order to show the introduction of a
blob rather than its last occurrence.

[1] https://stackoverflow.com/questions/223678/which-commit-has-this-blob

Signed-off-by: Stefan Beller 
---
 Documentation/git-describe.txt | 18 ++--
 builtin/describe.c | 62 ++
 t/t6120-describe.sh| 34 +++
 3 files changed, 107 insertions(+), 7 deletions(-)

diff --git a/Documentation/git-describe.txt b/Documentation/git-describe.txt
index c924c945ba..e027fb8c4b 100644
--- a/Documentation/git-describe.txt
+++ b/Documentation/git-describe.txt
@@ -3,14 +3,14 @@ git-describe(1)
 
 NAME
 
-git-describe - Describe a commit using the most recent tag reachable from it
-
+git-describe - Give an object a human readable name based on an available ref
 
 SYNOPSIS
 
 [verse]
 'git describe' [--all] [--tags] [--contains] [--abbrev=] [...]
 'git describe' [--all] [--tags] [--contains] [--abbrev=] --dirty[=]
+'git describe' 
 
 DESCRIPTION
 ---
@@ -24,6 +24,12 @@ By default (without --all or --tags) `git describe` only 
shows
 annotated tags.  For more information about creating annotated tags
 see the -a and -s options to linkgit:git-tag[1].
 
+If the given object refers to a blob, it will be described
+as `:`, such that the blob can be found
+at `` in the ``, which itself describes the
+first commit in which this blob occurs in a reverse revision walk
+from HEAD.
+
 OPTIONS
 ---
 ...::
@@ -186,6 +192,14 @@ selected and output.  Here fewest commits different is 
defined as
 the number of commits which would be shown by `git log tag..input`
 will be the smallest number of commits possible.
 
+BUGS
+
+
+Tree objects as well as tag objects not pointing at commits, cannot be 
described.
+When describing blobs, the lightweight tags pointing at blobs are ignored,
+but the blob is still described as : despite the lightweight
+tag being favorable.
+
 GIT
 ---
 Part of the linkgit:git[1] suite
diff --git a/builtin/describe.c b/builtin/describe.c
index 9e9a5ed5d4..5b4bfaba3f 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -3,6 +3,7 @@
 #include "lockfile.h"
 #include "commit.h"
 #include "tag.h"
+#include "blob.h"
 #include "refs.h"
 #include "builtin.h"
 #include "exec_cmd.h"
@@ -11,8 +12,9 @@
 #include "hashmap.h"
 #include "argv-array.h"
 #include "run-command.h"
+#include "revision.h"
+#include "list-objects.h"
 
-#define SEEN   (1u << 0)
 #define MAX_TAGS   (FLAG_BITS - 1)
 
 static const char * const describe_usage[] = {
@@ -434,6 +436,53 @@ static void describe_commit(struct object_id *oid, struct 
strbuf *dst)
strbuf_addstr(dst, suffix);
 }
 
+struct process_commit_data {
+   struct object_id current_commit;
+   struct object_id looking_for;
+   struct strbuf *dst;
+   struct rev_info *revs;
+};
+
+static void process_commit(struct commit *commit, void *data)
+{
+   struct process_commit_data *pcd = data;
+   pcd->current_commit = commit->object.oid;
+}
+
+static void process_object(struct object *obj, const char *path, void *data)
+{
+   struct process_commit_data *pcd = data;
+
+   if (!oidcmp(>looking_for, >oid) && !pcd->dst->len) {
+   reset_revision_walk();
+   describe_commit(>current_commit, pcd->dst);
+   strbuf_addf(pcd->dst, ":%s", path);
+   free_commit_list(pcd->revs->commits);
+   pcd->revs->commits = NULL;
+   }
+}
+
+static void describe_blob(struct object_id oid, struct strbuf *dst)
+{
+   struct rev_info revs;
+   struct