Re: [PATCH v2] test: set the realpath of CWD as TRASH_DIRECTORY

2012-08-29 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 But it also changes almost 600 *other* tests from succeed even in the
 presence of symlinks to never tested in the presence of symlinks, and
 I think that is surrendering more ground than necessary.

Ouch.  I did not know have 600+ tests that cares about CEILING.

 I would rather
 see one of the following approaches:

 *If* the official policy is that GIT_CEILING_DIRECTORIES is not reliable
 in the presence of symlinks, then (a) that limitation should be
 mentioned in the documentation; (b) the affected tests should either be
 skipped in the case of symlinked directories or they (alone!) should
 take measures to work around the problem.

What exactly is broken in CEILING?

I somehow thought that Jiang's patch was to make sure any variables
that contain pathnames (and make sure future paths we might grab out
of $(pwd)) are realpath without symlinks early in the test set-up,
and with that arrangement, no symlink gotcha should come into
picture, with or without CEILING.

Perhaps the setting of the CEILING has not been correctly converted
with the patch?

Or is there something fundamentally broken, even if we do not have
any symlinks involved, with CEILING check?

Puzzled..
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/3] revision (no-)walking in order

2012-08-29 Thread Martin von Zweigbergk
I'm still working on a re-roll of my rebase-range series, but I think
these three are quite unrelated and shouldn't be held up by that other
series.

Junio, thanks for all the help with explaining revision walking. It
was a little blurry for a long time, but at least I feel more
comfortable with these few patches now.

Btw, the rebase-range series seems to need (or be greatly simplified),
although I'm not 100% sure yet, by teaching patch-id --keep-empty,
which would be its first command line option. Let me know if you
(plural) sees a problem with that.

Btw2, I'm migrating my email to martinv...@gmail.com (not y...@google.com
;-) which saves a few keystrokes and matches some of my other
accounts, so these patches will be the first ones from the new
address.

Martin von Zweigbergk (3):
  teach log --no-walk=unsorted, which avoids sorting
  demonstrate broken 'git cherry-pick three one two'
  cherry-pick/revert: respect order of revisions to pick

 Documentation/rev-list-options.txt  | 12 
 builtin/log.c   |  2 +-
 builtin/revert.c|  2 +-
 revision.c  | 18 +++---
 revision.h  |  6 +-
 sequencer.c |  4 +++-
 t/t3508-cherry-pick-many-commits.sh | 15 +++
 t/t4202-log.sh  | 10 ++
 8 files changed, 58 insertions(+), 11 deletions(-)

-- 
1.7.11.1.104.ge7b44f1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/3] teach log --no-walk=unsorted, which avoids sorting

2012-08-29 Thread Martin von Zweigbergk
When 'git log' is passed the --no-walk option, no revision walk takes
place, naturally. Perhaps somewhat surprisingly, however, the provided
revisions still get sorted by commit date. So e.g 'git log --no-walk
HEAD HEAD~1' and 'git log --no-walk HEAD~1 HEAD' give the same result
(unless the two revisions share the commit date, in which case they
will retain the order given on the command line). As the commit that
introduced --no-walk (8e64006 (Teach revision machinery about
--no-walk, 2007-07-24)) points out, the sorting is intentional, to
allow things like

 git log --abbrev-commit --pretty=oneline --decorate --all --no-walk

to show all refs in order by commit date.

But there are also other cases where the sorting is not wanted, such
as

 command producing revisions in order |
   git log --oneline --no-walk --stdin

To accomodate both cases, leave the decision of whether or not to sort
up to the caller, by allowing --no-walk={sorted,unsorted}, defaulting
to 'sorted' for backward-compatibility reasons.

Signed-off-by: Martin von Zweigbergk martinv...@gmail.com
---
 Documentation/rev-list-options.txt | 12 
 builtin/log.c  |  2 +-
 builtin/revert.c   |  2 +-
 revision.c | 18 +++---
 revision.h |  6 +-
 t/t4202-log.sh | 10 ++
 6 files changed, 40 insertions(+), 10 deletions(-)

diff --git a/Documentation/rev-list-options.txt 
b/Documentation/rev-list-options.txt
index def1340..5436eba 100644
--- a/Documentation/rev-list-options.txt
+++ b/Documentation/rev-list-options.txt
@@ -636,10 +636,14 @@ These options are mostly targeted for packing of git 
repositories.
Only useful with '--objects'; print the object IDs that are not
in packs.
 
---no-walk::
-
-   Only show the given revs, but do not traverse their ancestors.
-   This has no effect if a range is specified.
+--no-walk[=(sorted|unsorted)]::
+
+   Only show the given commits, but do not traverse their ancestors.
+   This has no effect if a range is specified. If the argument
+   unsorted is given, the commits are show in the order they were
+   given on the command line. Otherwise (if sorted or no argument
+   was given), the commits are show in reverse chronological order
+   by commit time.
 
 --do-walk::
 
diff --git a/builtin/log.c b/builtin/log.c
index ecc2793..20838b1 100644
--- a/builtin/log.c
+++ b/builtin/log.c
@@ -456,7 +456,7 @@ int cmd_show(int argc, const char **argv, const char 
*prefix)
init_revisions(rev, prefix);
rev.diff = 1;
rev.always_show_header = 1;
-   rev.no_walk = 1;
+   rev.no_walk = REVISION_WALK_NO_WALK_SORTED;
rev.diffopt.stat_width = -1;/* Scale to real terminal size */
 
memset(opt, 0, sizeof(opt));
diff --git a/builtin/revert.c b/builtin/revert.c
index 82d1bf8..42ce399 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -193,7 +193,7 @@ static void parse_args(int argc, const char **argv, struct 
replay_opts *opts)
struct setup_revision_opt s_r_opt;
opts-revs = xmalloc(sizeof(*opts-revs));
init_revisions(opts-revs, NULL);
-   opts-revs-no_walk = 1;
+   opts-revs-no_walk = REVISION_WALK_NO_WALK_SORTED;
if (argc  2)
usage_with_options(usage_str, options);
memset(s_r_opt, 0, sizeof(s_r_opt));
diff --git a/revision.c b/revision.c
index 442a945..66ba2e6 100644
--- a/revision.c
+++ b/revision.c
@@ -1300,7 +1300,7 @@ static int handle_revision_opt(struct rev_info *revs, int 
argc, const char **arg
!strcmp(arg, --no-walk) || !strcmp(arg, --do-walk) ||
!strcmp(arg, --bisect) || !prefixcmp(arg, --glob=) ||
!prefixcmp(arg, --branches=) || !prefixcmp(arg, --tags=) ||
-   !prefixcmp(arg, --remotes=))
+   !prefixcmp(arg, --remotes=) || !prefixcmp(arg, --no-walk=))
{
unkv[(*unkc)++] = arg;
return 1;
@@ -1695,7 +1695,18 @@ static int handle_revision_pseudo_opt(const char 
*submodule,
} else if (!strcmp(arg, --not)) {
*flags ^= UNINTERESTING;
} else if (!strcmp(arg, --no-walk)) {
-   revs-no_walk = 1;
+   revs-no_walk = REVISION_WALK_NO_WALK_SORTED;
+   } else if (!prefixcmp(arg, --no-walk=)) {
+   /*
+* Detached form (--no-walk X as opposed to --no-walk=X)
+* not allowed, since the argument is optional.
+*/
+   if (!strcmp(arg + 10, sorted))
+   revs-no_walk = REVISION_WALK_NO_WALK_SORTED;
+   else if (!strcmp(arg + 10, unsorted))
+   revs-no_walk = REVISION_WALK_NO_WALK_UNSORTED;
+   else
+   return error(invalid argument to --no-walk);
} else if 

[PATCH v2 2/3] demonstrate broken 'git cherry-pick three one two'

2012-08-29 Thread Martin von Zweigbergk
Cherry-picking commits out of order (w.r.t. commit time stamp) doesn't
currently work. Add a test case to demonstrate it.

Signed-off-by: Martin von Zweigbergk martinv...@gmail.com
---
 t/t3508-cherry-pick-many-commits.sh | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/t/t3508-cherry-pick-many-commits.sh 
b/t/t3508-cherry-pick-many-commits.sh
index 75f7ff4..fff20c3 100755
--- a/t/t3508-cherry-pick-many-commits.sh
+++ b/t/t3508-cherry-pick-many-commits.sh
@@ -44,6 +44,21 @@ test_expect_success 'cherry-pick first..fourth works' '
check_head_differs_from fourth
 '
 
+test_expect_failure 'cherry-pick three one two works' '
+   git checkout -f first 
+   test_commit one 
+   test_commit two 
+   test_commit three 
+   git checkout -f master 
+   git reset --hard first 
+   git cherry-pick three one two 
+   git diff --quiet three 
+   git diff --quiet HEAD three 
+   test $(git log --reverse --format=%s first..) == three
+one
+two
+'
+
 test_expect_success 'output to keep user entertained during multi-pick' '
cat -\EOF expected 
[master OBJID] second
-- 
1.7.11.1.104.ge7b44f1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/3] cherry-pick/revert: respect order of revisions to pick

2012-08-29 Thread Martin von Zweigbergk
When giving multiple individual revisions to cherry-pick or revert, as
in 'git cherry-pick A B' or 'git revert B A', one would expect them to
be picked/reverted in the order given on the command line. They are
instead ordered by their commit timestamp -- in chronological order
for cherry-pick and in reverse chronological order for
revert. This matches the order in which one would usually give them
on the command line, making this bug somewhat hard to notice. Still,
it has been reported at least once before [1].

It seems like the chronological sorting happened by accident because
the revision walker has traditionally always sorted commits in reverse
chronological order when rev_info.no_walk was enabled. In the case of
'git revert B A' where B is newer than A, this sorting is a no-op. For
'git cherry-pick A B', the sorting would reverse the arguments, but
because the sequencer also flips the rev_info.reverse flag when
picking (as opposed to reverting), the end result is a chronological
order. The rev_info.reverse flag was probably flipped so that the
revision walker emits B before C in 'git cherry-pick A..C'; that it
happened to effectively undo the unexpected sorting done when not
walking, was probably a coincidence that allowed this bug to happen at
all.

Fix the bug by telling the revision walker not to sort the commits
when not walking. The only case we want to reverse the order is now
when cherry-picking and walking revisions (rev_info.no_walk = 0).

 [1] http://thread.gmane.org/gmane.comp.version-control.git/164794

Signed-off-by: Martin von Zweigbergk martinv...@gmail.com
---
 builtin/revert.c| 2 +-
 sequencer.c | 4 +++-
 t/t3508-cherry-pick-many-commits.sh | 2 +-
 3 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/builtin/revert.c b/builtin/revert.c
index 42ce399..98ad641 100644
--- a/builtin/revert.c
+++ b/builtin/revert.c
@@ -193,7 +193,7 @@ static void parse_args(int argc, const char **argv, struct 
replay_opts *opts)
struct setup_revision_opt s_r_opt;
opts-revs = xmalloc(sizeof(*opts-revs));
init_revisions(opts-revs, NULL);
-   opts-revs-no_walk = REVISION_WALK_NO_WALK_SORTED;
+   opts-revs-no_walk = REVISION_WALK_NO_WALK_UNSORTED;
if (argc  2)
usage_with_options(usage_str, options);
memset(s_r_opt, 0, sizeof(s_r_opt));
diff --git a/sequencer.c b/sequencer.c
index bf078f2..9f32104 100644
--- a/sequencer.c
+++ b/sequencer.c
@@ -543,7 +543,9 @@ static int do_pick_commit(struct commit *commit, struct 
replay_opts *opts)
 
 static void prepare_revs(struct replay_opts *opts)
 {
-   if (opts-action != REPLAY_REVERT)
+   // picking (but not reverting) ranges (but not individual revisions)
+   // should be done in reverse
+   if (opts-action == REPLAY_PICK  !opts-revs-no_walk)
opts-revs-reverse ^= 1;
 
if (prepare_revision_walk(opts-revs))
diff --git a/t/t3508-cherry-pick-many-commits.sh 
b/t/t3508-cherry-pick-many-commits.sh
index fff20c3..04b5ad4 100755
--- a/t/t3508-cherry-pick-many-commits.sh
+++ b/t/t3508-cherry-pick-many-commits.sh
@@ -44,7 +44,7 @@ test_expect_success 'cherry-pick first..fourth works' '
check_head_differs_from fourth
 '
 
-test_expect_failure 'cherry-pick three one two works' '
+test_expect_success 'cherry-pick three one two works' '
git checkout -f first 
test_commit one 
test_commit two 
-- 
1.7.11.1.104.ge7b44f1

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Funny 'git describe --contains' output

2012-08-29 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Greg KH gre...@linuxfoundation.org writes:

 In the Linux kernel tree, commit 0136db586c028f71e7cc21cc183064ff0d5919
 is a bit odd.

 If I go to look to see what release it was in, I normally do:
  $ git describe --contains 0136db586c028f71e7cc21cc183064ff0d5919
  v3.6-rc1~59^2~56^2~76
 ...
 Any ideas?

 That is 59 + 1 + 56 + 1 + 76 = 193 steps away from the tag v3.6-rc1.

 $ git name-rev --refs=refs/tags/v3.5-rc1 0136db58
 0136db58 tags/v3.5-rc1~83^2~81^2~76

 which is 83 + 1 + 81 + 1 + 76 = 242 steps away from that tag.

 So it _is_ odd that the newly tagged tip merged a branch that had
 smaller development since it merged the commit, but name-rev seems
 to be measuring the steps it takes from the tags to reach the commit
 and giving us the one that gives the shortest path correctly.

 Obviously, that is not the same as which tag is the oldest one
 among the ones that can reach this commit?

As is usual for what I say, the above is an explanation of what we
are seeing, not necessarily a justification.

Given a history of this shape:

o---o---o---o TONS!!!
 \
 ---o--o--o--o--o--Y--o---o---Z
 \   /   /
  \ /   /
   X---o

where Y is v3.5-rc1 and Z is v3.6-rc1, name-rev X measures the
distance of the shortest path between Z and X (Z^^2^ = 3 steps away)
and between Y and X (Y~3^2 = 4 steps away), and uses the tag with
the shortest path.

But in order to answer which is the earlier tag that merges X,
what name-rev measures is not very interesting.

What we want to see is the tag whose weight (imagine these commits
are beads on strings, and you hold the tag between your fingers and
lift it, pulling all the commits behind it on the history) is the
smallest and reaches the commit X in question.  The distance on the
shortest path to X totally ignores tons of merges that went into the
mainline between Y and Z.  That is what makes name-rev not useful
for this purpose.

That weight is what Linus's rev-list | wc -l showed, but it is
fairly expensive to compute.  We do have a code that computes such
weight in the history bisection code (it computes this exact weight
for each and every commit that is still suspect, and picks the one
that is half-way).  We know how to compute it, but I suspect that
applying that code naively to name-rev would make it unusably slow.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


relative objects/info/alternates doesn't work on remote SMB repo

2012-08-29 Thread Orgad and Raizel Shaneh
Hi,

I have a repo accessed through //server/share/foo/repo (Using msysgit).

.git/objects/info/alternates contains '../../../bare/objects'

Running 'git status' (or almost any other action) gives the following output:
error: object directory /server/share/foo/bare/objects does not exist;
check .git/objects/info/alternates.

Note that it tries to access /server instead of //server.

- Orgad
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2 0/3] revision (no-)walking in order

2012-08-29 Thread Junio C Hamano
Martin von Zweigbergk martinv...@gmail.com writes:

 Btw2, I'm migrating my email to martinv...@gmail.com (not y...@google.com
 ;-) which saves a few keystrokes and matches some of my other
 accounts, so these patches will be the first ones from the new
 address.

Please send in something like this, then.

 .mailmap | 1 +
 1 file changed, 1 insertion(+)

diff --git i/.mailmap w/.mailmap
index 6303782..2650f9e 100644
--- i/.mailmap
+++ w/.mailmap
@@ -43,6 +43,7 @@ Lars Doelle lars.doe...@on-line.de
 Li Hong leeh...@pku.edu.cn
 Lukas Sandström luk...@etek.chalmers.se
 Martin Langhoff mar...@laptop.org
+Martin von Zweigbergk martinv...@gmail.com martin.von.zweigbe...@gmail.com
 Michael Coleman tutu...@gmail.com
 Michael J Gruber g...@drmicha.warpmail.net michaeljgruber+gm...@fastmail.fm
 Michael W. Olson mwol...@gnu.org
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: GC of alternate object store (was: Bringing a bit more sanity to $GIT_DIR/objects/info/alternates?)

2012-08-29 Thread Oswald Buddenhagen
On Tue, Aug 28, 2012 at 09:19:53PM +0200, Hallvard Breien Furuseth wrote:
 Oswald Buddenhagen wrote:
  (...)so the second approach is the bare aggregator repo which adds
  all other repos as remotes, and the other repos link back via
  alternates. problems:
  
  - to actually share objects, one always needs to push to the aggregator
 
 Run a cron job which frequently does that?
 
nope. i also have separate repos which share the same code, so when i
develop it i need to pick between them live. of course it's unlikely
to get conflicts in this case, so the missing object sharing is not that
bad (the objects are transferred via format-patch, as i'm rewriting
paths anyway), but when it happens it's messy to get out again.

  - tags having a shared namespace doesn't actually work, because the
  repos have the same tags on different commits (they are independent
  repos, after all)
 
 Junio's proposal partially fixes that: It pushes refs/* instead of
 refs/heads/*, to refs/remotes/borrowing repo/.  However...
 
i did exacty that. the tags are *still* not populated - git just tries
very hard to treat them specially.
and the stash file is also ignored, unfortunately.

  - one still cannot safely garbage-collect the aggregator, as the refs
  don't include the stashes and the index, so rebasing may invalidate
  these more transient objects.
 
 Also if you copy a repo (e.g. making a backup) instead of cloning it,
 and then start using both, they'll push into the same namespace -
 overwriting each other's refs.

right. it's a clear user error, though - i wouldn't *expect* it to work.
anyway, i don't have *that* problem, as my aggregator actually pulls,
not the other way round.

anyway, the bottom line is that using alternates as-is for anything but
sharing refs/remotes/origin/* (which i'm assuming to be ff-only) is
a recipe for disaster.

anything which is supposed to be in any way safe must make the donor
object store aware of the sharing, which at the very least means setting
the proposed append-only flag _by the borrowing_ object store. which
means that the info/alternates file should be obfuscated, so people
can't edit it manually.

  i would re-propose hallvard's volatile alternates (at least i think that's
  what he was talking about two weeks ago): they can be used to obtain
  objects, but every object which is in any way referenced from the current
  clone must be available locally (or from a regular alternate). that means
  that diffing, etc.  would get objects only temporarily, while cherry-picking
  would actually copy (some of) the objects. this would make it possible to
  cross-link repositories, safely and without any 3rd parties.
 
 I'm afraid that idea by itself won't work:-(

 Either you borrow from a store or not.

correct. from regular alternates you borrow, in volatile ones you
only peek.
so apparently our definitions are different after all.

 If Git uses an object from the volatile store, it can't always know if
 the caller needs the object to be copied.
 
it doesn't have to. the distinction comes when creating objects: if an
object is only in a volatile alternate, it does not already exist for the
purpose of object creation and is thus created locally.

regards

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] test: set the realpath of CWD as TRASH_DIRECTORY

2012-08-29 Thread Michael Haggerty
On 08/29/2012 08:06 AM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 But it also changes almost 600 *other* tests from succeed even in the
 presence of symlinks to never tested in the presence of symlinks, and
 I think that is surrendering more ground than necessary.
 
 Ouch.  I did not know have 600+ tests that cares about CEILING.

No, I'm referring to the whole test suite, which is approximately 600
files :-)  Because the patch changes TEST_DIRECTORY, it affects the
environment of all tests that use that variable (one of which being via
GIT_CEILING_DIRECTORIES).

But really I shouldn't have made it sound like this patch is so
terrible, because it is just a logical continuation of the approach begun by

1bd9c648 t/test-lib.sh: resolve symlinks in working directory, for
pathname comparisons (2008-05-31)

It is in fact the whole approach that I object to.

 I would rather
 see one of the following approaches:

 *If* the official policy is that GIT_CEILING_DIRECTORIES is not reliable
 in the presence of symlinks, then (a) that limitation should be
 mentioned in the documentation; (b) the affected tests should either be
 skipped in the case of symlinked directories or they (alone!) should
 take measures to work around the problem.
 
 What exactly is broken in CEILING?

CEILING is used in setup_git_directory_gently_1() when trying to find
the GIT_DIR appropriate for the current directory.  The entries in
CEILING are compared textually to getcwd(), looking for the entry that
is the longest proper prefix of PWD.  This is then used to limit a loop
that is vaguely

while (!is_git_directory())
chdir(..);

The problem, as I understand it, is that when the tests are run with
root set to a path that includes a symlink, then test and
TRASH_DIRECTORY are set to the textual value of $root/trash
directory.t, but then a little bit later test-lib.sh chdirs into
the trash directory using cd -P $test (thereby resolving symlinks).
So, taking my concrete example --root=/dev/shm where /dev/shm is a
symlink to /run/shm, we have

test=/dev/shm/trash directory.t
TRASH_DIRECTORY=$test
...
cd -P $test

which results in PWD being /run/shm/trash directory.t.

Then (for example) in test t4035, we have stuff like

GIT_CEILING_DIRECTORIES=$TRASH_DIRECTORY/test-outside 
export GIT_CEILING_DIRECTORIES 
cd test-outside/non/git 
git diff

The problem is that because PWD and TRASH_DIRECTORY *look* different,
the code in setup_git_directory_gently_1() doesn't realize that the
value of GIT_CEILING_DIRECTORIES is a parent of PWD, so it doesn't abort
the search for GIT_DIR there, and this causes the test to fail.

The underlying problem is that setup_git_directory_gently_1() isn't
smart enough to realize that the directory in GIT_CEILING_DIRECTORIES is
in fact a parent of PWD even though it textually doesn't look like one.

The patch being discussed sets TEST_DIRECTORY *after* $test has been
canonicalized (through the use of cd -P $test), so that TEST_DIRECTORY
and later GIT_CEILING_DIRECTORIES start with /run/shm/ instead of
/dev/shm/.  This change makes setup_git_directory_gently_1()'s naive
textual prefix comparison succeed.

The same problem would occur in the real world whenever the user sets
GIT_CEILING_DIRECTORIES to a value that is *in fact* a parent of PWD but
due to symbolic links *textually* does not appear to be a parent.

So the first decision is whether this should be documented as a known
limitation of GIT_CEILING_DIRECTORIES, or whether it is a bug.  My
opinion is that it is a bug.

 I somehow thought that Jiang's patch was to make sure any variables
 that contain pathnames (and make sure future paths we might grab out
 of $(pwd)) are realpath without symlinks early in the test set-up,
 and with that arrangement, no symlink gotcha should come into
 picture, with or without CEILING.

Yes, this is correct.  But what you call a gotcha is actually a
real-world possibility.  Git might *really* be run in a PWD that
contains symlinks, or GIT_CEILING_DIRECTORIES might contain entries that
are symlinks.  So by resolving symlinks in PWD before running tests, we
are preventing the tests from ever encountering this situation, and
therefore failing to test whether git works correctly when PWD includes
a symlink.

 Perhaps the setting of the CEILING has not been correctly converted
 with the patch?

That's not the problem.

 Or is there something fundamentally broken, even if we do not have
 any symlinks involved, with CEILING check?

I believe that:

1. The handling of GIT_CEILING_DIRECTORIES in
setup_git_directory_gently_1() (i.e., in git itself, not how it is used
in the test suite) is correct if no symlinks are involved, but breaks in
the face of symlinks.

2. The proposed patch is a mistake because it masks the brokenness of
setup_git_directory_gently_1().

3. The old commit 1bd9c648 is an even bigger mistake because it might
mask other breakages 

dangling submodule references after rebase

2012-08-29 Thread Stijn Souffriau

Hi all,

I am using a repository that has a sub module which is being committed 
to frequently by myself as well as others. Because of the heavy 
concurrent development I need to do a lot of rebasing. Since the sub 
module commit hashes referenced by the parent repository can become 
dangling as a result of rebasing the sub module I am required to do lots 
of manual fixing of the references in the parent repository using an 
interactive rebase. This is a tedious, error-prone procedure which I 
would like to automate.


I was wondering if anyone has thought about solving this problem yet in 
the past and what might be a good solution?


I was thinking something along the lines of extending the add and commit 
commands so that a parent repository would signal to the sub modules 
that it's index or some if it's commits reference certain sub module 
commits; and also the rebase command so that it would update the parent 
repository commits with new hashes using the information stored by the 
add or commit commands. The procedure would have to be made recursive 
because changing commits in the parent repository might also require 
changing commits in it's parent repository as well.


I'm still no quite sure for which sub module rebase operations the 
referencing parent repository commits would actually have to be 
updated. The reason being that the rebased commits might still be 
referenced by another branch and so they might continue to exist after 
the rebase which raises the question if the parent repository commits 
need to be udated or not. I think this question would have to be 
answered by the add and commit commands which would also have to specify 
a referenced branch in addition to referenced commits so that the parent 
repo commits would only have to be updated if the commits on this branch 
are rebased. By default this could be the branch checked out in the sub 
module at the time the referencing commit was made.


For obvious reasons this should only be done for newly made, unpushed 
and unpulled commits in the repository. However, it might be interesting 
to also enable people to manually bind a parent repo commits to a 
submodule branch so that the commits in this parent repo branch are 
updated when the sub module branch is rebased.


I would like see this end up in the mainline and so I'm very interested 
in your opinions.


Thanks,

Stijn

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 5/5] (BROKEN) get_merge_bases_many(): walk from many tips in parallel

2012-08-29 Thread Jeff King
On Tue, Aug 28, 2012 at 04:39:11PM -0700, Junio C Hamano wrote:

  git rev-list --committer=torva...@linux-foundation.org \
  --max-parents=2 --min-parents=2 --parents v3.5..v3.6-rc2 RL
 
  cmd='
  while read result parent1 parent2
  do
  $GIT merge-base $parent1 $parent2
  done RL
  '
 
 I have this suspicion that it is spending most of its time in
 insert-by-date.  Luckily, merge_bases_many() is totally outside of
 the usual revision traversal and its use of the commit list is
 pretty well isolated.
 
 Peff, can I interest you into resurrecting your $gmane/174007 and
 $gmane/174008 (we do not need pop_commit_from_queue() in the latter;
 everything can stay as static to commit.c for now) to see how well
 priority queue based approach would perform?

I did a quick port of merge_bases_many and get_merge_bases_many to use
priority queues, but I didn't see any speedup at all on the above test
case.  According to perf, we spend most of our time in zlib, inflating
commits. Only 1% is spent on commit_list_insert_by_date, which is about
the same amount spent on queue insertion after my patches.

Patches follow, just for reference.

  [1/2]: basic priority queue implementation
  [2/2]: commit: use a priority queue in merge base functions

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 1/2] basic priority queue implementation

2012-08-29 Thread Jeff King
This can provide better algorithmic complexity for some of
the date-sorted commit list uses, among other things.

The queue is stored as a heap, allowing O(log) insertion and
top-element removal, and O(1) peeking at the top element.
Current commit lists are sorted linked lists, giving O(1)
peeking and top-element removal, but O(n^2) insertion.

Signed-off-by: Jeff King p...@peff.net
---
Mostly the same as $gmane/174007, but rebased on top of jc/merge-bases,
and adding queue_clear() to avoid leaking memory.

 .gitignore   |  1 +
 Makefile |  3 +++
 queue.c  | 75 
 queue.h  | 18 ++
 t/t0007-queue.sh | 50 +
 test-queue.c | 39 +
 6 files changed, 186 insertions(+)
 create mode 100644 queue.c
 create mode 100644 queue.h
 create mode 100755 t/t0007-queue.sh
 create mode 100644 test-queue.c

diff --git a/.gitignore b/.gitignore
index 3b7680e..f4539e8 100644
--- a/.gitignore
+++ b/.gitignore
@@ -184,6 +184,7 @@
 /test-obj-pool
 /test-parse-options
 /test-path-utils
+/test-queue
 /test-run-command
 /test-sha1
 /test-sigchain
diff --git a/Makefile b/Makefile
index e4f8e0e..a37d962 100644
--- a/Makefile
+++ b/Makefile
@@ -476,6 +476,7 @@ TEST_PROGRAMS_NEED_X += test-path-utils
 TEST_PROGRAMS_NEED_X += test-obj-pool
 TEST_PROGRAMS_NEED_X += test-parse-options
 TEST_PROGRAMS_NEED_X += test-path-utils
+TEST_PROGRAMS_NEED_X += test-queue
 TEST_PROGRAMS_NEED_X += test-run-command
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sigchain
@@ -597,6 +598,7 @@ LIB_H += prompt.h
 LIB_H += pkt-line.h
 LIB_H += progress.h
 LIB_H += prompt.h
+LIB_H += queue.h
 LIB_H += quote.h
 LIB_H += reflog-walk.h
 LIB_H += refs.h
@@ -709,6 +711,7 @@ LIB_OBJS += prompt.o
 LIB_OBJS += pretty.o
 LIB_OBJS += progress.o
 LIB_OBJS += prompt.o
+LIB_OBJS += queue.o
 LIB_OBJS += quote.o
 LIB_OBJS += reachable.o
 LIB_OBJS += read-cache.o
diff --git a/queue.c b/queue.c
new file mode 100644
index 000..7be6b86
--- /dev/null
+++ b/queue.c
@@ -0,0 +1,75 @@
+#include queue.h
+#include cache.h
+
+static inline void queue_swap(struct queue *pq, unsigned i, unsigned j)
+{
+   void *tmp = pq-items[i];
+   pq-items[i] = pq-items[j];
+   pq-items[j] = tmp;
+}
+
+static void queue_heapify_up(struct queue *pq)
+{
+   unsigned i = pq-nr - 1;
+   while (i  0) {
+   int parent = (i-1)/2;
+
+   if (pq-cmp(pq-items[i], pq-items[parent]) = 0)
+   return;
+
+   queue_swap(pq, i, parent);
+   i = parent;
+   }
+}
+
+void queue_insert(struct queue *pq, void *item)
+{
+   ALLOC_GROW(pq-items, pq-nr + 1, pq-alloc);
+   pq-items[pq-nr++] = item;
+   queue_heapify_up(pq);
+}
+
+static void queue_heapify_down(struct queue *pq)
+{
+   unsigned i = 0;
+   while (1) {
+   int largest = i, left = 2*i + 1, right = 2*i + 2;
+
+   if (left  pq-nr 
+   pq-cmp(pq-items[left], pq-items[largest])  0)
+   largest = left;
+   if (right  pq-nr 
+   pq-cmp(pq-items[right], pq-items[largest])  0)
+   largest = right;
+
+   if (largest == i)
+   return;
+
+   queue_swap(pq, i, largest);
+   i = largest;
+   }
+}
+
+void *queue_peek(struct queue *pq)
+{
+   return pq-nr ? pq-items[0] : NULL;
+}
+
+void *queue_pop(struct queue *pq)
+{
+   void *ret;
+
+   if (!pq-nr)
+   return NULL;
+   ret = pq-items[0];
+
+   pq-items[0] = pq-items[--pq-nr];
+   queue_heapify_down(pq);
+
+   return ret;
+}
+
+void queue_clear(struct queue *pq)
+{
+   free(pq-items);
+}
diff --git a/queue.h b/queue.h
new file mode 100644
index 000..cc471b5
--- /dev/null
+++ b/queue.h
@@ -0,0 +1,18 @@
+#ifndef QUEUE_H
+#define QUEUE_H
+
+typedef int (*queue_comparison_func_t)(const void *, const void *);
+
+struct queue {
+   queue_comparison_func_t cmp;
+   void **items;
+   unsigned nr;
+   unsigned alloc;
+};
+
+void queue_insert(struct queue *pq, void *item);
+void *queue_peek(struct queue *pq);
+void *queue_pop(struct queue *pq);
+void queue_clear(struct queue *pq);
+
+#endif /* QUEUE_H */
diff --git a/t/t0007-queue.sh b/t/t0007-queue.sh
new file mode 100755
index 000..ee357bb
--- /dev/null
+++ b/t/t0007-queue.sh
@@ -0,0 +1,50 @@
+#!/bin/sh
+
+test_description='basic tests for priority queue implementation'
+. ./test-lib.sh
+
+cat expect 'EOF'
+10
+9
+8
+7
+6
+5
+5
+4
+3
+2
+1
+EOF
+test_expect_success 'basic ordering' '
+   test-queue 2 6 3 10 9 5 7 4 5 8 1 dump actual 
+   test_cmp expect actual
+'
+
+cat expect 'EOF'
+3
+5
+4
+6
+2
+1
+EOF
+test_expect_success 'mixed insert and pop' '
+   test-queue 1 2 3 pop 4 5 pop pop 6 dump actual 
+   test_cmp expect actual
+'
+
+cat expect 'EOF'

Re: [PATCH 2/3] checkout: reorder option handling

2012-08-29 Thread Nguyen Thai Ngoc Duy
On Wed, Aug 29, 2012 at 3:45 AM, Junio C Hamano gits...@pobox.com wrote:
 + /* Easy mode-independent incompatible checks */
   if (opts.force_detach  (opts.new_branch || opts.new_orphan_branch))
   die(_(--detach cannot be used with -b/-B/--orphan));

 Did you catch --detach -B combination without checking new_branch_force?

I did not. Will move this check further down once opts.new_branch is final.

   if (opts.force_detach  0  opts.track)
   die(_(--detach cannot be used with -t));
 + if (opts.force  opts.merge)
 + die(_(git checkout: -f and -m are incompatible));
 +
 + if (conflict_style) {
 + opts.merge = 1; /* implied */
 + git_xmerge_config(merge.conflictstyle, conflict_style, NULL);
 + }

 checkout --conflict=diff3 -f branch would imply combination of
 -m and -f, which is supposed to be forbidden in the previous
 check, no?

Yep. That last if must be moved up.

 I very much like the idea of separating things in phases like your
 proposed log message explains.  But I wonder if the order should be
 to do dwimming and parameter canonicalization first, then decide the
 mode (these might have to be swapped, as the same parameter may dwim
 down to different things depending on the mode), and finally check
 for sanity before performing.

I do a few sanity checks after the mode has been decided and obviously
missed some as you pointed out above. Will move them all down. So
parameter canonicalization, dwim, then separate code paths for each
mode, which includes sanity checks.

 To avoid confusion, it also might not be a bad idea to remove
 new_branch_force and new_orphan_branch from the structure and
 introduce enum branch_creation_type or something, and always have
 the new branch name in new_branch field (this needs to get various
 pointers into opts out of the parseopt options[] array; parse into
 separate variables and decide what to put in struct checkout_opts),
 independent from how that branch is going to be created (either -b,
 -B, or --orphan).

OK
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] commit: use a priority queue in merge base functions

2012-08-29 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 The merge-base functions internally keep a commit lists and
 insert by date, which causes a linear search of the commit
 list for each insertion. Let's use a priority queue instead.

 Unfortunately, from my experiments, this didn't actually
 cause any speedup.

 Signed-off-by: Jeff King p...@peff.net
 ---
 I'd probably split this into a few commits if we were really going to
 apply it, but since it doesn't actually make anything faster, I doubt
 the code churn is worth it.

Thanks.  This seems to break t6010-merge-base.sh for me, though...




  commit.c | 78 
 ++--
  1 file changed, 47 insertions(+), 31 deletions(-)

 diff --git a/commit.c b/commit.c
 index 380f4ec..c64ef94 100644
 --- a/commit.c
 +++ b/commit.c
 @@ -7,6 +7,7 @@
  #include revision.h
  #include notes.h
  #include gpg-interface.h
 +#include queue.h
  
  int save_commit_buffer = 1;
  
 @@ -360,6 +361,15 @@ struct commit_list *commit_list_insert(struct commit 
 *item, struct commit_list *
   return new_list;
  }
  
 +static void commit_list_append(struct commit *item, struct commit_list 
 ***tail)
 +{
 + struct commit_list *new_list = xmalloc(sizeof(*new_list));
 + new_list-item = item;
 + new_list-next = NULL;
 + **tail = new_list;
 + *tail = new_list-next;
 +}
 +
  unsigned commit_list_count(const struct commit_list *l)
  {
   unsigned c = 0;
 @@ -422,6 +432,16 @@ struct commit *pop_most_recent_commit(struct commit_list 
 **list,
   return ret;
  }
  
 +static int commit_datecmp(const void *va, const void *vb)
 +{
 + const struct commit *a = va, *b = vb;
 + if (a-date  b-date)
 + return -1;
 + else if (a-date  b-date)
 + return 1;
 + return 0;
 +}
 +
  void clear_commit_marks(struct commit *commit, unsigned int mark)
  {
   while (commit) {
 @@ -569,22 +589,23 @@ static struct commit_list *merge_bases_many(struct 
 commit *one, int n, struct co
  
  static const unsigned all_flags = (PARENT1 | PARENT2 | STALE | RESULT);
  
 -static struct commit *interesting(struct commit_list *list)
 +static int interesting(struct queue *q)
  {
 - while (list) {
 - struct commit *commit = list-item;
 - list = list-next;
 + int i;
 + for (i = 0; i  q-nr; i++) {
 + struct commit *commit = q-items[i];
   if (commit-object.flags  STALE)
   continue;
 - return commit;
 + return 1;
   }
 - return NULL;
 + return 0;
  }
  
  static struct commit_list *merge_bases_many(struct commit *one, int n, 
 struct commit **twos)
  {
 - struct commit_list *list = NULL;
 - struct commit_list *result = NULL;
 + struct queue result = { commit_datecmp };
 + struct queue list = { commit_datecmp };
 + struct commit_list *ret = NULL, **tail = ret;
   int i;
  
   for (i = 0; i  n; i++) {
 @@ -593,7 +614,7 @@ static struct commit_list *merge_bases_many(struct commit 
 *one, int n, struct co
* We do not mark this even with RESULT so we do not
* have to clean it up.
*/
 - return commit_list_insert(one, result);
 + return commit_list_insert(one, ret);
   }
  
   if (parse_commit(one))
 @@ -604,28 +625,24 @@ static struct commit_list *merge_bases_many(struct 
 commit *one, int n, struct co
   }
  
   one-object.flags |= PARENT1;
 - commit_list_insert_by_date(one, list);
 + queue_insert(list, one);
   for (i = 0; i  n; i++) {
   twos[i]-object.flags |= PARENT2;
 - commit_list_insert_by_date(twos[i], list);
 + queue_insert(list, twos[i]);
   }
  
 - while (interesting(list)) {
 + while (interesting(list)) {
   struct commit *commit;
   struct commit_list *parents;
 - struct commit_list *next;
   int flags;
  
 - commit = list-item;
 - next = list-next;
 - free(list);
 - list = next;
 + commit = queue_pop(list);
  
   flags = commit-object.flags  (PARENT1 | PARENT2 | STALE);
   if (flags == (PARENT1 | PARENT2)) {
   if (!(commit-object.flags  RESULT)) {
   commit-object.flags |= RESULT;
 - commit_list_insert_by_date(commit, result);
 + queue_insert(result, commit);
   }
   /* Mark parents of a found merge stale */
   flags |= STALE;
 @@ -639,21 +656,16 @@ static struct commit_list *merge_bases_many(struct 
 commit *one, int n, struct co
   if (parse_commit(p))
   return NULL;
   p-object.flags |= flags;
 - 

Re: dangling submodule references after rebase

2012-08-29 Thread Jens Lehmann
Am 29.08.2012 11:55, schrieb Stijn Souffriau:
 I am using a repository that has a sub module which is being committed to 
 frequently by myself as well as others. Because of the heavy concurrent 
 development I need to do a lot of rebasing. Since the sub module commit 
 hashes referenced by the parent repository can become dangling as a result of 
 rebasing the sub module I am required to do lots of manual fixing of the 
 references in the parent repository using an interactive rebase. This is a 
 tedious, error-prone procedure which I would like to automate.

I am very interested to hear how you intend to do that. I can't think of
a sane way to do that but would really like to be convinced otherwise.

 I was wondering if anyone has thought about solving this problem yet in the 
 past and what might be a good solution?

We don't solve this problem, but try to avoid it by having the best
practice that only commits which are on (or merged into) a stable branch
in the submodule may be committed on a branch in the superproject. This
works really well for us.

 I was thinking something along the lines of extending the add and commit 
 commands so that a parent repository would signal to the sub modules that 
 it's index or some if it's commits reference certain sub module commits; and 
 also the rebase command so that it would update the parent repository commits 
 with new hashes using the information stored by the add or commit commands. 
 The procedure would have to be made recursive because changing commits in the 
 parent repository might also require changing commits in it's parent 
 repository as well.

Submodules aren't aware that they have a superproject (which is a feature),
so maybe you could issue something like a rebase that branch in the
submodule and update all commits in this branch here referencing those
rebased commits in the topmost superproject and iterate down from there.

 I'm still no quite sure for which sub module rebase operations the 
 referencing parent repository commits would actually have to be updated.

Me thinks it could make sense to have a branch in the superproject which
follows the development in the submodule, which would answer what commits
you'd have to check.

 The reason being that the rebased commits might still be referenced by 
 another branch and so they might continue to exist after the rebase which 
 raises the question if the parent repository commits need to be udated or 
 not. I think this question would have to be answered by the add and commit 
 commands which would also have to specify a referenced branch in addition to 
 referenced commits so that the parent repo commits would only have to be 
 updated if the commits on this branch are rebased. By default this could be 
 the branch checked out in the sub module at the time the
 referencing commit was made.

And the distributed nature of git leaves plenty of room to have other
branches in other repos referencing the commits you just rebased. And for
lots of submodules there is more than a single superproject, making that
problem even more interesting.

 For obvious reasons this should only be done for newly made, unpushed and 
 unpulled commits in the repository.

So that would make this operation impossible once you pushed a submodule
branch e.g. just to back it up on the server? I think that would make
this much less useful. When you rewrite history in a submodule you always
have the potential problem of removing the commits used by another branch
in your repo, somebody else's copy of the repo or even in a completely
different superproject, so you'll have to be careful there anyway.

 However, it might be interesting to also enable people to manually bind a 
 parent repo commits to a submodule branch so that the commits in this parent 
 repo branch are updated when the sub module branch is rebased.
 
 I would like see this end up in the mainline and so I'm very interested in 
 your opinions.

I think it the first step should be to line out the solution and then
create one or more scripts demonstrating how that approach could work.
Then we can decide how to move on from there.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] commit: use a priority queue in merge base functions

2012-08-29 Thread Jeff King
On Wed, Aug 29, 2012 at 09:36:43AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  The merge-base functions internally keep a commit lists and
  insert by date, which causes a linear search of the commit
  list for each insertion. Let's use a priority queue instead.
 
  Unfortunately, from my experiments, this didn't actually
  cause any speedup.
 
  Signed-off-by: Jeff King p...@peff.net
  ---
  I'd probably split this into a few commits if we were really going to
  apply it, but since it doesn't actually make anything faster, I doubt
  the code churn is worth it.
 
 Thanks.  This seems to break t6010-merge-base.sh for me, though...

Interesting. It works fine here, even under --valgrind. Did you apply
the patches directly on top of 1251cc7?

Not that it matters _too_ much if we are just going to scrap it anyway,
but maybe it is an indication that I screwed up something that could
impact the timing (I did check that the timed merge-base calculations on
linux-2.6 yielded the same results though).

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] commit: use a priority queue in merge base functions

2012-08-29 Thread Jeff King
On Wed, Aug 29, 2012 at 04:53:32PM -0400, Jeff King wrote:

  Thanks.  This seems to break t6010-merge-base.sh for me, though...
 
 Interesting. It works fine here, even under --valgrind. Did you apply
 the patches directly on top of 1251cc7?

Hmm, this does seem to break t6024 for me, though.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] commit: use a priority queue in merge base functions

2012-08-29 Thread Jeff King
On Wed, Aug 29, 2012 at 04:55:25PM -0400, Jeff King wrote:

 On Wed, Aug 29, 2012 at 04:53:32PM -0400, Jeff King wrote:
 
   Thanks.  This seems to break t6010-merge-base.sh for me, though...
  
  Interesting. It works fine here, even under --valgrind. Did you apply
  the patches directly on top of 1251cc7?
 
 Hmm, this does seem to break t6024 for me, though.

Probably because:

   /* Clean up the result to remove stale ones */
 - free_commit_list(list);
 - list = result; result = NULL;
 - while (list) {
 - struct commit_list *next = list-next;
 - if (!(list-item-object.flags  STALE))
 - commit_list_insert_by_date(list-item, result);
 - free(list);
 - list = next;
 - }
 - return result;
 + while (result.nr)
 + commit_list_append(queue_pop(result), tail);
 + queue_clear(result);
 + queue_clear(list);
 + return ret;

I forgot to to port the STALE flag handling here.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] commit: use a priority queue in merge base functions

2012-08-29 Thread Jeff King
On Wed, Aug 29, 2012 at 05:00:32PM -0400, Jeff King wrote:

  Hmm, this does seem to break t6024 for me, though.
 
 Probably because:
 
  /* Clean up the result to remove stale ones */
  -   free_commit_list(list);
  -   list = result; result = NULL;
  -   while (list) {
  -   struct commit_list *next = list-next;
  -   if (!(list-item-object.flags  STALE))
  -   commit_list_insert_by_date(list-item, result);
  -   free(list);
  -   list = next;
  -   }
  -   return result;
  +   while (result.nr)
  +   commit_list_append(queue_pop(result), tail);
  +   queue_clear(result);
  +   queue_clear(list);
  +   return ret;
 
 I forgot to to port the STALE flag handling here.

You would want this on top:

diff --git a/commit.c b/commit.c
index c64ef94..8259871 100644
--- a/commit.c
+++ b/commit.c
@@ -661,8 +661,11 @@ static struct commit_list *merge_bases_many(struct commit 
*one, int n, struct co
}
 
/* Clean up the result to remove stale ones */
-   while (result.nr)
-   commit_list_append(queue_pop(result), tail);
+   while (result.nr) {
+   struct commit *commit = queue_pop(result);
+   if (!(commit-object.flags  STALE))
+   commit_list_append(commit, tail);
+   }
queue_clear(result);
queue_clear(list);
return ret;

but t6024 still fails (it clearly is finding a different merge base than
the test expects).  I'll trace through it, but it will have to be later
tonight.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


warning: There are too many unreachable loose objects; run 'git prune' to remove them.

2012-08-29 Thread Dun Peal
Hi,

I am getting this error every time I pull. All the following have been
executed, but failed to remove this warning:

git prune
git prune --expire now
git gc
git gc --aggressive

What should I do?

Thanks, D.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] git name-rev --weight

2012-08-29 Thread Junio C Hamano
So here is an attempt to teach name-rev a mode that tries to base
its name on oldest tag that can reach the commit.  It needs the
reset_revision_walk() call recently added to the revision traversal
API, and applies to bcc0a3e (v1.7.11-rc0~111^2~2) or newer.

Note that this can benefit from caching, as the weight of the tag
(rather, the commit that is tagged) will never change once a history
is made, but that part is left as an exercise to the reader.

It correctly names 0136db586c in the kernel history as based on
v3.5-rc1 as tags/v3.5-rc1~83^2~81^2~76, not on v3.6-rc1, as we saw
on the list recently.

Once it is verified to operate correctly and updated to perform
properly, we can start passing --weight when describe --contains
runs the command.

Junio C Hamano (3):
  name-rev: lose unnecessary typedef
  name_rev: clarify when a new tip-name is assigned to a commit
  name-rev: --weight option (WIP)

 builtin/name-rev.c | 142 -
 1 file changed, 120 insertions(+), 22 deletions(-)

-- 
1.7.12.285.ga3d5fc0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 3/3] name-rev: --weight option (WIP)

2012-08-29 Thread Junio C Hamano
Instead of naming a rev after a tip that is topologically closest,
use the tip that is the oldest one among those which contain the
rev.

The semantics name-rev --weight would give is closer to what
people expect from describe --contains.

Note that this is fairly expensive (see NEEDSWORK comment in the
code).

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/name-rev.c | 97 --
 1 file changed, 95 insertions(+), 2 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index ebbf541..69da41d 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -4,6 +4,8 @@
 #include tag.h
 #include refs.h
 #include parse-options.h
+#include diff.h
+#include revision.h
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
@@ -11,8 +13,85 @@ struct rev_name {
const char *tip_name;
int generation;
int distance;
+   int weight;
 };
 
+/*
+ * Historically, name-rev named a rev based on the tip that is
+ * closest to it.
+ *
+ * It does not give a good answer to what is the earliest tag that
+ * contains the commit?, however, because you can build a new commit
+ * on top of an ancient commit X, merge it to the tip and tag the
+ * result, which would make X reachable from the new tag in two hops,
+ * even though it appears in the part of the history that is contained
+ * in other ancient tags.
+ *
+ * In order to answer that question, name-rev can be told to name a
+ * rev based on the tip that has smallest number of commits behind it.
+ */
+static int use_weight;
+
+/*
+ * NEEDSWORK: the result of this computation must be cached to
+ * a dedicated notes tree, keyed by the commit object name.
+ */
+static int compute_tip_weight(struct commit *commit)
+{
+   struct rev_info revs;
+   int weight = 1; /* give root the weight of 1 */
+
+   reset_revision_walk();
+   init_revisions(revs, NULL);
+   add_pending_object(revs, (struct object *)commit, NULL);
+   prepare_revision_walk(revs);
+   while (get_revision(revs))
+   weight++;
+   return weight;
+}
+
+static int tip_weight(const char *tip, size_t reflen)
+{
+   struct strbuf buf = STRBUF_INIT;
+   unsigned char sha1[20];
+   struct commit *commit;
+   struct rev_name *name;
+
+   strbuf_add(buf, tip, reflen);
+   if (get_sha1(buf.buf, sha1))
+   die(Internal error: cannot parse tip '%s', tip);
+   strbuf_release(buf);
+
+   commit = lookup_commit_reference_gently(sha1, 0);
+   if (!commit)
+   die(Internal error: cannot look up commit '%s', tip);
+   name = commit-util;
+   if (!name)
+   die(Internal error: a tip without name '%s', tip);
+   if (!name-weight)
+   name-weight = compute_tip_weight(commit);
+   return name-weight;
+}
+
+static int tip_weight_cmp(const char *a, const char *b)
+{
+   size_t reflen_a, reflen_b;
+   static const char traversal[] = ^~;
+
+   /*
+* A tip may look like refname followed by traversal
+* instruction (e.g. ^2~74).  We only are interested in
+* the weight of the ref part.
+*/
+   reflen_a = strcspn(a, traversal);
+   reflen_b = strcspn(b, traversal);
+
+   if (reflen_a == reflen_b  !memcmp(a, b, reflen_a))
+   return 0;
+
+   return tip_weight(a, reflen_a) - tip_weight(b, reflen_b);
+}
+
 static long cutoff = LONG_MAX;
 
 /* How many generations are maximally preferred over _one_ merge traversal? */
@@ -49,8 +128,20 @@ static void name_rev(struct commit *commit,
use_this_tip = 1;
}
 
-   if (distance  name-distance)
-   use_this_tip = 1;
+   if (!use_weight) {
+   if (distance  name-distance)
+   use_this_tip = 1;
+   } else {
+   if (!name-tip_name)
+   use_this_tip = 1;
+   else {
+   int cmp = tip_weight_cmp(name-tip_name, tip_name);
+   if (0  cmp)
+   use_this_tip = 1;
+   else if (!cmp  distance  name-distance)
+   use_this_tip = 1;
+   }
+   }
 
if (!use_this_tip)
return;
@@ -241,6 +332,8 @@ int cmd_name_rev(int argc, const char **argv, const char 
*prefix)
OPT_BOOLEAN(0, undefined, allow_undefined, allow to print 
`undefined` names),
OPT_BOOLEAN(0, always, always,
   show abbreviated commit object as fallback),
+   OPT_BOOLEAN(0, weight, use_weight,
+   name revs based on the oldest tip that contain 
them),
OPT_END(),
};
 
-- 
1.7.12.285.ga3d5fc0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  

[PATCH 1/3] name-rev: lose unnecessary typedef

2012-08-29 Thread Junio C Hamano
Just spell it struct rev_name; it makes it more clear what is
going on.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/name-rev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 1b37458..8af2cfa 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -7,11 +7,11 @@
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
-typedef struct rev_name {
+struct rev_name {
const char *tip_name;
int generation;
int distance;
-} rev_name;
+};
 
 static long cutoff = LONG_MAX;
 
@@ -43,7 +43,7 @@ static void name_rev(struct commit *commit,
}
 
if (name == NULL) {
-   name = xmalloc(sizeof(rev_name));
+   name = xmalloc(sizeof(struct rev_name));
commit-util = name;
goto copy_data;
} else if (name-distance  distance) {
-- 
1.7.12.285.ga3d5fc0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 2/3] name_rev: clarify when a new tip-name is assigned to a commit

2012-08-29 Thread Junio C Hamano
In preparation for the later changes, restructure the logic a little
bit to separate how the code decides to use the new tip for naming
a particular commit, and what happens based on the decision.

Also re-indent and correct style of this function while we are at it.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/name-rev.c | 45 +
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 8af2cfa..ebbf541 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -19,12 +19,13 @@ static long cutoff = LONG_MAX;
 #define MERGE_TRAVERSAL_WEIGHT 65535
 
 static void name_rev(struct commit *commit,
-   const char *tip_name, int generation, int distance,
-   int deref)
+const char *tip_name, int generation, int distance,
+int deref)
 {
struct rev_name *name = (struct rev_name *)commit-util;
struct commit_list *parents;
-   int parent_number = 1;
+   int parent_number;
+   int use_this_tip = 0;
 
if (!commit-object.parsed)
parse_commit(commit);
@@ -42,21 +43,26 @@ static void name_rev(struct commit *commit,
die(generation: %d, but deref?, generation);
}
 
-   if (name == NULL) {
-   name = xmalloc(sizeof(struct rev_name));
+   if (!name) {
+   name = xcalloc(1, sizeof(struct rev_name));
commit-util = name;
-   goto copy_data;
-   } else if (name-distance  distance) {
-copy_data:
-   name-tip_name = tip_name;
-   name-generation = generation;
-   name-distance = distance;
-   } else
+   use_this_tip = 1;
+   }
+
+   if (distance  name-distance)
+   use_this_tip = 1;
+
+   if (!use_this_tip)
return;
 
-   for (parents = commit-parents;
-   parents;
-   parents = parents-next, parent_number++) {
+   name-tip_name = tip_name;
+   name-generation = generation;
+   name-distance = distance;
+
+   /* Propagate our name to our parents */
+   for (parents = commit-parents, parent_number = 1;
+parents;
+parents = parents-next, parent_number++) {
if (parent_number  1) {
int len = strlen(tip_name);
char *new_name = xmalloc(len +
@@ -68,16 +74,15 @@ copy_data:
len -= 2;
if (generation  0)
sprintf(new_name, %.*s~%d^%d, len, tip_name,
-   generation, parent_number);
+   generation, parent_number);
else
sprintf(new_name, %.*s^%d, len, tip_name,
-   parent_number);
-
+   parent_number);
name_rev(parents-item, new_name, 0,
-   distance + MERGE_TRAVERSAL_WEIGHT, 0);
+distance + MERGE_TRAVERSAL_WEIGHT, 0);
} else {
name_rev(parents-item, tip_name, generation + 1,
-   distance + 1, 0);
+distance + 1, 0);
}
}
 }
-- 
1.7.12.285.ga3d5fc0

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 2/2] commit: use a priority queue in merge base functions

2012-08-29 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 +while (result.nr)
 +commit_list_append(queue_pop(result), tail);
 +queue_clear(result);
 +queue_clear(list);
 +return ret;

 I forgot to to port the STALE flag handling here.

Figures..  Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] name-rev: --weight option (WIP)

2012-08-29 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Note that this is fairly expensive (see NEEDSWORK comment in the
 code).

And this is with the notes-cache.

(priming the cache from scratch)
$ rm .git/refs/notes/name-rev-weight 
$ /usr/bin/time ../git.git/git-name-rev --weight --tags 0136db586c
0136db586c tags/v3.5-rc1~83^2~81^2~76
6.06user 0.46system 0:06.54elapsed 99%CPU (0avgtext+0avgdata 
1861456maxresident)k
8inputs+16outputs (0major+128576minor)pagefaults 0swaps

(with valid cache)
$ /usr/bin/time ../git.git/git-name-rev --weight --tags 0136db586c
0136db586c tags/v3.5-rc1~83^2~81^2~76
0.50user 0.22system 0:00.72elapsed 100%CPU (0avgtext+0avgdata 
244224maxresident)k
0inputs+0outputs (0major+16062minor)pagefaults 0swaps

(the old shortest path version)
$ /usr/bin/time git name-rev --tags 0136db586c
0136db586c tags/v3.6-rc1~59^2~56^2~76
0.31user 0.01system 0:00.32elapsed 100%CPU (0avgtext+0avgdata 
243488maxresident)k
0inputs+0outputs (0major+16000minor)pagefaults 0swaps


 builtin/name-rev.c | 38 +-
 1 file changed, 33 insertions(+), 5 deletions(-)

diff --git c/builtin/name-rev.c w/builtin/name-rev.c
index 69da41d..fdd087c 100644
--- c/builtin/name-rev.c
+++ w/builtin/name-rev.c
@@ -6,6 +6,7 @@
 #include parse-options.h
 #include diff.h
 #include revision.h
+#include notes-cache.h
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
@@ -32,10 +33,6 @@ struct rev_name {
  */
 static int use_weight;
 
-/*
- * NEEDSWORK: the result of this computation must be cached to
- * a dedicated notes tree, keyed by the commit object name.
- */
 static int compute_tip_weight(struct commit *commit)
 {
struct rev_info revs;
@@ -50,6 +47,31 @@ static int compute_tip_weight(struct commit *commit)
return weight;
 }
 
+static struct notes_cache weight_cache;
+static int weight_cache_updated;
+
+static int get_tip_weight(struct commit *commit)
+{
+   struct strbuf buf = STRBUF_INIT;
+   size_t sz;
+   int weight;
+   char *note = notes_cache_get(weight_cache, commit-object.sha1, sz);
+
+   if (note  !strtol_i(note, 10, weight)) {
+   free(note);
+   return weight;
+   }
+   free(note);
+
+   weight = compute_tip_weight(commit);
+   strbuf_addf(buf, %d, weight);
+   notes_cache_put(weight_cache, commit-object.sha1,
+   buf.buf, buf.len);
+   strbuf_release(buf);
+   weight_cache_updated = 1;
+   return weight;
+}
+
 static int tip_weight(const char *tip, size_t reflen)
 {
struct strbuf buf = STRBUF_INIT;
@@ -69,7 +91,7 @@ static int tip_weight(const char *tip, size_t reflen)
if (!name)
die(Internal error: a tip without name '%s', tip);
if (!name-weight)
-   name-weight = compute_tip_weight(commit);
+   name-weight = get_tip_weight(commit);
return name-weight;
 }
 
@@ -346,6 +368,9 @@ int cmd_name_rev(int argc, const char **argv, const char 
*prefix)
if (all || transform_stdin)
cutoff = 0;
 
+   if (use_weight)
+   notes_cache_init(weight_cache, name-rev-weight, 
2012-08-29);
+
for (; argc; argc--, argv++) {
unsigned char sha1[20];
struct object *o;
@@ -401,5 +426,8 @@ int cmd_name_rev(int argc, const char **argv, const char 
*prefix)
  always, allow_undefined, data.name_only);
}
 
+   if (use_weight  weight_cache_updated)
+   notes_cache_write(weight_cache);
+
return 0;
 }
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] name-rev: --weight option (WIP)

2012-08-29 Thread Jeff King
On Wed, Aug 29, 2012 at 04:37:06PM -0700, Junio C Hamano wrote:

 Junio C Hamano gits...@pobox.com writes:
 
  Note that this is fairly expensive (see NEEDSWORK comment in the
  code).
 
 And this is with the notes-cache.
 [...]
 +static int get_tip_weight(struct commit *commit)
 +{
 + struct strbuf buf = STRBUF_INIT;
 + size_t sz;
 + int weight;
 + char *note = notes_cache_get(weight_cache, commit-object.sha1, sz);
 +
 + if (note  !strtol_i(note, 10, weight)) {
 + free(note);
 + return weight;
 + }
 + free(note);
 +
 + weight = compute_tip_weight(commit);
 + strbuf_addf(buf, %d, weight);
 + notes_cache_put(weight_cache, commit-object.sha1,
 + buf.buf, buf.len);
 + strbuf_release(buf);
 + weight_cache_updated = 1;
 + return weight;
 +}

It looks like you didn't update compute_tip_weight at all, so it will
still do the full traversal down to the roots. I wonder if you can
define the weight as a recursive function of the parents. Using the sum
of the weights of the parents is not right, because you would
double-count in this situation:

  A--B--C--D---M
  \   /
   E--F--G

That would double-count A and B in this example. But maybe there is
a clever way to define it that avoids that.

The advantage would be that you could cheaply find the weights of new
commits by only traversing back to the last cached one. I did something
similar with the generation number cache (but the recursive definition
is easier there).

 + if (use_weight)
 + notes_cache_init(weight_cache, name-rev-weight, 
 2012-08-29);

Is that a sufficient validity field? What about grafts or replace
objects? For the generation cache, I used a hash of the graft and
replace fields.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 0/6] describe --contains / name-rev --weight

2012-08-29 Thread Junio C Hamano
Greg KH noticed that the commit 0136db586c that was merged to the
mainline back in v3.5-rc1 days was merged again as part of a
different branch to the mainline before v3.6-rc1.

git describe --contains gives the commit a name based on the newer
v3.6-rc1 tag, which was surprising.

This is because describe --contains calls name-rev, which tries
to use the tag that minimizes the steps between the tag and the
commit being named.  The branch merged recently to the mainline did
not have as much work done on top of the commit as the old branch
that was merged earlier, and name-rev chose to use the newer tag
to base the name on.

The new --weight option introduced by this series tells it to use
the oldest tag that contains the commit being named instead.  This
matches what people expect from describe --contains better.

Junio C Hamano (6):
  name-rev: lose unnecessary typedef
  name_rev: clarify the logic to assign a new tip-name to a commit
  name-rev: --weight option
  name-rev --weight: cache the computed weight in notes
  name-rev --weight: tests and documentation
  describe --contains: use name-rev --weight

 Documentation/git-name-rev.txt |  14 ++-
 builtin/describe.c |   3 +-
 builtin/name-rev.c | 195 -
 t/t6039-name-rev.sh|  62 +
 4 files changed, 248 insertions(+), 26 deletions(-)
 create mode 100755 t/t6039-name-rev.sh

-- 
1.7.12.286.g9df01f7

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 1/6] name-rev: lose unnecessary typedef

2012-08-29 Thread Junio C Hamano
Just spell it struct rev_name; it makes it more clear what is
going on.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/name-rev.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 1b37458..8af2cfa 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -7,11 +7,11 @@
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
-typedef struct rev_name {
+struct rev_name {
const char *tip_name;
int generation;
int distance;
-} rev_name;
+};
 
 static long cutoff = LONG_MAX;
 
@@ -43,7 +43,7 @@ static void name_rev(struct commit *commit,
}
 
if (name == NULL) {
-   name = xmalloc(sizeof(rev_name));
+   name = xmalloc(sizeof(struct rev_name));
commit-util = name;
goto copy_data;
} else if (name-distance  distance) {
-- 
1.7.12.286.g9df01f7

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 2/6] name_rev: clarify the logic to assign a new tip-name to a commit

2012-08-29 Thread Junio C Hamano
In preparation for later changes, restructure the logic a little bit
to separate how the code decides to use the new tip for naming a
particular commit, and what happens based on the decision.

Also re-indent and correct style of this function while we are at it.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/name-rev.c | 45 +
 1 file changed, 25 insertions(+), 20 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 8af2cfa..ebbf541 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -19,12 +19,13 @@ static long cutoff = LONG_MAX;
 #define MERGE_TRAVERSAL_WEIGHT 65535
 
 static void name_rev(struct commit *commit,
-   const char *tip_name, int generation, int distance,
-   int deref)
+const char *tip_name, int generation, int distance,
+int deref)
 {
struct rev_name *name = (struct rev_name *)commit-util;
struct commit_list *parents;
-   int parent_number = 1;
+   int parent_number;
+   int use_this_tip = 0;
 
if (!commit-object.parsed)
parse_commit(commit);
@@ -42,21 +43,26 @@ static void name_rev(struct commit *commit,
die(generation: %d, but deref?, generation);
}
 
-   if (name == NULL) {
-   name = xmalloc(sizeof(struct rev_name));
+   if (!name) {
+   name = xcalloc(1, sizeof(struct rev_name));
commit-util = name;
-   goto copy_data;
-   } else if (name-distance  distance) {
-copy_data:
-   name-tip_name = tip_name;
-   name-generation = generation;
-   name-distance = distance;
-   } else
+   use_this_tip = 1;
+   }
+
+   if (distance  name-distance)
+   use_this_tip = 1;
+
+   if (!use_this_tip)
return;
 
-   for (parents = commit-parents;
-   parents;
-   parents = parents-next, parent_number++) {
+   name-tip_name = tip_name;
+   name-generation = generation;
+   name-distance = distance;
+
+   /* Propagate our name to our parents */
+   for (parents = commit-parents, parent_number = 1;
+parents;
+parents = parents-next, parent_number++) {
if (parent_number  1) {
int len = strlen(tip_name);
char *new_name = xmalloc(len +
@@ -68,16 +74,15 @@ copy_data:
len -= 2;
if (generation  0)
sprintf(new_name, %.*s~%d^%d, len, tip_name,
-   generation, parent_number);
+   generation, parent_number);
else
sprintf(new_name, %.*s^%d, len, tip_name,
-   parent_number);
-
+   parent_number);
name_rev(parents-item, new_name, 0,
-   distance + MERGE_TRAVERSAL_WEIGHT, 0);
+distance + MERGE_TRAVERSAL_WEIGHT, 0);
} else {
name_rev(parents-item, tip_name, generation + 1,
-   distance + 1, 0);
+distance + 1, 0);
}
}
 }
-- 
1.7.12.286.g9df01f7

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH v2 3/6] name-rev: --weight option

2012-08-29 Thread Junio C Hamano
Instead of naming a rev after a tip that is topologically closest,
use the tip that is the oldest one among those which contain the
rev.

The semantics name-rev --weight would give us is closer to what
people expect from describe --contains.

Note that this is fairly expensive to compute; a later change in the
series will cache the weight value using notes-cache.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/name-rev.c | 104 +++--
 1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index ebbf541..7cdb758 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -4,6 +4,8 @@
 #include tag.h
 #include refs.h
 #include parse-options.h
+#include diff.h
+#include revision.h
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
@@ -11,8 +13,85 @@ struct rev_name {
const char *tip_name;
int generation;
int distance;
+   int weight;
 };
 
+/*
+ * Historically, name-rev named a rev based on the tip that is
+ * topologically closest to it.
+ *
+ * It does not give a good answer to what is the earliest tag that
+ * contains the commit?, however, because you can build a new commit
+ * on top of an ancient commit X, merge it to the tip and tag the
+ * result, which would make X reachable from the new tag in two hops,
+ * even though it appears in the part of the history that is contained
+ * in other ancient tags.
+ *
+ * In order to answer that question, name-rev can be told to name a
+ * rev based on the tip that has smallest number of commits behind it.
+ */
+static int use_weight;
+
+/*
+ * NEEDSWORK: the result of this computation must be cached to
+ * a dedicated notes tree, keyed by the commit object name.
+ */
+static int compute_ref_weight(struct commit *commit)
+{
+   struct rev_info revs;
+   int weight = 1; /* give root the weight of 1 */
+
+   reset_revision_walk();
+   init_revisions(revs, NULL);
+   add_pending_object(revs, (struct object *)commit, NULL);
+   prepare_revision_walk(revs);
+   while (get_revision(revs))
+   weight++;
+   return weight;
+}
+
+static int ref_weight(const char *refname, size_t reflen)
+{
+   struct strbuf buf = STRBUF_INIT;
+   unsigned char sha1[20];
+   struct commit *commit;
+   struct rev_name *name;
+
+   strbuf_add(buf, refname, reflen);
+   if (get_sha1(buf.buf, sha1))
+   die(Internal error: cannot parse tip '%s', buf.buf);
+   strbuf_release(buf);
+
+   commit = lookup_commit_reference_gently(sha1, 0);
+   if (!commit)
+   die(Internal error: cannot look up commit '%s', buf.buf);
+   name = commit-util;
+   if (!name)
+   die(Internal error: a tip without name '%s', buf.buf);
+   if (!name-weight)
+   name-weight = compute_ref_weight(commit);
+   return name-weight;
+}
+
+static int tip_weight_cmp(const char *a, const char *b)
+{
+   size_t reflen_a, reflen_b;
+   static const char traversal[] = ^~;
+
+   /*
+* A tip may look like refname followed by traversal
+* instruction (e.g. ^2~74).  We only are interested in
+* the weight of the ref part.
+*/
+   reflen_a = strcspn(a, traversal);
+   reflen_b = strcspn(b, traversal);
+
+   if (reflen_a == reflen_b  !memcmp(a, b, reflen_a))
+   return 0;
+
+   return ref_weight(a, reflen_a) - ref_weight(b, reflen_b);
+}
+
 static long cutoff = LONG_MAX;
 
 /* How many generations are maximally preferred over _one_ merge traversal? */
@@ -49,8 +128,27 @@ static void name_rev(struct commit *commit,
use_this_tip = 1;
}
 
-   if (distance  name-distance)
-   use_this_tip = 1;
+   if (!use_weight) {
+   if (distance  name-distance)
+   use_this_tip = 1;
+   } else {
+   if (!name-tip_name)
+   use_this_tip = 1;
+   else {
+   /*
+* Pick a name based on the ref that is older,
+* i.e. having smaller number of commits
+* behind it.  Break the tie by picking the
+* path with smaller numer of steps to reach
+* that ref from the commit.
+*/
+   int cmp = tip_weight_cmp(name-tip_name, tip_name);
+   if (0  cmp)
+   use_this_tip = 1;
+   else if (!cmp  distance  name-distance)
+   use_this_tip = 1;
+   }
+   }
 
if (!use_this_tip)
return;
@@ -241,6 +339,8 @@ int cmd_name_rev(int argc, const char **argv, const char 
*prefix)
OPT_BOOLEAN(0, undefined, allow_undefined, allow to print 
`undefined` names),
OPT_BOOLEAN(0, 

[PATCH v2 4/6] name-rev --weight: cache the computed weight in notes

2012-08-29 Thread Junio C Hamano
With the weight assigned to tip of each ref, we can give more
intuitive results from name-rev that is more suitable to answer
what is the oldest tag that contains this commit?  However, this
number is very expensive to compute.

Use the notes-cache mechanism to cache this value.  The result
becomes usable again.

(priming the cache from scratch)
$ rm .git/refs/notes/name-rev-weight
$ /usr/bin/time git name-rev --weight --tags 0136db586c
0136db586c tags/v3.5-rc1~83^2~81^2~76
6.06user 0.46system 0:06.54elapsed 99%CPU (0avgtext+0avgdata 
1861456maxresident)k
8inputs+16outputs (0major+128576minor)pagefaults 0swaps

(with cache already primed)
$ /usr/bin/time git name-rev --weight --tags 0136db586c
0136db586c tags/v3.5-rc1~83^2~81^2~76
0.50user 0.22system 0:00.72elapsed 100%CPU (0avgtext+0avgdata 
244224maxresident)k
0inputs+0outputs (0major+16062minor)pagefaults 0swaps

(the old shortest path version)
$ /usr/bin/time git name-rev --tags 0136db586c
0136db586c tags/v3.6-rc1~59^2~56^2~76
0.31user 0.01system 0:00.32elapsed 100%CPU (0avgtext+0avgdata 
243488maxresident)k
0inputs+0outputs (0major+16000minor)pagefaults 0swaps

This version does not invalidate the cache in the presense of
modified grafts and object replacement, but in a later version we
can compute a hash of the grafts and replacement data and pass it as
the validity token to automatically invalidate the cache when these
data that affects the perceived topology change.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/name-rev.c | 56 +-
 1 file changed, 51 insertions(+), 5 deletions(-)

diff --git a/builtin/name-rev.c b/builtin/name-rev.c
index 7cdb758..d78eedd 100644
--- a/builtin/name-rev.c
+++ b/builtin/name-rev.c
@@ -6,6 +6,7 @@
 #include parse-options.h
 #include diff.h
 #include revision.h
+#include notes-cache.h
 
 #define CUTOFF_DATE_SLOP 86400 /* one day */
 
@@ -32,10 +33,6 @@ struct rev_name {
  */
 static int use_weight;
 
-/*
- * NEEDSWORK: the result of this computation must be cached to
- * a dedicated notes tree, keyed by the commit object name.
- */
 static int compute_ref_weight(struct commit *commit)
 {
struct rev_info revs;
@@ -50,6 +47,32 @@ static int compute_ref_weight(struct commit *commit)
return weight;
 }
 
+static struct notes_cache weight_cache;
+static int weight_cache_updated;
+
+static int get_ref_weight(struct commit *commit)
+{
+   struct strbuf buf = STRBUF_INIT;
+   size_t sz;
+   int weight;
+   char *note;
+
+   note = notes_cache_get(weight_cache, commit-object.sha1, sz);
+   if (note  !strtol_i(note, 10, weight)) {
+   free(note);
+   return weight;
+   }
+   free(note);
+
+   weight = compute_ref_weight(commit);
+   strbuf_addf(buf, %d, weight);
+   notes_cache_put(weight_cache, commit-object.sha1,
+   buf.buf, buf.len);
+   strbuf_release(buf);
+   weight_cache_updated = 1;
+   return weight;
+}
+
 static int ref_weight(const char *refname, size_t reflen)
 {
struct strbuf buf = STRBUF_INIT;
@@ -69,7 +92,7 @@ static int ref_weight(const char *refname, size_t reflen)
if (!name)
die(Internal error: a tip without name '%s', buf.buf);
if (!name-weight)
-   name-weight = compute_ref_weight(commit);
+   name-weight = get_ref_weight(commit);
return name-weight;
 }
 
@@ -323,6 +346,22 @@ static void name_rev_line(char *p, struct name_ref_data 
*data)
fwrite(p_start, p - p_start, 1, stdout);
 }
 
+static const char *get_validity_token(void)
+{
+   /*
+* In future versions, we may want to automatically invalidate
+* the cached weight data whenever grafts and replacement
+* changes.  We could do so by (1) reading the contents of the
+* grafts file, (2) enumerating the replacement data (original
+* object name and replacement object name) and sorting the
+* result, and (3) concatenating (1) and (2) and hashing it,
+* to come up with dynamic validity: [0-9a-f]{40} or something.
+*
+* In this verison, we simply do not bother ;-).
+*/
+   return static validity token;
+}
+
 int cmd_name_rev(int argc, const char **argv, const char *prefix)
 {
struct object_array revs = OBJECT_ARRAY_INIT;
@@ -353,6 +392,10 @@ int cmd_name_rev(int argc, const char **argv, const char 
*prefix)
if (all || transform_stdin)
cutoff = 0;
 
+   if (use_weight)
+   notes_cache_init(weight_cache, name-rev-weight,
+get_validity_token());
+
for (; argc; argc--, argv++) {
unsigned char sha1[20];
struct object *o;
@@ -408,5 +451,8 @@ int cmd_name_rev(int argc, const char **argv, const char 
*prefix)
 

[PATCH v2 6/6] describe --contains: use name-rev --weight

2012-08-29 Thread Junio C Hamano
And this is the logical conclusion of the series, to
allow 0136db586c in the kernel history to be described
as v3.5-rc1~83^2~81^2~76, not as v3.6-rc1~59^2~56^2~76.

Signed-off-by: Junio C Hamano gits...@pobox.com
---
 builtin/describe.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/builtin/describe.c b/builtin/describe.c
index 9f63067..fbd5be5 100644
--- a/builtin/describe.c
+++ b/builtin/describe.c
@@ -436,11 +436,12 @@ int cmd_describe(int argc, const char **argv, const char 
*prefix)
die(_(--long is incompatible with --abbrev=0));
 
if (contains) {
-   const char **args = xmalloc((7 + argc) * sizeof(char *));
+   const char **args = xmalloc((8 + argc) * sizeof(char *));
int i = 0;
args[i++] = name-rev;
args[i++] = --name-only;
args[i++] = --no-undefined;
+   args[i++] = --weight;
if (always)
args[i++] = --always;
if (!all) {
-- 
1.7.12.286.g9df01f7

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] name-rev: --weight option (WIP)

2012-08-29 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 OK. I didn't think too hard about it, so I'll trust you that it is not
 easy. I wonder if using the generation number would be another way of
 defining oldest that would be easier to calculate.

Go back to my illustration to Greg and think about the implication
of TONS! side branch in the picture has on the generation numbers.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 3/3] name-rev: --weight option (WIP)

2012-08-29 Thread Junio C Hamano
Junio C Hamano gits...@pobox.com writes:

 Jeff King p...@peff.net writes:

 OK. I didn't think too hard about it, so I'll trust you that it is not
 easy. I wonder if using the generation number would be another way of
 defining oldest that would be easier to calculate.

 Go back to my illustration to Greg and think about the implication
 of TONS! side branch in the picture has on the generation numbers.

That is, we want some number (I called it weight) that grows when
TONS! side branch is heavy.  Generation numbers are about giving
larger numbers for _longer_ chains, but a long and thin side branch
does not contribute as much as a medium length but a very fat side
branch when merged.

So generation numbers might be a candidate for approximation, but I
do not think it will give us a _good_ approximation.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] test: set the realpath of CWD as TRASH_DIRECTORY

2012-08-29 Thread Michael Haggerty
On 08/29/2012 06:33 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 On 08/29/2012 08:06 AM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:

 It is in fact the whole approach that I object to.
 ...
 What exactly is broken in CEILING?

 CEILING is used in setup_git_directory_gently_1() when trying to find
 the GIT_DIR appropriate for the current directory.  The entries in
 CEILING are compared textually to getcwd(), looking for the entry that
 is the longest proper prefix of PWD.  This is then used to limit a loop
 that is vaguely

 while (!is_git_directory())
 chdir(..);

 The problem, as I understand it, is that when the tests are run with
 root set to a path that includes a symlink, then test and
 TRASH_DIRECTORY are set to the textual value of $root/trash
 directory.t, but then a little bit later test-lib.sh chdirs into
 the trash directory using cd -P $test (thereby resolving symlinks).
 So, taking my concrete example --root=/dev/shm where /dev/shm is a
 symlink to /run/shm, we have

 test=/dev/shm/trash directory.t
 TRASH_DIRECTORY=$test
 ...
 cd -P $test

 which results in PWD being /run/shm/trash directory.t.
 
 The components of CEILING should be adjusted to strip the symlink so
 that it begins with /run/shm/ before it is used; otherwise it
 would not work.  As the current code does not do that at runtime (I
 am describing the situation, not justifying), it won't match if
 CEILING is built out of $TRASH_DIRECTORY.  In the above, setting of
 $test and $TRASH is wrong; it does not match the realpath.
 
 So I somehow thought that Jiang's patch was to make sure any
 variables that contain pathnames (and make sure future paths we
 might grab out of $(pwd)) are realpath without symlinks early in the
 test set-up, in my previous message was not correct.  The patch is
 not doing that, and that is what breaks your setup.

I've confused things by not being clear what I was describing.  The
description that you quoted above was the state *before* Jiang's patch.
 Jiang's patch makes sure that $TRASH_DIRECTORY is a realpath.  The
working directory was already a realpath before his patch (due to commit
1bd9c648).  There are some other variables that are not necessarily
realpaths, even after Jiang's patch; for example (via a casual look at
test-lib.sh): $remove_trash, $HOME, $test, $TEST_DIRECTORY,
$test_results_dir, $GIT_BUILD_DIR.  I haven't checked whether these
variables are used in ways that could cause other problems.

When Jiang's patch is applied the test suite runs to completion without
any failures on my system even when I pass --root=/dev/shm (a symlink).

 My preference is to set things up in such a way that most of the
 tests do not have to worry about these symlink aliases.  I know you
 said you do not like the whole approach, but I'd like to see our
 tests run in a stable and reproducible testing environment.
 
 We should have specific tests (on symlink enabled platforms) that
 creates a directory and an alias to it via a symlink, goes into it
 and checks the CEILING (and whatever else) behaviour.  We know that
 the current code does not account for the alias introduced by
 symlinks, and setup_git_directory_gently() may want to be patched to
 fix _that_ specific test.

Yes, stable and reproducible is good.  And explicit tests for
symlink-sensitive code would be good.  But how do we find the code that
needs explicit testing for symlink tolerance?

Therefore (in addition to specific tests) I think it would be good to
have an easy way to run the *whole* test suite in a symlink environment.
 For example, we could add a test suite option that *explicitly* makes
all tests run in a directory containing symlinks.  Or we could even do
that *all* of the time on systems that support symlinks [1]--that would
be a stable and reproducible testing environment that *does* detect many
symlink problems rather than hiding them.  (It seems unlikely the the
use of symlinks would *hide* other bugs.)  Something along the lines of

mkdir -p $test.dir
ln -s $test.dir $test

By the way, is the use of realpath(3) permissible in git code?
GIT_CEILING_DIRECTORIES handling could be fixed relatively easily by
using this function to canonicalize pathnames before comparison.

Michael

[1] This would be analogous to the inclusion of a space in trash
directory.*, which I presume was done to detect space-handling problems
quickly.

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH v2] test: set the realpath of CWD as TRASH_DIRECTORY

2012-08-29 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 By the way, is the use of realpath(3) permissible in git code?
 GIT_CEILING_DIRECTORIES handling could be fixed relatively easily by
 using this function to canonicalize pathnames before comparison.

As long as we can add a compat/realpath.c (perhaps lift one from
glibc before they went GPLv3) for platforms that matter, I do not
see it as a huge problem.  How close is abspath.c::real_path() to
what you need?

 [1] This would be analogous to the inclusion of a space in trash
 directory.*, which I presume was done to detect space-handling problems
 quickly.

Yeah, understood where you are coming from, and I think I can agree
with where you are trying to go.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html