from:"Elijah Newren"

[PATCH 1/1] Documentation: fix a bunch of typos, both old and new

2019-10-22 Thread Elijah Newren via GitGitGadget

From: Elijah Newren 

Signed-off-by: Elijah Newren 
---
 Documentation/CodingGuidelines |  2 +-
 Documentation/RelNotes/1.7.0.2.txt |  2 +-
 Documentation/RelNotes/1.7.10.4.txt|  2 +-
 Documentation/RelNotes/1.7.12.3.txt|  2 +-
 Documentation/RelNotes/1.7.5.3.txt |  2 +-
 Documentation/RelNotes/1.8.0.txt   |  2 +-
 Documentation/RelNotes/2.1.3.txt   |  2 +-
 Documentation/RelNotes/2.10.0.txt  |  2 +-
 Documentation/RelNotes/2.10.2.txt  |  2 +-
 Documentation/RelNotes/2.11.1.txt  |  2 +-
 Documentation/RelNotes/2.12.0.txt  |  2 +-
 Documentation/RelNotes/2.13.3.txt  |  4 ++--
 Documentation/RelNotes/2.14.0.txt  |  4 ++--
 Documentation/RelNotes/2.16.0.txt  |  2 +-
 Documentation/RelNotes/2.16.3.txt  |  2 +-
 Documentation/RelNotes/2.17.0.txt  |  2 +-
 Documentation/RelNotes/2.18.0.txt  |  2 +-
 Documentation/RelNotes/2.19.0.txt  |  2 +-
 Documentation/RelNotes/2.24.0.txt  |  2 +-
 Documentation/RelNotes/2.3.3.txt   |  2 +-
 Documentation/RelNotes/2.3.7.txt   |  2 +-
 Documentation/RelNotes/2.4.3.txt   |  2 +-
 Documentation/RelNotes/2.8.0.txt   |  2 +-
 Documentation/RelNotes/2.9.3.txt   |  2 +-
 Documentation/config/tag.txt   |  2 +-
 Documentation/git-bisect-lk2009.txt|  2 +-
 Documentation/git-check-attr.txt   |  2 +-
 Documentation/git-check-ignore.txt |  2 +-
 Documentation/git-filter-branch.txt|  2 +-
 Documentation/git-range-diff.txt   |  2 +-
 Documentation/git-tag.txt  |  2 +-
 Documentation/gitattributes.txt|  2 +-
 Documentation/gitmodules.txt   |  2 +-
 Documentation/technical/api-trace2.txt | 14 +++---
 Documentation/technical/commit-graph.txt   | 12 ++--
 .../technical/hash-function-transition.txt |  2 +-
 Documentation/technical/index-format.txt   |  4 ++--
 Documentation/technical/partial-clone.txt  |  2 +-
 Documentation/technical/protocol-v2.txt|  2 +-
 Documentation/technical/rerere.txt |  2 +-
 40 files changed, 54 insertions(+), 54 deletions(-)

diff --git a/Documentation/CodingGuidelines b/Documentation/CodingGuidelines
index f45db5b727..d05a80fe9d 100644
--- a/Documentation/CodingGuidelines
+++ b/Documentation/CodingGuidelines
@@ -75,7 +75,7 @@ For shell scripts specifically (not exhaustive):
 
  - If you want to find out if a command is available on the user's
$PATH, you should use 'type ', instead of 'which '.
-   The output of 'which' is not machine parseable and its exit code
+   The output of 'which' is not machine parsable and its exit code
is not reliable across platforms.
 
  - We use POSIX compliant parameter substitutions and avoid bashisms;
diff --git a/Documentation/RelNotes/1.7.0.2.txt 
b/Documentation/RelNotes/1.7.0.2.txt
index fcb46ca6a4..73ed2b5278 100644
--- a/Documentation/RelNotes/1.7.0.2.txt
+++ b/Documentation/RelNotes/1.7.0.2.txt
@@ -34,7 +34,7 @@ Fixes since v1.7.0.1
  * "git status" in 1.7.0 lacked the optimization we used to have in 1.6.X 
series
to speed up scanning of large working tree.
 
- * "gitweb" did not diagnose parsing errors properly while reading tis 
configuration
+ * "gitweb" did not diagnose parsing errors properly while reading its 
configuration
file.
 
 And other minor fixes and documentation updates.
diff --git a/Documentation/RelNotes/1.7.10.4.txt 
b/Documentation/RelNotes/1.7.10.4.txt
index 326670df6e..57597f2bf3 100644
--- a/Documentation/RelNotes/1.7.10.4.txt
+++ b/Documentation/RelNotes/1.7.10.4.txt
@@ -7,7 +7,7 @@ Fixes since v1.7.10.3
  * The message file for Swedish translation has been updated a bit.
 
  * A name taken from mailmap was copied into an internal buffer
-   incorrectly and could overun the buffer if it is too long.
+   incorrectly and could overrun the buffer if it is too long.
 
  * A malformed commit object that has a header line chomped in the
middle could kill git with a NULL pointer dereference.
diff --git a/Documentation/RelNotes/1.7.12.3.txt 
b/Documentation/RelNotes/1.7.12.3.txt
index ecda427a35..4b822976b8 100644
--- a/Documentation/RelNotes/1.7.12.3.txt
+++ b/Documentation/RelNotes/1.7.12.3.txt
@@ -25,7 +25,7 @@ Fixes since v1.7.12.2
its Accept-Encoding header.
 
  * "git receive-pack" (the counterpart to "git push") did not give
-   progress output while processing objects it received to the puser
+   progress output while processing objects it received to

[PATCH 0/1] Thyme two ficks sum Documentaton tyops and speling erors!

2019-10-22 Thread Elijah Newren via GitGitGadget

We have a number of typos and spelling errors that I spotted under
Documentation/.

It'd be nice if someone could double check that I placed the missing right
parenthesis correctly in Documentation/technical/api-trace2.txt. Also, not
sure if folks would be happy or unhappy with me un-splitting a word in
commit-graph.txt.

Elijah Newren (1):
  Documentation: fix a bunch of typos, both old and new

 Documentation/CodingGuidelines |  2 +-
 Documentation/RelNotes/1.7.0.2.txt |  2 +-
 Documentation/RelNotes/1.7.10.4.txt|  2 +-
 Documentation/RelNotes/1.7.12.3.txt|  2 +-
 Documentation/RelNotes/1.7.5.3.txt |  2 +-
 Documentation/RelNotes/1.8.0.txt   |  2 +-
 Documentation/RelNotes/2.1.3.txt   |  2 +-
 Documentation/RelNotes/2.10.0.txt  |  2 +-
 Documentation/RelNotes/2.10.2.txt  |  2 +-
 Documentation/RelNotes/2.11.1.txt  |  2 +-
 Documentation/RelNotes/2.12.0.txt  |  2 +-
 Documentation/RelNotes/2.13.3.txt  |  4 ++--
 Documentation/RelNotes/2.14.0.txt  |  4 ++--
 Documentation/RelNotes/2.16.0.txt  |  2 +-
 Documentation/RelNotes/2.16.3.txt  |  2 +-
 Documentation/RelNotes/2.17.0.txt  |  2 +-
 Documentation/RelNotes/2.18.0.txt  |  2 +-
 Documentation/RelNotes/2.19.0.txt  |  2 +-
 Documentation/RelNotes/2.24.0.txt  |  2 +-
 Documentation/RelNotes/2.3.3.txt   |  2 +-
 Documentation/RelNotes/2.3.7.txt   |  2 +-
 Documentation/RelNotes/2.4.3.txt   |  2 +-
 Documentation/RelNotes/2.8.0.txt   |  2 +-
 Documentation/RelNotes/2.9.3.txt   |  2 +-
 Documentation/config/tag.txt   |  2 +-
 Documentation/git-bisect-lk2009.txt|  2 +-
 Documentation/git-check-attr.txt   |  2 +-
 Documentation/git-check-ignore.txt |  2 +-
 Documentation/git-filter-branch.txt|  2 +-
 Documentation/git-range-diff.txt   |  2 +-
 Documentation/git-tag.txt  |  2 +-
 Documentation/gitattributes.txt|  2 +-
 Documentation/gitmodules.txt   |  2 +-
 Documentation/technical/api-trace2.txt | 14 +++---
 Documentation/technical/commit-graph.txt   | 12 ++--
 .../technical/hash-function-transition.txt |  2 +-
 Documentation/technical/index-format.txt   |  4 ++--
 Documentation/technical/partial-clone.txt  |  2 +-
 Documentation/technical/protocol-v2.txt|  2 +-
 Documentation/technical/rerere.txt |  2 +-
 40 files changed, 54 insertions(+), 54 deletions(-)


base-commit: d966095db01190a2196e31195ea6fa0c722aa732
Published-As: 
https://github.com/gitgitgadget/git/releases/tag/pr-418%2Fnewren%2Ftypo-fixes-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-418/newren/typo-fixes-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/418
-- 
gitgitgadget

[PATCH v2 0/3] Dir rename fixes

2019-10-22 Thread Elijah Newren via GitGitGadget

This series improves a couple things found after looking into things Dscho
flagged:

 * clarify and slightly restructure code in the get_renamed_dir_portion()
   function
 * extend support of detecting renaming/merging of one directory into
   another to support the root directory as a target directory

First patch best viewed with a --histogram diff (sorry, gitgitgadget does
not yet know how to generate those).

Changes since v1:

 * Incorporated code cleanups suggested by Dscho
 * Fixed to work with an alternate rename-to-root-directory case (end_of_new
   == NULL), with new testcase
 * Added a new patch to the end of the series to stop making setup tests be
   part of a separate test_expect_success block.

Elijah Newren (3):
  merge-recursive: clean up get_renamed_dir_portion()
  merge-recursive: fix merging a subdirectory into the root directory
  t604[236]: do not run setup in separate tests

 merge-recursive.c  | 104 -
 t/t6042-merge-rename-corner-cases.sh   | 111 +++--
 t/t6043-merge-rename-directories.sh| 568 -
 t/t6046-merge-skip-unneeded-updates.sh | 135 +++---
 4 files changed, 582 insertions(+), 336 deletions(-)


base-commit: 08da6496b61341ec45eac36afcc8f94242763468
Published-As: 
https://github.com/gitgitgadget/git/releases/tag/pr-390%2Fnewren%2Fdir-rename-fixes-v2
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-390/newren/dir-rename-fixes-v2
Pull-Request: https://github.com/gitgitgadget/git/pull/390

Range-diff vs v1:

 1:  8ae78679c9 = 1:  8ae78679c9 merge-recursive: clean up 
get_renamed_dir_portion()
 2:  37aee862e1 ! 2:  a1e80e8fbb merge-recursive: fix merging a subdirectory 
into the root directory
 @@ -34,9 +34,41 @@
strbuf_grow(&new_path, newlen);
strbuf_addbuf(&new_path, &entry->new_dir);
  @@
 -  *end_of_old == *end_of_new)
 -  return; /* We haven't modified *old_dir or *new_dir yet. */
 +   */
 +  end_of_old = strrchr(old_path, '/');
 +  end_of_new = strrchr(new_path, '/');
 +- if (end_of_old == NULL || end_of_new == NULL)
 +- return; /* We haven't modified *old_dir or *new_dir yet. */
 ++
 ++ /*
 ++  * If end_of_old is NULL, old_path wasn't in a directory, so there
 ++  * could not be a directory rename (our rule elsewhere that a
 ++  * directory which still exists is not considered to have been
 ++  * renamed means the root directory can never be renamed -- because
 ++  * the root directory always exists).
 ++  */
 ++ if (end_of_old == NULL)
 ++ return; /* Note: *old_dir and *new_dir are still NULL */
 ++
 ++ /*
 ++  * If new_path contains no directory (end_of_new is NULL), then we
 ++  * have a rename of old_path's directory to the root directory.
 ++  */
 ++ if (end_of_new == NULL) {
 ++ *old_dir = xstrndup(old_path, end_of_old - old_path);
 ++ *new_dir = xstrdup("");
 ++ return;
 ++ }
   
 +  /* Find the first non-matching character traversing backwards */
 +  while (*--end_of_new == *--end_of_old &&
 +@@
 +   */
 +  if (end_of_old == old_path && end_of_new == new_path &&
 +  *end_of_old == *end_of_new)
 +- return; /* We haven't modified *old_dir or *new_dir yet. */
 ++ return; /* Note: *old_dir and *new_dir are still NULL */
 ++
  + /*
  +  * If end_of_new got back to the beginning of its string, and
  +  * end_of_old got back to the beginning of some subdirectory, then
 @@ -44,21 +76,19 @@
  +  * needs slightly special handling.
  +  *
  +  * Note: There is no need to consider the opposite case, with a
 -+  * rename/merge of the root directory into some subdirectory.
 -+  * Our rule elsewhere that a directory which still exists is not
 -+  * considered to have been renamed means the root directory can
 -+  * never be renamed (because the root directory always exists).
 ++  * rename/merge of the root directory into some subdirectory
 ++  * because as noted above the root directory always exists so it
 ++  * cannot be considered to be renamed.
  +  */
  + if (end_of_new == new_path &&
  + end_of_old != old_path && end_of_old[-1] == '/') {
 -+ *old_dir = xstrndup(old_path, end_of_old-1 - old_path);
 -+ *new_dir = xstrndup(new_path, end_of_new - new_path);
 ++ *old_dir = xstrndup(old_path, --end_of_old - old_path);
 ++ *new_dir = xstrdup("");
  + return;
  + }
 -+
 + 
/*
 * We've found the first non-matching character in the directory
 -   * paths.  That means the current characters we were looking at
  
   diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-m

[PATCH v2 2/3] merge-recursive: fix merging a subdirectory into the root directory

2019-10-22 Thread Elijah Newren via GitGitGadget

From: Elijah Newren 

We allow renaming all entries in e.g. a directory named z/ into a
directory named y/ to be detected as a z/ -> y/ rename, so that if the
other side of history adds any files to the directory z/ in the mean
time, we can provide the hint that they should be moved to y/.

There is no reason to not allow 'y/' to be the root directory, but the
code did not handle that case correctly.  Add a testcase and the
necessary special checks to support this case.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   |  52 -
 t/t6043-merge-rename-directories.sh | 114 
 2 files changed, 163 insertions(+), 3 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index f80e48f623..ec60715368 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1931,6 +1931,16 @@ static char *apply_dir_rename(struct dir_rename_entry 
*entry,
return NULL;
 
oldlen = strlen(entry->dir);
+   if (entry->new_dir.len == 0)
+   /*
+* If someone renamed/merged a subdirectory into the root
+* directory (e.g. 'some/subdir' -> ''), then we want to
+* avoid returning
+* '' + '/filename'
+* as the rename; we need to make old_path + oldlen advance
+* past the '/' character.
+*/
+   oldlen++;
newlen = entry->new_dir.len + (strlen(old_path) - oldlen) + 1;
strbuf_grow(&new_path, newlen);
strbuf_addbuf(&new_path, &entry->new_dir);
@@ -1963,8 +1973,26 @@ static void get_renamed_dir_portion(const char 
*old_path, const char *new_path,
 */
end_of_old = strrchr(old_path, '/');
end_of_new = strrchr(new_path, '/');
-   if (end_of_old == NULL || end_of_new == NULL)
-   return; /* We haven't modified *old_dir or *new_dir yet. */
+
+   /*
+* If end_of_old is NULL, old_path wasn't in a directory, so there
+* could not be a directory rename (our rule elsewhere that a
+* directory which still exists is not considered to have been
+* renamed means the root directory can never be renamed -- because
+* the root directory always exists).
+*/
+   if (end_of_old == NULL)
+   return; /* Note: *old_dir and *new_dir are still NULL */
+
+   /*
+* If new_path contains no directory (end_of_new is NULL), then we
+* have a rename of old_path's directory to the root directory.
+*/
+   if (end_of_new == NULL) {
+   *old_dir = xstrndup(old_path, end_of_old - old_path);
+   *new_dir = xstrdup("");
+   return;
+   }
 
/* Find the first non-matching character traversing backwards */
while (*--end_of_new == *--end_of_old &&
@@ -1978,7 +2006,25 @@ static void get_renamed_dir_portion(const char 
*old_path, const char *new_path,
 */
if (end_of_old == old_path && end_of_new == new_path &&
*end_of_old == *end_of_new)
-   return; /* We haven't modified *old_dir or *new_dir yet. */
+   return; /* Note: *old_dir and *new_dir are still NULL */
+
+   /*
+* If end_of_new got back to the beginning of its string, and
+* end_of_old got back to the beginning of some subdirectory, then
+* we have a rename/merge of a subdirectory into the root, which
+* needs slightly special handling.
+*
+* Note: There is no need to consider the opposite case, with a
+* rename/merge of the root directory into some subdirectory
+* because as noted above the root directory always exists so it
+* cannot be considered to be renamed.
+*/
+   if (end_of_new == new_path &&
+   end_of_old != old_path && end_of_old[-1] == '/') {
+   *old_dir = xstrndup(old_path, --end_of_old - old_path);
+   *new_dir = xstrdup("");
+   return;
+   }
 
/*
 * We've found the first non-matching character in the directory
diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index c966147d5d..32cdd1f493 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -4051,6 +4051,120 @@ test_expect_success '12c-check: Moving one directory 
hierarchy into another w/ c
)
 '
 
+# Testcase 12d, Rename/merge of subdirectory into the root
+#   Commit O: a/b/subdir/foo
+#   Commit A: subdir/foo
+#   Commit B: a/b/subdir/foo, a/b/bar
+#   Expected: subdir/foo, bar
+
+test_expect_success '12d-setup: Rename/merge subdir into the root, variant 1' '
+   test_create_repo 12d &&
+

[PATCH v2 3/3] t604[236]: do not run setup in separate tests

2019-10-22 Thread Elijah Newren via GitGitGadget

From: Elijah Newren 

Transform the setup "tests" to setup functions, and have the actual
tests call the setup functions.  Advantages:

  * Should make life easier for people working with webby CI/PR builds
who have to abuse mice (and their own index finger as well) in
order to switch from viewing one testcase to another.  Sounds
awful; hopefully this will improve things for them.

  * Improves re-runnability: any failed test in any of these three
files can now be re-run in isolation, e.g.
   ./t6042* --ver --imm -x --run=21
whereas before it would require two tests to be specified to the
--run argument, the other needing to be picked out as the relevant
setup test from one or two tests before.

  * Importantly, this still keeps the "setup" and "test" sections
somewhat separate to make it easier for readers to discern what is
just ancillary setup and what the intent of the test is.

Signed-off-by: Elijah Newren 
---
 t/t6042-merge-rename-corner-cases.sh   | 111 +++---
 t/t6043-merge-rename-directories.sh| 466 ++---
 t/t6046-merge-skip-unneeded-updates.sh | 135 ---
 3 files changed, 393 insertions(+), 319 deletions(-)

diff --git a/t/t6042-merge-rename-corner-cases.sh 
b/t/t6042-merge-rename-corner-cases.sh
index c5b57f40c3..b047cf1c1c 100755
--- a/t/t6042-merge-rename-corner-cases.sh
+++ b/t/t6042-merge-rename-corner-cases.sh
@@ -5,7 +5,7 @@ test_description="recursive merge corner cases w/ renames but 
not criss-crosses"
 
 . ./test-lib.sh
 
-test_expect_success 'setup rename/delete + untracked file' '
+test_setup_rename_delete_untracked () {
test_create_repo rename-delete-untracked &&
(
cd rename-delete-untracked &&
@@ -29,9 +29,10 @@ test_expect_success 'setup rename/delete + untracked file' '
git commit -m track-people-instead-of-objects &&
echo "Myyy PRECIOUSSS" >ring
)
-'
+}
 
 test_expect_success "Does git preserve Gollum's precious artifact?" '
+   test_setup_rename_delete_untracked &&
(
cd rename-delete-untracked &&
 
@@ -49,7 +50,7 @@ test_expect_success "Does git preserve Gollum's precious 
artifact?" '
 #
 # We should be able to merge B & C cleanly
 
-test_expect_success 'setup rename/modify/add-source conflict' '
+test_setup_rename_modify_add_source () {
test_create_repo rename-modify-add-source &&
(
cd rename-modify-add-source &&
@@ -70,9 +71,10 @@ test_expect_success 'setup rename/modify/add-source 
conflict' '
git add a &&
git commit -m C
)
-'
+}
 
 test_expect_failure 'rename/modify/add-source conflict resolvable' '
+   test_setup_rename_modify_add_source &&
(
cd rename-modify-add-source &&
 
@@ -88,7 +90,7 @@ test_expect_failure 'rename/modify/add-source conflict 
resolvable' '
)
 '
 
-test_expect_success 'setup resolvable conflict missed if rename missed' '
+test_setup_break_detection_1 () {
test_create_repo break-detection-1 &&
(
cd break-detection-1 &&
@@ -110,9 +112,10 @@ test_expect_success 'setup resolvable conflict missed if 
rename missed' '
git add a &&
git commit -m C
)
-'
+}
 
 test_expect_failure 'conflict caused if rename not detected' '
+   test_setup_break_detection_1 &&
(
cd break-detection-1 &&
 
@@ -135,7 +138,7 @@ test_expect_failure 'conflict caused if rename not 
detected' '
)
 '
 
-test_expect_success 'setup conflict resolved wrong if rename missed' '
+test_setup_break_detection_2 () {
test_create_repo break-detection-2 &&
(
cd break-detection-2 &&
@@ -160,9 +163,10 @@ test_expect_success 'setup conflict resolved wrong if 
rename missed' '
git add a &&
git commit -m E
)
-'
+}
 
 test_expect_failure 'missed conflict if rename not detected' '
+   test_setup_break_detection_2 &&
(
cd break-detection-2 &&
 
@@ -182,7 +186,7 @@ test_expect_failure 'missed conflict if rename not 
detected' '
 #   Commit B: rename a->b
 #   Commit C: rename a->b, add unrelated a
 
-test_expect_success 'setup undetected rename/add-source causes data loss' '
+test_setup_break_detection_3 () {
test_create_repo break-detection-3 &&
(
cd break-detection-3 &

[PATCH v2 1/3] merge-recursive: clean up get_renamed_dir_portion()

2019-10-22 Thread Elijah Newren via GitGitGadget

From: Elijah Newren 

Dscho noted a few things making this function hard to follow.
Restructure it a bit and add comments to make it easier to follow.  The
restructurings include:

  * There was a special case if-check at the end of the function
checking whether someone just renamed a file within its original
directory, meaning that there could be no directory rename involved.
That check was slightly convoluted; it could be done in a more
straightforward fashion earlier in the function, and can be done
more cheaply too (no call to strncmp).

  * The conditions for advancing end_of_old and end_of_new before
calling strchr were both confusing and unnecessary.  If either
points at a '/', then they need to be advanced in order to find the
next '/'.  If either doesn't point at a '/', then advancing them one
char before calling strchr() doesn't hurt.  So, just rip out the
if conditions and advance both before calling strchr().

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 60 ---
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 22a12cfeba..f80e48f623 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1943,8 +1943,8 @@ static void get_renamed_dir_portion(const char *old_path, 
const char *new_path,
char **old_dir, char **new_dir)
 {
char *end_of_old, *end_of_new;
-   int old_len, new_len;
 
+   /* Default return values: NULL, meaning no rename */
*old_dir = NULL;
*new_dir = NULL;
 
@@ -1955,43 +1955,55 @@ static void get_renamed_dir_portion(const char 
*old_path, const char *new_path,
 *"a/b/c/d" was renamed to "a/b/some/thing/else"
 * so, for this example, this function returns "a/b/c/d" in
 * *old_dir and "a/b/some/thing/else" in *new_dir.
-*
-* Also, if the basename of the file changed, we don't care.  We
-* want to know which portion of the directory, if any, changed.
+*/
+
+   /*
+* If the basename of the file changed, we don't care.  We want
+* to know which portion of the directory, if any, changed.
 */
end_of_old = strrchr(old_path, '/');
end_of_new = strrchr(new_path, '/');
-
if (end_of_old == NULL || end_of_new == NULL)
-   return;
+   return; /* We haven't modified *old_dir or *new_dir yet. */
+
+   /* Find the first non-matching character traversing backwards */
while (*--end_of_new == *--end_of_old &&
   end_of_old != old_path &&
   end_of_new != new_path)
; /* Do nothing; all in the while loop */
+
/*
-* We've found the first non-matching character in the directory
-* paths.  That means the current directory we were comparing
-* represents the rename.  Move end_of_old and end_of_new back
-* to the full directory name.
+* If both got back to the beginning of their strings, then the
+* directory didn't change at all, only the basename did.
 */
-   if (*end_of_old == '/')
-   end_of_old++;
-   if (*end_of_old != '/')
-   end_of_new++;
-   end_of_old = strchr(end_of_old, '/');
-   end_of_new = strchr(end_of_new, '/');
+   if (end_of_old == old_path && end_of_new == new_path &&
+   *end_of_old == *end_of_new)
+   return; /* We haven't modified *old_dir or *new_dir yet. */
 
/*
-* It may have been the case that old_path and new_path were the same
-* directory all along.  Don't claim a rename if they're the same.
+* We've found the first non-matching character in the directory
+* paths.  That means the current characters we were looking at
+* were part of the first non-matching subdir name going back from
+* the end of the strings.  Get the whole name by advancing both
+* end_of_old and end_of_new to the NEXT '/' character.  That will
+* represent the entire directory rename.
+*
+* The reason for the increment is cases like
+*a/b/star/foo/whatever.c -> a/b/tar/foo/random.c
+* After dropping the basename and going back to the first
+* non-matching character, we're now comparing:
+*a/b/s  and a/b/
+* and we want to be comparing:
+*a/b/star/  and a/b/tar/
+* but without the pre-increment, the one on the right would stay
+* a/b/.
 */
-   old_len = end_of_old - old_path;
-   new_len = end_of_new - new_path;
+   end_of_old = strchr(

Re: [PATCH 2/2] merge-recursive: fix merging a subdirectory into the root directory

2019-10-22 Thread Elijah Newren

Sorry for the long delay before getting back to this; the other stuff
I was working on took longer than expected.

On Mon, Oct 14, 2019 at 3:42 AM Johannes Schindelin
 wrote:
> On Sat, 12 Oct 2019, Elijah Newren wrote:
> > On Sat, Oct 12, 2019 at 1:37 PM Johannes Schindelin
> >  wrote:
> > >
> > > For the record: I am still a huge anti-fan of splitting `setup` test
> > > cases from the test cases that do actual things, _unless_ it is
> > > _one_, and _only one_, big, honking `setup` test case that is the
> > > very first one in the test script.
[...]
> > The one thing I do agree with you on is test cases need to be
> > optimized for when they report breakages, but that is precisely what
> > led me to splitting setup and testing.
>
> To me, it is so not helpful _not_ to see the output of a `setup` that
> succeeded, and only the output of the actual test that actually failed.
>
> It removes context.
>
> I need to understand the scenario where the breakage happens, and the
> only way I can understand is when I understand the context.
>
> So the context needs to be as close as possible.

I've updated the patch series with a change that I hope helps while
still allowing the setup "steps" to be visibly differentiated from the
testing steps.

> > Way too many tests in the testsuite intersperse several setup and test
> > cases together making it really hard to disentangle, understand what
> > is going on, or even reverse engineer what is relevant.  The absolute
> > worst tests are the ones which just keep making additional changes to
> > some existing repo to provide extra setup, causing all sorts of
> > problems for skipping and resuming and understanding (to say nothing
> > of test prerequisites that aren't always met).
>
> I agree with this sentiment, and have to point out that this is yet
> another fallout of the way our test suite is implemented. If you look
> e.g. at JUnit, there are no "setup test cases". There are specific setup
> steps that you can define, there is even a teardown step you can define,
> and those individual test cases? They can run in parallel, or
> randomized, and they run in their own sandbox, to make sure that they
> don't rely on side effects of unrelated test cases.
>
> We don't do that. We don't enforce the absence of side effects, and
> therefore we have a megaton of them.
>
> But note how this actually speaks _against_ separating the setup from
> the test? Because then you _explicitly_ want those test cases to rely on
> one another. Which flies in the _face_ of trying to disentangling them.

I agree that it is desirable to avoid side effects in the tests, but
I'd like to point out that I'm not at all sure that your conclusion
here is the only logical one to draw here in comparing to JUnit.  As
you point out, JUnit has clearly delineated setup steps for a test (as
well as teardown steps), providing a place to keep them separate.  Our
testsuite lacks that, so how do folks try to get it?  One logical way
would be just inlining the setup steps in the test outside a
test_expect_* block (which has been done in the past), but that has
even worse problems.  Another way, even if suboptimal, is placing
those steps in their own test_expect_* block.  You say just throw the
setup and test together, but that breaks the separation.

I think it's a case of the testsuite not providing the right
abstractions and enough capability, leaving us to argue over which
aspects of a more featureful test harness are most important to
emulate.  You clearly picked one, while I was focusing on another.
Anyway, all that said, I think I have a nice compromise that I'll send
out with V2.

[...]
> > > Makes sense, but the part that I am missing is
> > >
> > > test_path_is_file bar.c.t
> > >
> > > I.e. the _most_ important outcome of this test is: the rename was
> > > detected, and the added file was correctly placed into the target
> > > directory of the rename.
> >
> > That's a useful thing to add to the test, I'll add it.  (It's kind of
> > included in the 'git hash-object bar.c.t' half a dozen or so lines
> > above, but this line helps document the expectation a bit better.)
> >
> > I'd be very reticent to include only this test_path_is_file check, as
> > it makes no guarantee that it has the right contents or that we didn't
> > also keep around another copy in a/b/bar.c.t, etc.
>
> I agree that it has to strike a balance. There are multiple aspects you
> need to consider:
>
> - It needs to be easy to understand what the test case tries to ensure.
>
> - While it is important

Re: [ANNOUNCE] Git v2.24.0-rc0

2019-10-21 Thread Elijah Newren

 On Mon, Oct 21, 2019 at 1:50 PM Derrick Stolee  wrote:
> I ran a few of the performance tests against the Linux repository
> using v2.22.0, v2.23.0, and the new v2.24.0-rc0. I thought it worth
> pointing out that the drastic performance improvements are due to
> turning on the commit-graph by default. I had computed a commit-graph
> for my Linux repo, but used my global config to enable core.commitGraph.
> The global config is ignored by perf tests, so v2.22.0 and v2.23.0 were
> operating without looking at the commit-graph.
>
> (These were run on my old dev machine, which is now running Ubuntu on
> bare metal. No VM this time!)
>
> Test  v2.22.0 
>   v2.23.0 v2.24.0-rc0
> 
> 0001.1: rev-list --all6.01(5.73+0.28) 
>   5.99(5.73+0.25) -0.3%   0.97(0.80+0.16) -83.9%
> 0001.2: rev-list --all --objects  
> 40.40(39.86+0.54) 40.22(39.59+0.62) -0.4% 35.28(34.75+0.52) -12.7%
> 0001.3: rev-list --parents6.11(5.83+0.27) 
>   6.07(5.82+0.25) -0.7%   1.03(0.86+0.16) -83.1%
> 0001.5: rev-list -- dummy 0.64(0.58+0.06) 
>   0.66(0.59+0.07) +3.1%   0.34(0.29+0.05) -46.9%
> 0001.6: rev-list --parents -- dummy   0.66(0.60+0.05) 
>   0.67(0.62+0.05) +1.5%   0.36(0.32+0.03) -45.5%
[...]
> 4211.2: git rev-list --topo-order (baseline)  6.32(6.04+0.28) 
>   6.30(6.09+0.21) -0.3%   1.15(0.96+0.19) -81.8%
> 4211.3: git log --follow (baseline for -M)8.58(8.43+0.14) 
>   8.56(8.41+0.15) -0.2%   3.67(3.53+0.13) -57.2%
> 4211.4: git log -L (renames off)  
> 32.79(30.68+2.10) 32.80(30.69+2.11) +0.0% 27.17(25.24+1.93) -17.1%
> 4211.5: git log -L (renames on)   
> 212.64(210.39+2.24)   213.48(211.26+2.20) +0.4%   27.38(25.53+1.84) -87.1%

Many nice speedups here, not just commit-graph (the rev-list cases)
but also log -L (from sg/line-log-tree-diff-optim, I believe), and log
--follow.  I'm curious if the log --follow speedup comes from sg's
series or something else...

> 0001.9: rev-list --objects $commit --not --all0.08(0.05+0.03) 
>   0.08(0.05+0.03) +0.0%   0.09(0.07+0.02) +12.5%

Looks like this one increased too, with a similar magnitude to the
7300.2 you pointed out.  But the base is kinda small; is this just
noise?

> The tests below are some that I don't run very often, but seemed
> interesting. Interesting that rebase got a lot faster!
>
> Testv2.22.0   
> v2.23.0 v2.24.0-rc0
> ---
> 3400.2: rebase on top of a lot of unrelated changes 
> 18.86(17.80+1.71) 18.80(17.80+1.66) -0.3% 2.63(2.49+0.79) -86.1%
> 3400.4: rebase a lot of unrelated changes without split-index   
> 68.00(62.32+5.04) 68.50(62.34+5.30) +0.7% 45.25(41.37+4.18) -33.5%
> 3400.6: rebase a lot of unrelated changes with split-index  
> 46.39(44.89+2.19) 46.24(44.66+2.30) -0.3% 25.00(24.49+1.23) -46.1%

I'm also curious what change it was that made these rebase tests faster.

> 7300.2: clean many untracked sub dirs, check for nested git 
> 1.36(0.54+0.81)   1.35(0.51+0.82) -0.7%   1.53(0.62+0.90) +12.5%
[...]
> Any thoughts on 7300.2? Seems to not just be noise, or maybe it is?

Well, en/clean-nested-with-ignored is a very likely the cause of any
performance difference here, but given the nasty bug it was fixing
(see sg/clean-nested-repo-with-ignored topic), the performance change
is totally warranted if necessary for the fix.  And it looks like that
test is exercising one of the areas of logic that my series was
modifying (namely the clean -fd case in conjunction with the
possibility of nested .git dirs).

That's enough for me to accept the performance change.  If soemone
else wants to dig a little further to determine whether this perf
change was part of the important fix or just due to a separate change,
I'll provide a few pointers.  Assuming it's one of my commits, I think
it has to be one of the following three:

404ebceda01c ("dir: also check directories for matching pathspecs",
2019-09-17): if this one causes the perf change, I think we just suck
it up.

89a1f4aaf765 ("dir: if our pathspec might match files under a dir,
recurse into it", 2019-09-17): if this one causes the perf change, we
might be able to do something by somehow rearranging the if-block
logic.  Checking bits is going to be faster than calling the
get_dtype() f

Re: [PATCH v4 00/17] New sparse-checkout builtin and "cone" mode

2019-10-16 Thread Elijah Newren

On Tue, Oct 15, 2019 at 6:56 AM Derrick Stolee via GitGitGadget
 wrote:
> Updates in V4:
>
>  * Updated hashmap API usage to respond to ew/hashmap
>
>
>  * Responded to detailed review by Elijah. Thanks!
>
>
>  * Marked the feature as experimental in git-sparse-checkout.txt the same
>way that git-switch.txt does.

I read through the range-diff, and it all looks good to me other than
one issue I flagged on patch 1.

Nice work!

Re: [PATCH v4 01/17] sparse-checkout: create builtin with 'list' subcommand

2019-10-16 Thread Elijah Newren

On Tue, Oct 15, 2019 at 6:56 AM Derrick Stolee via GitGitGadget
 wrote:
> +DESCRIPTION
> +---
> +
> +Initialize and modify the sparse-checkout configuration, which reduces
> +the checkout to a set of directories given by a list of prefixes.
> +
> +THIS COMMAND IS EXPERIMENTAL. THE BEHAVIOR MAY CHANGE.

I think the wording needs to be a bit more detailed; you copied the
wording from git-switch.txt, but usage of git-switch is not expected
to modify the behavior of other commands.  sparse-checkout, by
contrast, is designed to affect other commands: at the very least
checkout & switch, and likely will affect grep, diff, log, and a host
of others.  Perhaps something like:

THIS COMMAND IS EXPERIMENTAL.  ITS BEHAVIOR, AND THE BEHAVIOR OF OTHER
COMMANDS IN THE PRESENCE OF SPARSE-CHECKOUTS, WILL LIKELY CHANGE IN
THE FUTURE.

Re: What's cooking in git.git (Oct 2019, #04; Tue, 15)

2019-10-15 Thread Elijah Newren

On Tue, Oct 15, 2019 at 6:25 PM Junio C Hamano  wrote:
>
> Elijah Newren  writes:
>
> >> * en/merge-recursive-directory-rename-fixes (2019-10-12) 2 commits
> >>   (merged to 'next' on 2019-10-15 at ebfdc3ff7b)
> >>  + merge-recursive: fix merging a subdirectory into the root directory
> >>  + merge-recursive: clean up get_renamed_dir_portion()
> >>
> >>  A few glitches in the heuristic in merge-recursive to infer file
> >>  movements based on movements of other files in the same directory
> >>  have been corrected.
> >>
> >>  Will merge to 'master'.
> >
> > I'm surprised this one was merged straight down to next; perhaps I
> > should have highlighted my plans a bit clearer in the thread?
>
> My mistake.  I am willing to revert the merge to give the topic a
> clean slate.  Just tell me so.

Yeah, let's revert it.

> > Also, a very minor point but "glitches" may be misleading; it suggests
> > (to me at least) a malfunction rather than a failure to trigger,...
>
> I used the word to mean a failure to trigger (after all, a heuristic
> that fails to trigger when most people would naturelly expect it to
> is showing a glitch in that case).  A better phrasing, please?

Oh, I guess I just had a different connotation for glitch.  I guess
what you had is fine then, but alternatively we could spell it out
just a little more:

When all files from some subdirectory were renamed to the root
directory, the directory rename heuristics would fail to detect that
as a rename/merge of the subdirectory to the root directory, which has
been corrected.

Re: What's cooking in git.git (Oct 2019, #04; Tue, 15)

2019-10-15 Thread Elijah Newren

On Tue, Oct 15, 2019 at 10:39 AM Johannes Schindelin
 wrote:
>
> Hi Elijah,
>
> On Tue, 15 Oct 2019, Elijah Newren wrote:
>
> > On Tue, Oct 15, 2019 at 2:04 AM Junio C Hamano  wrote:
> > > * en/fast-imexport-nested-tags (2019-10-04) 8 commits
> > >   (merged to 'next' on 2019-10-07 at 3e75779e10)
> > >  + fast-export: handle nested tags
> > >  + t9350: add tests for tags of things other than a commit
> > >  + fast-export: allow user to request tags be marked with --mark-tags
> > >  + fast-export: add support for --import-marks-if-exists
> > >  + fast-import: add support for new 'alias' command
> > >  + fast-import: allow tags to be identified by mark labels
> > >  + fast-import: fix handling of deleted tags
> > >  + fast-export: fix exporting a tag and nothing else
> > >
> > >  Updates to fast-import/export.
> >
> > Thanks!
> >
> > > * en/merge-recursive-directory-rename-fixes (2019-10-12) 2 commits
> > >   (merged to 'next' on 2019-10-15 at ebfdc3ff7b)
> > >  + merge-recursive: fix merging a subdirectory into the root directory
> > >  + merge-recursive: clean up get_renamed_dir_portion()
> > >
> > >  A few glitches in the heuristic in merge-recursive to infer file
> > >  movements based on movements of other files in the same directory
> > >  have been corrected.
> > >
> > >  Will merge to 'master'.
> >
> > I'm surprised this one was merged straight down to next; perhaps I
> > should have highlighted my plans a bit clearer in the thread?  I did
> > mention (at the end of an email) at [1], that
> >
> > "Oh, and I think there's another place in the code that needs to be
> > tweaked to make sure we handle renaming subdirectories into the root
> > directory that I missed (and just wasn't tested by this testcase), so
> > I'll check into it and if so fix the code and add another testcase,
> > and include the fixups I agreed to above and send out a v2.  Probably
> > won't get to it until the middle of next week, though."
> >
> > So, I guess I'll submit a fixup patch on top instead, either later
> > today or tomorrow.
> >
> > Also, a very minor point but "glitches" may be misleading; it suggests
> > (to me at least) a malfunction rather than a failure to trigger, and
> > it's really only the special case of renaming/merging of a directory
> > into the root directory that the previous heuristics failed to detect.
> > The rest of the fixes were make-the-code-clearer (there were a couple
> > places in the code that were technically correct but quite misleading
> > and hard to reason about).
>
> I also offered several comments that the regression tests could be
> condensed into easier-to-understand ones.

Part of that was (obliquely) referenced in my above quote ("the fixups
I agreed to above"), and I was also going to respond to your follow up
and add a few more changes based on it.

I thought I'd be doing that while this series was still in pu, and
that this series would probably not make it into 2.24.0, but since
Junio quickly merged this down to next and says he plans to merge
down, I'm thinking right now it may make more sense to make a minimal
change to what he has merged down to get the functionality right, and
then start a new topic that addresses testcase restructuring overhauls
of t6042, t6043, and t6046 -- restructurings that we'll probably
continue to argue about but may be able to find some common ground on.

I'll respond in more detail on the testcase restructuring stuff in a
few days after I get a few other things out of the way.

> Ciao,
> Dscho
>
> > [1] 
> > https://public-inbox.org/git/CABPp-BFNCLJnt4NgFKVxURBGD1Z00gastc5q4ZPjcHmwS=k...@mail.gmail.com/
> >

Re: What's cooking in git.git (Oct 2019, #04; Tue, 15)

2019-10-15 Thread Elijah Newren

On Tue, Oct 15, 2019 at 2:04 AM Junio C Hamano  wrote:
> * en/fast-imexport-nested-tags (2019-10-04) 8 commits
>   (merged to 'next' on 2019-10-07 at 3e75779e10)
>  + fast-export: handle nested tags
>  + t9350: add tests for tags of things other than a commit
>  + fast-export: allow user to request tags be marked with --mark-tags
>  + fast-export: add support for --import-marks-if-exists
>  + fast-import: add support for new 'alias' command
>  + fast-import: allow tags to be identified by mark labels
>  + fast-import: fix handling of deleted tags
>  + fast-export: fix exporting a tag and nothing else
>
>  Updates to fast-import/export.

Thanks!

> * en/merge-recursive-directory-rename-fixes (2019-10-12) 2 commits
>   (merged to 'next' on 2019-10-15 at ebfdc3ff7b)
>  + merge-recursive: fix merging a subdirectory into the root directory
>  + merge-recursive: clean up get_renamed_dir_portion()
>
>  A few glitches in the heuristic in merge-recursive to infer file
>  movements based on movements of other files in the same directory
>  have been corrected.
>
>  Will merge to 'master'.

I'm surprised this one was merged straight down to next; perhaps I
should have highlighted my plans a bit clearer in the thread?  I did
mention (at the end of an email) at [1], that

"Oh, and I think there's another place in the code that needs to be
tweaked to make sure we handle renaming subdirectories into the root
directory that I missed (and just wasn't tested by this testcase), so
I'll check into it and if so fix the code and add another testcase,
and include the fixups I agreed to above and send out a v2.  Probably
won't get to it until the middle of next week, though."

So, I guess I'll submit a fixup patch on top instead, either later
today or tomorrow.

Also, a very minor point but "glitches" may be misleading; it suggests
(to me at least) a malfunction rather than a failure to trigger, and
it's really only the special case of renaming/merging of a directory
into the root directory that the previous heuristics failed to detect.
The rest of the fixes were make-the-code-clearer (there were a couple
places in the code that were technically correct but quite misleading
and hard to reason about).

[1] 
https://public-inbox.org/git/CABPp-BFNCLJnt4NgFKVxURBGD1Z00gastc5q4ZPjcHmwS=k...@mail.gmail.com/

Re: [PATCH 2/2] merge-recursive: fix merging a subdirectory into the root directory

2019-10-12 Thread Elijah Newren

Hi Dscho,

Thanks for the reviews!

On Sat, Oct 12, 2019 at 1:37 PM Johannes Schindelin
 wrote:
> On Fri, 11 Oct 2019, Elijah Newren via GitGitGadget wrote:
>
[...]
> > @@ -1980,6 +1990,25 @@ static void get_renamed_dir_portion(const char 
> > *old_path, const char *new_path,
> >   *end_of_old == *end_of_new)
> >   return; /* We haven't modified *old_dir or *new_dir yet. */
> >
> > + /*
> > +  * If end_of_new got back to the beginning of its string, and
> > +  * end_of_old got back to the beginning of some subdirectory, then
> > +  * we have a rename/merge of a subdirectory into the root, which
> > +  * needs slightly special handling.
> > +  *
> > +  * Note: There is no need to consider the opposite case, with a
> > +  * rename/merge of the root directory into some subdirectory.
> > +  * Our rule elsewhere that a directory which still exists is not
> > +  * considered to have been renamed means the root directory can
> > +  * never be renamed (because the root directory always exists).
> > +  */
> > + if (end_of_new == new_path &&
> > + end_of_old != old_path && end_of_old[-1] == '/') {
> > + *old_dir = xstrndup(old_path, end_of_old-1 - old_path);
> > + *new_dir = xstrndup(new_path, end_of_new - new_path);
>
> However, here we write something convoluted that essentially amounts to
> `xstrdup("")`. I would rather have that simple call than the convoluted
> one that would puzzle me every time I have to look at this part of the
> code.

Makes sense; I can switch it over.

>
> While at it, would you mind either surrounding the `-` and the `1` by
> spaces, or even write `--end_of_old - old_path`?

Sounds good to me; I'll make the change.

> > diff --git a/t/t6043-merge-rename-directories.sh 
> > b/t/t6043-merge-rename-directories.sh
> > index c966147d5d..b920bb0850 100755
> > --- a/t/t6043-merge-rename-directories.sh
> > +++ b/t/t6043-merge-rename-directories.sh
> > @@ -4051,6 +4051,62 @@ test_expect_success '12c-check: Moving one directory 
> > hierarchy into another w/ c
> >   )
> >  '
> >
> > +# Testcase 12d, Rename/merge of subdirectory into the root
> > +#   Commit O: a/b/{foo.c}
> > +#   Commit A: foo.c
> > +#   Commit B: a/b/{foo.c,bar.c}
> > +#   Expected: a/b/{foo.c,bar.c}

Note the nice explanation of the testcase setup at the beginning of
every test within this file...

> > +
> > +test_expect_success '12d-setup: Rename (merge) of subdirectory into the 
> > root' '
> > + test_create_repo 12d &&
> > + (
> > + cd 12d &&
> > +
> > + mkdir -p a/b/subdir &&
> > + test_commit a/b/subdir/foo.c &&
>
> Why `.c`? That's a little distracting.

I can toss it.

> > +
> > + git branch O &&
>
> Might be simpler just to use `master` subsequently and not "waste" a new
> ref on that.

I could do so, but then this makes the testcase description comment
earlier harder to read comparing "master", "A", and "B".  Having the
same length simplifies it a bit, and the triple of O, A, and B are
also used quite a bit in merge-recursive.c (e.g. in process_entry()
and merge_threeway() and other places).

Also, behaving differently for this test than the other 50+ tests in
the testfile would break the comment at the beginning of t6043 which
explains how *every* test in the file is of a certain form, using O,
A, and B.

>
> > + git branch A &&
>
> Might make more sense to create it below, via the `-b` option of `git
> checkout`.
>
> Or, for extra brownie points, via the `-c` option of `git switch`.
>
> > + git branch B &&
>
> Likewise, this might want to be created below, via replacing `git
> checkout B` with `git switch -c B master`.

I'm not sure I see why it'd be beneficial to switch this, though in
isolation I also don't see any drawbacks with your suggestion either.
It looks entirely reasonable, so I'd probably just do it if it weren't
for the fact that there are four dozen or so other tests in the same
file that already do it this way.  I'd rather keep the file internally
consistent, and there's a bit too much inertia for me to want to
switch all the tests over...unless you can provide a reason to
strongly prefer one style over the other?

> > +
> > + git checkout A &&
> > + mkdir subdir &&
> > +

Re: [PATCH v3 00/17] New sparse-checkout builtin and "cone" mode

2019-10-12 Thread Elijah Newren

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
 wrote:
>
> This series makes the sparse-checkout feature more user-friendly. While
> there, I also present a way to use a limited set of patterns to gain a
> significant performance boost in very large repositories.
>
> Sparse-checkout is only documented as a subsection of the read-tree docs
> [1], which makes the feature hard to discover. Users have trouble navigating
> the feature, especially at clone time [2], and have even resorted to
> creating their own helper tools [3].
>
> This series attempts to solve these problems using a new builtin. Here is a
> sample workflow to give a feeling for how it can work:
>
> In an existing repo:
>
> $ git sparse-checkout init
> $ ls
> myFile1.txt myFile2.txt
> $ git sparse-checkout set "/*" "!/*/" /myFolder/
> $ ls
> myFile1.txt myFile2.txt myFolder
> $ ls myFolder
> a.c a.h
> $ git sparse-checkout disable
> $ ls
> hiddenFolder myFile1.txt myFile2.txt myFolder
>
> At clone time:
>
> $ git clone --sparse origin repo
> $ cd repo
> $ ls
> myFile1.txt myFile2.txt
> $ git sparse-checkout set "/*" "!/*/" /myFolder/
> $ ls
> myFile1.txt myFile2.txt myFolder
>
> Here are some more specific details:
>
>  * git sparse-checkout init enables core.sparseCheckout and populates the
>sparse-checkout file with patterns that match only the files at root.
>
>
>  * git clone learns the --sparse argument to run git sparse-checkout init
>before the first checkout.
>
>
>  * git sparse-checkout set reads patterns from the arguments, or with
>--stdin reads patterns from stdin one per line, then writes them to the
>sparse-checkout file and refreshes the working directory.
>
>
>  * git sparse-checkout disable removes the patterns from the sparse-checkout
>file, disables core.sparseCheckout, and refills the working directory.
>
>
>  * git sparse-checkout list lists the contents of the sparse-checkout file.
>
>
>
> The documentation for the sparse-checkout feature can now live primarily
> with the git-sparse-checkout documentation.
>
> Cone Mode
> =
>
> What really got me interested in this area is a performance problem. If we
> have N patterns in the sparse-checkout file and M entries in the index, then
> we can perform up to O(N * M) pattern checks in clear_ce_flags(). This
> quadratic growth is not sustainable in a repo with 1,000+ patterns and
> 1,000,000+ index entries.
>
> To solve this problem, I propose a new, more restrictive mode to
> sparse-checkout: "cone mode". In this mode, all patterns are based on prefix
> matches at a directory level. This can then use hashsets for fast
> performance -- O(M) instead of O(N*M). My hashset implementation is based on
> the virtual filesystem hook in the VFS for Git custom code [4].
>
> In cone mode, a user specifies a list of folders which the user wants every
> file inside. In addition, the cone adds all blobs that are siblings of the
> folders in the directory path to that folder. This makes the directories
> look "hydrated" as a user drills down to those recursively-closed folders.
> These directories are called "parent" folders, as a file matches them only
> if the file's immediate parent is that directory.
>
> When building a prototype of this feature, I used a separate file to contain
> the list of recursively-closed folders and built the hashsets dynamically
> based on that file. In this implementation, I tried to maximize the amount
> of backwards-compatibility by storing all data in the sparse-checkout file
> using patterns recognized by earlier Git versions.
>
> For example, if we add A/B/C as a recursive folder, then we add the
> following patterns to the sparse-checkout file:
>
> /*
> !/*/
> /A/
> !/A/*/
> /A/B/
> !/A/B/*/
> /A/B/C/
>
> The alternating positive/negative patterns say "include everything in this
> folder, but exclude everything another level deeper". The final pattern has
> no matching negation, so is a recursively closed pattern.
>
> Note that I have some basic warnings to try and check that the
> sparse-checkout file doesn't match what would be written by a cone-mode add.
> In such a case, Git writes a warning to stderr and continues with the old
> pattern matching algorithm. These checks are currently very barebones, and
> would need to be updated with more robust checks for things like regex
> characters in the middle of the pattern. As review moves forward (and if we
> don't change the data storage) then we could spend more time on this.
>
> Thanks, -Stolee
>
> Updates in v2, relative to the RFC:
>
>  * Instead of an 'add' subcommand, use a 'set' subcommand. We can consider
>adding 'add' and/or 'remove' subcommands later.
>
>
>  * 'set' reads from the arguments by default. '--stdin' option is available.
>
>
>  * A new performance-oriented commit is added at the end.
>
>
>  * Patterns no longer end with a trailing asterisk except for the first "/*"
>pattern.
>
>
>  * References to a "bug" (that was really a strange GVF

Re: [PATCH v3 17/17] sparse-checkout: cone mode should not interact with .gitignore

2019-10-12 Thread Elijah Newren

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> During the development of the sparse-checkout "cone mode" feature,
> an incorrect placement of the initializer for "use_cone_patterns = 1"
> caused warnings to show up when a .gitignore file was present with
> non-cone-mode patterns. This was fixed in the original commit
> introducing the cone mode, but now we should add a test to avoid
> hitting this problem again in the future.
>
> Signed-off-by: Derrick Stolee 
> ---
>  t/t1091-sparse-checkout-builtin.sh | 7 +++
>  1 file changed, 7 insertions(+)
>
> diff --git a/t/t1091-sparse-checkout-builtin.sh 
> b/t/t1091-sparse-checkout-builtin.sh
> index f22a4afbea..ed9355384a 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -269,4 +269,11 @@ test_expect_success 'fail when lock is taken' '
> test_i18ngrep "File exists" err
>  '
>
> +test_expect_success '.gitignore should not warn about cone mode' '
> +   git -C repo config --worktree core.sparseCheckoutCone true &&
> +   echo "**/bin/*" >repo/.gitignore &&
> +   git -C repo reset --hard 2>err &&
> +   test_i18ngrep ! "disabling cone patterns" err
> +'
> +
>  test_done
> --

Makes sense; thanks for adding good preventative tests.

Re: [PATCH v3 16/17] sparse-checkout: write using lockfile

2019-10-12 Thread Elijah Newren

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> If two 'git sparse-checkout set' subcommands are launched at the
> same time, the behavior can be unexpected as they compete to write
> the sparse-checkout file and update the working directory.
>
> Take a lockfile around the writes to the sparse-checkout file. In
> addition, acquire this lock around the working directory update
> to avoid two commands updating the working directory in different
> ways.

Wow, there's something I never would have thought to check.  Did you
have folks run into this, or is this just some defensive programming?
Either way, I'm impressed.

>
> Signed-off-by: Derrick Stolee 
> ---
>  builtin/sparse-checkout.c  | 15 ---
>  t/t1091-sparse-checkout-builtin.sh |  7 +++
>  2 files changed, 19 insertions(+), 3 deletions(-)
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 542d57fac6..9b313093cd 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -308,6 +308,8 @@ static int write_patterns_and_update(struct pattern_list 
> *pl)
>  {
> char *sparse_filename;
> FILE *fp;
> +   int fd;
> +   struct lock_file lk = LOCK_INIT;
> int result;
>
> if (!core_apply_sparse_checkout) {
> @@ -317,21 +319,28 @@ static int write_patterns_and_update(struct 
> pattern_list *pl)
>
> result = update_working_directory(pl);
>
> +   sparse_filename = get_sparse_checkout_filename();
> +   fd = hold_lock_file_for_update(&lk, sparse_filename,
> + LOCK_DIE_ON_ERROR);
> +
> +   result = update_working_directory(pl);
> if (result) {
> +   rollback_lock_file(&lk);
> +   free(sparse_filename);
> clear_pattern_list(pl);
> update_working_directory(NULL);
> return result;
> }
>
> -   sparse_filename = get_sparse_checkout_filename();
> -   fp = fopen(sparse_filename, "w");
> +   fp = fdopen(fd, "w");
>
> if (core_sparse_checkout_cone)
> write_cone_to_file(fp, pl);
> else
> write_patterns_to_file(fp, pl);
>
> -   fclose(fp);
> +   fflush(fp);
> +   commit_lock_file(&lk);
>
> free(sparse_filename);
> clear_pattern_list(pl);
> diff --git a/t/t1091-sparse-checkout-builtin.sh 
> b/t/t1091-sparse-checkout-builtin.sh
> index 82eb5fb2f8..f22a4afbea 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -262,4 +262,11 @@ test_expect_success 'revert to old sparse-checkout on 
> bad update' '
> test_cmp dir expect
>  '
>
> +test_expect_success 'fail when lock is taken' '
> +   test_when_finished rm -rf repo/.git/info/sparse-checkout.lock &&
> +   touch repo/.git/info/sparse-checkout.lock &&
> +   test_must_fail git -C repo sparse-checkout set deep 2>err &&
> +   test_i18ngrep "File exists" err
> +'
> +
>  test_done
> --
> gitgitgadget
>

Re: [PATCH v3 15/17] sparse-checkout: update working directory in-process

2019-10-12 Thread Elijah Newren

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The sparse-checkout builtin used 'git read-tree -mu HEAD' to update the
> skip-worktree bits in the index and to update the working directory.
> This extra process is overly complex, and prone to failure. It also
> requires that we write our changes to the sparse-checkout file before
> trying to update the index.
>
> Remove this extra process call by creating a direct call to
> unpack_trees() in the same way 'git read-tree -mu HEAD' does. In
> adition, provide an in-memory list of patterns so we can avoid

s/adition/addition/

> reading from the sparse-checkout file. This allows us to test a
> proposed change to the file before writing to it.
>
> Signed-off-by: Derrick Stolee 
> ---
>  builtin/read-tree.c|  2 +-
>  builtin/sparse-checkout.c  | 85 +-
>  t/t1091-sparse-checkout-builtin.sh | 17 ++
>  unpack-trees.c |  5 +-
>  unpack-trees.h |  3 +-
>  5 files changed, 95 insertions(+), 17 deletions(-)
>
> diff --git a/builtin/read-tree.c b/builtin/read-tree.c
> index 69963d83dc..d7eeaa26ec 100644
> --- a/builtin/read-tree.c
> +++ b/builtin/read-tree.c
> @@ -186,7 +186,7 @@ int cmd_read_tree(int argc, const char **argv, const char 
> *cmd_prefix)
>
> if (opts.reset || opts.merge || opts.prefix) {
> if (read_cache_unmerged() && (opts.prefix || opts.merge))
> -   die("You need to resolve your current index first");
> +   die(_("You need to resolve your current index 
> first"));

A good change, but isn't this unrelated to the current commit?

> stage = opts.merge = 1;
> }
> resolve_undo_clear();
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 25786f8bb0..542d57fac6 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -7,6 +7,11 @@
>  #include "run-command.h"
>  #include "strbuf.h"
>  #include "string-list.h"
> +#include "cache.h"
> +#include "cache-tree.h"
> +#include "lockfile.h"
> +#include "resolve-undo.h"
> +#include "unpack-trees.h"
>
>  static char const * const builtin_sparse_checkout_usage[] = {
> N_("git sparse-checkout [init|list|set|disable] "),
> @@ -60,18 +65,53 @@ static int sparse_checkout_list(int argc, const char 
> **argv)
> return 0;
>  }
>
> -static int update_working_directory(void)
> +static int update_working_directory(struct pattern_list *pl)
>  {
> -   struct argv_array argv = ARGV_ARRAY_INIT;
> int result = 0;
> -   argv_array_pushl(&argv, "read-tree", "-m", "-u", "HEAD", NULL);
> +   struct unpack_trees_options o;
> +   struct lock_file lock_file = LOCK_INIT;
> +   struct object_id oid;
> +   struct tree *tree;
> +   struct tree_desc t;
> +   struct repository *r = the_repository;
>
> -   if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
> -   error(_("failed to update index with new sparse-checkout 
> paths"));
> -   result = 1;
> +   if (repo_read_index_unmerged(r))
> +   die(_("You need to resolve your current index first"));

Well, at least that ensures that the user gets a good error message.
I'm not sure I like the error, because e.g. if a user hits a conflict
while merging in a sparse checkout and wants to return to a non-sparse
checkout because they think other files might help them resolve the
conflicts, then they ought to be able to do it.  Basically, unless
they are trying use sparsification to remove entries from the working
directory that differ from the index (and conflicted entries always
differ), then it seems like we should be able to support
sparsification despite the presence of conflicts.

Your series is long enough, doesn't make this problem any worse (and
appears to make it slightly better), and so you really don't need to
tackle that problem in this series. I'm just stating a gripe with
sparse checkouts again.  :-)

[...]

>  static void insert_recursive_pattern(struct pattern_list *pl, struct strbuf 
> *path)
>  {
> -   struct pattern_entry *e = xmalloc(sizeof(struct pattern_entry));
> +   struct pattern_entry *e = xmalloc(sizeof(*e));

This is a good fix, but shouldn't it be squashed into the
"sparse-checkout: init and set in cone mode" commit from earlier in
your series?

> @@ -262,12 +308,21 @@ static int write_patterns_and_update(struct 
> pattern_list *pl)
>  {
> char *sparse_filename;
> FILE *fp;
> -
> +   int result;
> +

Trailing whitespace that should be cleaned up.

> if (!core_apply_sparse_checkout) {
> warning(_("core.sparseCheckout is disabled, so changes to the 
> sparse-checkout file will have no effect"));
> warning(_("run 'git sparse-checkout init' to enable the 
> sparse-checkout feature"));
> }
>
> +   result = update_working

Re: [PATCH v3 13/17] read-tree: show progress by default

2019-10-12 Thread Elijah Newren

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The read-tree builtin has a --verbose option that signals to show
> progress and other data while updating the index. Update this to
> be on by default when stderr is a terminal window.
>
> This will help tools like 'git sparse-checkout' to automatically
> benefit from progress indicators when a user runs these commands.

This change seems fine, but in patch 2 you said:

> The use of running another process for 'git read-tree' is sub-
> optimal. This will be removed in a later change.

leaving me slightly confused about the goal/plan.

Re: [PATCH v3 04/17] sparse-checkout: 'set' subcommand

2019-10-11 Thread Elijah Newren

On Fri, Oct 11, 2019 at 3:26 PM Elijah Newren  wrote:
>
> On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget

> Looks good, thanks for the fixes.  I'm still slightly worried about
> folks not looking at the docs and calling sparse-checkout set without
> calling init, and then being negatively surprised.  It's a minor
> issue, but a warning might be helpful.

Looks like you added that to patch 5, so nevermind.

Re: [PATCH v3 05/17] sparse-checkout: add '--stdin' option to set subcommand

2019-10-11 Thread Elijah Newren

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The 'git sparse-checkout set' subcommand takes a list of patterns
> and places them in the sparse-checkout file. Then, it updates the
> working directory to match those patterns. For a large list of
> patterns, the command-line call can get very cumbersome.
>
> Add a '--stdin' option to instead read patterns over standard in.
>
> Signed-off-by: Derrick Stolee 
> ---
>  builtin/sparse-checkout.c  | 40 --
>  t/t1091-sparse-checkout-builtin.sh | 27 
>  2 files changed, 65 insertions(+), 2 deletions(-)
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 52d4f832f3..68f3d8433e 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -145,6 +145,11 @@ static int write_patterns_and_update(struct pattern_list 
> *pl)
> char *sparse_filename;
> FILE *fp;
>
> +   if (!core_apply_sparse_checkout) {
> +   warning(_("core.sparseCheckout is disabled, so changes to the 
> sparse-checkout file will have no effect"));
> +   warning(_("run 'git sparse-checkout init' to enable the 
> sparse-checkout feature"));
> +   }
> +
> sparse_filename = get_sparse_checkout_filename();
> fp = fopen(sparse_filename, "w");
> write_patterns_to_file(fp, pl);
> @@ -154,16 +159,47 @@ static int write_patterns_and_update(struct 
> pattern_list *pl)
> return update_working_directory();
>  }
>
> +static char const * const builtin_sparse_checkout_set_usage[] = {
> +   N_("git sparse-checkout set [--stdin|]"),
> +   NULL
> +};
> +
> +static struct sparse_checkout_set_opts {
> +   int use_stdin;
> +} set_opts;
> +
>  static int sparse_checkout_set(int argc, const char **argv, const char 
> *prefix)
>  {
> static const char *empty_base = "";
> int i;
> struct pattern_list pl;
> int result;
> +
> +   static struct option builtin_sparse_checkout_set_options[] = {
> +   OPT_BOOL(0, "stdin", &set_opts.use_stdin,
> +N_("read patterns from standard in")),
> +   OPT_END(),
> +   };
> +
> memset(&pl, 0, sizeof(pl));
>
> -   for (i = 1; i < argc; i++)
> -   add_pattern(argv[i], empty_base, 0, &pl, 0);
> +   argc = parse_options(argc, argv, prefix,
> +builtin_sparse_checkout_set_options,
> +builtin_sparse_checkout_set_usage,
> +PARSE_OPT_KEEP_UNKNOWN);

Does this mean users can also spell it 'git sparse-checkout --stdin
set', instead of the expected 'git sparse-checkout set --stdin'?

> +
> +   if (set_opts.use_stdin) {
> +   struct strbuf line = STRBUF_INIT;
> +
> +   while (!strbuf_getline(&line, stdin)) {
> +   size_t len;
> +   char *buf = strbuf_detach(&line, &len);
> +   add_pattern(buf, empty_base, 0, &pl, 0);
> +   }
> +   } else {
> +   for (i = 0; i < argc; i++)
> +   add_pattern(argv[i], empty_base, 0, &pl, 0);
> +   }
>
> result = write_patterns_and_update(&pl);
>
> diff --git a/t/t1091-sparse-checkout-builtin.sh 
> b/t/t1091-sparse-checkout-builtin.sh
> index 19e8673c6b..2a0137fde3 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -101,6 +101,13 @@ test_expect_success 'clone --sparse' '
> test_cmp expect dir
>  '
>
> +test_expect_success 'warn if core.sparseCheckout is disabled' '
> +   test_when_finished git -C repo config --worktree core.sparseCheckout 
> true &&
> +   git -C repo config --worktree core.sparseCheckout false &&
> +   git -C repo sparse-checkout set folder1 2>err &&
> +   test_i18ngrep "core.sparseCheckout is disabled" err
> +'
> +
>  test_expect_success 'set sparse-checkout using builtin' '
> git -C repo sparse-checkout set "/*" "!/*/" "*folder*" &&
> cat >expect <<-EOF &&
> @@ -120,4 +127,24 @@ test_expect_success 'set sparse-checkout using builtin' '
> test_cmp expect dir
>  '
>
> +test_expect_success 'set sparse-checkout using --stdin' '
> +   cat >expect <<-EOF &&
> +   /*
> +   !/*/
> +   /folder1/
> +   /folder2/
> +   EOF
> +   git -C repo sparse-checkout set --stdin  +   git -C repo sparse-checkout list >actual &&
> +   test_cmp expect actual &&
> +   test_cmp expect repo/.git/info/sparse-checkout &&
> +   ls repo >dir  &&
> +   cat >expect <<-EOF &&
> +   a
> +   folder1
> +   folder2
> +   EOF
> +   test_cmp expect dir
> +'
> +
>  test_done
> --
> gitgitgadget
>

Re: [PATCH v3 04/17] sparse-checkout: 'set' subcommand

2019-10-11 Thread Elijah Newren

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The 'git sparse-checkout set' subcommand takes a list of patterns
> as arguments and writes them to the sparse-checkout file. Then, it
> updates the working directory using 'git read-tree -mu HEAD'.
>
> The 'set' subcommand will replace the entire contents of the
> sparse-checkout file. The write_patterns_and_update() method is
> extracted from cmd_sparse_checkout() to make it easier to implement
> 'add' and/or 'remove' subcommands in the future.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/git-sparse-checkout.txt |  5 
>  builtin/sparse-checkout.c | 35 ++-
>  t/t1091-sparse-checkout-builtin.sh| 19 +++
>  3 files changed, 58 insertions(+), 1 deletion(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt 
> b/Documentation/git-sparse-checkout.txt
> index e095c4a98b..f4bd951550 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -39,6 +39,11 @@ and sets the `core.sparseCheckout` setting in the 
> worktree-specific config
>  file. This prevents the sparse-checkout feature from interfering with other
>  worktrees.
>
> +'set'::
> +   Write a set of patterns to the sparse-checkout file, as given as
> +   a list of arguments following the 'set' subcommand. Update the
> +   working directory to match the new patterns.
> +
>  SPARSE CHECKOUT
>  
>
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 3ecb7ac2e7..52d4f832f3 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -8,7 +8,7 @@
>  #include "strbuf.h"
>
>  static char const * const builtin_sparse_checkout_usage[] = {
> -   N_("git sparse-checkout [init|list]"),
> +   N_("git sparse-checkout [init|list|set] "),
> NULL
>  };
>
> @@ -140,6 +140,37 @@ static int sparse_checkout_init(int argc, const char 
> **argv)
> return update_working_directory();
>  }
>
> +static int write_patterns_and_update(struct pattern_list *pl)
> +{
> +   char *sparse_filename;
> +   FILE *fp;
> +
> +   sparse_filename = get_sparse_checkout_filename();
> +   fp = fopen(sparse_filename, "w");
> +   write_patterns_to_file(fp, pl);
> +   fclose(fp);
> +   free(sparse_filename);
> +
> +   return update_working_directory();
> +}
> +
> +static int sparse_checkout_set(int argc, const char **argv, const char 
> *prefix)
> +{
> +   static const char *empty_base = "";
> +   int i;
> +   struct pattern_list pl;
> +   int result;
> +   memset(&pl, 0, sizeof(pl));
> +
> +   for (i = 1; i < argc; i++)
> +   add_pattern(argv[i], empty_base, 0, &pl, 0);
> +
> +   result = write_patterns_and_update(&pl);
> +
> +   clear_pattern_list(&pl);
> +   return result;
> +}
> +
>  int cmd_sparse_checkout(int argc, const char **argv, const char *prefix)
>  {
> static struct option builtin_sparse_checkout_options[] = {
> @@ -162,6 +193,8 @@ int cmd_sparse_checkout(int argc, const char **argv, 
> const char *prefix)
> return sparse_checkout_list(argc, argv);
> if (!strcmp(argv[0], "init"))
> return sparse_checkout_init(argc, argv);
> +   if (!strcmp(argv[0], "set"))
> +   return sparse_checkout_set(argc, argv, prefix);
> }
>
> usage_with_options(builtin_sparse_checkout_usage,
> diff --git a/t/t1091-sparse-checkout-builtin.sh 
> b/t/t1091-sparse-checkout-builtin.sh
> index d4c145a3af..19e8673c6b 100755
> --- a/t/t1091-sparse-checkout-builtin.sh
> +++ b/t/t1091-sparse-checkout-builtin.sh
> @@ -101,4 +101,23 @@ test_expect_success 'clone --sparse' '
> test_cmp expect dir
>  '
>
> +test_expect_success 'set sparse-checkout using builtin' '
> +   git -C repo sparse-checkout set "/*" "!/*/" "*folder*" &&
> +   cat >expect <<-EOF &&
> +   /*
> +   !/*/
> +   *folder*
> +   EOF
> +   git -C repo sparse-checkout list >actual &&
> +   test_cmp expect actual &&
> +   test_cmp expect repo/.git/info/sparse-checkout &&
> +   ls repo >dir  &&
> +   cat >expect <<-EOF &&
> +   a
> +   folder1
> +   folder2
> +   EOF
> +   test_cmp expect dir
> +'
> +
>  test_done
> --

Looks good, thanks for the fixes.  I'm still slightly worried about
folks not looking at the docs and calling sparse-checkout set without
calling init, and then being negatively surprised.  It's a minor
issue, but a warning might be helpful.

Re: [PATCH v2 04/11] sparse-checkout: 'set' subcommand

2019-10-11 Thread Elijah Newren

On Mon, Oct 7, 2019 at 11:26 AM Derrick Stolee  wrote:
>
> On 10/5/2019 8:30 PM, Elijah Newren wrote:
> > On Sat, Oct 5, 2019 at 3:44 PM Elijah Newren  wrote:
> >>
> >> On Thu, Sep 19, 2019 at 3:07 PM Derrick Stolee via GitGitGadget
> >>  wrote:
> >>> +static int write_patterns_and_update(struct pattern_list *pl)
> >>> +{
> >>> +   char *sparse_filename;
> >>> +   FILE *fp;
> >>> +
> >>> +   sparse_filename = get_sparse_checkout_filename();
> >>> +   fp = fopen(sparse_filename, "w");
> >>> +   write_patterns_to_file(fp, pl);
> >>> +   fclose(fp);
> >>> +   free(sparse_filename);
> >>> +
> >>> +   clear_pattern_list(pl);
> >>
> >> It seems slightly odd that pl is passed in but cleared in this
> >> function rather than in the caller that created pl.  Should this be
> >> moved to the caller, or, alternatively, a comment added to explain
> >> this side-effect for future callers of the function?
> >>
> >> The rest of the patch looked good to me.
> >
> > Actually, thought of something else.  What if the user calls 'git
> > sparse-checkout set ...' without first calling 'git sparse-checkout
> > init'?  Should that report an error to the user, a suggestion to
> > follow it up with 'sparse-checkout init', or should it just call
> > sc_set_config() behind the scenes and allow bypassing the init
> > subcommand?
>
> Maybe a warning would suffice. I still think the workflow of the
> following is most correct, and not difficult to recommend:
>
> * "git sparse-checkout init [--cone]" -OR- "git clone --sparse"
> * git sparse-checkout set [stuff]
> * git sparse-checkout disable

Recommending the right thing is easy, but users will call things out
of order despite documentation.  If they call disable before init, I
see no problems that will lead to confusion.  If they call set without
calling init, I can see them being surprised...so I commented on it
and asked if we want a warning or whatever.

Re: [PATCH v3 03/17] clone: add --sparse mode

2019-10-11 Thread Elijah Newren

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
 wrote:
> During the 'git sparse-checkout init' call, we must first look
> to see if HEAD is valid, since 'git clone' does not have a valid
> HEAD.

...does not have a valid HEAD by the time git_sparse_checkout_init() is called?

> The first checkout will create the HEAD ref and update the
> working directory correctly.

Is this checkout you reference a manual-initiated user checkout after
the clone, or the checkout performed as part of the clone?  (I'm
almost certain it's the latter, but your wording makes me question.)

Re: [PATCH v3 02/17] sparse-checkout: create 'init' subcommand

2019-10-11 Thread Elijah Newren

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
 wrote:
> ++
> +The init subcommand also enables the 'extensions.worktreeConfig' setting
> +and sets the `core.sparseCheckout` setting in the worktree-specific config
> +file. This prevents the sparse-checkout feature from interfering with other
> +worktrees.

I'm afraid that might be mis-parsed by future readers.  Perhaps something like:

The init subcommand also enables the `core.sparseCheckout` setting.
To avoid interfering with other worktrees, it first enables the
`extensions.worktreeConfig` setting and makes sure to set the
`core.sparseCheckout` setting in the worktree-specific config file.

> +enum sparse_checkout_mode {
> +   MODE_NONE = 0,
> +   MODE_FULL = 1,
> +};

So MODE_FULL is "true" and MODE_NONE is "false".  MODE_NONE seems
confusing to me, but let's keep reading...

> +
> +static int sc_set_config(enum sparse_checkout_mode mode)
> +{
> +   struct argv_array argv = ARGV_ARRAY_INIT;
> +
> +   if (git_config_set_gently("extensions.worktreeConfig", "true")) {
> +   error(_("failed to set extensions.worktreeConfig setting"));
> +   return 1;
> +   }
> +
> +   argv_array_pushl(&argv, "config", "--worktree", 
> "core.sparseCheckout", NULL);
> +
> +   if (mode)
> +   argv_array_pushl(&argv, "true", NULL);
> +   else
> +   argv_array_pushl(&argv, "false", NULL);

Wait, what?  MODE_FULL is used to specify that you want a sparse
checkout, and MODE_NONE is used to denote that you want a full (i.e.
non-sparse) checkout?  These are *very* confusing names.

> +static int sparse_checkout_init(int argc, const char **argv)
> +{
> +   struct pattern_list pl;
> +   char *sparse_filename;
> +   FILE *fp;
> +   int res;
> +
> +   if (sc_set_config(MODE_FULL))
> +   return 1;

Seems confusing here too.

Everything else in the patch looks good, though.

Re: [PATCH v3 01/17] sparse-checkout: create builtin with 'list' subcommand

2019-10-11 Thread Elijah Newren

On Mon, Oct 7, 2019 at 1:08 PM Derrick Stolee via GitGitGadget
 wrote:
> +SPARSE CHECKOUT
> +
> +
> +"Sparse checkout" allows populating the working directory sparsely.
> +It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
> +Git whether a file in the working directory is worth looking at. If
> +the skip-worktree bit is set, then the file is ignored in the working
> +directory. Git will not populate the contents of those files, which
> +makes a sparse checkout helpful when working in a repository with many
> +files, but only a few are important to the current user.
> +
> +The `$GIT_DIR/info/sparse-checkout` file is used to define the
> +skip-worktree reference bitmap. When Git updates the working
> +directory, it updates the skip-worktree bits in the index based
> +ont this file. The files matching the patterns in the file will

s/ont/on/

> +appear in the working directory, and the rest will not.
> +
> +## FULL PATTERN SET
> +
> +By default, the sparse-checkout file uses the same syntax as `.gitignore`
> +files.
> +
> +While `$GIT_DIR/info/sparse-checkout` is usually used to specify what
> +files are included, you can also specify what files are _not_ included,
> +using negative patterns. For example, to remove the file `unwanted`:
> +
> +
> +/*
> +!unwanted
> +
> +
> +Another tricky thing is fully repopulating the working directory when you
> +no longer want sparse checkout. You cannot just disable "sparse
> +checkout" because skip-worktree bits are still in the index and your working
> +directory is still sparsely populated. You should re-populate the working
> +directory with the `$GIT_DIR/info/sparse-checkout` file content as
> +follows:
> +
> +
> +/*
> +
> +
> +Then you can disable sparse checkout. Sparse checkout support in 'git
> +checkout' and similar commands is disabled by default. You need to
> +set `core.sparseCheckout` to `true` in order to have sparse checkout
> +support.

Looks like these disappear by the end of the series, so no need to
comment on them.  Thanks for all the fixes, other than the trivial
typo above, this patch looks good.

[PATCH 2/2] merge-recursive: fix merging a subdirectory into the root directory

2019-10-11 Thread Elijah Newren via GitGitGadget

From: Elijah Newren 

We allow renaming all entries in e.g. a directory named z/ into a
directory named y/ to be detected as a z/ -> y/ rename, so that if the
other side of history adds any files to the directory z/ in the mean
time, we can provide the hint that they should be moved to y/.

There is no reason to not allow 'y/' to be the root directory, but the
code did not handle that case correctly.  Add a testcase and the
necessary special checks to support this case.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c   | 29 +++
 t/t6043-merge-rename-directories.sh | 56 +
 2 files changed, 85 insertions(+)

diff --git a/merge-recursive.c b/merge-recursive.c
index f80e48f623..7bd4a7cf10 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1931,6 +1931,16 @@ static char *apply_dir_rename(struct dir_rename_entry 
*entry,
return NULL;
 
oldlen = strlen(entry->dir);
+   if (entry->new_dir.len == 0)
+   /*
+* If someone renamed/merged a subdirectory into the root
+* directory (e.g. 'some/subdir' -> ''), then we want to
+* avoid returning
+* '' + '/filename'
+* as the rename; we need to make old_path + oldlen advance
+* past the '/' character.
+*/
+   oldlen++;
newlen = entry->new_dir.len + (strlen(old_path) - oldlen) + 1;
strbuf_grow(&new_path, newlen);
strbuf_addbuf(&new_path, &entry->new_dir);
@@ -1980,6 +1990,25 @@ static void get_renamed_dir_portion(const char 
*old_path, const char *new_path,
*end_of_old == *end_of_new)
return; /* We haven't modified *old_dir or *new_dir yet. */
 
+   /*
+* If end_of_new got back to the beginning of its string, and
+* end_of_old got back to the beginning of some subdirectory, then
+* we have a rename/merge of a subdirectory into the root, which
+* needs slightly special handling.
+*
+* Note: There is no need to consider the opposite case, with a
+* rename/merge of the root directory into some subdirectory.
+* Our rule elsewhere that a directory which still exists is not
+* considered to have been renamed means the root directory can
+* never be renamed (because the root directory always exists).
+*/
+   if (end_of_new == new_path &&
+   end_of_old != old_path && end_of_old[-1] == '/') {
+   *old_dir = xstrndup(old_path, end_of_old-1 - old_path);
+   *new_dir = xstrndup(new_path, end_of_new - new_path);
+   return;
+   }
+
/*
 * We've found the first non-matching character in the directory
 * paths.  That means the current characters we were looking at
diff --git a/t/t6043-merge-rename-directories.sh 
b/t/t6043-merge-rename-directories.sh
index c966147d5d..b920bb0850 100755
--- a/t/t6043-merge-rename-directories.sh
+++ b/t/t6043-merge-rename-directories.sh
@@ -4051,6 +4051,62 @@ test_expect_success '12c-check: Moving one directory 
hierarchy into another w/ c
)
 '
 
+# Testcase 12d, Rename/merge of subdirectory into the root
+#   Commit O: a/b/{foo.c}
+#   Commit A: foo.c
+#   Commit B: a/b/{foo.c,bar.c}
+#   Expected: a/b/{foo.c,bar.c}
+
+test_expect_success '12d-setup: Rename (merge) of subdirectory into the root' '
+   test_create_repo 12d &&
+   (
+   cd 12d &&
+
+   mkdir -p a/b/subdir &&
+   test_commit a/b/subdir/foo.c &&
+
+   git branch O &&
+   git branch A &&
+   git branch B &&
+
+   git checkout A &&
+   mkdir subdir &&
+   git mv a/b/subdir/foo.c.t subdir/foo.c.t &&
+   test_tick &&
+   git commit -m "A" &&
+
+   git checkout B &&
+   test_commit a/b/bar.c
+   )
+'
+
+test_expect_success '12d-check: Rename (merge) of subdirectory into the root' '
+   (
+   cd 12d &&
+
+   git checkout A^0 &&
+
+   git -c merge.directoryRenames=true merge -s recursive B^0 &&
+
+   git ls-files -s >out &&
+   test_line_count = 2 out &&
+
+   git rev-parse >actual \
+   HEAD:subdir/foo.c.t   HEAD:bar.c.t &&
+   git rev-parse >expect \
+   O:a/b/subdir/foo.c.t  B:a/b/bar.c.t &&
+   tes

[PATCH 1/2] merge-recursive: clean up get_renamed_dir_portion()

2019-10-11 Thread Elijah Newren via GitGitGadget

From: Elijah Newren 

Dscho noted a few things making this function hard to follow.
Restructure it a bit and add comments to make it easier to follow.  The
restructurings include:

  * There was a special case if-check at the end of the function
checking whether someone just renamed a file within its original
directory, meaning that there could be no directory rename involved.
That check was slightly convoluted; it could be done in a more
straightforward fashion earlier in the function, and can be done
more cheaply too (no call to strncmp).

  * The conditions for advancing end_of_old and end_of_new before
calling strchr were both confusing and unnecessary.  If either
points at a '/', then they need to be advanced in order to find the
next '/'.  If either doesn't point at a '/', then advancing them one
char before calling strchr() doesn't hurt.  So, just rip out the
if conditions and advance both before calling strchr().

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 60 ---
 1 file changed, 36 insertions(+), 24 deletions(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index 22a12cfeba..f80e48f623 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -1943,8 +1943,8 @@ static void get_renamed_dir_portion(const char *old_path, 
const char *new_path,
char **old_dir, char **new_dir)
 {
char *end_of_old, *end_of_new;
-   int old_len, new_len;
 
+   /* Default return values: NULL, meaning no rename */
*old_dir = NULL;
*new_dir = NULL;
 
@@ -1955,43 +1955,55 @@ static void get_renamed_dir_portion(const char 
*old_path, const char *new_path,
 *"a/b/c/d" was renamed to "a/b/some/thing/else"
 * so, for this example, this function returns "a/b/c/d" in
 * *old_dir and "a/b/some/thing/else" in *new_dir.
-*
-* Also, if the basename of the file changed, we don't care.  We
-* want to know which portion of the directory, if any, changed.
+*/
+
+   /*
+* If the basename of the file changed, we don't care.  We want
+* to know which portion of the directory, if any, changed.
 */
end_of_old = strrchr(old_path, '/');
end_of_new = strrchr(new_path, '/');
-
if (end_of_old == NULL || end_of_new == NULL)
-   return;
+   return; /* We haven't modified *old_dir or *new_dir yet. */
+
+   /* Find the first non-matching character traversing backwards */
while (*--end_of_new == *--end_of_old &&
   end_of_old != old_path &&
   end_of_new != new_path)
; /* Do nothing; all in the while loop */
+
/*
-* We've found the first non-matching character in the directory
-* paths.  That means the current directory we were comparing
-* represents the rename.  Move end_of_old and end_of_new back
-* to the full directory name.
+* If both got back to the beginning of their strings, then the
+* directory didn't change at all, only the basename did.
 */
-   if (*end_of_old == '/')
-   end_of_old++;
-   if (*end_of_old != '/')
-   end_of_new++;
-   end_of_old = strchr(end_of_old, '/');
-   end_of_new = strchr(end_of_new, '/');
+   if (end_of_old == old_path && end_of_new == new_path &&
+   *end_of_old == *end_of_new)
+   return; /* We haven't modified *old_dir or *new_dir yet. */
 
/*
-* It may have been the case that old_path and new_path were the same
-* directory all along.  Don't claim a rename if they're the same.
+* We've found the first non-matching character in the directory
+* paths.  That means the current characters we were looking at
+* were part of the first non-matching subdir name going back from
+* the end of the strings.  Get the whole name by advancing both
+* end_of_old and end_of_new to the NEXT '/' character.  That will
+* represent the entire directory rename.
+*
+* The reason for the increment is cases like
+*a/b/star/foo/whatever.c -> a/b/tar/foo/random.c
+* After dropping the basename and going back to the first
+* non-matching character, we're now comparing:
+*a/b/s  and a/b/
+* and we want to be comparing:
+*a/b/star/  and a/b/tar/
+* but without the pre-increment, the one on the right would stay
+* a/b/.
 */
-   old_len = end_of_old - old_path;
-   new_len = end_of_new - new_path;
+   end_of_old = strchr(

[PATCH 0/2] Dir rename fixes

2019-10-11 Thread Elijah Newren via GitGitGadget

This series improves a couple things found after looking into things Dscho
flagged:

 * clarify and slightly restructure code in the get_renamed_dir_portion()
   function
 * extend support of detecting renaming/merging of one directory into
   another to support the root directory as a target directory

First patch best viewed with a --histogram diff, which I sadly don't know
how to make gitgitgadget generate.

Elijah Newren (2):
  merge-recursive: clean up get_renamed_dir_portion()
  merge-recursive: fix merging a subdirectory into the root directory

 merge-recursive.c   | 89 +
 t/t6043-merge-rename-directories.sh | 56 ++
 2 files changed, 121 insertions(+), 24 deletions(-)


base-commit: 08da6496b61341ec45eac36afcc8f94242763468
Published-As: 
https://github.com/gitgitgadget/git/releases/tag/pr-390%2Fnewren%2Fdir-rename-fixes-v1
Fetch-It-Via: git fetch https://github.com/gitgitgadget/git 
pr-390/newren/dir-rename-fixes-v1
Pull-Request: https://github.com/gitgitgadget/git/pull/390
-- 
gitgitgadget

Re: What's cooking in git.git (Oct 2019, #03; Fri, 11)

2019-10-11 Thread Elijah Newren

On Fri, Oct 11, 2019 at 9:42 AM Junio C Hamano  wrote:
>
> Elijah Newren  writes:
>
> >> * en/fast-imexport-nested-tags (2019-10-04) 8 commits
> >>   (merged to 'next' on 2019-10-07 at 3e75779e10)
> >>  + fast-export: handle nested tags
> >>  ...
> >>  + fast-export: fix exporting a tag and nothing else
> >>
> >>  Updates to fast-import/export.
> >>
> >>  Will merge to 'master'.
> >
> > Any chance this will merge down before 2.24.0?  I'd really like to see
> > and use it within filter-repo.
>
> A few general guidelines I use are that a typical topic spends 1
> week in 'next' (a trivial one-liner may be there for much shorter
> time, an involved multi-patch topic that touches the core part of
> the system may have to spend more), that an involved topic that is
> not in 'master' by rc0 would not appear in the next release, and
> that any topic that is not in 'master' by rc1 needs compelling
> reason to be in the next release.  So it is cutting a bit too close
> for this topic, it seems, but we'll see.

:-(

Did I shoot myself in the foot by being quick to jump on Rene's couple
of cosmetic touch-up suggestions he posted over a week after the
series was originally posted?  In other words, if I hadn't stopped you
from merging the series down to next to incorporate those
clean-ups[1], would the story have been different?

[1] 
https://public-inbox.org/git/CABPp-BHvzyLf=wwhv45qkdkjitvwhtsmdfa0hd5ejf5fmhh...@mail.gmail.com/

Re: [PATCH v10 18/36] merge-recursive: add get_directory_renames()

2019-10-11 Thread Elijah Newren

// Dropping a few folks from the cc list as the thread is so old that
I think it should just be the normal git mailing list.

Hi Dscho,

On Wed, Oct 9, 2019 at 1:39 PM Johannes Schindelin
 wrote:
>
> Hi Elijah,
>
> sorry about the blast from the past, but I just stumbled over something
> I could not even find any discussion about:

I'm curious what brought you to this part of the codebase, but either
way, thanks for sending an email with your findings.

More comments below...

[...]
> > @@ -1357,6 +1395,169 @@ static struct diff_queue_struct 
> > *get_diffpairs(struct merge_options *o,
> >   return ret;
> >  }
> >
> > +static void get_renamed_dir_portion(const char *old_path, const char 
> > *new_path,
> > + char **old_dir, char **new_dir)
> > +{
> > + char *end_of_old, *end_of_new;
> > + int old_len, new_len;
> > +
> > + *old_dir = NULL;
> > + *new_dir = NULL;
> > +
> > + /*
> > +  * For
> > +  *"a/b/c/d/e/foo.c" -> "a/b/some/thing/else/e/foo.c"
> > +  * the "e/foo.c" part is the same, we just want to know that
> > +  *"a/b/c/d" was renamed to "a/b/some/thing/else"
> > +  * so, for this example, this function returns "a/b/c/d" in
> > +  * *old_dir and "a/b/some/thing/else" in *new_dir.
> > +  *
> > +  * Also, if the basename of the file changed, we don't care.  We
> > +  * want to know which portion of the directory, if any, changed.
> > +  */
> > + end_of_old = strrchr(old_path, '/');
> > + end_of_new = strrchr(new_path, '/');
> > +
> > + if (end_of_old == NULL || end_of_new == NULL)
> > + return;
> > + while (*--end_of_new == *--end_of_old &&
> > +end_of_old != old_path &&
> > +end_of_new != new_path)
> > + ; /* Do nothing; all in the while loop */
> > + /*
> > +  * We've found the first non-matching character in the directory
> > +  * paths.  That means the current directory we were comparing
> > +  * represents the rename.  Move end_of_old and end_of_new back
> > +  * to the full directory name.
> > +  */
> > + if (*end_of_old == '/')
> > + end_of_old++;
> > + if (*end_of_old != '/')
> > + end_of_new++;
>
> Is this intentional? Even after thinking about it for fifteen minutes, I
> think it was probable meant to test for `*end_of_new == '/'` instead of
> `*end_of_old != '/'`. And...

Yeah, looks like a mess-up, and yes your suspicion is correct about
what was intended.

Hilariously, though, no bug results from this.  Since these are paths,
as canonicalized by git (i.e. not as specified by the user where they
might accidentally type multiple consecutive slashes), there will
never be two slashes in a row (because we can't have directories with
an empty name).  Thus, it is guaranteed at this point that *end_of_old
!= '/', and end_of_new is thus unconditionally advanced.  Further,
since we wanted to find the _next_ '/' character after end_of_new,
then there were two cases: (1) end_of_new already pointed at a slash
character in which case we needed it to be advanced, or (2) end_of_new
didn't point to a slash character so it wouldn't hurt at all to
advance it.

> > + end_of_old = strchr(end_of_old, '/');
> > + end_of_new = strchr(end_of_new, '/');
>
> ... while I satisfied myself that these calls cannot return `NULL` at
> this point, it took quite a few minutes of reasoning.
>
> So I think we might want to rewrite these past 6 lines, to make
> everything quite a bit more obvious, like this:
>
> if (end_of_old != old_path)
> while (*(++end_of_old) != '/')
> ; /* keep looking */
> if (end_of_new != new_path)
> while (*(++end_of_new) != '/')
> ; /* keep looking */

I think your if-checks here are not correct.  Let's say that old_path
was "tar/foo.c" and new_path was "star/foo.c".  The initial strrchr
will bring both end_of_* variables back to the slash.  The moving left
while equal will move end_of_old back to old_path (i.e. pointing to
the "t") and end_of_new back to pointing at "t" as well.  Here's where
your six alternate lines would kick in, and would leave end_of_old at
old_path, while moving end_of_new to the '/', making it look like we
had a rename of "" (the empty string or root directory) to "star"
instead of a rename of "tar" to "star".  If you dropped your if-checks
(just having the while loops), then I think it does the right thing.

> There is _still_ one thing that makes this harder than trivial to reason
> about: the case where one of `*end_of_old` and `*end_of_new` is a slash.
> At this point, we assume that `*end_of_old != *end_of_new` (more about
> that assumption in the next paragraph), therefore only one of them can
> be a slash, and we want to advance beyond it. But even if the pointer
> does not point at a slash, we want to look for one, so we want to
> advance beyond

Re: What's cooking in git.git (Oct 2019, #03; Fri, 11)

2019-10-11 Thread Elijah Newren

On Fri, Oct 11, 2019 at 12:35 AM Junio C Hamano  wrote:
> [Cooking]

[...]

> * en/fast-imexport-nested-tags (2019-10-04) 8 commits
>   (merged to 'next' on 2019-10-07 at 3e75779e10)
>  + fast-export: handle nested tags
>  + t9350: add tests for tags of things other than a commit
>  + fast-export: allow user to request tags be marked with --mark-tags
>  + fast-export: add support for --import-marks-if-exists
>  + fast-import: add support for new 'alias' command
>  + fast-import: allow tags to be identified by mark labels
>  + fast-import: fix handling of deleted tags
>  + fast-export: fix exporting a tag and nothing else
>
>  Updates to fast-import/export.
>
>  Will merge to 'master'.

Any chance this will merge down before 2.24.0?  I'd really like to see
and use it within filter-repo.

Re: Raise your hand to Ack jk/code-of-conduct if your Ack fell thru cracks

2019-10-09 Thread Elijah Newren

On Tue, Oct 8, 2019 at 5:20 PM Junio C Hamano  wrote:
>
> Johannes Schindelin  writes:
>
> > In other words, the commit message can be augmented by this:
> >
> > Acked-by: Johannes Schindelin 
> > Acked-by: Derrick Stolee 
> > Acked-by: Garima Singh 
> > Acked-by: Jonathan Tan 
> > Acked-by: Thomas Gummerer 
> > Acked-by: brian m. carlson 
> > Acked-by: Elijah Newren 
> >
> > Junio, would you mind picking it up, please?
>
> I trust you enough that I won't go back to the cited messages to
> double check that these acks are real, but I'd still wait for a few
> days for people who expressed their Acks but your scan missed, or
> those who wanted to give their Acks but forgot to do so, to raise
> their hands on this thread.
>
> Thanks for starting the concluding move on this topic.

Agreed, thanks.  There is one super minor issue, that I probably
shouldn't even bring up but... Looking at jk/coc, it ends with:

Signed-off-by: Jeff King 
Acked-by: Christian Couder 
Acked-by: Emily Shaffer 
Acked-by: Garima Singh 
Acked-by: Junio C Hamano 
Acked-by: Johannes Schindelin 
Acked-by: Jonathan Tan 
Acked-by: Jonathan Nieder 
Acked-by: Taylor Blau 
Acked-by: Elijah Newren 
Acked-by: brian m. carlson 
Acked-by: Derrick Stolee 
Acked-by: Thomas Gummerer 
Signed-off-by: Junio C Hamano 

Those Acked-by's are nearly in alphabetical order (at least at first
glance) until Brian, Derrick, and me.  I know it's a trivial thing,
but for the OCD among us, could it either be randomized, ordered by
when people acked, or alphabetized?  Sorry if this sounds really
trivial, but it's kind of like trying to hold a conversation with
someone at their office when the cord to their phone was all twisted
up; it's really difficult to pay attention to anything of substance
until that problem is fixed (though, thankfully, a job change in
combination with cell phone prevalence have almost completely
eradicated the phone cord problem years ago).

I'm kinda worried that sending this will result in someone
alphabetizing everyone in the list except me, but oh well...

Elijah

Re: [PATCH] merge-recursive: fix the fix to the diff3 common ancestor label

2019-10-08 Thread Elijah Newren

On Mon, Oct 7, 2019 at 7:36 PM Junio C Hamano  wrote:
>
> Elijah Newren  writes:
>
> > In commit 208e69a4ebce ("merge-recursive: fix the diff3 common ancestor
>
> I think the above was an earlier incarntion of what is now known as
> 8e4ec337 ("merge-recursive: fix the diff3 common ancestor label for
> virtual commits", 2019-10-01).

Oops, yes.

> > label for virtual commits", 2019-09-30) which was a fix to commit
> > 743474cbfa8b ("merge-recursive: provide a better label for diff3 common
> > ...
> > The handling for "constructed merge base" worked by allowing
> > opt->ancestor to be set in merge_recursive_generic(), so we payed
>
> s/payed/paid/

Ugh, two simple mistakes in the commit message.  I see you've not only
proofread the commit but fixed up the commit message for me in pu;
thanks.

Elijah

Re: Why is "Sparse checkout leaves no entry on working directory" a fatal error?

2019-10-08 Thread Elijah Newren

On Mon, Oct 7, 2019 at 11:52 PM Josef Wolf  wrote:
>
> Hello,
>
> This is a repost, since the original message seems to have been lost somehow.
>
>
> I am trying to add a file to an arbitrary branch without touching the current
> worktree with as little overhead as possible. This should work no matter in
> which state the current worktree is in. And it should not touch the current WT
> in any way.
>
> For this, the sparse-checkout feature in conjuntion with the "shared
> repository" feature seems to be perfect.

I can see the logical progression that a sparse worktree would be less
overhead than a full worktree, and that a bare worktree would be even
better.  But you're still dealing with unnecessary overhead; you don't
need a worktree at all to achieve what you want.

Traditionally, if you wanted to modify another branch without touching
the worktree at all, you would use a combination of hash-object,
mktree, commit-tree, and update-ref.  That would be a better solution
to your problem than trying to approximate it with a sparse checkout.
However, that's at least four invocations of git, and you said as
little overhead as possible, so I'd recommend you use fast-import.

But, since you asked some other questions about sparse checkouts...

> The basic idea goes like this:
>
>
>TMP=`mktemp -d /var/tmp/test-X`
>GD=$TMP/git
>WD=$TMP/wd
>
>git --work-tree $WD --git-dir $GD clone -qns -n . $GD
>git --work-tree $WD --git-dir $GD config core.sparsecheckout true
>echo path/of/file/which/I/want/to/create >>$GD/info/sparse-checkout
>
>git --work-tree $WD --git-dir $GD checkout -b some-branch 
> remotes/origin/some-branch  # !!!
>
>( cd $WD
>  mkdir -p path/of/file/which/I/want/to
>  echo huhuh >path/of/file/which/I/want/to/create
>  git --work-tree $WD --git-dir $GD add path/of/file/which/I/want/to/create
>  git --work-tree $WD --git-dir $GD commit
>  git --work-tree $WD --git-dir $GD push
>)
>
>rm -rf $TMP
>
>
> Unfortunately, the marked command errors out with
>
>"error: Sparse checkout leaves no entry on working directory"
>
> and won't create/switch to the branch that is to be modified.
>
> Why is this an error? Since there are no matching files, an empty worktree
> is EXACTLY what I wanted. Why will the "git checkout -b" command error out?

It is very easy to mess up the sparse specifications.  We can't check
for all errors, but a pretty obvious one is when people specify
restrictions that match no path.  We can at least give an error in
that case.  There are times when folks might intentionally specify
paths that don't match anything, but they are quite rare.  The ones I
can think of:

1) When they are doing something exotic where they are just trying to
approximate something else rather than actual use sparse checkouts as
intended.
2) When they've learned about sparse checkouts and just want to test
what things are like in extreme situations.

Case 1 consists of stuff like what you are doing here, for which there
are better solutions, or when I was attempting to simulate the
performance issues microsoft folks were having with a really large
repo and knowing they used sparse checkouts as part of VFS-for-git (I
created a very large index and had no entries checked out at first,
but then ran into these errors, and added one file to the index and
had a sparse specification match it.)

For case 2, people learn that an empty working tree is a too extreme
situation that we'll throw an error at and so they adjust and make
sure to match at least one path.

> Strange enough, I have some repositories at this machine where the
> .git/info/sparse-checkout file contains only non-existing files and git
> happily executes this "git checkout -b XXX remotes/origin/XXX" command leaving
> the working tree totally empty all the time.

I can't reproduce:

$ git config core.sparseCheckout true
$ echo 'non-existent' > .git/info/sparse-checkout
$ git checkout -b next origin/next
error: Sparse checkout leaves no entry on working directory

Can you provide any more details about how you get into this state?

> Someone understands this inconsistent behaviour?

No, but I wouldn't be surprised if there are bugs and edge cases.  I
think I ran into one or two when testing things out, didn't take good
enough notes, and had trouble reproducing later.  The sparse checkout
stuff has been under-tested and not well documented, something Stolee
is trying to fix right now.

Re: log -m output

2019-10-07 Thread Elijah Newren

On Mon, Oct 7, 2019 at 10:05 AM Semyon Kirnosenko  wrote:
>
> On 2019-10-07 20:43, SZEDER Gábor wrote:
> > On Mon, Oct 07, 2019 at 07:14:25PM +0400, Semyon Kirnosenko wrote:
> >> I have a question about log command.
> >> Probably I'm just missing something but anyway.
> >> I can illustrate the question on the repository of Git.
> >> Let's look at revision 1ed91937
> >> It is a merge based on pair of revisions a9572072 and 294c695d.
> >> According to blame these parent revisions have different content for
> >> delta.h file.
> >
> > I'm not sure what you mean by this statement; what blame command did
> > you run?
> >
> >> But when I get log with -m flag for merge revision, I can't see that
> >> file in the list of changed files.
> >> Why?
> >
> > The contents of 'delta.h' is identical in both parents of that merge:
> >
> >$ git diff a9572072 294c695d delta.h
> >$
> ># no difference
> >
> > So 'git log -m' does the right thing by not showing 'delta.h'.
> >
> > .
> >
>
> But blame shows different results:
>
> git blame a9572072 delta.h
> git blame 294c695d delta.h

blame does not at all claim those two revisions have different
versions of delta.h:

$ diff -u <(git blame a9572072 delta.h) <(git blame 294c695d delta.h)
--- /dev/fd/632019-10-07 10:16:43.092356078 -0700
+++ /dev/fd/622019-10-07 10:16:43.092356078 -0700
@@ -9,8 +9,8 @@
 a310d434946 (Nicolas Pitre  2005-05-19 10:27:14 -0400  9)
 void *delta_buf, unsigned long delta_size,
 a310d434946 (Nicolas Pitre  2005-05-19 10:27:14 -0400 10)
 unsigned long *dst_size);
 d1af002dc60 (Nicolas Pitre  2005-05-20 16:59:17 -0400 11)
-dcde55bc58a (Nicolas Pitre  2005-06-29 02:49:56 -0400 12) /* the
smallest possible delta size is 4 bytes */
-dcde55bc58a (Nicolas Pitre  2005-06-29 02:49:56 -0400 13) #define
DELTA_SIZE_MIN4
+c7a45bd20e4 (Junio C Hamano 2005-12-12 16:42:38 -0800 12) /* the
smallest possible delta size is 4 bytes */
+c7a45bd20e4 (Junio C Hamano 2005-12-12 16:42:38 -0800 13) #define
DELTA_SIZE_MIN4
 dcde55bc58a (Nicolas Pitre  2005-06-29 02:49:56 -0400 14)
 dcde55bc58a (Nicolas Pitre  2005-06-29 02:49:56 -0400 15) /*
 dcde55bc58a (Nicolas Pitre  2005-06-29 02:49:56 -0400 16)  * This
must be called twice on the delta data buffer, first to get t

It does say that _how_ those two arrived at the *same* version of the
file differed, but if you compare the portions of the differing lines
corresponding to the actual file contents you see that they are the
same...just as SZEDER pointed out.

[PATCH] merge-recursive: fix the fix to the diff3 common ancestor label

2019-10-07 Thread Elijah Newren

In commit 208e69a4ebce ("merge-recursive: fix the diff3 common ancestor
label for virtual commits", 2019-09-30) which was a fix to commit
743474cbfa8b ("merge-recursive: provide a better label for diff3 common
ancestor", 2019-08-17), the label for the common ancestor was changed
from always being

 "merged common ancestors"

to instead be based on the number of merge bases and whether the merge
base was a real commit or a virtual one:

>=2: "merged common ancestors"
  1, via merge_recursive_generic: "constructed merge base"
  1, otherwise: 
  0: ""

The handling for "constructed merge base" worked by allowing
opt->ancestor to be set in merge_recursive_generic(), so we payed
attention to the setting of that variable in merge_recursive_internal().
Now, for the outer merge, the code flow was simply the following:

ancestor_name = "merged merge bases"
loop over merge_bases: merge_recursive_internal()

The first merge base not needing recursion would determine its own
ancestor_name however necessary and thus run

ancestor_name = $SOMETHING
empty loop over merge_bases...
opt->ancestor = ancestor_name
merge_trees_internal()

Now, the next set of merge_bases that would need to be merged after this
particular merge had completed would note that opt->ancestor has been
set to something (to a local ancestor_name variable that has since been
popped off the stack), and thus it would run:

... else if (opt->ancestor) {
ancestor_name = opt->ancestor;  /* OOPS! */
loop over merge_bases: merge_recursive_internal()
opt->ancestor = ancestor_name
merge_trees_internal()

This resulted in garbage strings being printed for the virtual merge
bases, which was visible in git.git by just merging commit b744c3af07
into commit 6d8cb22a4f.  There are two ways to fix this: set
opt->ancestor to NULL after using it to avoid re-use, or add a
!opt->priv->call_depth check to the if block for using a pre-defined
opt->ancestor.  Apply both fixes.

Signed-off-by: Elijah Newren 
---
 merge-recursive.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index e12d91f48a..2653ba9a50 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3550,7 +3550,7 @@ static int merge_recursive_internal(struct merge_options 
*opt,
merged_merge_bases = make_virtual_commit(opt->repo, tree,
 "ancestor");
ancestor_name = "empty tree";
-   } else if (opt->ancestor) {
+   } else if (opt->ancestor && !opt->priv->call_depth) {
ancestor_name = opt->ancestor;
} else if (merge_bases) {
ancestor_name = "merged common ancestors";
@@ -3600,6 +3600,7 @@ static int merge_recursive_internal(struct merge_options 
*opt,
  merged_merge_bases),
 &result_tree);
strbuf_release(&merge_base_abbrev);
+   opt->ancestor = NULL;  /* avoid accidental re-use of opt->ancestor */
if (clean < 0) {
flush_output(opt);
return clean;
-- 
2.23.0.26.gfc82117b87

Re: [PATCH v2 08/11] sparse-checkout: add 'cone' mode

2019-10-05 Thread Elijah Newren

On Thu, Sep 19, 2019 at 1:45 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The sparse-checkout feature can have quadratic performance as
> the number of patterns and number of entries in the index grow.
> If there are 1,000 patterns and 1,000,000 entries, this time can
> be very significant.
>
> Create a new Boolean config option, core.sparseCheckoutCone, to
> indicate that we expect the sparse-checkout file to contain a
> more limited set of patterns. This is a separate config setting
> from core.sparseCheckout to avoid breaking older clients by
> introcuding a tri-state option.

s/introcuding/introducing/

> The config option does nothing right now, but will be expanded
> upon in a later commit.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/config/core.txt |  7 ++--
>  Documentation/git-sparse-checkout.txt | 50 +++
>  cache.h   |  4 ++-
>  config.c  |  5 +++
>  environment.c |  1 +
>  t/t1091-sparse-checkout-builtin.sh| 14 
>  6 files changed, 78 insertions(+), 3 deletions(-)
>
> diff --git a/Documentation/config/core.txt b/Documentation/config/core.txt
> index 75538d27e7..9b8ab2a6d4 100644
> --- a/Documentation/config/core.txt
> +++ b/Documentation/config/core.txt
> @@ -591,8 +591,11 @@ core.multiPackIndex::
> multi-pack-index design document].
>
>  core.sparseCheckout::
> -   Enable "sparse checkout" feature. See section "Sparse checkout" in
> -   linkgit:git-read-tree[1] for more information.
> +   Enable "sparse checkout" feature. If "false", then sparse-checkout
> +   is disabled. If "true", then sparse-checkout is enabled with the full
> +   .gitignore pattern set. If "cone", then sparse-checkout is enabled 
> with
> +   a restricted pattern set. See linkgit:git-sparse-checkout[1] for more
> +   information.

This isn't consistent with the commit message that suggests it's a new
option rather than a new possible value for an old option.

>  core.abbrev::
> Set the length object names are abbreviated to.  If
> diff --git a/Documentation/git-sparse-checkout.txt 
> b/Documentation/git-sparse-checkout.txt
> index da95b28b1c..757326618d 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -87,6 +87,56 @@ using negative patterns. For example, to remove the file 
> `unwanted`:
>  
>
>
> +## CONE PATTERN SET
> +
> +The full pattern set allows for arbitrary pattern matches and complicated
> +inclusion/exclusion rules. These can result in O(N*M) pattern matches when
> +updating the index, where N is the number of patterns and M is the number
> +of paths in the index. To combat this performance issue, a more restricted
> +pattern set is allowed when `core.spareCheckoutCone` is enabled.
> +
> +The accepted patterns in the cone pattern set are:
> +
> +1. *Recursive:* All paths inside a directory are included.
> +
> +2. *Parent:* All files immediately inside a directory are included.
> +
> +In addition to the above two patterns, we also expect that all files in the
> +root directory are included. If a recursive pattern is added, then all
> +leading directories are added as parent patterns.
> +
> +By default, when running `git sparse-checkout init`, the root directory is
> +added as a parent pattern. At this point, the sparse-checkout file contains
> +the following patterns:
> +
> +```
> +/*
> +!/*/
> +```
> +
> +This says "include everything in root, but nothing two levels below root."
> +If we then add the folder `A/B/C` as a recursive pattern, the folders `A` and
> +`A/B` are added as parent patterns. The resulting sparse-checkout file is
> +now
> +
> +```
> +/*
> +!/*/
> +/A/
> +!/A/*/
> +/A/B/
> +!/A/B/*/
> +/A/B/C/
> +```
> +
> +Here, order matters, so the negative patterns are overridden by the positive
> +patterns that appear lower in the file.
> +
> +If `core.sparseCheckoutCone=true`, then Git will parse the sparse-checkout 
> file
> +expecting patterns of these types. Git will warn if the patterns do not 
> match.
> +If the patterns do match the expected format, then Git will use faster hash-
> +based algorithms to compute inclusion in the sparse-checkout.
> +
>  SEE ALSO
>  
>
> diff --git a/cache.h b/cache.h
> index cf5d70c196..8e8ea67efa 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -911,12 +911,14 @@ extern char *git_replace_ref_base;
>
>  extern int fsync_object_files;
>  extern int core_preload_index;
> -extern int core_apply_sparse_checkout;
>  extern int precomposed_unicode;
>  extern int protect_hfs;
>  extern int protect_ntfs;
>  extern const char *core_fsmonitor;
>
> +int core_apply_sparse_checkout;
> +int core_sparse_checkout_cone;
> +
>  /*
>   * Include broken refs in all ref iterations, which will
>   * generally choke dangerous operations rather than letting
> diff --git a/config.c b/config.c
> index 296a6d9cc4..

Re: [PATCH v2 07/11] trace2: add region in clear_ce_flags

2019-10-05 Thread Elijah Newren

On Thu, Sep 19, 2019 at 10:15 AM Jeff Hostetler via GitGitGadget
 wrote:
>
> From: Jeff Hostetler 
>
> When Git updates the working directory with the sparse-checkout
> feature enabled, the unpack_trees() method calls clear_ce_flags()
> to update the skip-wortree bits on the cache entries. This
> check can be expensive, depending on the patterns used.
>
> Add trace2 regions around the method, including some flag
> information, so we can get granular performance data during
> experiments. This data will be used to measure improvements
> to the pattern-matching algorithms for sparse-checkout.
>
> Signed-off-by: Jeff Hostetler 
> Signed-off-by: Derrick Stolee 
> ---
>  unpack-trees.c | 10 +-
>  1 file changed, 9 insertions(+), 1 deletion(-)
>
> diff --git a/unpack-trees.c b/unpack-trees.c
> index cd548f4fa2..26be8f3569 100644
> --- a/unpack-trees.c
> +++ b/unpack-trees.c
> @@ -1404,15 +1404,23 @@ static int clear_ce_flags(struct index_state *istate,
>   struct pattern_list *pl)
>  {
> static struct strbuf prefix = STRBUF_INIT;
> +   char label[100];
> +   int rval;
>
> strbuf_reset(&prefix);
>
> -   return clear_ce_flags_1(istate,
> +   xsnprintf(label, sizeof(label), "clear_ce_flags(0x%08lx,0x%08lx)",
> + (unsigned long)select_mask, (unsigned long)clear_mask);
> +   trace2_region_enter("unpack_trees", label, the_repository);
> +   rval = clear_ce_flags_1(istate,
> istate->cache,
> istate->cache_nr,
> &prefix,
> select_mask, clear_mask,
> pl, 0);
> +   trace2_region_leave("unpack_trees", label, the_repository);
> +
> +   return rval;
>  }
>
>  /*
> --
> gitgitgadget

Thanks for the updates to the commit message, and the tweaks from
"exp" to "unpack_trees" in the patch.  I still don't know trace2, but
it's much clearer how this relates to the series now.

Re: [PATCH v2 06/11] sparse-checkout: create 'disable' subcommand

2019-10-05 Thread Elijah Newren

On Thu, Sep 19, 2019 at 1:46 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The instructions for disabling a sparse-checkout to a full
> working directory are complicated and non-intuitive. Add a
> subcommand, 'git sparse-checkout disable', to perform those
> steps for the user.
>
> Signed-off-by: Derrick Stolee 
> ---
>  Documentation/git-sparse-checkout.txt | 26 ---
>  builtin/sparse-checkout.c | 37 ---
>  t/t1091-sparse-checkout-builtin.sh| 15 +++
>  3 files changed, 59 insertions(+), 19 deletions(-)
>
> diff --git a/Documentation/git-sparse-checkout.txt 
> b/Documentation/git-sparse-checkout.txt
> index 87813e5797..da95b28b1c 100644
> --- a/Documentation/git-sparse-checkout.txt
> +++ b/Documentation/git-sparse-checkout.txt
> @@ -39,6 +39,10 @@ COMMANDS
> a list of arguments following the 'set' subcommand. Update the
> working directory to match the new patterns.
>
> +'disable'::
> +   Remove the sparse-checkout file, set `core.sparseCheckout` to
> +   `false`, and restore the working directory to include all files.

Good, so 'init' (and maybe 'set'?) will set core.sparseCheckout, and
disable will unset it, so the user doesn't have to worry about it...

> +
>  SPARSE CHECKOUT
>  
>
> @@ -61,6 +65,13 @@ Then it compares the new skip-worktree value with the 
> previous one. If
>  skip-worktree turns from set to unset, it will add the corresponding
>  file back. If it turns from unset to set, that file will be removed.
>
> +To repopulate the working directory with all files, use the
> +`git sparse-checkout disable` command.

Good.

> +Sparse checkout support in 'git checkout' and similar commands is
> +disabled by default. You need to set `core.sparseCheckout` to `true`
> +in order to have sparse checkout support.

Aren't we having the user use 'git sparse-checkout init' to do that?
Why guide them to the core.sparseCheckout option?  And why mention it
without extensions.worktreeConfig?

> +
>  ## FULL PATTERN SET
>
>  By default, the sparse-checkout file uses the same syntax as `.gitignore`
> @@ -75,21 +86,6 @@ using negative patterns. For example, to remove the file 
> `unwanted`:
>  !unwanted
>  
>
> -Another tricky thing is fully repopulating the working directory when you
> -no longer want sparse checkout. You cannot just disable "sparse
> -checkout" because skip-worktree bits are still in the index and your working
> -directory is still sparsely populated. You should re-populate the working
> -directory with the `$GIT_DIR/info/sparse-checkout` file content as
> -follows:
> -
> -
> -/*
> -

Yaay, glad to see this removed.

> -Then you can disable sparse checkout. Sparse checkout support in 'git
> -read-tree' and similar commands is disabled by default. You need to
> -set `core.sparseCheckout` to `true` in order to have sparse checkout
> -support.
>
>  SEE ALSO
>  
> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index f726fcd6b8..f858f0b1b5 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -8,7 +8,7 @@
>  #include "strbuf.h"
>
>  static char const * const builtin_sparse_checkout_usage[] = {
> -   N_("git sparse-checkout [init|list|set] "),
> +   N_("git sparse-checkout [init|list|set|disable] "),
> NULL
>  };
>
> @@ -74,7 +74,7 @@ static int update_working_directory(void)
> return result;
>  }
>
> -static int sc_enable_config(void)
> +static int sc_set_config(int mode)

Nice to see this change from the RFC round; do we want to use an enum
instead of an int, or is the int good enough?  (No strong opinion
here, just asking.)

>  {
> struct argv_array argv = ARGV_ARRAY_INIT;
>
> @@ -83,7 +83,12 @@ static int sc_enable_config(void)
> return 1;
> }
>
> -   argv_array_pushl(&argv, "config", "--worktree", 
> "core.sparseCheckout", "true", NULL);
> +   argv_array_pushl(&argv, "config", "--worktree", 
> "core.sparseCheckout", NULL);
> +
> +   if (mode)
> +   argv_array_pushl(&argv, "true", NULL);
> +   else
> +   argv_array_pushl(&argv, "false", NULL);
>
> if (run_command_v_opt(argv.argv, RUN_GIT_CMD)) {
> error(_("failed to enable core.sparseCheckout"));
> @@ -101,7 +106,7 @@ static int sparse_checkout_init(int argc, const char 
> **argv)
> int res;
> struct object_id oid;
>
> -   if (sc_enable_config())
> +   if (sc_set_config(1))
> return 1;
>
> memset(&pl, 0, sizeof(pl));
> @@ -188,6 +193,28 @@ static int sparse_checkout_set(int argc, const char 
> **argv, const char *prefix)
> return write_patterns_and_update(&pl);
>  }
>
> +static int sparse_checkout_disable(int argc, const char **argv)
> +{
> +   char *sparse_filename;
> +   FILE *fp;
> +
> +   if (sc_set_config(1))
> +   di

Re: [PATCH v2 04/11] sparse-checkout: 'set' subcommand

2019-10-05 Thread Elijah Newren

On Sat, Oct 5, 2019 at 3:44 PM Elijah Newren  wrote:
>
> On Thu, Sep 19, 2019 at 3:07 PM Derrick Stolee via GitGitGadget
>  wrote:
> > +static int write_patterns_and_update(struct pattern_list *pl)
> > +{
> > +   char *sparse_filename;
> > +   FILE *fp;
> > +
> > +   sparse_filename = get_sparse_checkout_filename();
> > +   fp = fopen(sparse_filename, "w");
> > +   write_patterns_to_file(fp, pl);
> > +   fclose(fp);
> > +   free(sparse_filename);
> > +
> > +   clear_pattern_list(pl);
>
> It seems slightly odd that pl is passed in but cleared in this
> function rather than in the caller that created pl.  Should this be
> moved to the caller, or, alternatively, a comment added to explain
> this side-effect for future callers of the function?
>
> The rest of the patch looked good to me.

Actually, thought of something else.  What if the user calls 'git
sparse-checkout set ...' without first calling 'git sparse-checkout
init'?  Should that report an error to the user, a suggestion to
follow it up with 'sparse-checkout init', or should it just call
sc_set_config() behind the scenes and allow bypassing the init
subcommand?

Re: [PATCH v2 04/11] sparse-checkout: 'set' subcommand

2019-10-05 Thread Elijah Newren

On Thu, Sep 19, 2019 at 3:07 PM Derrick Stolee via GitGitGadget
 wrote:
> +static int write_patterns_and_update(struct pattern_list *pl)
> +{
> +   char *sparse_filename;
> +   FILE *fp;
> +
> +   sparse_filename = get_sparse_checkout_filename();
> +   fp = fopen(sparse_filename, "w");
> +   write_patterns_to_file(fp, pl);
> +   fclose(fp);
> +   free(sparse_filename);
> +
> +   clear_pattern_list(pl);

It seems slightly odd that pl is passed in but cleared in this
function rather than in the caller that created pl.  Should this be
moved to the caller, or, alternatively, a comment added to explain
this side-effect for future callers of the function?

The rest of the patch looked good to me.

Re: [PATCH v2 03/11] clone: add --sparse mode

2019-10-05 Thread Elijah Newren

On Thu, Sep 19, 2019 at 3:06 PM Derrick Stolee via GitGitGadget
 wrote:

> During the 'git sparse-checkout init' call, we must first look
> to see if HEAD is valid, or else we will fail while trying to
> update the working directory. The first checkout will actually
> update the working directory correctly.

This is new since the RFC series, but I'm not sure I understand.  Is
the issue you're fixing here that a 'git init somerepo' would hit this
codepath and print funny errors because HEAD doesn't exist yet and
thus the whole `git read-tree -mu HEAD` stuff can't work?  Or that
when the remote has HEAD pointing at a bad commit that you get error
messages different than expected?

> diff --git a/builtin/sparse-checkout.c b/builtin/sparse-checkout.c
> index 895479970d..656e6ebdd5 100644
> --- a/builtin/sparse-checkout.c
> +++ b/builtin/sparse-checkout.c
> @@ -99,6 +99,7 @@ static int sparse_checkout_init(int argc, const char **argv)
> char *sparse_filename;
> FILE *fp;
> int res;
> +   struct object_id oid;
>
> if (sc_enable_config())
> return 1;
> @@ -120,6 +121,11 @@ static int sparse_checkout_init(int argc, const char 
> **argv)
> fprintf(fp, "/*\n!/*/\n");
> fclose(fp);
>
> +   if (get_oid("HEAD", &oid)) {
> +   /* assume we are in a fresh repo */
> +   return 0;
> +   }
> +
>  reset_dir:
> return update_working_directory();
>  }

Re: [PATCH v2 02/11] sparse-checkout: create 'init' subcommand

2019-10-05 Thread Elijah Newren

On Thu, Sep 19, 2019 at 3:06 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> Getting started with a sparse-checkout file can be daunting. Help
> users start their sparse enlistment using 'git sparse-checkout init'.
> This will set 'core.sparseCheckout=true' in their config, write
> an initial set of patterns to the sparse-checkout file, and update
> their working directory.

...and ensure extensions.worktreeConfig is set to true.

> Using 'git read-tree' to clear directories does not work cleanly
> on Windows, so manually delete directories that are tracked by Git
> before running read-tree.

I thought you said you fixed this?  It appears to no longer be part of
the patch, so I'm guessing you just forgot to remove this comment from
the commit message?

> The use of running another process for 'git read-tree' is likely
> suboptimal, but that can be improved in a later change, if valuable.

I think it would also be worth mentioning that not only is a
subprocess suboptimal, but the behavior of `git read-tree -mu HEAD` is
itself suboptimal for a sparse-checkout.  (We either need more error
checking e.g. when the user is in the middle of a rebase or merge or
cherry-pick and have conflicted entries with a more focused error
message for the user, or we need a command that won't abort if the
conflicts aren't in the paths we're trying to remove from or bring
back to the working tree.)

Patch looks good to me, assuming the caveats of using `git read-tree
-mu HEAD` are better documented -- and hopefully addressed at some
point.  You addressed all my other feedback on this patch from the RFC
series.

Re: [PATCH v2 01/11] sparse-checkout: create builtin with 'list' subcommand

2019-10-05 Thread Elijah Newren

On Thu, Sep 19, 2019 at 1:45 PM Derrick Stolee via GitGitGadget
 wrote:
>
> From: Derrick Stolee 
>
> The sparse-checkout feature is mostly hidden to users, as its
> only documentation is supplementary information in the docs for
> 'git read-tree'. In addition, users need to know how to edit the
> .git/info/sparse-checkout file with the right patterns, then run
> the appropriate 'git read-tree -mu HEAD' command. Keeping the
> working directory in sync with the sparse-checkout file requires
> care.
>
> Begin an effort to make the sparse-checkout feature a porcelain
> feature by creating a new 'git sparse-checkout' builtin. This
> builtin will be the preferred mechanism for manipulating the
> sparse-checkout file and syncing the working directory.

Sounds good.

> The `$GIT_DIR/info/sparse-checkout` file defines the skip-
> worktree reference bitmap. When Git updates the working
> directory, it updates the skip-worktree bits in the index
> based on this file and removes or restores files in the
> working copy to match.

Does this paragraph make sense in the commit message?  It's not
explaining anything new or changing with your patch, just pre-existing
behavior, but you don't seem to reference or expound on it.

> The documentation provided is adapted from the "git read-tree"
> documentation with a few edits for clarity in the new context.
> Extra sections are added to hint toward a future change to
> a more restricted pattern set.

I think it needs a few more adaptations, as noted below...

> +SPARSE CHECKOUT
> +
> +
> +"Sparse checkout" allows populating the working directory sparsely.
> +It uses the skip-worktree bit (see linkgit:git-update-index[1]) to tell
> +Git whether a file in the working directory is worth looking at. If
> +the skip-worktree bit is set, then the file is ignored in the working
> +directory. Git will not populate the contents of those files, which
> +makes a sparse checkout helpful when working in a repository with many
> +files, but only a few are important to the current user.
> +
> +The `$GIT_DIR/info/sparse-checkout` file is used to define the
> +skip-worktree reference bitmap. When Git updates the working
> +directory, it resets the skip-worktree bit in the index based on this
> +file. If an entry
> +matches a pattern in this file, skip-worktree will not be set on
> +that entry. Otherwise, skip-worktree will be set.
> +
> +Then it compares the new skip-worktree value with the previous one. If
> +skip-worktree turns from set to unset, it will add the corresponding
> +file back. If it turns from unset to set, that file will be removed.

I know this was just copied from elsewhere, but I still have the same
problem I mentioned last time with these paragraphs: the double
negations just make it confusing to follow.  I'd prefer e.g. replacing
the last two paragraphs above with the following (which I think you
did take but accidentally placed in the commit message instead of
using it to replace these confusing paragraphs?):

The `$GIT_DIR/info/sparse-checkout` file is used to define the
skip-worktree reference bitmap. When Git updates the working
directory, it updates the skip-worktree bits in the index based on this
file and removes or restores files in the working copy to match.

It doesn't have to be this precise wording, but something like it
which is way easier to follow than those two paragraphs you were
copying.

> +Another tricky thing is fully repopulating the working directory when you
> +no longer want sparse checkout. You cannot just disable "sparse
> +checkout" because skip-worktree bits are still in the index and your working
> +directory is still sparsely populated. You should re-populate the working
> +directory with the `$GIT_DIR/info/sparse-checkout` file content as
> +follows:
> +
> +
> +/*
> +
> +
> +Then you can disable sparse checkout.

I would comment on this section, but it appears you remove this
section later in your series when you add 'sparse-checkout disable',
which addresses my concern.

> Sparse checkout support in 'git
> +read-tree' and similar commands is disabled by default. You need to
> +set `core.sparseCheckout` to `true` in order to have sparse checkout
> +support.

I see you change `git read-tree` to `git checkout` later in the
series, which is good.  However, you keep the second sentence which
seems unhelpful.  Why have a 'git sparse-checkout init' command if the
user still has to manually set `core.sparseCheckout`?  Also, if we're
going to mention that setting, we should mention
extensions.worktreeConfig at the same time.  Not sure whether it'd be
better to drop the second sentence or restructure it to let the user
know that it depends on the core.sparseCheckout setting which the init
command runs, but something should probably be done.

The rest of the patch looks good.

Re: [PATCH v4 1/6] rebase -i: add --ignore-whitespace flag

2019-10-05 Thread Elijah Newren

On Fri, Oct 4, 2019 at 2:29 AM Phillip Wood  wrote:
>
> Hi Rohit
>
> On 07/09/2019 12:50, Rohit Ashiwal wrote:
> > There are two backends available for rebasing, viz, the am and the
> > interactive. Naturally, there shall be some features that are
> > implemented in one but not in the other. One such flag is
> > --ignore-whitespace which indicates merge mechanism to treat lines
> > with only whitespace changes as unchanged. Wire the interactive
> > rebase to also understand the --ignore-whitespace flag by
> > translating it to -Xignore-space-change.
> >
> > Signed-off-by: Rohit Ashiwal 
> > ---
> >   Documentation/git-rebase.txt| 13 -
> >   builtin/rebase.c| 22 +++--
> >   t/t3422-rebase-incompatible-options.sh  |  1 -
> >   t/t3433-rebase-options-compatibility.sh | 65 +
> >   4 files changed, 94 insertions(+), 7 deletions(-)
> >   create mode 100755 t/t3433-rebase-options-compatibility.sh
> >
> > diff --git a/Documentation/git-rebase.txt b/Documentation/git-rebase.txt
> > index 6156609cf7..873eb5768c 100644
> > --- a/Documentation/git-rebase.txt
> > +++ b/Documentation/git-rebase.txt
> > @@ -371,8 +371,16 @@ If either  or --root is given on the command 
> > line, then the
> >   default is `--no-fork-point`, otherwise the default is `--fork-point`.
> >
> >   --ignore-whitespace::
> > + Behaves differently depending on which backend is selected.
> > ++
> > +'am' backend: When applying a patch, ignore changes in whitespace in
> > +context lines if necessary.
> > ++
> > +'interactive' backend: Treat lines with only whitespace changes as
> > +unchanged for the sake of a three-way merge.
> > +
> >   --whitespace=::
> > - These flag are passed to the 'git apply' program
> > + This flag is passed to the 'git apply' program
> >   (see linkgit:git-apply[1]) that applies the patch.
> >   +
> >   See also INCOMPATIBLE OPTIONS below.
> > @@ -520,7 +528,6 @@ The following options:
> >* --committer-date-is-author-date
> >* --ignore-date
> >* --whitespace
> > - * --ignore-whitespace
> >* -C
> >
> >   are incompatible with the following options:
> > @@ -543,6 +550,8 @@ In addition, the following pairs of options are 
> > incompatible:
> >* --preserve-merges and --interactive
> >* --preserve-merges and --signoff
> >* --preserve-merges and --rebase-merges
> > + * --preserve-merges and --ignore-whitespace
> > + * --rebase-merges and --ignore-whitespace
> >* --rebase-merges and --strategy
> >* --rebase-merges and --strategy-option
> >
> > diff --git a/builtin/rebase.c b/builtin/rebase.c
> > index 670096c065..f8a618d54c 100644
> > --- a/builtin/rebase.c
> > +++ b/builtin/rebase.c
> > @@ -79,6 +79,7 @@ struct rebase_options {
> >   int allow_rerere_autoupdate;
> >   int keep_empty;
> >   int autosquash;
> > + int ignore_whitespace;
> >   char *gpg_sign_opt;
> >   int autostash;
> >   char *cmd;
> > @@ -99,6 +100,7 @@ struct rebase_options {
> >
> >   static struct replay_opts get_replay_opts(const struct rebase_options 
> > *opts)
> >   {
> > + struct strbuf strategy_buf = STRBUF_INIT;
> >   struct replay_opts replay = REPLAY_OPTS_INIT;
> >
> >   replay.action = REPLAY_INTERACTIVE_REBASE;
> > @@ -114,9 +116,15 @@ static struct replay_opts get_replay_opts(const struct 
> > rebase_options *opts)
> >   replay.reschedule_failed_exec = opts->reschedule_failed_exec;
> >   replay.gpg_sign = xstrdup_or_null(opts->gpg_sign_opt);
> >   replay.strategy = opts->strategy;
> > +
> >   if (opts->strategy_opts)
> > - parse_strategy_opts(&replay, opts->strategy_opts);
> > + strbuf_addstr(&strategy_buf, opts->strategy_opts);
> > + if (opts->ignore_whitespace)
> > + strbuf_addstr(&strategy_buf, " --ignore-space-change");
> > + if (strategy_buf.len)
> > + parse_strategy_opts(&replay, strategy_buf.buf);
> >
> > + strbuf_release(&strategy_buf);
> >   return replay;
> >   }
> >
> > @@ -511,6 +519,8 @@ int cmd_rebase__interactive(int argc, const char 
> > **argv, const char *prefix)
> >   argc = parse_options(argc, argv, prefix, options,
> >   builtin_rebase_interactive_usage, 
> > PARSE_OPT_KEEP_ARGV0);
> >
> > + opts.strategy_opts = xstrdup_or_null(opts.strategy_opts);
> > +
> >   if (!is_null_oid(&squash_onto))
> >   opts.squash_onto = &squash_onto;
> >
> > @@ -964,6 +974,8 @@ static int run_am(struct rebase_options *opts)
> >   am.git_cmd = 1;
> >   argv_array_push(&am.args, "am");
> >
> > + if (opts->ignore_whitespace)
> > + argv_array_push(&am.args, "--ignore-whitespace");
> >   if (opts->action && !strcmp("continue", opts->action)) {
> >   argv_array_push(&am.args, "--resolved");
> >   argv_array_pushf(&am.args, "--resolvemsg=%s", resolvemsg);
> > @@ -1407,9 +1419,6 @@ int cmd_rebase(int argc, const char **argv, co

Re: What's cooking in git.git (Oct 2019, #01; Thu, 3)

2019-10-04 Thread Elijah Newren

On Fri, Oct 4, 2019 at 4:49 AM Phillip Wood  wrote:
>
> Hi Junio
>
> On 03/10/2019 06:04, Junio C Hamano wrote:
> > Here are the topics that have been cooking.  Commits prefixed with
> > '-' are only in 'pu' (proposed updates) while commits prefixed with
> > '+' are in 'next'.  The ones marked with '.' do not appear in any of
> > the integration branches, but I am still holding onto them.
> > [...]
> >
> >
> > * pw/rebase-i-show-HEAD-to-reword (2019-08-19) 3 commits
> >   - sequencer: simplify root commit creation
> >   - rebase -i: check for updated todo after squash and reword
> >   - rebase -i: always update HEAD before rewording
> >   (this branch is used by ra/rebase-i-more-options.)
> >
> >   "git rebase -i" showed a wrong HEAD while "reword" open the editor.
> >
> >   Will merge to 'next'.
>
> That's great, thanks
>
> >
> > * ra/rebase-i-more-options (2019-09-09) 6 commits
> >   - rebase: add --reset-author-date
> >   - rebase -i: support --ignore-date
> >   - sequencer: rename amend_author to author_to_rename
> >   - rebase -i: support --committer-date-is-author-date
> >   - sequencer: allow callers of read_author_script() to ignore fields
> >   - rebase -i: add --ignore-whitespace flag
> >   (this branch uses pw/rebase-i-show-HEAD-to-reword.)
> >
> >   "git rebase -i" learned a few options that are known by "git
> >   rebase" proper.
> >
> >   Is this ready for 'next'.
>
> Nearly, but not quite I think cf [1]. Also I'm still not convinced that
> having different behaviors for --ignore-whitespace depending on the
> backend is going to be helpful but maybe they are close enough not to
> matter too much in practice [2].

Sorry I should have chimed in sooner; I can speak to the second point.
I would say that in practice it doesn't matter a lot; in most cases
the two overlap.  Both am's --ignore-whitespace and merge's
-Xignore-space-change are buggy (in different ways) and should be
fixed, but I'd consider them both to be buggy in edge cases.  I
recommended earlier this summer that Rohit submit the patches without
first attempting to fix apply or xdiff, and kept in my TODO list
that'd I'd go in and fix xdiff later if Rohit didn't have extra time
for it.  I did a little digging back then to find out the differences
and suggested some text to use to explain them and to argue that they
shouldn't block this feature:

"""
am's --ignore-space-change (an alias for am's --ignore-whitespace; see
git-apply's description of those two flags) not only share the same
name with diff's --ignore-space-change and merge's
-Xignore-space-change, but the similarity in naming appears to have
been intentional with am's --ignore-space-change and merge's
-Xignore-space-change being designed to have the same functionality
(see e.g. the commit messages for f008cef4abb2 ("Merge branch
'jc/apply-ignore-whitespace'", 2014-06-03) and 4e5dd044c62f
("merge-recursive: options to ignore whitespace changes",
2010-08-26)).  For the most part, these options do provide the same
behavior.  However, there are some edge cases where both apply's
--ignore-space-change and merge's -Xignore-space-change fall short of
optimal behavior, and in different ways.  In particular,
--ignore-space-change for apply will handle whitespace changes in the
context region but not in the region the other side modified, and
-Xignore-space-change will delete whitespace changes even when the
other side had no changes (thus treating both sides as unmodified).
Fixing these differences in edge cases is left for future work; this
patch simply wires interactive rebase to also understand
--ignore-whitespace by translating it to -Xignore-space-change.
"""

I've got another email with even more detail if folks need it.

Re: What's cooking in git.git (Oct 2019, #01; Thu, 3)

2019-10-03 Thread Elijah Newren

On Wed, Oct 2, 2019 at 10:22 PM Junio C Hamano  wrote:
>
> * en/fast-imexport-nested-tags (2019-10-02) 8 commits
>  - fast-export: handle nested tags
>  - t9350: add tests for tags of things other than a commit
>  - fast-export: allow user to request tags be marked with --mark-tags
>  - fast-export: add support for --import-marks-if-exists
>  - fast-import: add support for new 'alias' command
>  - fast-import: allow tags to be identified by mark labels
>  - fast-import: fix handling of deleted tags
>  - fast-export: fix exporting a tag and nothing else
>
>  Updates to fast-import/export.
>
>  Will merge to 'next'.

Actually, René posted a code cleanup suggestion for patch 2/8, so I
sent a V3 re-roll[1].  Could you pick up V3 instead of merging V2 down
to next?

[1] https://public-inbox.org/git/20191003202709.26279-1-new...@gmail.com/

[PATCH -v3 8/8] fast-export: handle nested tags

2019-10-03 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 30 ++
 t/t9350-fast-export.sh |  2 +-
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index d32e1e9327..58a74de42a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -843,22 +843,28 @@ static void handle_tag(const char *name, struct tag *tag)
free(buf);
return;
case REWRITE:
-   if (tagged->type != OBJ_COMMIT) {
-   die("tag %s tags unexported %s!",
-   oid_to_hex(&tag->object.oid),
-   type_name(tagged->type));
-   }
-   p = rewrite_commit((struct commit *)tagged);
-   if (!p) {
-   printf("reset %s\nfrom %s\n\n",
-  name, oid_to_hex(&null_oid));
-   free(buf);
-   return;
+   if (tagged->type == OBJ_TAG && !mark_tags) {
+   die(_("Error: Cannot export nested tags unless 
--mark-tags is specified."));
+   } else if (tagged->type == OBJ_COMMIT) {
+   p = rewrite_commit((struct commit *)tagged);
+   if (!p) {
+   printf("reset %s\nfrom %s\n\n",
+  name, oid_to_hex(&null_oid));
+   free(buf);
+   return;
+   }
+   tagged_mark = get_object_mark(&p->object);
+   } else {
+   /* tagged->type is either OBJ_BLOB or OBJ_TAG */
+   tagged_mark = get_object_mark(tagged);
}
-   tagged_mark = get_object_mark(&p->object);
}
}
 
+   if (tagged->type == OBJ_TAG) {
+   printf("reset %s\nfrom %s\n\n",
+  name, oid_to_hex(&null_oid));
+   }
if (starts_with(name, "refs/tags/"))
name += 10;
printf("tag %s\n", name);
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 9ab281e4b9..2e4e214815 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -567,7 +567,7 @@ test_expect_success 'handling tags of blobs' '
test_cmp expect actual
 '
 
-test_expect_failure 'handling nested tags' '
+test_expect_success 'handling nested tags' '
git tag -a -m "This is a nested tag" nested muss &&
git fast-export --mark-tags nested >output &&
grep "^from $ZERO_OID$" output &&
-- 
2.23.0.264.g3b9f7f2fc6

[PATCH -v3 5/8] fast-export: add support for --import-marks-if-exists

2019-10-03 Thread Elijah Newren

fast-import has support for both an --import-marks flag and an
--import-marks-if-exists flag; the latter of which will not die() if the
file does not exist.  fast-export only had support for an --import-marks
flag; add an --import-marks-if-exists flag for consistency.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 23 +++
 t/t9350-fast-export.sh | 10 --
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 5822271c6b..575e47833b 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -1052,11 +1052,16 @@ static void export_marks(char *file)
error("Unable to write marks file %s.", file);
 }
 
-static void import_marks(char *input_file)
+static void import_marks(char *input_file, int check_exists)
 {
char line[512];
-   FILE *f = xfopen(input_file, "r");
+   FILE *f;
+   struct stat sb;
+
+   if (check_exists && stat(input_file, &sb))
+   return;
 
+   f = xfopen(input_file, "r");
while (fgets(line, sizeof(line), f)) {
uint32_t mark;
char *line_end, *mark_end;
@@ -1120,7 +1125,9 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
struct rev_info revs;
struct object_array commits = OBJECT_ARRAY_INIT;
struct commit *commit;
-   char *export_filename = NULL, *import_filename = NULL;
+   char *export_filename = NULL,
+*import_filename = NULL,
+*import_filename_if_exists = NULL;
uint32_t lastimportid;
struct string_list refspecs_list = STRING_LIST_INIT_NODUP;
struct string_list paths_of_changed_objects = STRING_LIST_INIT_DUP;
@@ -1140,6 +1147,10 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
 N_("Dump marks to this file")),
OPT_STRING(0, "import-marks", &import_filename, N_("file"),
 N_("Import marks from this file")),
+   OPT_STRING(0, "import-marks-if-exists",
+&import_filename_if_exists,
+N_("file"),
+N_("Import marks from this file if it exists")),
OPT_BOOL(0, "fake-missing-tagger", &fake_missing_tagger,
 N_("Fake a tagger when tags lack one")),
OPT_BOOL(0, "full-tree", &full_tree,
@@ -1187,8 +1198,12 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
if (use_done_feature)
printf("feature done\n");
 
+   if (import_filename && import_filename_if_exists)
+   die(_("Cannot pass both --import-marks and 
--import-marks-if-exists"));
if (import_filename)
-   import_marks(import_filename);
+   import_marks(import_filename, 0);
+   else if (import_filename_if_exists)
+   import_marks(import_filename_if_exists, 1);
lastimportid = last_idnum;
 
if (import_filename && revs.prune_data.nr)
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index d32ff41859..ea84e2f173 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -580,17 +580,15 @@ test_expect_success 'fast-export quotes pathnames' '
 '
 
 test_expect_success 'test bidirectionality' '
-   >marks-cur &&
-   >marks-new &&
git init marks-test &&
-   git fast-export --export-marks=marks-cur --import-marks=marks-cur 
--branches | \
-   git --git-dir=marks-test/.git fast-import --export-marks=marks-new 
--import-marks=marks-new &&
+   git fast-export --export-marks=marks-cur 
--import-marks-if-exists=marks-cur --branches | \
+   git --git-dir=marks-test/.git fast-import --export-marks=marks-new 
--import-marks-if-exists=marks-new &&
(cd marks-test &&
git reset --hard &&
echo Wohlauf > file &&
git commit -a -m "back in time") &&
-   git --git-dir=marks-test/.git fast-export --export-marks=marks-new 
--import-marks=marks-new --branches | \
-   git fast-import --export-marks=marks-cur --import-marks=marks-cur
+   git --git-dir=marks-test/.git fast-export --export-marks=marks-new 
--import-marks-if-exists=marks-new --branches | \
+   git fast-import --export-marks=marks-cur 
--import-marks-if-exists=marks-cur
 '
 
 cat > expected << EOF
-- 
2.23.0.264.g3b9f7f2fc6

[PATCH -v3 7/8] t9350: add tests for tags of things other than a commit

2019-10-03 Thread Elijah Newren

Multiple changes here:
  * add a test for a tag of a blob
  * add a test for a tag of a tag of a commit
  * add a comment to the tests for (possibly nested) tags of trees,
making it clear that these tests are doing much less than you might
expect

Signed-off-by: Elijah Newren 
---
 t/t9350-fast-export.sh | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index b3fca6ffba..9ab281e4b9 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -540,10 +540,41 @@ test_expect_success 'tree_tag''
 '
 
 # NEEDSWORK: not just check return status, but validate the output
+# Note that these tests DO NOTHING other than print a warning that
+# they are ommitting the one tag we asked them to export (because the
+# tags resolve to a tree).  They exist just to make sure we do not
+# abort but instead just warn.
 test_expect_success 'tree_tag-obj''git fast-export tree_tag-obj'
 test_expect_success 'tag-obj_tag' 'git fast-export tag-obj_tag'
 test_expect_success 'tag-obj_tag-obj' 'git fast-export tag-obj_tag-obj'
 
+test_expect_success 'handling tags of blobs' '
+   git tag -a -m "Tag of a blob" blobtag $(git rev-parse master:file) &&
+   git fast-export blobtag >actual &&
+   cat >expect <<-EOF &&
+   blob
+   mark :1
+   data 9
+   die Luft
+
+   tag blobtag
+   from :1
+   tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+   data 14
+   Tag of a blob
+
+   EOF
+   test_cmp expect actual
+'
+
+test_expect_failure 'handling nested tags' '
+   git tag -a -m "This is a nested tag" nested muss &&
+   git fast-export --mark-tags nested >output &&
+   grep "^from $ZERO_OID$" output &&
+   grep "^tag nested$" output >tag_lines &&
+   test_line_count = 2 tag_lines
+'
+
 test_expect_success 'directory becomes symlink''
git init dirtosymlink &&
git init result &&
-- 
2.23.0.264.g3b9f7f2fc6

[PATCH -v3 3/8] fast-import: allow tags to be identified by mark labels

2019-10-03 Thread Elijah Newren

Mark identifiers are used in fast-export and fast-import to provide a
label to refer to earlier content.  Blobs are given labels because they
need to be referenced in the commits where they first appear with a
given filename, and commits are given labels because they can be the
parents of other commits.  Tags were never given labels, probably
because they were viewed as unnecessary, but that presents two problems:

   1. It leaves us without a way of referring to previous tags if we
  want to create a tag of a tag (or higher nestings).
   2. It leaves us with no way of recording that a tag has already been
  imported when using --export-marks and --import-marks.

Fix these problems by allowing an optional mark label for tags.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-import.txt |  1 +
 fast-import.c |  3 ++-
 t/t9300-fast-import.sh| 19 +++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index 0bb276269e..4977869465 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -774,6 +774,7 @@ lightweight (non-annotated) tags see the `reset` command 
below.
 
 
'tag' SP  LF
+   mark?
'from' SP  LF
original-oid?
'tagger' (SP )? SP LT  GT SP  LF
diff --git a/fast-import.c b/fast-import.c
index caae0819f5..5b9e9e3b02 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2713,6 +2713,7 @@ static void parse_new_tag(const char *arg)
first_tag = t;
last_tag = t;
read_next_command();
+   parse_mark();
 
/* from ... */
if (!skip_prefix(command_buf.buf, "from ", &from))
@@ -2769,7 +2770,7 @@ static void parse_new_tag(const char *arg)
strbuf_addbuf(&new_data, &msg);
free(tagger);
 
-   if (store_object(OBJ_TAG, &new_data, NULL, &t->oid, 0))
+   if (store_object(OBJ_TAG, &new_data, NULL, &t->oid, next_mark))
t->pack_id = MAX_PACK_ID;
else
t->pack_id = pack_id;
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 74bc41333b..3ad2b2f1ba 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -94,6 +94,23 @@ test_expect_success 'A: create pack from stdin' '
reset refs/tags/to-be-deleted
from 
 
+   tag nested
+   mark :6
+   from :4
+   data <

[PATCH -v3 6/8] fast-export: allow user to request tags be marked with --mark-tags

2019-10-03 Thread Elijah Newren

Add a new option, --mark-tags, which will output mark identifiers with
each tag object.  This improves the incremental export story with
--export-marks since it will allow us to record that annotated tags have
been exported, and it is also needed as a step towards supporting nested
tags.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 17 +
 builtin/fast-export.c |  7 +++
 t/t9350-fast-export.sh| 14 ++
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index cc940eb9ad..c522b34f7b 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -75,11 +75,20 @@ produced incorrect results if you gave these options.
Before processing any input, load the marks specified in
.  The input file must exist, must be readable, and
must use the same format as produced by --export-marks.
+
+--mark-tags::
+   In addition to labelling blobs and commits with mark ids, also
+   label tags.  This is useful in conjunction with
+   `--export-marks` and `--import-marks`, and is also useful (and
+   necessary) for exporting of nested tags.  It does not hurt
+   other cases and would be the default, but many fast-import
+   frontends are not prepared to accept tags with mark
+   identifiers.
 +
-Any commits that have already been marked will not be exported again.
-If the backend uses a similar --import-marks file, this allows for
-incremental bidirectional exporting of the repository by keeping the
-marks the same across runs.
+Any commits (or tags) that have already been marked will not be
+exported again.  If the backend uses a similar --import-marks file,
+this allows for incremental bidirectional exporting of the repository
+by keeping the marks the same across runs.
 
 --fake-missing-tagger::
Some old repositories have tags without a tagger.  The
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 575e47833b..d32e1e9327 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -40,6 +40,7 @@ static int no_data;
 static int full_tree;
 static int reference_excluded_commits;
 static int show_original_ids;
+static int mark_tags;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
@@ -861,6 +862,10 @@ static void handle_tag(const char *name, struct tag *tag)
if (starts_with(name, "refs/tags/"))
name += 10;
printf("tag %s\n", name);
+   if (mark_tags) {
+   mark_next_object(&tag->object);
+   printf("mark :%"PRIu32"\n", last_idnum);
+   }
if (tagged_mark)
printf("from :%d\n", tagged_mark);
else
@@ -1165,6 +1170,8 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
 &reference_excluded_commits, N_("Reference parents 
which are not in fast-export stream by object id")),
OPT_BOOL(0, "show-original-ids", &show_original_ids,
N_("Show original object ids of blobs/commits")),
+   OPT_BOOL(0, "mark-tags", &mark_tags,
+   N_("Label tags with mark ids")),
 
OPT_END()
};
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index ea84e2f173..b3fca6ffba 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -66,6 +66,20 @@ test_expect_success 'fast-export ^muss^{commit} muss' '
test_cmp expected actual
 '
 
+test_expect_success 'fast-export --mark-tags ^muss^{commit} muss' '
+   git fast-export --mark-tags --tag-of-filtered-object=rewrite 
^muss^{commit} muss >actual &&
+   cat >expected <<-EOF &&
+   tag muss
+   mark :1
+   from $(git rev-parse --verify muss^{commit})
+   $(git cat-file tag muss | grep tagger)
+   data 9
+   valentin
+
+   EOF
+   test_cmp expected actual
+'
+
 test_expect_success 'fast-export master~2..master' '
 
git fast-export master~2..master >actual &&
-- 
2.23.0.264.g3b9f7f2fc6

[PATCH -v3 2/8] fast-import: fix handling of deleted tags

2019-10-03 Thread Elijah Newren

If our input stream includes a tag which is later deleted, we were not
properly deleting it.  We did have a step which would delete it, but we
left a tag in the tag list noting that it needed to be updated, and the
updating of annotated tags occurred AFTER ref deletion.  So, when we
record that a tag needs to be deleted, also remove it from the list of
annotated tags to update.

While this has likely been something that has not happened in practice,
it will come up more in order to support nested tags.  For nested tags,
we either need to give temporary names to the intermediate tags and then
delete them, or else we need to use the final name for the intermediate
tags.  If we use the final name for the intermediate tags, then in order
to keep the sanity check that someone doesn't try to update the same tag
twice, we need to delete the ref after creating the intermediate tag.
So, either way nested tags imply the need to delete temporary inner tag
references.

Helped-by: René Scharfe 
Signed-off-by: Elijah Newren 
---
 fast-import.c  | 27 +++
 t/t9300-fast-import.sh | 13 +
 2 files changed, 40 insertions(+)

diff --git a/fast-import.c b/fast-import.c
index b44d6a467e..caae0819f5 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2778,6 +2778,7 @@ static void parse_new_tag(const char *arg)
 static void parse_reset_branch(const char *arg)
 {
struct branch *b;
+   const char *tag_name;
 
b = lookup_branch(arg);
if (b) {
@@ -2793,6 +2794,32 @@ static void parse_reset_branch(const char *arg)
b = new_branch(arg);
read_next_command();
parse_from(b);
+   if (b->delete && skip_prefix(b->name, "refs/tags/", &tag_name)) {
+   /*
+* Elsewhere, we call dump_branches() before dump_tags(),
+* and dump_branches() will handle ref deletions first, so
+* in order to make sure the deletion actually takes effect,
+* we need to remove the tag from our list of tags to update.
+*
+* NEEDSWORK: replace list of tags with hashmap for faster
+* deletion?
+*/
+   struct tag *t, *prev = NULL;
+   for (t = first_tag; t; t = t->next_tag) {
+   if (!strcmp(t->name, tag_name))
+   break;
+   prev = t;
+   }
+   if (t) {
+   if (prev)
+   prev->next_tag = t->next_tag;
+   else
+   first_tag = t->next_tag;
+   if (!t->next_tag)
+   last_tag = prev;
+   /* There is no mem_pool_free(t) function to call. */
+   }
+   }
if (command_buf.len > 0)
unread_command_buf = 1;
 }
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 141b7fa35e..74bc41333b 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -85,6 +85,15 @@ test_expect_success 'A: create pack from stdin' '
An annotated tag that annotates a blob.
EOF
 
+   tag to-be-deleted
+   from :3
+   data <expect <<-EOF &&
:2 $(git rev-parse --verify master:file2)
-- 
2.23.0.264.g3b9f7f2fc6

[PATCH -v3 4/8] fast-import: add support for new 'alias' command

2019-10-03 Thread Elijah Newren

fast-export and fast-import have nice --import-marks flags which allow
for incremental migrations.  However, if there is a mark in
fast-export's file of marks without a corresponding mark in the one for
fast-import, then we run the risk that fast-export tries to send new
objects relative to the mark it knows which fast-import does not,
causing fast-import to fail.

This arises in practice when there is a filter of some sort running
between the fast-export and fast-import processes which prunes some
commits programmatically.  Provide such a filter with the ability to
alias pruned commits to their most recent non-pruned ancestor.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-import.txt | 22 +++
 fast-import.c | 62 ++-
 t/t9300-fast-import.sh|  5 +++
 3 files changed, 79 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index 4977869465..a3f1e0c5e4 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -337,6 +337,13 @@ and control the current import process.  More detailed 
discussion
`commit` command.  This command is optional and is not
needed to perform an import.
 
+`alias`::
+   Record that a mark refers to a given object without first
+   creating any new object.  Using --import-marks and referring
+   to missing marks will cause fast-import to fail, so aliases
+   can provide a way to set otherwise pruned commits to a valid
+   value (e.g. the nearest non-pruned ancestor).
+
 `checkpoint`::
Forces fast-import to close the current packfile, generate its
unique SHA-1 checksum and index, and start a new packfile.
@@ -914,6 +921,21 @@ a data chunk which does not have an LF as its last byte.
 +
 The `LF` after ` LF` is optional (it used to be required).
 
+`alias`
+~~~
+Record that a mark refers to a given object without first creating any
+new object.
+
+
+   'alias' LF
+   mark
+   'to' SP  LF
+   LF?
+
+
+For a detailed description of `` see above under `from`.
+
+
 `checkpoint`
 
 Forces fast-import to close the current packfile, start a new one, and to
diff --git a/fast-import.c b/fast-import.c
index 5b9e9e3b02..ac368b3e2b 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2491,18 +2491,14 @@ static void parse_from_existing(struct branch *b)
}
 }
 
-static int parse_from(struct branch *b)
+static int parse_objectish(struct branch *b, const char *objectish)
 {
-   const char *from;
struct branch *s;
struct object_id oid;
 
-   if (!skip_prefix(command_buf.buf, "from ", &from))
-   return 0;
-
oidcpy(&oid, &b->branch_tree.versions[1].oid);
 
-   s = lookup_branch(from);
+   s = lookup_branch(objectish);
if (b == s)
die("Can't create a branch from itself: %s", b->name);
else if (s) {
@@ -2510,8 +2506,8 @@ static int parse_from(struct branch *b)
oidcpy(&b->oid, &s->oid);
oidcpy(&b->branch_tree.versions[0].oid, t);
oidcpy(&b->branch_tree.versions[1].oid, t);
-   } else if (*from == ':') {
-   uintmax_t idnum = parse_mark_ref_eol(from);
+   } else if (*objectish == ':') {
+   uintmax_t idnum = parse_mark_ref_eol(objectish);
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2525,13 +2521,13 @@ static int parse_from(struct branch *b)
} else
parse_from_existing(b);
}
-   } else if (!get_oid(from, &b->oid)) {
+   } else if (!get_oid(objectish, &b->oid)) {
parse_from_existing(b);
if (is_null_oid(&b->oid))
b->delete = 1;
}
else
-   die("Invalid ref name or SHA1 expression: %s", from);
+   die("Invalid ref name or SHA1 expression: %s", objectish);
 
if (b->branch_tree.tree && !oideq(&oid, 
&b->branch_tree.versions[1].oid)) {
release_tree_content_recursive(b->branch_tree.tree);
@@ -2542,6 +2538,26 @@ static int parse_from(struct branch *b)
return 1;
 }
 
+static int parse_from(struct branch *b)
+{
+   const char *from;
+
+   if (!skip_prefix(command_buf.buf, "from ", &from))
+   return 0;
+
+   return parse_objectish(b, from);
+}
+
+static int parse_objectish_with_prefix(struct branch *b, const char *prefix)
+{
+   const char *base;
+
+   if (!skip_prefix(command_buf.buf, prefix, &base))
+

[PATCH -v3 0/8] fast export/import: handle nested tags, improve incremental exports

2019-10-03 Thread Elijah Newren

This series improves the incremental export story for fast-export and
fast-import (--export-marks and --import-marks fell a bit short),
fixes a couple small export/import bugs, and enables handling nested
tags.  In particular, the nested tags handling makes it so that
fast-export and fast-import can finally handle the git.git repo.

Changes since v2 (full range-diff below):
  - Code cleanup of patch 2 suggested by René

Elijah Newren (8):
  fast-export: fix exporting a tag and nothing else
  fast-import: fix handling of deleted tags
  fast-import: allow tags to be identified by mark labels
  fast-import: add support for new 'alias' command
  fast-export: add support for --import-marks-if-exists
  fast-export: allow user to request tags be marked with --mark-tags
  t9350: add tests for tags of things other than a commit
  fast-export: handle nested tags

 Documentation/git-fast-export.txt | 17 --
 Documentation/git-fast-import.txt | 23 
 builtin/fast-export.c | 67 --
 fast-import.c | 92 +++
 t/t9300-fast-import.sh| 37 +
 t/t9350-fast-export.sh| 68 +--
 6 files changed, 266 insertions(+), 38 deletions(-)

Range-diff:
1:  a30cfbbb50 = 1:  a30cfbbb50 fast-export: fix exporting a tag and nothing 
else
2:  1d19498bc6 ! 2:  36fbf15134 fast-import: fix handling of deleted tags
@@ Commit message
 So, either way nested tags imply the need to delete temporary inner tag
 references.
 
+Helped-by: René Scharfe 
 Signed-off-by: Elijah Newren 
 
  ## fast-import.c ##
+@@ fast-import.c: static void parse_new_tag(const char *arg)
+ static void parse_reset_branch(const char *arg)
+ {
+   struct branch *b;
++  const char *tag_name;
+ 
+   b = lookup_branch(arg);
+   if (b) {
 @@ fast-import.c: static void parse_reset_branch(const char *arg)
b = new_branch(arg);
read_next_command();
parse_from(b);
-+  if (b->delete && !strncmp(b->name, "refs/tags/", 10)) {
++  if (b->delete && skip_prefix(b->name, "refs/tags/", &tag_name)) {
 +  /*
 +   * Elsewhere, we call dump_branches() before dump_tags(),
 +   * and dump_branches() will handle ref deletions first, so
@@ fast-import.c: static void parse_reset_branch(const char *arg)
 +   * NEEDSWORK: replace list of tags with hashmap for faster
 +   * deletion?
 +   */
-+  struct strbuf tag_name = STRBUF_INIT;
 +  struct tag *t, *prev = NULL;
 +  for (t = first_tag; t; t = t->next_tag) {
-+  strbuf_reset(&tag_name);
-+  strbuf_addf(&tag_name, "refs/tags/%s", t->name);
-+  if (!strcmp(b->name, tag_name.buf))
++  if (!strcmp(t->name, tag_name))
 +  break;
 +  prev = t;
 +  }
3:  e1fd888e4a = 3:  3b5f4270f8 fast-import: allow tags to be identified by 
mark labels
4:  93175f28d9 = 4:  489c7fd854 fast-import: add support for new 'alias' command
5:  8c8743395c = 5:  38fd19caee fast-export: add support for 
--import-marks-if-exists
6:  eebc40df33 = 6:  2017b8d9f9 fast-export: allow user to request tags be 
marked with --mark-tags
7:  de39f703c6 = 7:  0efdbb81b1 t9350: add tests for tags of things other than 
a commit
8:  ac739dbb79 = 8:  fe7c27d786 fast-export: handle nested tags
-- 
2.23.0.264.g3b9f7f2fc6

[PATCH -v3 1/8] fast-export: fix exporting a tag and nothing else

2019-10-03 Thread Elijah Newren

fast-export allows specifying revision ranges, which can be used to
export a tag without exporting the commit it tags.  fast-export handled
this rather poorly: it would emit a "from :0" directive.  Since marks
start at 1 and increase, this means it refers to an unknown commit and
fast-import will choke on the input.

When we are unable to look up a mark for the object being tagged, use a
"from $HASH" directive instead to fix this problem.

Note that this is quite similar to the behavior fast-export exhibits
with commits and parents when --reference-excluded-parents is passed
along with an excluded commit range.  For tags of excluded commits we do
not require the --reference-excluded-parents flag because we always have
to tag something.  By contrast, when dealing with commits, pruning a
parent is always a viable option, so we need the flag to specify that
parent pruning is not wanted.  (It is slightly weird that
--reference-excluded-parents isn't the default with a separate
--prune-excluded-parents flag, but backward compatibility concerns
resulted in the current defaults.)

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  |  7 ++-
 t/t9350-fast-export.sh | 13 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index f541f55d33..5822271c6b 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -860,7 +860,12 @@ static void handle_tag(const char *name, struct tag *tag)
 
if (starts_with(name, "refs/tags/"))
name += 10;
-   printf("tag %s\nfrom :%d\n", name, tagged_mark);
+   printf("tag %s\n", name);
+   if (tagged_mark)
+   printf("from :%d\n", tagged_mark);
+   else
+   printf("from %s\n", oid_to_hex(&tagged->oid));
+
if (show_original_ids)
printf("original-oid %s\n", oid_to_hex(&tag->object.oid));
printf("%.*s%sdata %d\n%.*s\n",
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index b4004e05c2..d32ff41859 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -53,6 +53,19 @@ test_expect_success 'fast-export | fast-import' '
 
 '
 
+test_expect_success 'fast-export ^muss^{commit} muss' '
+   git fast-export --tag-of-filtered-object=rewrite ^muss^{commit} muss 
>actual &&
+   cat >expected <<-EOF &&
+   tag muss
+   from $(git rev-parse --verify muss^{commit})
+   $(git cat-file tag muss | grep tagger)
+   data 9
+   valentin
+
+   EOF
+   test_cmp expected actual
+'
+
 test_expect_success 'fast-export master~2..master' '
 
git fast-export master~2..master >actual &&
-- 
2.23.0.264.g3b9f7f2fc6

Re: [PATCH v2 0/8] fast export/import: handle nested tags, improve incremental exports

2019-10-02 Thread Elijah Newren

On Wed, Oct 2, 2019 at 1:10 PM Junio C Hamano  wrote:
>
> Elijah Newren  writes:
>
> > I see you picked up this corrected series in pu; thanks.  However,
> > your merge commit, 2a99d6b6ff7c ("Merge branch
> > 'en/fast-imexport-nested-tags' into pu", 2019-10-02), claims "Seems to
> > break t9300 when merged to 'pu'.".  I know v1 did that and I could
> > reproduce, but I can't reproduce any failures here.  Was this message
> > just left over or is there some problem you are seeing?
>
> I thought that the latest What's cooking written after you sent the
> corrected version hasn't been sent yet.
>
> And the draft copy I locally have for the next issue of what's cooking
> has the comment removed already.
>
> Anything I missed?

I was looking at the commit message for the merge commit where you
merged in v2 of the series, and was surprised to see the commit
message claim there was still breakage with the new version of the
series:

$ git log -1 2a99d6b6ff7c
commit 2a99d6b6ff7cd29cc41d587ae737ffb8e7babea4
Merge: 7d81b61dd0 0e2eeb4cb4
Author: Junio C Hamano 
Date:   Wed Oct 2 16:08:06 2019 +0900

Merge branch 'en/fast-imexport-nested-tags' into pu

Updates to fast-import/export.

Seems to break t9300 when merged to 'pu'.

* en/fast-imexport-nested-tags:
  fast-export: handle nested tags
  t9350: add tests for tags of things other than a commit
  fast-export: allow user to request tags be marked with --mark-tags
  fast-export: add support for --import-marks-if-exists
  fast-import: add support for new 'alias' command
  fast-import: allow tags to be identified by mark labels
  fast-import: fix handling of deleted tags
  fast-export: fix exporting a tag and nothing else

Re: git-grep in sparse checkout

2019-10-02 Thread Elijah Newren

On Tue, Oct 1, 2019 at 11:33 PM Junio C Hamano  wrote:
>
> Elijah Newren  writes:
>
> > * other commands (archive, bisect, clean?, gitk, shortlog, blame,
> > fsck?, etc.) likely need to pay attention to sparsity patterns as
> > well, but there are some special cases:
>
> "git archive" falls into the same class as fast-(im|ex)port; it
> should ignore the sparse cone by default.  I suspect you threw
> "fsck" as a joke, but I do not think it should pay attention to the
> sparse cone, either (besides, most of the time in fsck the objects
> subject to checking do not know all the paths that reach them).

archive in the same category as fast-(im|ex)port makes sense.  I'm not
sure if "ignore the sparse cone" by default makes sense or if it
should be a case where we error out if --ignore-sparsity-patterns
isn't specified, especially if history is also sparse.

In terms of fsck, I agree that if history is dense and the worktree is
sparse that you want to walk all history.  I was thinking further
along the lines when partial clones and sparse checkouts are combined
so that history is also sparse.  In cases where a partial clone is in
use, rather than download everything in order to walk it, wouldn't it
make more sense to have fsck walk over the bits that are already
downloaded?  I don't really know how that'd all work, but it seems
that if fsck walked over all history it'd be treated as a
useless/dangerous command by those who are doing partial clones
because the repo is just too big.

> > * merge, cherry-pick, and rebase (anything touching the merge
> > machinery) will need to expand the size of the non-sparse worktree if
> > there are files outside the sparsity patterns with conflicts.  (Though
> > merge should do a better job of not expanding the non-sparse worktree
> > when files can cleanly be resolved.)
>
> I think the important point is what is done to the result of
> operation.  Result of these operations that create new commits are
> meant to be consumed by other people, who may not share your
> definition of sparse cone.  And such a command (i.e. those whose
> results are consumed by others who may have different sparse cone)
> must be full-tree by default.
>
> > * fast-export and format-patch are not about viewing history but about
> > exporting it, and limiting to sparsity patterns would result in the
> > creation of an incompatible history.
>
> I agree with the conclusion; see above.
>
> > * New worktrees, by default, should copy the sparsity-patterns of the
> > worktree they were created from (much like a new shell inherits the
> > current working directory of it's parent process)
>
> Sorry, but I do not share this view at all.
>
> In my mental model, "worktree new" is to attach a brand-new worktree
> to a bare repository that underlies the existing worktree I happen
> to be in, and that existing worktree that I happen to type "worktree
> new" in is no more or no less special than other worktrees.
>
> The above isn't to say that I'd veto your "a new worktree inherits
> traits from an existing worktree that 'git worktree add' was invoked
> in" idea.  I am just saying that I have a problem with that mode of
> operation and mental model being the default.

If worktrees are the only area we disagree on, then I'll happily take
the stuff we agree on and can overlook this piece.

But, perhaps some further explaining on worktrees might help us reach
some middle ground. If worktrees are dense by default and folks have
not only sparse checkouts but sparse history, then creating a new
worktree would suddenly mandate downloading a lot more of history --
which could be prohibitively expensive, forcing people to instead have
N clones without any shared history.  That may be fine (I tend to not
be a heavy worktree user, I just support some users who are), but is
it the route we want to push people with big repos towards?

Thanks for the feedback on the ideas,
Elijah

Re: [PATCH v2 0/8] fast export/import: handle nested tags, improve incremental exports

2019-10-02 Thread Elijah Newren

Hi Junio,

On Mon, Sep 30, 2019 at 2:10 PM Elijah Newren  wrote:
>
> This series improves the incremental export story for fast-export and
> fast-import (--export-marks and --import-marks fell a bit short),
> fixes a couple small export/import bugs, and enables handling nested
> tags.  In particular, the nested tags handling makes it so that
> fast-export and fast-import can finally handle the git.git repo.

I see you picked up this corrected series in pu; thanks.  However,
your merge commit, 2a99d6b6ff7c ("Merge branch
'en/fast-imexport-nested-tags' into pu", 2019-10-02), claims "Seems to
break t9300 when merged to 'pu'.".  I know v1 did that and I could
reproduce, but I can't reproduce any failures here.  Was this message
just left over or is there some problem you are seeing?

Thanks,
Elijah

>
> Changes since v1 (full range-diff below):
>   - Fixed an issue integrating with next/pu (in particular, with
> jk/fast-import-history-bugfix)
>
> Elijah Newren (8):
>   fast-export: fix exporting a tag and nothing else
>   fast-import: fix handling of deleted tags
>   fast-import: allow tags to be identified by mark labels
>   fast-import: add support for new 'alias' command
>   fast-export: add support for --import-marks-if-exists
>   fast-export: allow user to request tags be marked with --mark-tags
>   t9350: add tests for tags of things other than a commit
>   fast-export: handle nested tags
>
>  Documentation/git-fast-export.txt | 17 --
>  Documentation/git-fast-import.txt | 23 
>  builtin/fast-export.c | 67 --
>  fast-import.c | 94 +++
>  t/t9300-fast-import.sh| 37 
>  t/t9350-fast-export.sh| 68 --
>  6 files changed, 268 insertions(+), 38 deletions(-)
>
> Range-diff:
> 1:  b751d6c2d6 ! 1:  1d19498bc6 fast-import: fix handling of deleted tags
> @@ fast-import.c: static void parse_reset_branch(const char *arg)
> b = new_branch(arg);
> read_next_command();
> parse_from(b);
> -+  if (b->delete && !strncmp(arg, "refs/tags/", 10)) {
> ++  if (b->delete && !strncmp(b->name, "refs/tags/", 10)) {
>  +  /*
>  +   * Elsewhere, we call dump_branches() before dump_tags(),
>  +   * and dump_branches() will handle ref deletions first, so
> @@ fast-import.c: static void parse_reset_branch(const char *arg)
>  +  for (t = first_tag; t; t = t->next_tag) {
>  +  strbuf_reset(&tag_name);
>  +  strbuf_addf(&tag_name, "refs/tags/%s", t->name);
> -+  if (!strcmp(arg, tag_name.buf))
> ++  if (!strcmp(b->name, tag_name.buf))
>  +  break;
>  +  prev = t;
>  +  }
> 2:  26b77dde15 = 2:  e1fd888e4a fast-import: allow tags to be identified by 
> mark labels
> 3:  e0d1a1d7aa = 3:  93175f28d9 fast-import: add support for new 'alias' 
> command
> 4:  edea892661 = 4:  8c8743395c fast-export: add support for 
> --import-marks-if-exists
> 5:  6af7e1fdd0 = 5:  eebc40df33 fast-export: allow user to request tags be 
> marked with --mark-tags
> 6:  631ae9a63e = 6:  de39f703c6 t9350: add tests for tags of things other 
> than a commit
> 7:  c0e932e4da = 7:  ac739dbb79 fast-export: handle nested tags
> --
> 2.23.0.264.gac739dbb79
>

Re: [PATCH v3] dir: special case check for the possibility that pathspec is NULL

2019-10-02 Thread Elijah Newren

On Tue, Oct 1, 2019 at 12:39 PM Elijah Newren  wrote:
>
> On Tue, Oct 1, 2019 at 12:35 PM Denton Liu  wrote:
> >
> > Hi Elijah,
> >
> > Sorry for dragging out this thread for so long...
> >
> > On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:
> >
> > [...]
> >
> > > diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> > > index 192c94eccd..a840919967 100755
> > > --- a/t/t0050-filesystem.sh
> > > +++ b/t/t0050-filesystem.sh
> > > @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' 
> > > '
> >
> > I had to change the 25 to a 24 for this to apply cleanly.
> >
> > >   git merge topic
> > >  '
> > >
> > > +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a 
> > > case insensitive fs' '
> > > + git init repo &&
> > > + (
> > > + cd repo &&
> > > +
> > > + >Gitweb &&
> > > + git add Gitweb &&
> > > + git commit -m "add Gitweb" &&
> > > +
> > > + git checkout --orphan todo &&
> > > + git reset --hard &&
> > > + mkdir -p gitweb/subdir &&
> > > + >gitweb/subdir/file &&
> > > + git add gitweb &&
> > > + git commit -m "add gitweb/subdir/file" &&
> > > +
> > > + git checkout master
> > > + )
> > > +'
> > > +
> > >  test_done
> >
> > Just wondering, how did you generate this patch? Did you manually edit
> > the last patch and resend it or is this a bug in our diff machinery?
>
> I manually edited because it "was so simple" and of course just
> compounded the problem because I didn't fix the count, as you pointed
> out.  Gah.  Thanks for checking.  Clearly, I'm bouncing between too
> many things this morning, and need to wait until I'm not so distracted
> and rushing so I don't mess things up.  I'll sound out a v4 in a few
> hours when I've cleaned a few other things off my plate.

I was going to send out a new version this morning, but it looks like
Junio already picked up the patch and fixed it up (the tip of
en/clean-nested-with-ignored already has what we want), so I won't
resend after all.  Thanks Denton, SZEDER, and Junio.

Re: [PATCH v3] dir: special case check for the possibility that pathspec is NULL

2019-10-01 Thread Elijah Newren

On Tue, Oct 1, 2019 at 12:35 PM Denton Liu  wrote:
>
> Hi Elijah,
>
> Sorry for dragging out this thread for so long...
>
> On Tue, Oct 01, 2019 at 11:55:24AM -0700, Elijah Newren wrote:
>
> [...]
>
> > diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> > index 192c94eccd..a840919967 100755
> > --- a/t/t0050-filesystem.sh
> > +++ b/t/t0050-filesystem.sh
> > @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
>
> I had to change the 25 to a 24 for this to apply cleanly.
>
> >   git merge topic
> >  '
> >
> > +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a 
> > case insensitive fs' '
> > + git init repo &&
> > + (
> > + cd repo &&
> > +
> > + >Gitweb &&
> > + git add Gitweb &&
> > + git commit -m "add Gitweb" &&
> > +
> > + git checkout --orphan todo &&
> > + git reset --hard &&
> > + mkdir -p gitweb/subdir &&
> > + >gitweb/subdir/file &&
> > + git add gitweb &&
> > + git commit -m "add gitweb/subdir/file" &&
> > +
> > + git checkout master
> > + )
> > +'
> > +
> >  test_done
>
> Just wondering, how did you generate this patch? Did you manually edit
> the last patch and resend it or is this a bug in our diff machinery?

I manually edited because it "was so simple" and of course just
compounded the problem because I didn't fix the count, as you pointed
out.  Gah.  Thanks for checking.  Clearly, I'm bouncing between too
many things this morning, and need to wait until I'm not so distracted
and rushing so I don't mess things up.  I'll sound out a v4 in a few
hours when I've cleaned a few other things off my plate.

[PATCH v3] dir: special case check for the possibility that pathspec is NULL

2019-10-01 Thread Elijah Newren

Commits 404ebceda01c ("dir: also check directories for matching
pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
match files under a dir, recurse into it", 2019-09-17) added calls to
match_pathspec() and do_match_pathspec() passing along their pathspec
parameter.  Both match_pathspec() and do_match_pathspec() assume the
pathspec argument they are given is non-NULL.  It turns out that
unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
pathspec == NULL, and it is possible on case insensitive filesystems for
that NULL to make it to these new calls to match_pathspec() and
do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
to avoid a segfault.

In case the negation throws anyone off (one of the calls was to
do_match_pathspec() while the other was to !match_pathspec(), yet no
negation of the NULLness of pathspec is used), there are two ways to
understand the differences:
  * The code already handled the pathspec == NULL cases before this
series, and this series only tried to change behavior when there was
a pathspec, thus we only want to go into the if-block if pathspec is
non-NULL.
  * One of the calls is for whether to recurse into a subdirectory, the
other is for after we've recursed into it for whether we want to
remove the subdirectory itself (i.e. the subdirectory didn't match
but something under it could have).  That difference in situation
leads to the slight differences in logic used (well, that and the
slightly unusual fact that we don't want empty pathspecs to remove
untracked directories by default).

Denton found and analyzed one issue and provided the patch for the
match_pathspec() call, SZEDER figured out why the issue only reproduced
for some folks and not others and provided the testcase, and I looked
through the remainder of the series and noted the do_match_pathspec()
call that should have the same check.

Co-authored-by: Denton Liu 
Co-authored-by: SZEDER Gábor 
Signed-off-by: Elijah Newren 
---
Note: Applies on top of en/clean-nested-with-ignored, in next.

As with v1, the authorship is really mixed, so I don't know if I
should use Co-authored-by (highlighted as a possibility by Denton), or
the far more common Helped-by (as suggested by Junio but based on a
more limited summary of the different contributions), or if perhaps
Denton or SZEDER should be marked as the author and I be marked as
Helped-by or Co-authored-by.  Since Denton commented on round 1, I
used his suggestion for attribution in this round, but I'm open to
changing it to whatever works best.

Changes since v2:
  - This time actually removed the entire unnecessary comment

Range-diff:
1:  c495b9303c ! 1:  40392c6bba dir: special case check for the possibility 
that pathspec is NULL
@@ t/t0050-filesystem.sh: $test_unicode 'merge (silent unicode 
normalization)' '
 +  git reset --hard &&
 +  mkdir -p gitweb/subdir &&
 +  >gitweb/subdir/file &&
-+  # it is not strictly necessary to add and commit the
 +  git add gitweb &&
 +  git commit -m "add gitweb/subdir/file" &&
 +

 dir.c |  8 +---
 t/t0050-filesystem.sh | 21 +
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 7ff79170fc..bd39b86be4 100644
--- a/dir.c
+++ b/dir.c
@@ -1962,8 +1962,9 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
((state == path_untracked) &&
 (get_dtype(cdir.de, istate, path.buf, path.len) == 
DT_DIR) &&
 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
- do_match_pathspec(istate, pathspec, path.buf, 
path.len,
-   baselen, NULL, 
DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
+ (pathspec &&
+  do_match_pathspec(istate, pathspec, path.buf, 
path.len,
+baselen, NULL, 
DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC {
struct untracked_cache_dir *ud;
ud = lookup_untracked(dir->untracked, untracked,
  path.buf + baselen,
@@ -1975,7 +1976,8 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
if (subdir_state > dir_state)
dir_state = subdir_state;
 
-   if (!match_pathspec(istate, pathspec, path.buf, 
path.len,
+   if (pathspec &&
+   !match_pathspec(istate, pathspec, path.buf, 
path.len,

Re: [PATCH v2] dir: special case check for the possibility that pathspec is NULL

2019-10-01 Thread Elijah Newren

On Tue, Oct 1, 2019 at 11:41 AM Denton Liu  wrote:
>
> Hi Elijah,
>
> On Tue, Oct 01, 2019 at 11:30:05AM -0700, Elijah Newren wrote:
>
> [...]
>
> > diff --git a/dir.c b/dir.c
> > index 7ff79170fc..bd39b86be4 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -1962,8 +1962,9 @@ static enum path_treatment 
> > read_directory_recursive(struct dir_struct *dir,
> >   ((state == path_untracked) &&
> >(get_dtype(cdir.de, istate, path.buf, path.len) == 
> > DT_DIR) &&
> >((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> > -   do_match_pathspec(istate, pathspec, path.buf, 
> > path.len,
> > - baselen, NULL, 
> > DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
> > +   (pathspec &&
> > +do_match_pathspec(istate, pathspec, path.buf, 
> > path.len,
> > +  baselen, NULL, 
> > DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC {
> >   struct untracked_cache_dir *ud;
> >   ud = lookup_untracked(dir->untracked, untracked,
> > path.buf + baselen,
> > @@ -1975,7 +1976,8 @@ static enum path_treatment 
> > read_directory_recursive(struct dir_struct *dir,
> >   if (subdir_state > dir_state)
> >   dir_state = subdir_state;
> >
> > - if (!match_pathspec(istate, pathspec, path.buf, 
> > path.len,
> > + if (pathspec &&
> > + !match_pathspec(istate, pathspec, path.buf, 
> > path.len,
> >   0 /* prefix */, NULL,
> >   0 /* do NOT special case dirs */))
> >   state = path_none;
> > diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
> > index 192c94eccd..a840919967 100755
> > --- a/t/t0050-filesystem.sh
> > +++ b/t/t0050-filesystem.sh
> > @@ -131,4 +131,25 @@ $test_unicode 'merge (silent unicode normalization)' '
> >   git merge topic
> >  '
> >
> > +test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a 
> > case insensitive fs' '
> > + git init repo &&
> > + (
> > + cd repo &&
> > +
> > + >Gitweb &&
> > + git add Gitweb &&
> > + git commit -m "add Gitweb" &&
> > +
> > + git checkout --orphan todo &&
> > + git reset --hard &&
> > + mkdir -p gitweb/subdir &&
> > + >gitweb/subdir/file &&
> > + # it is not strictly necessary to add and commit the
>
> Probably not worth a reroll but we're missing "gitweb directory" at the
> end of the comment. Other than that, it looks good to me.

Yuck, I accidentally only removed half the comment when I intended to
remove it all?  Whoops.  I think it's worth a reroll; it's just a
single patch.  I'll send it out.

[PATCH v2] dir: special case check for the possibility that pathspec is NULL

2019-10-01 Thread Elijah Newren

Commits 404ebceda01c ("dir: also check directories for matching
pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
match files under a dir, recurse into it", 2019-09-17) added calls to
match_pathspec() and do_match_pathspec() passing along their pathspec
parameter.  Both match_pathspec() and do_match_pathspec() assume the
pathspec argument they are given is non-NULL.  It turns out that
unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
pathspec == NULL, and it is possible on case insensitive filesystems for
that NULL to make it to these new calls to match_pathspec() and
do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
to avoid a segfault.

In case the negation throws anyone off (one of the calls was to
do_match_pathspec() while the other was to !match_pathspec(), yet no
negation of the NULLness of pathspec is used), there are two ways to
understand the differences:
  * The code already handled the pathspec == NULL cases before this
series, and this series only tried to change behavior when there was
a pathspec, thus we only want to go into the if-block if pathspec is
non-NULL.
  * One of the calls is for whether to recurse into a subdirectory, the
other is for after we've recursed into it for whether we want to
remove the subdirectory itself (i.e. the subdirectory didn't match
but something under it could have).  That difference in situation
leads to the slight differences in logic used (well, that and the
slightly unusual fact that we don't want empty pathspecs to remove
untracked directories by default).

Denton found and analyzed one issue and provided the patch for the
match_pathspec() call, SZEDER figured out why the issue only reproduced
for some folks and not others and provided the testcase, and I looked
through the remainder of the series and noted the do_match_pathspec()
call that should have the same check.

Co-authored-by: Denton Liu 
Co-authored-by: SZEDER Gábor 
Signed-off-by: Elijah Newren 
---
Note: Applies on top of en/clean-nested-with-ignored, in next.

As with v1, the authorship is really mixed, so I don't know if I
should use Co-authored-by (highlighted as a possibility by Denton), or
the far more common Helped-by (as suggested by Junio but based on a
more limited summary of the different contributions), or if perhaps
Denton or SZEDER should be marked as the author and I be marked as
Helped-by or Co-authored-by.  Since Denton commented on round 1, I
used his suggestion for attribution in this round, but I'm open to
changing it to whatever works best.

Changes since v1:
  - Removed comments that made sense in context of the original thread
but wouldn't be helpful to future readers.
  - s/Helped-by/Co-authored-by/

Range-diff:
1:  885c22d24b ! 1:  c495b9303c dir: special case check for the possibility 
that pathspec is NULL
@@ t/t0050-filesystem.sh: $test_unicode 'merge (silent unicode 
normalization)' '
 +
 +  git checkout --orphan todo &&
 +  git reset --hard &&
-+  # the subdir is crucial, without it there is no segfault
 +  mkdir -p gitweb/subdir &&
 +  >gitweb/subdir/file &&
 +  # it is not strictly necessary to add and commit the
-+  # gitweb directory, its presence is sufficient
 +  git add gitweb &&
 +  git commit -m "add gitweb/subdir/file" &&
 +

 dir.c |  8 +---
 t/t0050-filesystem.sh | 21 +
 2 files changed, 26 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 7ff79170fc..bd39b86be4 100644
--- a/dir.c
+++ b/dir.c
@@ -1962,8 +1962,9 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
((state == path_untracked) &&
 (get_dtype(cdir.de, istate, path.buf, path.len) == 
DT_DIR) &&
 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
- do_match_pathspec(istate, pathspec, path.buf, 
path.len,
-   baselen, NULL, 
DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
+ (pathspec &&
+  do_match_pathspec(istate, pathspec, path.buf, 
path.len,
+baselen, NULL, 
DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC {
struct untracked_cache_dir *ud;
ud = lookup_untracked(dir->untracked, untracked,
  path.buf + baselen,
@@ -1975,7 +1976,8 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
if (subdir_state > dir_state)

[PATCH v3] merge-recursive: fix the diff3 common ancestor label for virtual commits

2019-10-01 Thread Elijah Newren

In commit 743474cbfa8b ("merge-recursive: provide a better label for
diff3 common ancestor", 2019-08-17), the label for the common ancestor
was changed from always being

 "merged common ancestors"

to instead be based on the number of merge bases:

>=2: "merged common ancestors"
  1: 
  0: ""

Unfortunately, this did not take into account that when we have a single
merge base, that merge base could be fake or constructed.  In such
cases, this resulted in a label of "".  Of course, the previous
label of "merged common ancestors" was also misleading for this case.
Since we have an API that is explicitly about creating fake merge base
commits in merge_recursive_generic(), we should provide a better label
when using that API with one merge base.  So, when
merge_recursive_generic() is called with one merge base, set the label
to:

 "constructed merge base"

Note that callers of merge_recursive_generic() include the builtin
commands git-am (in combination with git apply --build-fake-ancestor),
git-merge-recursive, and git-stash.

Helped-by: Jeff King 
Signed-off-by: Elijah Newren 
---
Note: Applies to the top of en/merge-recursive-cleanup, which is in next.

Changes since v2:
  - Squashed in the testcase Peff provided and changed his attribution
from Reported-by to Helped-by due to the testcase

Range-diff:
1:  3fbfd7 ! 1:  208e69a4eb merge-recursive: fix the diff3 common ancestor 
label for virtual commits
@@ Commit message
 commands git-am (in combination with git apply --build-fake-ancestor),
 git-merge-recursive, and git-stash.
 
-Reported-by: Jeff King 
+Helped-by: Jeff King 
 Signed-off-by: Elijah Newren 
 
  ## merge-recursive.c ##
@@ merge-recursive.c: int merge_recursive_generic(struct merge_options *opt,
}
  
repo_hold_locked_index(opt->repo, &lock, LOCK_DIE_ON_ERROR);
+
+ ## t/t6047-diff3-conflict-markers.sh ##
+@@ t/t6047-diff3-conflict-markers.sh: test_expect_success 'check multiple 
merge bases' '
+   )
+ '
+ 
++test_expect_success 'rebase describes fake ancestor base' '
++  test_create_repo rebase &&
++  (
++  cd rebase &&
++  test_commit base file &&
++  test_commit master file &&
++  git checkout -b side HEAD^ &&
++  test_commit side file &&
++  test_must_fail git -c merge.conflictstyle=diff3 rebase master &&
++  grep "||| constructed merge base" file
++  )
++'
++
+ test_done

 merge-recursive.c |  7 ++-
 t/t6047-diff3-conflict-markers.sh | 13 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index b058741f00..e12d91f48a 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3550,6 +3550,8 @@ static int merge_recursive_internal(struct merge_options 
*opt,
merged_merge_bases = make_virtual_commit(opt->repo, tree,
 "ancestor");
ancestor_name = "empty tree";
+   } else if (opt->ancestor) {
+   ancestor_name = opt->ancestor;
} else if (merge_bases) {
ancestor_name = "merged common ancestors";
} else {
@@ -3689,7 +3691,8 @@ int merge_recursive(struct merge_options *opt,
 {
int clean;
 
-   assert(opt->ancestor == NULL);
+   assert(opt->ancestor == NULL ||
+  !strcmp(opt->ancestor, "constructed merge base"));
 
if (merge_start(opt, repo_get_commit_tree(opt->repo, h1)))
return -1;
@@ -3741,6 +3744,8 @@ int merge_recursive_generic(struct merge_options *opt,
   oid_to_hex(merge_bases[i]));
commit_list_insert(base, &ca);
}
+   if (num_merge_bases == 1)
+   opt->ancestor = "constructed merge base";
}
 
repo_hold_locked_index(opt->repo, &lock, LOCK_DIE_ON_ERROR);
diff --git a/t/t6047-diff3-conflict-markers.sh 
b/t/t6047-diff3-conflict-markers.sh
index 3fb68e0aae..860542aad0 100755
--- a/t/t6047-diff3-conflict-markers.sh
+++ b/t/t6047-diff3-conflict-markers.sh
@@ -186,4 +186,17 @@ test_expect_success 'check multiple merge bases' '
)
 '
 
+test_expect_success 'rebase describes fake ancestor base' '
+   test_create_repo rebase &&
+   (
+   cd rebase &&
+   test_commit base file &&
+   test_commit master file &&
+   git checkout -b side HEAD^ &&
+   test_commit side file &&
+   test_must_fail git -c merge.conflictstyle=diff3 rebase master &&
+   grep "||| constructed merge base" file
+   )
+'
+
 test_done
-- 
2.23.0.25.g3fbfd7.dirty

Re: [PATCH v2 00/11] New sparse-checkout builtin and "cone" mode

2019-10-01 Thread Elijah Newren

On Tue, Oct 1, 2019 at 9:48 AM Derrick Stolee  wrote:
>
> On 9/19/2019 10:43 AM, Derrick Stolee via GitGitGadget wrote:
> > This series makes the sparse-checkout feature more user-friendly. While
> > there, I also present a way to use a limited set of patterns to gain a
> > significant performance boost in very large repositories.
> >
> > Sparse-checkout is only documented as a subsection of the read-tree docs
> > [1], which makes the feature hard to discover. Users have trouble navigating
> > the feature, especially at clone time [2], and have even resorted to
> > creating their own helper tools [3].
> >
> > This series attempts to solve these problems using a new builtin.
>
> I haven't heard anything about this series since Elijah's careful
> review of the RFC. There are definitely areas where this can be
> made more robust, but I'd like to save those for a follow-up series.
>
> Junio: I know you didn't track this in the recent "what's cooking"
> list, and I don't expect you to take it until I re-roll v3 to
> include the .gitignore interaction I already pointed out.

Oh, sorry, I missed this.  By the way, is there any reason I wasn't
cc'ed on this round after reviewing the RFC?

Re: git-grep in sparse checkout

2019-10-01 Thread Elijah Newren

On Tue, Oct 1, 2019 at 6:30 AM Bert Wesarg  wrote:
>
> Hi,
>
> On Tue, Oct 1, 2019 at 3:06 PM Matheus Tavares Bernardino
>  wrote:
> >
> > Hi,
> >
> > During Git Summit it was mentioned that git-grep searches outside
> > sparsity pattern which is not aligned with user expectation. I took a
> > quick look at it and it seems the reason is
> > builtin/grep.c:grep_cache() (which also greps worktree) will grep the
> > object store when a given index entry has the CE_SKIP_WORKTREE bit
> > turned on.
> >
>
> I also had once this problem and found that out and wrote a patch. I
> was just about to send this patch out.
>
> Btw, ls-files should also learn to skip worktree files.
>
> Stay tuned.

I too have a small patch for just grep without --cached or revisions
(it's only a few lines), but it's very incomplete as that is the only
usecase it handles.  For the usecases I'm closest too, what users have
reported they want is essentially a miniature repo where they work on
stuff they care about and ignore the rest.  As such, the desired
functionality for these users is:

* git grep, by default, should only search within the sparsity pattern
* git grep --cached and git grep $REVISION should also only search
within the sparsity pattern
* git diff $REV1 $REV2, git diff $REV1, etc., should by default only
search within sparsity patterns
* git log should by default only show commits modifying files matching
the sparsity patterns
* for all of these, there should be some kind of
--ignore-sparsity-patterns flag to allow searching outside the
sparsity pattern
* other commands (archive, bisect, clean?, gitk, shortlog, blame,
fsck?, etc.) likely need to pay attention to sparsity patterns as
well, but there are some special cases:

* merge, cherry-pick, and rebase (anything touching the merge
machinery) will need to expand the size of the non-sparse worktree if
there are files outside the sparsity patterns with conflicts.  (Though
merge should do a better job of not expanding the non-sparse worktree
when files can cleanly be resolved.)
* ls-files has a -t option which can be used to differentiate which
entries in the index are skip-worktree (S) and which are not.  As
such, use of that flag should probably imply
--ignore-sparsity-patterns
* fast-export and format-patch are not about viewing history but about
exporting it, and limiting to sparsity patterns would result in the
creation of an incompatible history.  As such, they should error out
without a --ignore-sparsity-patterns when invoked from a repository
that has a sparse checkout.
* We may want to augment status with additional information to remind
users they are in a sparse checkout
* New worktrees, by default, should copy the sparsity-patterns of the
worktree they were created from (much like a new shell inherits the
current working directory of it's parent process)

I should note here that Stolee wasn't so sure about having 'log' only
show commits that touched files within the sparse patterns, so we may
need some kind of config setting and have a good usability story for
what each of the settings means and usecases in order to guide how to
handle weird cases better.

Also, as if that weren't enough, there are two more challenges too:
1) As pointed out by Dscho in the contributor summit, we want
intersection of pathspecs specified by the user and those in the
sparsity patterns; e.g. if the user says `git diff $REV -- '*.c' `, we
want to show them a diff against $REV of all .c files that are within
their sparsity patterns.
2) We have two different kinds of path patterns, one for .gitignore
and sparse-checkout, and the other for command-line pathspecs.  See
https://public-inbox.org/git/xmqq4l1qpiaw@gitster-ct.c.googlers.com/.
These differences might make the implementation more difficult, and
making the two types of path patterns have more overlap might be a
necessary first step.

However, the "work with a miniature repo" probably makes the VFS for
Git and partial clone stuff easier -- we don't have to worry about the
incessant need to download more blobs after the initial partial clone
because git commands by default would avoid requesting them.  It also
would work quite nicely with a partial index -- we could have a
directory entry in the index and marked as skip-worktree and avoid
having all the paths under it show up in the index.  That would
accelerate many operations within git.

I'd love to work on this, but I've got plenty of other things on my
plate at the moment, so I probably won't get time for it at least
until the middle of next year.  But I thought I'd send out what I view
as the bigger picture.  Also, this is very much still idea stage; the
contributor summit refined some of the ideas and there may be more
refinement as more people in the list chime in.

Hope that helps,
Elijah

Re: [PATCH] dir: special case check for the possibility that pathspec is NULL

2019-10-01 Thread Elijah Newren

On Mon, Sep 30, 2019 at 3:31 PM Denton Liu  wrote:
>
> Hi Elijah,
>
> On Mon, Sep 30, 2019 at 12:11:06PM -0700, Elijah Newren wrote:
> > Commits 404ebceda01c ("dir: also check directories for matching
> > pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
> > match files under a dir, recurse into it", 2019-09-17) added calls to
> > match_pathspec() and do_match_pathspec() passing along their pathspec
> > parameter.  Both match_pathspec() and do_match_pathspec() assume the
> > pathspec argument they are given is non-NULL.  It turns out that
> > unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
> > pathspec == NULL, and it is possible on case insensitive filesystems for
> > that NULL to make it to these new calls to match_pathspec() and
> > do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
> > to avoid a segfault.
> >
> > In case the negation throws anyone off (one of the calls was to
> > do_match_pathspec() while the other was to !match_pathspec(), yet no
> > negation of the NULLness of pathspec is used), there are two ways to
> > understand the differences:
> >   * The code already handled the pathspec == NULL cases before this
> > series, and this series only tried to change behavior when there was
> > a pathspec, thus we only want to go into the if-block if pathspec is
> > non-NULL.
> >   * One of the calls is for whether to recurse into a subdirectory, the
> > other is for after we've recursed into it for whether we want to
> > remove the subdirectory itself (i.e. the subdirectory didn't match
> > but something under it could have).  That difference in situation
> > leads to the slight differences in logic used (well, that and the
> > slightly unusual fact that we don't want empty pathspecs to remove
> > untracked directories by default).
> >
> > Helped-by: Denton Liu 
> > Helped-by: SZEDER Gábor 
> > Signed-off-by: Elijah Newren 
> > ---
> > This patch applies on top of en/clean-nested-with-ignored, which is now
> > in next.
> >
> > Denton found and analyzed one issue and provided the patch for the
> > match_pathspec() call, SZEDER figured out why the issue only reproduced
> > for some folks and not others and provided the testcase, and I looked
> > through the remainder of the series and noted the do_match_pathspec()
> > call that should have the same check.
>
> Thanks for catching what I missed.
>
> >
> > So, I'm not sure who should be author and who should be helped-by; I
> > feel like their contributions are possibly bigger than mine.  While I
> > tried to reproduce and debug, they ended up doing the work, and I just
> > looked through the rest of the series for similar issues and wrote up
> > a commit message.  *shrug*
>
> Eh, it doesn't really matter to me. GitHub appears to have de facto
> standardised the Co-authored-by: trailer to allow credit to be split
> amonst multiple authors so _maybe_ we could use that, but I'm pretty
> impartial.
>
> >
> >  dir.c |  8 +---
> >  t/t0050-filesystem.sh | 23 +++
> >  2 files changed, 28 insertions(+), 3 deletions(-)
> >
> > diff --git a/dir.c b/dir.c
> > index 7ff79170fc..bd39b86be4 100644
> > --- a/dir.c
> > +++ b/dir.c
> > @@ -1962,8 +1962,9 @@ static enum path_treatment 
> > read_directory_recursive(struct dir_struct *dir,
> >   ((state == path_untracked) &&
> >(get_dtype(cdir.de, istate, path.buf, path.len) == 
> > DT_DIR) &&
> >((dir->flags & DIR_SHOW_IGNORED_TOO) ||
> > -   do_match_pathspec(istate, pathspec, path.buf, 
> > path.len,
> > - baselen, NULL, 
> > DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
> > +   (pathspec &&
> > +do_match_pathspec(istate, pathspec, path.buf, 
> > path.len,
> > +  baselen, NULL, 
> > DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC {
> >   struct untracked_cache_dir *ud;
> >   ud = lookup_untracked(dir->untracked, untracked,
> > path.buf + baselen,
> > @@ -1975,7 +1976,8 @@ static enum path_treatment 
> > read_directory_recursive(struct dir_struct *dir,
> >

[PATCH v2] merge-recursive: fix the diff3 common ancestor label for virtual commits

2019-09-30 Thread Elijah Newren

In commit 743474cbfa8b ("merge-recursive: provide a better label for
diff3 common ancestor", 2019-08-17), the label for the common ancestor
was changed from always being

 "merged common ancestors"

to instead be based on the number of merge bases:

>=2: "merged common ancestors"
  1: 
  0: ""

Unfortunately, this did not take into account that when we have a single
merge base, that merge base could be fake or constructed.  In such
cases, this resulted in a label of "".  Of course, the previous
label of "merged common ancestors" was also misleading for this case.
Since we have an API that is explicitly about creating fake merge base
commits in merge_recursive_generic(), we should provide a better label
when using that API with one merge base.  So, when
merge_recursive_generic() is called with one merge base, set the label
to:

 "constructed merge base"

Note that callers of merge_recursive_generic() include the builtin
commands git-am (in combination with git apply --build-fake-ancestor),
git-merge-recursive, and git-stash.

Reported-by: Jeff King 
Signed-off-by: Elijah Newren 
---
Applies to the top of en/merge-recursive-cleanup, which is in next.

Changes since v1:
  - We only had a problem if the number of fake merge bases was exactly
one; update the patch to check for that and update the commit message
accordingly.

Range-diff:
1:  e3b5015985 ! 1:  3fbfd7 merge-recursive: fix the diff3 common ancestor 
label for virtual commits
@@ Commit message
 Unfortunately, this did not take into account that when we have a 
single
 merge base, that merge base could be fake or constructed.  In such
 cases, this resulted in a label of "".  Of course, the previous
-label of "merged common ancestors" was also misleading for these cases.
-Since we have an API that is explicitly about creating fake commits in
-merge_recursive_generic(), we should provide a better label when using
-that API.  So, when merge_recursive_generic() is called, set the label
+label of "merged common ancestors" was also misleading for this case.
+Since we have an API that is explicitly about creating fake merge base
+commits in merge_recursive_generic(), we should provide a better label
+when using that API with one merge base.  So, when
+merge_recursive_generic() is called with one merge base, set the label
 to:
 
  "constructed merge base"
 
-Note that users of merge_recursive_generic include the builtin commands
-git-am (in combination with git apply --build-fake-ancestor),
+Note that callers of merge_recursive_generic() include the builtin
+commands git-am (in combination with git apply --build-fake-ancestor),
 git-merge-recursive, and git-stash.
 
 Reported-by: Jeff King 
@@ merge-recursive.c: int merge_recursive_generic(struct merge_options *opt,
   oid_to_hex(merge_bases[i]));
commit_list_insert(base, &ca);
}
-+  opt->ancestor = "constructed merge base";
++  if (num_merge_bases == 1)
++  opt->ancestor = "constructed merge base";
}
  
repo_hold_locked_index(opt->repo, &lock, LOCK_DIE_ON_ERROR);

 merge-recursive.c | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index b058741f00..e12d91f48a 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3550,6 +3550,8 @@ static int merge_recursive_internal(struct merge_options 
*opt,
merged_merge_bases = make_virtual_commit(opt->repo, tree,
 "ancestor");
ancestor_name = "empty tree";
+   } else if (opt->ancestor) {
+   ancestor_name = opt->ancestor;
} else if (merge_bases) {
ancestor_name = "merged common ancestors";
} else {
@@ -3689,7 +3691,8 @@ int merge_recursive(struct merge_options *opt,
 {
int clean;
 
-   assert(opt->ancestor == NULL);
+   assert(opt->ancestor == NULL ||
+  !strcmp(opt->ancestor, "constructed merge base"));
 
if (merge_start(opt, repo_get_commit_tree(opt->repo, h1)))
return -1;
@@ -3741,6 +3744,8 @@ int merge_recursive_generic(struct merge_options *opt,
   oid_to_hex(merge_bases[i]));
commit_list_insert(base, &ca);
}
+   if (num_merge_bases == 1)
+   opt->ancestor = "constructed merge base";
}
 
repo_hold_locked_index(opt->repo, &lock, LOCK_DIE_ON_ERROR);
-- 
2.23.0.25.ge3b5015985.dirty

Re: [PATCH] merge-recursive: fix the diff3 common ancestor label for virtual commits

2019-09-30 Thread Elijah Newren

On Mon, Sep 30, 2019 at 3:55 PM Elijah Newren  wrote:
>
> In commit 743474cbfa8b ("merge-recursive: provide a better label for
> diff3 common ancestor", 2019-08-17), the label for the common ancestor
> was changed from always being
>
>  "merged common ancestors"
>
> to instead be based on the number of merge bases:
>
> >=2: "merged common ancestors"
>   1: 
>   0: ""
>
> Unfortunately, this did not take into account that when we have a single
> merge base, that merge base could be fake or constructed.  In such
> cases, this resulted in a label of "".  Of course, the previous
> label of "merged common ancestors" was also misleading for these cases.
> Since we have an API that is explicitly about creating fake commits in
> merge_recursive_generic(), we should provide a better label when using
> that API.  So, when merge_recursive_generic() is called, set the label
> to:
>
>  "constructed merge base"
>
> Note that users of merge_recursive_generic include the builtin commands
> git-am (in combination with git apply --build-fake-ancestor),
> git-merge-recursive, and git-stash.
>
> Reported-by: Jeff King 
...
> @@ -3741,6 +3744,7 @@ int merge_recursive_generic(struct merge_options *opt,
>oid_to_hex(merge_bases[i]));
> commit_list_insert(base, &ca);
> }
> +   opt->ancestor = "constructed merge base";

This should have a 'if (num_merge_bases == 1)' check before it; I'll
be sending a v2 shortly and update the commit message slightly.

[PATCH] merge-recursive: fix the diff3 common ancestor label for virtual commits

2019-09-30 Thread Elijah Newren

In commit 743474cbfa8b ("merge-recursive: provide a better label for
diff3 common ancestor", 2019-08-17), the label for the common ancestor
was changed from always being

 "merged common ancestors"

to instead be based on the number of merge bases:

>=2: "merged common ancestors"
  1: 
  0: ""

Unfortunately, this did not take into account that when we have a single
merge base, that merge base could be fake or constructed.  In such
cases, this resulted in a label of "".  Of course, the previous
label of "merged common ancestors" was also misleading for these cases.
Since we have an API that is explicitly about creating fake commits in
merge_recursive_generic(), we should provide a better label when using
that API.  So, when merge_recursive_generic() is called, set the label
to:

 "constructed merge base"

Note that users of merge_recursive_generic include the builtin commands
git-am (in combination with git apply --build-fake-ancestor),
git-merge-recursive, and git-stash.

Reported-by: Jeff King 
Signed-off-by: Elijah Newren 
---
Applies to the top of en/merge-recursive-cleanup, which is in next.

 merge-recursive.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/merge-recursive.c b/merge-recursive.c
index b058741f00..2b7722e350 100644
--- a/merge-recursive.c
+++ b/merge-recursive.c
@@ -3550,6 +3550,8 @@ static int merge_recursive_internal(struct merge_options 
*opt,
merged_merge_bases = make_virtual_commit(opt->repo, tree,
 "ancestor");
ancestor_name = "empty tree";
+   } else if (opt->ancestor) {
+   ancestor_name = opt->ancestor;
} else if (merge_bases) {
ancestor_name = "merged common ancestors";
} else {
@@ -3689,7 +3691,8 @@ int merge_recursive(struct merge_options *opt,
 {
int clean;
 
-   assert(opt->ancestor == NULL);
+   assert(opt->ancestor == NULL ||
+  !strcmp(opt->ancestor, "constructed merge base"));
 
if (merge_start(opt, repo_get_commit_tree(opt->repo, h1)))
return -1;
@@ -3741,6 +3744,7 @@ int merge_recursive_generic(struct merge_options *opt,
   oid_to_hex(merge_bases[i]));
commit_list_insert(base, &ca);
}
+   opt->ancestor = "constructed merge base";
}
 
repo_hold_locked_index(opt->repo, &lock, LOCK_DIE_ON_ERROR);
-- 
2.23.0.25.ge3b5015985

[PATCH v2 7/8] t9350: add tests for tags of things other than a commit

2019-09-30 Thread Elijah Newren

Multiple changes here:
  * add a test for a tag of a blob
  * add a test for a tag of a tag of a commit
  * add a comment to the tests for (possibly nested) tags of trees,
making it clear that these tests are doing much less than you might
expect

Signed-off-by: Elijah Newren 
---
 t/t9350-fast-export.sh | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index b3fca6ffba..9ab281e4b9 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -540,10 +540,41 @@ test_expect_success 'tree_tag''
 '
 
 # NEEDSWORK: not just check return status, but validate the output
+# Note that these tests DO NOTHING other than print a warning that
+# they are ommitting the one tag we asked them to export (because the
+# tags resolve to a tree).  They exist just to make sure we do not
+# abort but instead just warn.
 test_expect_success 'tree_tag-obj''git fast-export tree_tag-obj'
 test_expect_success 'tag-obj_tag' 'git fast-export tag-obj_tag'
 test_expect_success 'tag-obj_tag-obj' 'git fast-export tag-obj_tag-obj'
 
+test_expect_success 'handling tags of blobs' '
+   git tag -a -m "Tag of a blob" blobtag $(git rev-parse master:file) &&
+   git fast-export blobtag >actual &&
+   cat >expect <<-EOF &&
+   blob
+   mark :1
+   data 9
+   die Luft
+
+   tag blobtag
+   from :1
+   tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+   data 14
+   Tag of a blob
+
+   EOF
+   test_cmp expect actual
+'
+
+test_expect_failure 'handling nested tags' '
+   git tag -a -m "This is a nested tag" nested muss &&
+   git fast-export --mark-tags nested >output &&
+   grep "^from $ZERO_OID$" output &&
+   grep "^tag nested$" output >tag_lines &&
+   test_line_count = 2 tag_lines
+'
+
 test_expect_success 'directory becomes symlink''
git init dirtosymlink &&
git init result &&
-- 
2.23.0.264.gac739dbb79

[PATCH v2 0/8] fast export/import: handle nested tags, improve incremental exports

2019-09-30 Thread Elijah Newren

This series improves the incremental export story for fast-export and
fast-import (--export-marks and --import-marks fell a bit short),
fixes a couple small export/import bugs, and enables handling nested
tags.  In particular, the nested tags handling makes it so that
fast-export and fast-import can finally handle the git.git repo.

Changes since v1 (full range-diff below):
  - Fixed an issue integrating with next/pu (in particular, with
jk/fast-import-history-bugfix)

Elijah Newren (8):
  fast-export: fix exporting a tag and nothing else
  fast-import: fix handling of deleted tags
  fast-import: allow tags to be identified by mark labels
  fast-import: add support for new 'alias' command
  fast-export: add support for --import-marks-if-exists
  fast-export: allow user to request tags be marked with --mark-tags
  t9350: add tests for tags of things other than a commit
  fast-export: handle nested tags

 Documentation/git-fast-export.txt | 17 --
 Documentation/git-fast-import.txt | 23 
 builtin/fast-export.c | 67 --
 fast-import.c | 94 +++
 t/t9300-fast-import.sh| 37 
 t/t9350-fast-export.sh| 68 --
 6 files changed, 268 insertions(+), 38 deletions(-)

Range-diff:
1:  b751d6c2d6 ! 1:  1d19498bc6 fast-import: fix handling of deleted tags
@@ fast-import.c: static void parse_reset_branch(const char *arg)
b = new_branch(arg);
read_next_command();
parse_from(b);
-+  if (b->delete && !strncmp(arg, "refs/tags/", 10)) {
++  if (b->delete && !strncmp(b->name, "refs/tags/", 10)) {
 +  /*
 +   * Elsewhere, we call dump_branches() before dump_tags(),
 +   * and dump_branches() will handle ref deletions first, so
@@ fast-import.c: static void parse_reset_branch(const char *arg)
 +  for (t = first_tag; t; t = t->next_tag) {
 +  strbuf_reset(&tag_name);
 +  strbuf_addf(&tag_name, "refs/tags/%s", t->name);
-+  if (!strcmp(arg, tag_name.buf))
++  if (!strcmp(b->name, tag_name.buf))
 +  break;
 +  prev = t;
 +  }
2:  26b77dde15 = 2:  e1fd888e4a fast-import: allow tags to be identified by 
mark labels
3:  e0d1a1d7aa = 3:  93175f28d9 fast-import: add support for new 'alias' command
4:  edea892661 = 4:  8c8743395c fast-export: add support for 
--import-marks-if-exists
5:  6af7e1fdd0 = 5:  eebc40df33 fast-export: allow user to request tags be 
marked with --mark-tags
6:  631ae9a63e = 6:  de39f703c6 t9350: add tests for tags of things other than 
a commit
7:  c0e932e4da = 7:  ac739dbb79 fast-export: handle nested tags
-- 
2.23.0.264.gac739dbb79

[PATCH v2 5/8] fast-export: add support for --import-marks-if-exists

2019-09-30 Thread Elijah Newren

fast-import has support for both an --import-marks flag and an
--import-marks-if-exists flag; the latter of which will not die() if the
file does not exist.  fast-export only had support for an --import-marks
flag; add an --import-marks-if-exists flag for consistency.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 23 +++
 t/t9350-fast-export.sh | 10 --
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 5822271c6b..575e47833b 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -1052,11 +1052,16 @@ static void export_marks(char *file)
error("Unable to write marks file %s.", file);
 }
 
-static void import_marks(char *input_file)
+static void import_marks(char *input_file, int check_exists)
 {
char line[512];
-   FILE *f = xfopen(input_file, "r");
+   FILE *f;
+   struct stat sb;
+
+   if (check_exists && stat(input_file, &sb))
+   return;
 
+   f = xfopen(input_file, "r");
while (fgets(line, sizeof(line), f)) {
uint32_t mark;
char *line_end, *mark_end;
@@ -1120,7 +1125,9 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
struct rev_info revs;
struct object_array commits = OBJECT_ARRAY_INIT;
struct commit *commit;
-   char *export_filename = NULL, *import_filename = NULL;
+   char *export_filename = NULL,
+*import_filename = NULL,
+*import_filename_if_exists = NULL;
uint32_t lastimportid;
struct string_list refspecs_list = STRING_LIST_INIT_NODUP;
struct string_list paths_of_changed_objects = STRING_LIST_INIT_DUP;
@@ -1140,6 +1147,10 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
 N_("Dump marks to this file")),
OPT_STRING(0, "import-marks", &import_filename, N_("file"),
 N_("Import marks from this file")),
+   OPT_STRING(0, "import-marks-if-exists",
+&import_filename_if_exists,
+N_("file"),
+N_("Import marks from this file if it exists")),
OPT_BOOL(0, "fake-missing-tagger", &fake_missing_tagger,
 N_("Fake a tagger when tags lack one")),
OPT_BOOL(0, "full-tree", &full_tree,
@@ -1187,8 +1198,12 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
if (use_done_feature)
printf("feature done\n");
 
+   if (import_filename && import_filename_if_exists)
+   die(_("Cannot pass both --import-marks and 
--import-marks-if-exists"));
if (import_filename)
-   import_marks(import_filename);
+   import_marks(import_filename, 0);
+   else if (import_filename_if_exists)
+   import_marks(import_filename_if_exists, 1);
lastimportid = last_idnum;
 
if (import_filename && revs.prune_data.nr)
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index d32ff41859..ea84e2f173 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -580,17 +580,15 @@ test_expect_success 'fast-export quotes pathnames' '
 '
 
 test_expect_success 'test bidirectionality' '
-   >marks-cur &&
-   >marks-new &&
git init marks-test &&
-   git fast-export --export-marks=marks-cur --import-marks=marks-cur 
--branches | \
-   git --git-dir=marks-test/.git fast-import --export-marks=marks-new 
--import-marks=marks-new &&
+   git fast-export --export-marks=marks-cur 
--import-marks-if-exists=marks-cur --branches | \
+   git --git-dir=marks-test/.git fast-import --export-marks=marks-new 
--import-marks-if-exists=marks-new &&
(cd marks-test &&
git reset --hard &&
echo Wohlauf > file &&
git commit -a -m "back in time") &&
-   git --git-dir=marks-test/.git fast-export --export-marks=marks-new 
--import-marks=marks-new --branches | \
-   git fast-import --export-marks=marks-cur --import-marks=marks-cur
+   git --git-dir=marks-test/.git fast-export --export-marks=marks-new 
--import-marks-if-exists=marks-new --branches | \
+   git fast-import --export-marks=marks-cur 
--import-marks-if-exists=marks-cur
 '
 
 cat > expected << EOF
-- 
2.23.0.264.gac739dbb79

[PATCH v2 6/8] fast-export: allow user to request tags be marked with --mark-tags

2019-09-30 Thread Elijah Newren

Add a new option, --mark-tags, which will output mark identifiers with
each tag object.  This improves the incremental export story with
--export-marks since it will allow us to record that annotated tags have
been exported, and it is also needed as a step towards supporting nested
tags.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 17 +
 builtin/fast-export.c |  7 +++
 t/t9350-fast-export.sh| 14 ++
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index cc940eb9ad..c522b34f7b 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -75,11 +75,20 @@ produced incorrect results if you gave these options.
Before processing any input, load the marks specified in
.  The input file must exist, must be readable, and
must use the same format as produced by --export-marks.
+
+--mark-tags::
+   In addition to labelling blobs and commits with mark ids, also
+   label tags.  This is useful in conjunction with
+   `--export-marks` and `--import-marks`, and is also useful (and
+   necessary) for exporting of nested tags.  It does not hurt
+   other cases and would be the default, but many fast-import
+   frontends are not prepared to accept tags with mark
+   identifiers.
 +
-Any commits that have already been marked will not be exported again.
-If the backend uses a similar --import-marks file, this allows for
-incremental bidirectional exporting of the repository by keeping the
-marks the same across runs.
+Any commits (or tags) that have already been marked will not be
+exported again.  If the backend uses a similar --import-marks file,
+this allows for incremental bidirectional exporting of the repository
+by keeping the marks the same across runs.
 
 --fake-missing-tagger::
Some old repositories have tags without a tagger.  The
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 575e47833b..d32e1e9327 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -40,6 +40,7 @@ static int no_data;
 static int full_tree;
 static int reference_excluded_commits;
 static int show_original_ids;
+static int mark_tags;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
@@ -861,6 +862,10 @@ static void handle_tag(const char *name, struct tag *tag)
if (starts_with(name, "refs/tags/"))
name += 10;
printf("tag %s\n", name);
+   if (mark_tags) {
+   mark_next_object(&tag->object);
+   printf("mark :%"PRIu32"\n", last_idnum);
+   }
if (tagged_mark)
printf("from :%d\n", tagged_mark);
else
@@ -1165,6 +1170,8 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
 &reference_excluded_commits, N_("Reference parents 
which are not in fast-export stream by object id")),
OPT_BOOL(0, "show-original-ids", &show_original_ids,
N_("Show original object ids of blobs/commits")),
+   OPT_BOOL(0, "mark-tags", &mark_tags,
+   N_("Label tags with mark ids")),
 
OPT_END()
};
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index ea84e2f173..b3fca6ffba 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -66,6 +66,20 @@ test_expect_success 'fast-export ^muss^{commit} muss' '
test_cmp expected actual
 '
 
+test_expect_success 'fast-export --mark-tags ^muss^{commit} muss' '
+   git fast-export --mark-tags --tag-of-filtered-object=rewrite 
^muss^{commit} muss >actual &&
+   cat >expected <<-EOF &&
+   tag muss
+   mark :1
+   from $(git rev-parse --verify muss^{commit})
+   $(git cat-file tag muss | grep tagger)
+   data 9
+   valentin
+
+   EOF
+   test_cmp expected actual
+'
+
 test_expect_success 'fast-export master~2..master' '
 
git fast-export master~2..master >actual &&
-- 
2.23.0.264.gac739dbb79

[PATCH v2 8/8] fast-export: handle nested tags

2019-09-30 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 30 ++
 t/t9350-fast-export.sh |  2 +-
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index d32e1e9327..58a74de42a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -843,22 +843,28 @@ static void handle_tag(const char *name, struct tag *tag)
free(buf);
return;
case REWRITE:
-   if (tagged->type != OBJ_COMMIT) {
-   die("tag %s tags unexported %s!",
-   oid_to_hex(&tag->object.oid),
-   type_name(tagged->type));
-   }
-   p = rewrite_commit((struct commit *)tagged);
-   if (!p) {
-   printf("reset %s\nfrom %s\n\n",
-  name, oid_to_hex(&null_oid));
-   free(buf);
-   return;
+   if (tagged->type == OBJ_TAG && !mark_tags) {
+   die(_("Error: Cannot export nested tags unless 
--mark-tags is specified."));
+   } else if (tagged->type == OBJ_COMMIT) {
+   p = rewrite_commit((struct commit *)tagged);
+   if (!p) {
+   printf("reset %s\nfrom %s\n\n",
+  name, oid_to_hex(&null_oid));
+   free(buf);
+   return;
+   }
+   tagged_mark = get_object_mark(&p->object);
+   } else {
+   /* tagged->type is either OBJ_BLOB or OBJ_TAG */
+   tagged_mark = get_object_mark(tagged);
}
-   tagged_mark = get_object_mark(&p->object);
}
}
 
+   if (tagged->type == OBJ_TAG) {
+   printf("reset %s\nfrom %s\n\n",
+  name, oid_to_hex(&null_oid));
+   }
if (starts_with(name, "refs/tags/"))
name += 10;
printf("tag %s\n", name);
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 9ab281e4b9..2e4e214815 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -567,7 +567,7 @@ test_expect_success 'handling tags of blobs' '
test_cmp expect actual
 '
 
-test_expect_failure 'handling nested tags' '
+test_expect_success 'handling nested tags' '
git tag -a -m "This is a nested tag" nested muss &&
git fast-export --mark-tags nested >output &&
grep "^from $ZERO_OID$" output &&
-- 
2.23.0.264.gac739dbb79

[PATCH v2 1/8] fast-export: fix exporting a tag and nothing else

2019-09-30 Thread Elijah Newren

fast-export allows specifying revision ranges, which can be used to
export a tag without exporting the commit it tags.  fast-export handled
this rather poorly: it would emit a "from :0" directive.  Since marks
start at 1 and increase, this means it refers to an unknown commit and
fast-import will choke on the input.

When we are unable to look up a mark for the object being tagged, use a
"from $HASH" directive instead to fix this problem.

Note that this is quite similar to the behavior fast-export exhibits
with commits and parents when --reference-excluded-parents is passed
along with an excluded commit range.  For tags of excluded commits we do
not require the --reference-excluded-parents flag because we always have
to tag something.  By contrast, when dealing with commits, pruning a
parent is always a viable option, so we need the flag to specify that
parent pruning is not wanted.  (It is slightly weird that
--reference-excluded-parents isn't the default with a separate
--prune-excluded-parents flag, but backward compatibility concerns
resulted in the current defaults.)

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  |  7 ++-
 t/t9350-fast-export.sh | 13 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index f541f55d33..5822271c6b 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -860,7 +860,12 @@ static void handle_tag(const char *name, struct tag *tag)
 
if (starts_with(name, "refs/tags/"))
name += 10;
-   printf("tag %s\nfrom :%d\n", name, tagged_mark);
+   printf("tag %s\n", name);
+   if (tagged_mark)
+   printf("from :%d\n", tagged_mark);
+   else
+   printf("from %s\n", oid_to_hex(&tagged->oid));
+
if (show_original_ids)
printf("original-oid %s\n", oid_to_hex(&tag->object.oid));
printf("%.*s%sdata %d\n%.*s\n",
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index b4004e05c2..d32ff41859 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -53,6 +53,19 @@ test_expect_success 'fast-export | fast-import' '
 
 '
 
+test_expect_success 'fast-export ^muss^{commit} muss' '
+   git fast-export --tag-of-filtered-object=rewrite ^muss^{commit} muss 
>actual &&
+   cat >expected <<-EOF &&
+   tag muss
+   from $(git rev-parse --verify muss^{commit})
+   $(git cat-file tag muss | grep tagger)
+   data 9
+   valentin
+
+   EOF
+   test_cmp expected actual
+'
+
 test_expect_success 'fast-export master~2..master' '
 
git fast-export master~2..master >actual &&
-- 
2.23.0.264.gac739dbb79

[PATCH v2 3/8] fast-import: allow tags to be identified by mark labels

2019-09-30 Thread Elijah Newren

Mark identifiers are used in fast-export and fast-import to provide a
label to refer to earlier content.  Blobs are given labels because they
need to be referenced in the commits where they first appear with a
given filename, and commits are given labels because they can be the
parents of other commits.  Tags were never given labels, probably
because they were viewed as unnecessary, but that presents two problems:

   1. It leaves us without a way of referring to previous tags if we
  want to create a tag of a tag (or higher nestings).
   2. It leaves us with no way of recording that a tag has already been
  imported when using --export-marks and --import-marks.

Fix these problems by allowing an optional mark label for tags.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-import.txt |  1 +
 fast-import.c |  3 ++-
 t/t9300-fast-import.sh| 19 +++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index 0bb276269e..4977869465 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -774,6 +774,7 @@ lightweight (non-annotated) tags see the `reset` command 
below.
 
 
'tag' SP  LF
+   mark?
'from' SP  LF
original-oid?
'tagger' (SP )? SP LT  GT SP  LF
diff --git a/fast-import.c b/fast-import.c
index 546da3a938..0042440487 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2713,6 +2713,7 @@ static void parse_new_tag(const char *arg)
first_tag = t;
last_tag = t;
read_next_command();
+   parse_mark();
 
/* from ... */
if (!skip_prefix(command_buf.buf, "from ", &from))
@@ -2769,7 +2770,7 @@ static void parse_new_tag(const char *arg)
strbuf_addbuf(&new_data, &msg);
free(tagger);
 
-   if (store_object(OBJ_TAG, &new_data, NULL, &t->oid, 0))
+   if (store_object(OBJ_TAG, &new_data, NULL, &t->oid, next_mark))
t->pack_id = MAX_PACK_ID;
else
t->pack_id = pack_id;
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 74bc41333b..3ad2b2f1ba 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -94,6 +94,23 @@ test_expect_success 'A: create pack from stdin' '
reset refs/tags/to-be-deleted
from 
 
+   tag nested
+   mark :6
+   from :4
+   data <

[PATCH v2 4/8] fast-import: add support for new 'alias' command

2019-09-30 Thread Elijah Newren

fast-export and fast-import have nice --import-marks flags which allow
for incremental migrations.  However, if there is a mark in
fast-export's file of marks without a corresponding mark in the one for
fast-import, then we run the risk that fast-export tries to send new
objects relative to the mark it knows which fast-import does not,
causing fast-import to fail.

This arises in practice when there is a filter of some sort running
between the fast-export and fast-import processes which prunes some
commits programmatically.  Provide such a filter with the ability to
alias pruned commits to their most recent non-pruned ancestor.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-import.txt | 22 +++
 fast-import.c | 62 ++-
 t/t9300-fast-import.sh|  5 +++
 3 files changed, 79 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index 4977869465..a3f1e0c5e4 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -337,6 +337,13 @@ and control the current import process.  More detailed 
discussion
`commit` command.  This command is optional and is not
needed to perform an import.
 
+`alias`::
+   Record that a mark refers to a given object without first
+   creating any new object.  Using --import-marks and referring
+   to missing marks will cause fast-import to fail, so aliases
+   can provide a way to set otherwise pruned commits to a valid
+   value (e.g. the nearest non-pruned ancestor).
+
 `checkpoint`::
Forces fast-import to close the current packfile, generate its
unique SHA-1 checksum and index, and start a new packfile.
@@ -914,6 +921,21 @@ a data chunk which does not have an LF as its last byte.
 +
 The `LF` after ` LF` is optional (it used to be required).
 
+`alias`
+~~~
+Record that a mark refers to a given object without first creating any
+new object.
+
+
+   'alias' LF
+   mark
+   'to' SP  LF
+   LF?
+
+
+For a detailed description of `` see above under `from`.
+
+
 `checkpoint`
 
 Forces fast-import to close the current packfile, start a new one, and to
diff --git a/fast-import.c b/fast-import.c
index 0042440487..9a12850d16 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2491,18 +2491,14 @@ static void parse_from_existing(struct branch *b)
}
 }
 
-static int parse_from(struct branch *b)
+static int parse_objectish(struct branch *b, const char *objectish)
 {
-   const char *from;
struct branch *s;
struct object_id oid;
 
-   if (!skip_prefix(command_buf.buf, "from ", &from))
-   return 0;
-
oidcpy(&oid, &b->branch_tree.versions[1].oid);
 
-   s = lookup_branch(from);
+   s = lookup_branch(objectish);
if (b == s)
die("Can't create a branch from itself: %s", b->name);
else if (s) {
@@ -2510,8 +2506,8 @@ static int parse_from(struct branch *b)
oidcpy(&b->oid, &s->oid);
oidcpy(&b->branch_tree.versions[0].oid, t);
oidcpy(&b->branch_tree.versions[1].oid, t);
-   } else if (*from == ':') {
-   uintmax_t idnum = parse_mark_ref_eol(from);
+   } else if (*objectish == ':') {
+   uintmax_t idnum = parse_mark_ref_eol(objectish);
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2525,13 +2521,13 @@ static int parse_from(struct branch *b)
} else
parse_from_existing(b);
}
-   } else if (!get_oid(from, &b->oid)) {
+   } else if (!get_oid(objectish, &b->oid)) {
parse_from_existing(b);
if (is_null_oid(&b->oid))
b->delete = 1;
}
else
-   die("Invalid ref name or SHA1 expression: %s", from);
+   die("Invalid ref name or SHA1 expression: %s", objectish);
 
if (b->branch_tree.tree && !oideq(&oid, 
&b->branch_tree.versions[1].oid)) {
release_tree_content_recursive(b->branch_tree.tree);
@@ -2542,6 +2538,26 @@ static int parse_from(struct branch *b)
return 1;
 }
 
+static int parse_from(struct branch *b)
+{
+   const char *from;
+
+   if (!skip_prefix(command_buf.buf, "from ", &from))
+   return 0;
+
+   return parse_objectish(b, from);
+}
+
+static int parse_objectish_with_prefix(struct branch *b, const char *prefix)
+{
+   const char *base;
+
+   if (!skip_prefix(command_buf.buf, prefix, &base))
+

[PATCH v2 2/8] fast-import: fix handling of deleted tags

2019-09-30 Thread Elijah Newren

If our input stream includes a tag which is later deleted, we were not
properly deleting it.  We did have a step which would delete it, but we
left a tag in the tag list noting that it needed to be updated, and the
updating of annotated tags occurred AFTER ref deletion.  So, when we
record that a tag needs to be deleted, also remove it from the list of
annotated tags to update.

While this has likely been something that has not happened in practice,
it will come up more in order to support nested tags.  For nested tags,
we either need to give temporary names to the intermediate tags and then
delete them, or else we need to use the final name for the intermediate
tags.  If we use the final name for the intermediate tags, then in order
to keep the sanity check that someone doesn't try to update the same tag
twice, we need to delete the ref after creating the intermediate tag.
So, either way nested tags imply the need to delete temporary inner tag
references.

Signed-off-by: Elijah Newren 
---
 fast-import.c  | 29 +
 t/t9300-fast-import.sh | 13 +
 2 files changed, 42 insertions(+)

diff --git a/fast-import.c b/fast-import.c
index b44d6a467e..546da3a938 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2793,6 +2793,35 @@ static void parse_reset_branch(const char *arg)
b = new_branch(arg);
read_next_command();
parse_from(b);
+   if (b->delete && !strncmp(b->name, "refs/tags/", 10)) {
+   /*
+* Elsewhere, we call dump_branches() before dump_tags(),
+* and dump_branches() will handle ref deletions first, so
+* in order to make sure the deletion actually takes effect,
+* we need to remove the tag from our list of tags to update.
+*
+* NEEDSWORK: replace list of tags with hashmap for faster
+* deletion?
+*/
+   struct strbuf tag_name = STRBUF_INIT;
+   struct tag *t, *prev = NULL;
+   for (t = first_tag; t; t = t->next_tag) {
+   strbuf_reset(&tag_name);
+   strbuf_addf(&tag_name, "refs/tags/%s", t->name);
+   if (!strcmp(b->name, tag_name.buf))
+   break;
+   prev = t;
+   }
+   if (t) {
+   if (prev)
+   prev->next_tag = t->next_tag;
+   else
+   first_tag = t->next_tag;
+   if (!t->next_tag)
+   last_tag = prev;
+   /* There is no mem_pool_free(t) function to call. */
+   }
+   }
if (command_buf.len > 0)
unread_command_buf = 1;
 }
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 141b7fa35e..74bc41333b 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -85,6 +85,15 @@ test_expect_success 'A: create pack from stdin' '
An annotated tag that annotates a blob.
EOF
 
+   tag to-be-deleted
+   from :3
+   data <expect <<-EOF &&
:2 $(git rev-parse --verify master:file2)
-- 
2.23.0.264.gac739dbb79

[PATCH] dir: special case check for the possibility that pathspec is NULL

2019-09-30 Thread Elijah Newren

Commits 404ebceda01c ("dir: also check directories for matching
pathspecs", 2019-09-17) and 89a1f4aaf765 ("dir: if our pathspec might
match files under a dir, recurse into it", 2019-09-17) added calls to
match_pathspec() and do_match_pathspec() passing along their pathspec
parameter.  Both match_pathspec() and do_match_pathspec() assume the
pathspec argument they are given is non-NULL.  It turns out that
unpack-tree.c's verify_clean_subdirectory() calls read_directory() with
pathspec == NULL, and it is possible on case insensitive filesystems for
that NULL to make it to these new calls to match_pathspec() and
do_match_pathspec().  Add appropriate checks on the NULLness of pathspec
to avoid a segfault.

In case the negation throws anyone off (one of the calls was to
do_match_pathspec() while the other was to !match_pathspec(), yet no
negation of the NULLness of pathspec is used), there are two ways to
understand the differences:
  * The code already handled the pathspec == NULL cases before this
series, and this series only tried to change behavior when there was
a pathspec, thus we only want to go into the if-block if pathspec is
non-NULL.
  * One of the calls is for whether to recurse into a subdirectory, the
other is for after we've recursed into it for whether we want to
remove the subdirectory itself (i.e. the subdirectory didn't match
but something under it could have).  That difference in situation
leads to the slight differences in logic used (well, that and the
slightly unusual fact that we don't want empty pathspecs to remove
untracked directories by default).

Helped-by: Denton Liu 
Helped-by: SZEDER Gábor 
Signed-off-by: Elijah Newren 
---
This patch applies on top of en/clean-nested-with-ignored, which is now
in next.

Denton found and analyzed one issue and provided the patch for the
match_pathspec() call, SZEDER figured out why the issue only reproduced
for some folks and not others and provided the testcase, and I looked
through the remainder of the series and noted the do_match_pathspec()
call that should have the same check.

So, I'm not sure who should be author and who should be helped-by; I
feel like their contributions are possibly bigger than mine.  While I
tried to reproduce and debug, they ended up doing the work, and I just
looked through the rest of the series for similar issues and wrote up
a commit message.  *shrug*

 dir.c |  8 +---
 t/t0050-filesystem.sh | 23 +++
 2 files changed, 28 insertions(+), 3 deletions(-)

diff --git a/dir.c b/dir.c
index 7ff79170fc..bd39b86be4 100644
--- a/dir.c
+++ b/dir.c
@@ -1962,8 +1962,9 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
((state == path_untracked) &&
 (get_dtype(cdir.de, istate, path.buf, path.len) == 
DT_DIR) &&
 ((dir->flags & DIR_SHOW_IGNORED_TOO) ||
- do_match_pathspec(istate, pathspec, path.buf, 
path.len,
-   baselen, NULL, 
DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC))) {
+ (pathspec &&
+  do_match_pathspec(istate, pathspec, path.buf, 
path.len,
+baselen, NULL, 
DO_MATCH_LEADING_PATHSPEC) == MATCHED_RECURSIVELY_LEADING_PATHSPEC {
struct untracked_cache_dir *ud;
ud = lookup_untracked(dir->untracked, untracked,
  path.buf + baselen,
@@ -1975,7 +1976,8 @@ static enum path_treatment 
read_directory_recursive(struct dir_struct *dir,
if (subdir_state > dir_state)
dir_state = subdir_state;
 
-   if (!match_pathspec(istate, pathspec, path.buf, 
path.len,
+   if (pathspec &&
+   !match_pathspec(istate, pathspec, path.buf, 
path.len,
0 /* prefix */, NULL,
0 /* do NOT special case dirs */))
state = path_none;
diff --git a/t/t0050-filesystem.sh b/t/t0050-filesystem.sh
index 192c94eccd..edb30f9eb2 100755
--- a/t/t0050-filesystem.sh
+++ b/t/t0050-filesystem.sh
@@ -131,4 +131,27 @@ $test_unicode 'merge (silent unicode normalization)' '
git merge topic
 '
 
+test_expect_success CASE_INSENSITIVE_FS 'checkout with no pathspec and a case 
insensitive fs' '
+   git init repo &&
+   (
+   cd repo &&
+
+   >Gitweb &&
+   git add Gitweb &&
+   git commit -m "add Gitweb" &&
+
+   git checkout --orphan todo &&

Re: [BUG] git is segfaulting, was [PATCH v4 04/12] dir: also check directories for matching pathspecs

2019-09-26 Thread Elijah Newren

Hi Denton,

On Thu, Sep 26, 2019 at 1:35 PM Denton Liu  wrote:
>
> On Wed, Sep 25, 2019 at 02:55:30PM -0700, Denton Liu wrote:
> > Looks correct to me. I don't see why this wouldn't reproduce. I'll send
> > you more information if I figure anything else out.
>
> I looked into it a little more and I think I know why it's being
> triggered.
>
> When we checkout 'todo' from 'master', since they're completely
> different trees, all of git's source files need to be removed. As a
> result, the checkout process at some point invokes check_ok_to_remove().
>
> This kicks off the following call chain:
>
> check_ok_to_remove()
> verify_clean_subdirectory()
> read_directory()
> read_directory_recursive() (this is called recursively, of course)
> match_pathspec()
> do_match_pathspec()
>
> Where we segfault in do_match_pathspec() because ps is NULL:
>
> GUARD_PATHSPEC(ps,
>PATHSPEC_FROMTOP |
>PATHSPEC_MAXDEPTH |
>PATHSPEC_LITERAL |
>PATHSPEC_GLOB |
>PATHSPEC_ICASE |
>PATHSPEC_EXCLUDE |
>PATHSPEC_ATTR);
>
> So why is ps == NULL? In verify_clean_subdirectory(), we call
> read_directory() like this:
>
> i = read_directory(&d, o->src_index, pathbuf, namelen+1, NULL);
>
> where we explictly pass in a NULL and it is handed down the callstack. I
> guess this means that we should be expecting that pathspecs can be NULL
> in this path. So I've applied the patch at the bottom and it fixes the
> problem.
>
> I was wondering if we should stick a
>
> if (!ps)
> BUG("ps is NULL");
>
> into do_match_pathspec(), though, so we can avoid these situations in
> the future.
>
> Also, I'm still not sure why the issue wasn't reproducible on your
> side... I'm not too familiar with this area of the code, though.
>
> -- >8 --
> diff --git a/dir.c b/dir.c
> index 76a3c3894b..b7a6de58c6 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1952,7 +1952,7 @@ static enum path_treatment 
> read_directory_recursive(struct dir_struct *dir,
> if (subdir_state > dir_state)
> dir_state = subdir_state;
>
> -   if (!match_pathspec(istate, pathspec, path.buf, 
> path.len,
> +   if (pathspec && !match_pathspec(istate, pathspec, 
> path.buf, path.len,
> 0 /* prefix */, NULL,
> 0 /* do NOT special case dirs */))
> state = path_none;

The patch makes sense...but I'd really like to add a test, and
understand it better so I can check to see if there are any other bad
codepaths.  Sadly, I still have no idea how to reproduce the bug.  I
can put

char *oopsies = NULL;
printf("oopsies = %s\n", oopsies);

at the beginning of check_ok_to_remove() to verify that function is
never called and run the steps you gave with no problem.  However, I
do notice that your reproduction steps involve 'master' which may have
local changes for you that I don't have.  Is there any chance you can
reproduce this using a commit id that is already upstream instead of
'master'?  I've been poking around unpack-trees.c for a bit but I'm
having a hard time reversing out of it what's different about our
setups and how to trigger.

Re: [PATCH] add a Code of Conduct document

2019-09-26 Thread Elijah Newren

On Wed, Sep 25, 2019 at 5:42 PM Jeff King  wrote:
>
> We've never had a formally written Code of Conduct document. Though it
> has been discussed off and on over the years, for the most part the
> behavior on the mailing list has been good enough that nobody felt the
> need to push one forward.
>
> However, even if there aren't specific problems now, it's a good idea to
> have a document:
>
>   - it puts everybody on the same page with respect to expectations.
> This might avoid poor behavior, but also makes it easier to handle
> it if it does happen.
>
>   - it publicly advertises that good conduct is important to us and will
> be enforced, which may make some people more comfortable with
> joining our community
>
>   - it may be a good time to cement our expectations when things are
> quiet, since it gives everybody some distance rather than focusing
> on a current contentious issue
>
> This patch adapts the Contributor Covenant Code of Conduct. As opposed
> to writing our own from scratch, this uses common and well-accepted
> language, and strikes a good balance between illustrating expectations
> and avoiding a laundry list of behaviors. It's also the same document
> used by the Git for Windows project.
>
> The text is taken mostly verbatim from:
>
>   https://www.contributor-covenant.org/version/1/4/code-of-conduct.html
>
> I also stole a very nice introductory paragraph from the Git for Windows
> version of the file.
>
> There are a few subtle points, though:
>
>   - the document refers to "the project maintainers". For the code, we
> generally only consider there to be one maintainer: Junio C Hamano.
> But for dealing with community issues, it makes sense to involve
> more people to spread the responsibility. I've listed the project
> committee address of g...@sfconservancy.org as the contact point.
>
>   - the document mentions banning from the community, both in the intro
> paragraph and in "Our Responsibilities". The exact mechanism here is
> left vague. I can imagine it might start with social enforcement
> (not accepting patches, ignoring emails) and could escalate to
> technical measures if necessary (asking vger admins to block an
> address). It probably make sense _not_ to get too specific at this
> point, and deal with specifics as they come up.
>
> Signed-off-by: Jeff King 
> ---
> Obviously related to the discussion in:
>
>   https://public-inbox.org/git/71fba9e7-6314-6ef9-9959-6ae06843d...@gmail.com/
>
> After some poking around at various CoC options, this one seemed like
> the best fit to me. But I'm open to suggestions or more discussion. It
> seems to me that the important piece is having _some_ CoC, and picking
> something standard-ish seems a safe bet.
>
> I did find this nice set of guidelines in an old discussion:
>
>   
> https://github.com/mhagger/git/commit/c6e6196be8fab3d48b12c4e42eceae6937538dee
>
> I think it's missing some things that are "standard" in more modern CoCs
> (in particular, there's not much discussion of enforcement or
> responsibilities, and I think those are important for the "making people
> comfortable" goal). But maybe there are bits we'd like to pick out for
> other documents; not so much "_what_ we expect" as "here are some tips
> on _how_".
>
> If people are on board with this direction, it might be fun to pick up a
> bunch of "Acked-by" trailers from people in the community who agree with
> it. It might give it more weight if many members have publicly endorsed
> it.

Acked-by: Elijah Newren 

(including the small update you sent elsewhere to individually list
the members of project leader team.)

Re: [BUG] git is segfaulting, was [PATCH v4 04/12] dir: also check directories for matching pathspecs

2019-09-25 Thread Elijah Newren

Hi Denton,

On Wed, Sep 25, 2019 at 1:39 PM Denton Liu  wrote:
>
> Hi Elijah,
>
> I ran into a segfault on MacOS. I managed to bisect it down to
> 404ebceda0 (dir: also check directories for matching pathspecs,
> 2019-09-17), which should be the patch in the parent thread. The test
> case below works fine without this patch applied but segfaults once it
> is applied.
>
> #!/bin/sh
>
> git worktree add testdir
> git -C testdir checkout master
> git -C testdir fetch https://github.com/git/git.git todo
> bin-wrappers/git -C testdir checkout FETCH_HEAD # segfault here
>
> Note that the worktree part isn't necessary to reproduce the problem but
> I didn't want my files to be constantly refreshed, triggering a rebuild
> each time.
>
> I also managed to get this backtrace from running lldb at the segfault
> but it is based on the latest "jch" commit, 1cc52d20df (Merge branch
> 'jt/merge-recursive-symlink-is-not-a-dir-in-way' into jch, 2019-09-20).
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = 
> EXC_BAD_ACCESS (code=1, address=0x8)
>   * frame #0: 0x0001000f63a0 
> git`do_match_pathspec(istate=0x000100299940, ps=0x00010200aa80, 
> name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x, 
> flags=0) at dir.c:420:2 [opt]
> frame #1: 0x0001000f632c 
> git`match_pathspec(istate=0x000100299940, ps=0x, 
> name="Gitweb/static/js/lib/", namelen=21, prefix=0, seen=0x, 
> is_dir=0) at dir.c:490:13 [opt]
> frame #2: 0x0001000f8315 
> git`read_directory_recursive(dir=0x7ffeefbfe278, 
> istate=0x000100299940, base=, baselen=17, 
> untracked=, check_only=0, stop_at_first_file=0, 
> pathspec=0x) at dir.c:1990:9 [opt]
> frame #3: 0x0001000f82e9 
> git`read_directory_recursive(dir=0x7ffeefbfe278, 
> istate=0x000100299940, base=, baselen=14, 
> untracked=, check_only=0, stop_at_first_file=0, 
> pathspec=0x) at dir.c:1984:5 [opt]
> frame #4: 0x0001000f82e9 
> git`read_directory_recursive(dir=0x7ffeefbfe278, 
> istate=0x000100299940, base=, baselen=7, 
> untracked=, check_only=0, stop_at_first_file=0, 
> pathspec=0x) at dir.c:1984:5 [opt]
> frame #5: 0x0001000f60d1 
> git`read_directory(dir=0x7ffeefbfe278, istate=0x000100299940, 
> path="Gitweb/", len=7, pathspec=0x) at dir.c:2298:3 [opt]
> frame #6: 0x0001001bded1 
> git`verify_clean_subdirectory(ce=, o=0x7ffeefbfe8c0) at 
> unpack-trees.c:1846:6 [opt]
> frame #7: 0x0001001bdc1d 
> git`check_ok_to_remove(name="Gitweb", len=6, dtype=4, ce=0x000103e70de0, 
> st=0x7ffeefbfe438, error_type=ERROR_WOULD_LOSE_UNTRACKED_OVERWRITTEN, 
> o=0x7ffeefbfe8c0) at unpack-trees.c:1901:7 [opt]
> frame #8: 0x0001001bdb01 
> git`verify_absent_1(ce=, error_type=, 
> o=) at unpack-trees.c:1964:10 [opt]
> frame #9: 0x0001001bafc0 
> git`verify_absent(ce=, error_type=, 
> o=) at unpack-trees.c:1052:11 [opt] [artificial]
> frame #10: 0x0001001bbc3c 
> git`merged_entry(ce=0x000100605fb0, old=0x, 
> o=0x7ffeefbfe8c0) at unpack-trees.c:2013:7 [opt]
> frame #11: 0x0001001bd2b7 
> git`call_unpack_fn(src=, o=) at 
> unpack-trees.c:522:12 [opt]
> frame #12: 0x0001001bca16 git`unpack_nondirectories(n=2, 
> mask=2, dirmask=, src=0x7ffeefbfe5d0, names=, 
> info=0x7ffeefbfe718) at unpack-trees.c:1029:12 [opt]
> frame #13: 0x0001001bad1a git`unpack_callback(n=2, 
> mask=2, dirmask=0, names=0x000102007390, info=0x7ffeefbfe718) at 
> unpack-trees.c:1229:6 [opt]
> frame #14: 0x0001001b8be2 
> git`traverse_trees(istate=0x000100299940, n=2, t=, 
> info=) at tree-walk.c:497:17 [opt]
> frame #15: 0x0001001ba80f git`unpack_trees(len=2, 
> t=0x7ffeefbfebe0, o=0x7ffeefbfe8c0) at unpack-trees.c:1546:9 [opt]
> frame #16: 0x00010001a443 
> git`merge_working_tree(opts=0x7ffeefbfee38, 
> old_branch_info=0x7ffeefbfeca0, new_branch_info=0x7ffeefbfeda0, 
> writeout_error=0x7ffeefbfeccc) at checkout.c:704:9 [opt]
> frame #17: 0x00010001a08c 
> git`switch_branches(opts=0x7ffeefbfee38, 
> new_branch_info=0x7ffeefbfeda0) at checkout.c:1057:9 [opt]
> frame #18: 0x000100018df0 
> git`checkout_branch(opts=, new_branch_info=) at 
> checkout.c:1426:9 [opt]
> frame #19: 0x000100017b90 git`checkout_main(argc=0, 
> argv=0x7ffeefbff570, prefix=0x, opts=0x7ffeefbfee38, 
> options=, usagestr=) at checkout.c:1682:10 [opt]
> frame #20: 0x000100016f2d git`cmd_checkout(argc=2, 
> argv=0x7ffeefbff568, prefix=0x0

[PATCH 7/8] t9350: add tests for tags of things other than a commit

2019-09-24 Thread Elijah Newren

Multiple changes here:
  * add a test for a tag of a blob
  * add a test for a tag of a tag of a commit
  * add a comment to the tests for (possibly nested) tags of trees,
making it clear that these tests are doing much less than you might
expect

Signed-off-by: Elijah Newren 
---
 t/t9350-fast-export.sh | 31 +++
 1 file changed, 31 insertions(+)

diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index b3fca6ffba..9ab281e4b9 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -540,10 +540,41 @@ test_expect_success 'tree_tag''
 '
 
 # NEEDSWORK: not just check return status, but validate the output
+# Note that these tests DO NOTHING other than print a warning that
+# they are ommitting the one tag we asked them to export (because the
+# tags resolve to a tree).  They exist just to make sure we do not
+# abort but instead just warn.
 test_expect_success 'tree_tag-obj''git fast-export tree_tag-obj'
 test_expect_success 'tag-obj_tag' 'git fast-export tag-obj_tag'
 test_expect_success 'tag-obj_tag-obj' 'git fast-export tag-obj_tag-obj'
 
+test_expect_success 'handling tags of blobs' '
+   git tag -a -m "Tag of a blob" blobtag $(git rev-parse master:file) &&
+   git fast-export blobtag >actual &&
+   cat >expect <<-EOF &&
+   blob
+   mark :1
+   data 9
+   die Luft
+
+   tag blobtag
+   from :1
+   tagger $GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE
+   data 14
+   Tag of a blob
+
+   EOF
+   test_cmp expect actual
+'
+
+test_expect_failure 'handling nested tags' '
+   git tag -a -m "This is a nested tag" nested muss &&
+   git fast-export --mark-tags nested >output &&
+   grep "^from $ZERO_OID$" output &&
+   grep "^tag nested$" output >tag_lines &&
+   test_line_count = 2 tag_lines
+'
+
 test_expect_success 'directory becomes symlink''
git init dirtosymlink &&
git init result &&
-- 
2.23.0.177.g8af0b3ca64

[PATCH 0/8] fast export/import: handle nested tags, improve incremental exports

2019-09-24 Thread Elijah Newren

This series improves the incremental export story for fast-export and
fast-import (--export-marks and --import-marks fell a bit short),
fixes a couple small export/import bugs, and enables handling nested
tags.  In particular, the nested tags handling makes it so that
fast-export and fast-import can finally handle the git.git repo.

Elijah Newren (8):
  fast-export: fix exporting a tag and nothing else
  fast-import: fix handling of deleted tags
  fast-import: allow tags to be identified by mark labels
  fast-import: add support for new 'alias' command
  fast-export: add support for --import-marks-if-exists
  fast-export: allow user to request tags be marked with --mark-tags
  t9350: add tests for tags of things other than a commit
  fast-export: handle nested tags

 Documentation/git-fast-export.txt | 17 --
 Documentation/git-fast-import.txt | 23 
 builtin/fast-export.c | 67 --
 fast-import.c | 94 +++
 t/t9300-fast-import.sh| 37 
 t/t9350-fast-export.sh| 68 --
 6 files changed, 268 insertions(+), 38 deletions(-)

-- 
2.23.0.177.g8af0b3ca64

[PATCH 8/8] fast-export: handle nested tags

2019-09-24 Thread Elijah Newren

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 30 ++
 t/t9350-fast-export.sh |  2 +-
 2 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index d32e1e9327..58a74de42a 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -843,22 +843,28 @@ static void handle_tag(const char *name, struct tag *tag)
free(buf);
return;
case REWRITE:
-   if (tagged->type != OBJ_COMMIT) {
-   die("tag %s tags unexported %s!",
-   oid_to_hex(&tag->object.oid),
-   type_name(tagged->type));
-   }
-   p = rewrite_commit((struct commit *)tagged);
-   if (!p) {
-   printf("reset %s\nfrom %s\n\n",
-  name, oid_to_hex(&null_oid));
-   free(buf);
-   return;
+   if (tagged->type == OBJ_TAG && !mark_tags) {
+   die(_("Error: Cannot export nested tags unless 
--mark-tags is specified."));
+   } else if (tagged->type == OBJ_COMMIT) {
+   p = rewrite_commit((struct commit *)tagged);
+   if (!p) {
+   printf("reset %s\nfrom %s\n\n",
+  name, oid_to_hex(&null_oid));
+   free(buf);
+   return;
+   }
+   tagged_mark = get_object_mark(&p->object);
+   } else {
+   /* tagged->type is either OBJ_BLOB or OBJ_TAG */
+   tagged_mark = get_object_mark(tagged);
}
-   tagged_mark = get_object_mark(&p->object);
}
}
 
+   if (tagged->type == OBJ_TAG) {
+   printf("reset %s\nfrom %s\n\n",
+  name, oid_to_hex(&null_oid));
+   }
if (starts_with(name, "refs/tags/"))
name += 10;
printf("tag %s\n", name);
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index 9ab281e4b9..2e4e214815 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -567,7 +567,7 @@ test_expect_success 'handling tags of blobs' '
test_cmp expect actual
 '
 
-test_expect_failure 'handling nested tags' '
+test_expect_success 'handling nested tags' '
git tag -a -m "This is a nested tag" nested muss &&
git fast-export --mark-tags nested >output &&
grep "^from $ZERO_OID$" output &&
-- 
2.23.0.177.g8af0b3ca64

[PATCH 5/8] fast-export: add support for --import-marks-if-exists

2019-09-24 Thread Elijah Newren

fast-import has support for both an --import-marks flag and an
--import-marks-if-exists flag; the latter of which will not die() if the
file does not exist.  fast-export only had support for an --import-marks
flag; add an --import-marks-if-exists flag for consistency.

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  | 23 +++
 t/t9350-fast-export.sh | 10 --
 2 files changed, 23 insertions(+), 10 deletions(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 5822271c6b..575e47833b 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -1052,11 +1052,16 @@ static void export_marks(char *file)
error("Unable to write marks file %s.", file);
 }
 
-static void import_marks(char *input_file)
+static void import_marks(char *input_file, int check_exists)
 {
char line[512];
-   FILE *f = xfopen(input_file, "r");
+   FILE *f;
+   struct stat sb;
+
+   if (check_exists && stat(input_file, &sb))
+   return;
 
+   f = xfopen(input_file, "r");
while (fgets(line, sizeof(line), f)) {
uint32_t mark;
char *line_end, *mark_end;
@@ -1120,7 +1125,9 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
struct rev_info revs;
struct object_array commits = OBJECT_ARRAY_INIT;
struct commit *commit;
-   char *export_filename = NULL, *import_filename = NULL;
+   char *export_filename = NULL,
+*import_filename = NULL,
+*import_filename_if_exists = NULL;
uint32_t lastimportid;
struct string_list refspecs_list = STRING_LIST_INIT_NODUP;
struct string_list paths_of_changed_objects = STRING_LIST_INIT_DUP;
@@ -1140,6 +1147,10 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
 N_("Dump marks to this file")),
OPT_STRING(0, "import-marks", &import_filename, N_("file"),
 N_("Import marks from this file")),
+   OPT_STRING(0, "import-marks-if-exists",
+&import_filename_if_exists,
+N_("file"),
+N_("Import marks from this file if it exists")),
OPT_BOOL(0, "fake-missing-tagger", &fake_missing_tagger,
 N_("Fake a tagger when tags lack one")),
OPT_BOOL(0, "full-tree", &full_tree,
@@ -1187,8 +1198,12 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
if (use_done_feature)
printf("feature done\n");
 
+   if (import_filename && import_filename_if_exists)
+   die(_("Cannot pass both --import-marks and 
--import-marks-if-exists"));
if (import_filename)
-   import_marks(import_filename);
+   import_marks(import_filename, 0);
+   else if (import_filename_if_exists)
+   import_marks(import_filename_if_exists, 1);
lastimportid = last_idnum;
 
if (import_filename && revs.prune_data.nr)
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index d32ff41859..ea84e2f173 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -580,17 +580,15 @@ test_expect_success 'fast-export quotes pathnames' '
 '
 
 test_expect_success 'test bidirectionality' '
-   >marks-cur &&
-   >marks-new &&
git init marks-test &&
-   git fast-export --export-marks=marks-cur --import-marks=marks-cur 
--branches | \
-   git --git-dir=marks-test/.git fast-import --export-marks=marks-new 
--import-marks=marks-new &&
+   git fast-export --export-marks=marks-cur 
--import-marks-if-exists=marks-cur --branches | \
+   git --git-dir=marks-test/.git fast-import --export-marks=marks-new 
--import-marks-if-exists=marks-new &&
(cd marks-test &&
git reset --hard &&
echo Wohlauf > file &&
git commit -a -m "back in time") &&
-   git --git-dir=marks-test/.git fast-export --export-marks=marks-new 
--import-marks=marks-new --branches | \
-   git fast-import --export-marks=marks-cur --import-marks=marks-cur
+   git --git-dir=marks-test/.git fast-export --export-marks=marks-new 
--import-marks-if-exists=marks-new --branches | \
+   git fast-import --export-marks=marks-cur 
--import-marks-if-exists=marks-cur
 '
 
 cat > expected << EOF
-- 
2.23.0.177.g8af0b3ca64

[PATCH 4/8] fast-import: add support for new 'alias' command

2019-09-24 Thread Elijah Newren

fast-export and fast-import have nice --import-marks flags which allow
for incremental migrations.  However, if there is a mark in
fast-export's file of marks without a corresponding mark in the one for
fast-import, then we run the risk that fast-export tries to send new
objects relative to the mark it knows which fast-import does not,
causing fast-import to fail.

This arises in practice when there is a filter of some sort running
between the fast-export and fast-import processes which prunes some
commits programmatically.  Provide such a filter with the ability to
alias pruned commits to their most recent non-pruned ancestor.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-import.txt | 22 +++
 fast-import.c | 62 ++-
 t/t9300-fast-import.sh|  5 +++
 3 files changed, 79 insertions(+), 10 deletions(-)

diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index 4977869465..a3f1e0c5e4 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -337,6 +337,13 @@ and control the current import process.  More detailed 
discussion
`commit` command.  This command is optional and is not
needed to perform an import.
 
+`alias`::
+   Record that a mark refers to a given object without first
+   creating any new object.  Using --import-marks and referring
+   to missing marks will cause fast-import to fail, so aliases
+   can provide a way to set otherwise pruned commits to a valid
+   value (e.g. the nearest non-pruned ancestor).
+
 `checkpoint`::
Forces fast-import to close the current packfile, generate its
unique SHA-1 checksum and index, and start a new packfile.
@@ -914,6 +921,21 @@ a data chunk which does not have an LF as its last byte.
 +
 The `LF` after ` LF` is optional (it used to be required).
 
+`alias`
+~~~
+Record that a mark refers to a given object without first creating any
+new object.
+
+
+   'alias' LF
+   mark
+   'to' SP  LF
+   LF?
+
+
+For a detailed description of `` see above under `from`.
+
+
 `checkpoint`
 
 Forces fast-import to close the current packfile, start a new one, and to
diff --git a/fast-import.c b/fast-import.c
index 0271d81d0d..8228cde759 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2491,18 +2491,14 @@ static void parse_from_existing(struct branch *b)
}
 }
 
-static int parse_from(struct branch *b)
+static int parse_objectish(struct branch *b, const char *objectish)
 {
-   const char *from;
struct branch *s;
struct object_id oid;
 
-   if (!skip_prefix(command_buf.buf, "from ", &from))
-   return 0;
-
oidcpy(&oid, &b->branch_tree.versions[1].oid);
 
-   s = lookup_branch(from);
+   s = lookup_branch(objectish);
if (b == s)
die("Can't create a branch from itself: %s", b->name);
else if (s) {
@@ -2510,8 +2506,8 @@ static int parse_from(struct branch *b)
oidcpy(&b->oid, &s->oid);
oidcpy(&b->branch_tree.versions[0].oid, t);
oidcpy(&b->branch_tree.versions[1].oid, t);
-   } else if (*from == ':') {
-   uintmax_t idnum = parse_mark_ref_eol(from);
+   } else if (*objectish == ':') {
+   uintmax_t idnum = parse_mark_ref_eol(objectish);
struct object_entry *oe = find_mark(idnum);
if (oe->type != OBJ_COMMIT)
die("Mark :%" PRIuMAX " not a commit", idnum);
@@ -2525,13 +2521,13 @@ static int parse_from(struct branch *b)
} else
parse_from_existing(b);
}
-   } else if (!get_oid(from, &b->oid)) {
+   } else if (!get_oid(objectish, &b->oid)) {
parse_from_existing(b);
if (is_null_oid(&b->oid))
b->delete = 1;
}
else
-   die("Invalid ref name or SHA1 expression: %s", from);
+   die("Invalid ref name or SHA1 expression: %s", objectish);
 
if (b->branch_tree.tree && !oideq(&oid, 
&b->branch_tree.versions[1].oid)) {
release_tree_content_recursive(b->branch_tree.tree);
@@ -2542,6 +2538,26 @@ static int parse_from(struct branch *b)
return 1;
 }
 
+static int parse_from(struct branch *b)
+{
+   const char *from;
+
+   if (!skip_prefix(command_buf.buf, "from ", &from))
+   return 0;
+
+   return parse_objectish(b, from);
+}
+
+static int parse_objectish_with_prefix(struct branch *b, const char *prefix)
+{
+   const char *base;
+
+   if (!skip_prefix(command_buf.buf, prefix, &base))
+

[PATCH 3/8] fast-import: allow tags to be identified by mark labels

2019-09-24 Thread Elijah Newren

Mark identifiers are used in fast-export and fast-import to provide a
label to refer to earlier content.  Blobs are given labels because they
need to be referenced in the commits where they first appear with a
given filename, and commits are given labels because they can be the
parents of other commits.  Tags were never given labels, probably
because they were viewed as unnecessary, but that presents two problems:

   1. It leaves us without a way of referring to previous tags if we
  want to create a tag of a tag (or higher nestings).
   2. It leaves us with no way of recording that a tag has already been
  imported when using --export-marks and --import-marks.

Fix these problems by allowing an optional mark label for tags.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-import.txt |  1 +
 fast-import.c |  3 ++-
 t/t9300-fast-import.sh| 19 +++
 3 files changed, 22 insertions(+), 1 deletion(-)

diff --git a/Documentation/git-fast-import.txt 
b/Documentation/git-fast-import.txt
index 0bb276269e..4977869465 100644
--- a/Documentation/git-fast-import.txt
+++ b/Documentation/git-fast-import.txt
@@ -774,6 +774,7 @@ lightweight (non-annotated) tags see the `reset` command 
below.
 
 
'tag' SP  LF
+   mark?
'from' SP  LF
original-oid?
'tagger' (SP )? SP LT  GT SP  LF
diff --git a/fast-import.c b/fast-import.c
index dab905d667..0271d81d0d 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2713,6 +2713,7 @@ static void parse_new_tag(const char *arg)
first_tag = t;
last_tag = t;
read_next_command();
+   parse_mark();
 
/* from ... */
if (!skip_prefix(command_buf.buf, "from ", &from))
@@ -2769,7 +2770,7 @@ static void parse_new_tag(const char *arg)
strbuf_addbuf(&new_data, &msg);
free(tagger);
 
-   if (store_object(OBJ_TAG, &new_data, NULL, &t->oid, 0))
+   if (store_object(OBJ_TAG, &new_data, NULL, &t->oid, next_mark))
t->pack_id = MAX_PACK_ID;
else
t->pack_id = pack_id;
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 74bc41333b..3ad2b2f1ba 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -94,6 +94,23 @@ test_expect_success 'A: create pack from stdin' '
reset refs/tags/to-be-deleted
from 
 
+   tag nested
+   mark :6
+   from :4
+   data <

[PATCH 1/8] fast-export: fix exporting a tag and nothing else

2019-09-24 Thread Elijah Newren

fast-export allows specifying revision ranges, which can be used to
export a tag without exporting the commit it tags.  fast-export handled
this rather poorly: it would emit a "from :0" directive.  Since marks
start at 1 and increase, this means it refers to an unknown commit and
fast-import will choke on the input.

When we are unable to look up a mark for the object being tagged, use a
"from $HASH" directive instead to fix this problem.

Note that this is quite similar to the behavior fast-export exhibits
with commits and parents when --reference-excluded-parents is passed
along with an excluded commit range.  For tags of excluded commits we do
not require the --reference-excluded-parents flag because we always have
to tag something.  By contrast, when dealing with commits, pruning a
parent is always a viable option, so we need the flag to specify that
parent pruning is not wanted.  (It is slightly weird that
--reference-excluded-parents isn't the default with a separate
--prune-excluded-parents flag, but backward compatibility concerns
resulted in the current defaults.)

Signed-off-by: Elijah Newren 
---
 builtin/fast-export.c  |  7 ++-
 t/t9350-fast-export.sh | 13 +
 2 files changed, 19 insertions(+), 1 deletion(-)

diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index f541f55d33..5822271c6b 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -860,7 +860,12 @@ static void handle_tag(const char *name, struct tag *tag)
 
if (starts_with(name, "refs/tags/"))
name += 10;
-   printf("tag %s\nfrom :%d\n", name, tagged_mark);
+   printf("tag %s\n", name);
+   if (tagged_mark)
+   printf("from :%d\n", tagged_mark);
+   else
+   printf("from %s\n", oid_to_hex(&tagged->oid));
+
if (show_original_ids)
printf("original-oid %s\n", oid_to_hex(&tag->object.oid));
printf("%.*s%sdata %d\n%.*s\n",
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index b4004e05c2..d32ff41859 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -53,6 +53,19 @@ test_expect_success 'fast-export | fast-import' '
 
 '
 
+test_expect_success 'fast-export ^muss^{commit} muss' '
+   git fast-export --tag-of-filtered-object=rewrite ^muss^{commit} muss 
>actual &&
+   cat >expected <<-EOF &&
+   tag muss
+   from $(git rev-parse --verify muss^{commit})
+   $(git cat-file tag muss | grep tagger)
+   data 9
+   valentin
+
+   EOF
+   test_cmp expected actual
+'
+
 test_expect_success 'fast-export master~2..master' '
 
git fast-export master~2..master >actual &&
-- 
2.23.0.177.g8af0b3ca64

[PATCH 6/8] fast-export: allow user to request tags be marked with --mark-tags

2019-09-24 Thread Elijah Newren

Add a new option, --mark-tags, which will output mark identifiers with
each tag object.  This improves the incremental export story with
--export-marks since it will allow us to record that annotated tags have
been exported, and it is also needed as a step towards supporting nested
tags.

Signed-off-by: Elijah Newren 
---
 Documentation/git-fast-export.txt | 17 +
 builtin/fast-export.c |  7 +++
 t/t9350-fast-export.sh| 14 ++
 3 files changed, 34 insertions(+), 4 deletions(-)

diff --git a/Documentation/git-fast-export.txt 
b/Documentation/git-fast-export.txt
index cc940eb9ad..c522b34f7b 100644
--- a/Documentation/git-fast-export.txt
+++ b/Documentation/git-fast-export.txt
@@ -75,11 +75,20 @@ produced incorrect results if you gave these options.
Before processing any input, load the marks specified in
.  The input file must exist, must be readable, and
must use the same format as produced by --export-marks.
+
+--mark-tags::
+   In addition to labelling blobs and commits with mark ids, also
+   label tags.  This is useful in conjunction with
+   `--export-marks` and `--import-marks`, and is also useful (and
+   necessary) for exporting of nested tags.  It does not hurt
+   other cases and would be the default, but many fast-import
+   frontends are not prepared to accept tags with mark
+   identifiers.
 +
-Any commits that have already been marked will not be exported again.
-If the backend uses a similar --import-marks file, this allows for
-incremental bidirectional exporting of the repository by keeping the
-marks the same across runs.
+Any commits (or tags) that have already been marked will not be
+exported again.  If the backend uses a similar --import-marks file,
+this allows for incremental bidirectional exporting of the repository
+by keeping the marks the same across runs.
 
 --fake-missing-tagger::
Some old repositories have tags without a tagger.  The
diff --git a/builtin/fast-export.c b/builtin/fast-export.c
index 575e47833b..d32e1e9327 100644
--- a/builtin/fast-export.c
+++ b/builtin/fast-export.c
@@ -40,6 +40,7 @@ static int no_data;
 static int full_tree;
 static int reference_excluded_commits;
 static int show_original_ids;
+static int mark_tags;
 static struct string_list extra_refs = STRING_LIST_INIT_NODUP;
 static struct string_list tag_refs = STRING_LIST_INIT_NODUP;
 static struct refspec refspecs = REFSPEC_INIT_FETCH;
@@ -861,6 +862,10 @@ static void handle_tag(const char *name, struct tag *tag)
if (starts_with(name, "refs/tags/"))
name += 10;
printf("tag %s\n", name);
+   if (mark_tags) {
+   mark_next_object(&tag->object);
+   printf("mark :%"PRIu32"\n", last_idnum);
+   }
if (tagged_mark)
printf("from :%d\n", tagged_mark);
else
@@ -1165,6 +1170,8 @@ int cmd_fast_export(int argc, const char **argv, const 
char *prefix)
 &reference_excluded_commits, N_("Reference parents 
which are not in fast-export stream by object id")),
OPT_BOOL(0, "show-original-ids", &show_original_ids,
N_("Show original object ids of blobs/commits")),
+   OPT_BOOL(0, "mark-tags", &mark_tags,
+   N_("Label tags with mark ids")),
 
OPT_END()
};
diff --git a/t/t9350-fast-export.sh b/t/t9350-fast-export.sh
index ea84e2f173..b3fca6ffba 100755
--- a/t/t9350-fast-export.sh
+++ b/t/t9350-fast-export.sh
@@ -66,6 +66,20 @@ test_expect_success 'fast-export ^muss^{commit} muss' '
test_cmp expected actual
 '
 
+test_expect_success 'fast-export --mark-tags ^muss^{commit} muss' '
+   git fast-export --mark-tags --tag-of-filtered-object=rewrite 
^muss^{commit} muss >actual &&
+   cat >expected <<-EOF &&
+   tag muss
+   mark :1
+   from $(git rev-parse --verify muss^{commit})
+   $(git cat-file tag muss | grep tagger)
+   data 9
+   valentin
+
+   EOF
+   test_cmp expected actual
+'
+
 test_expect_success 'fast-export master~2..master' '
 
git fast-export master~2..master >actual &&
-- 
2.23.0.177.g8af0b3ca64

[PATCH 2/8] fast-import: fix handling of deleted tags

2019-09-24 Thread Elijah Newren

If our input stream includes a tag which is later deleted, we were not
properly deleting it.  We did have a step which would delete it, but we
left a tag in the tag list noting that it needed to be updated, and the
updating of annotated tags occurred AFTER ref deletion.  So, when we
record that a tag needs to be deleted, also remove it from the list of
annotated tags to update.

While this has likely been something that has not happened in practice,
it will come up more in order to support nested tags.  For nested tags,
we either need to give temporary names to the intermediate tags and then
delete them, or else we need to use the final name for the intermediate
tags.  If we use the final name for the intermediate tags, then in order
to keep the sanity check that someone doesn't try to update the same tag
twice, we need to delete the ref after creating the intermediate tag.
So, either way nested tags imply the need to delete temporary inner tag
references.

Signed-off-by: Elijah Newren 
---
 fast-import.c  | 29 +
 t/t9300-fast-import.sh | 13 +
 2 files changed, 42 insertions(+)

diff --git a/fast-import.c b/fast-import.c
index b44d6a467e..dab905d667 100644
--- a/fast-import.c
+++ b/fast-import.c
@@ -2793,6 +2793,35 @@ static void parse_reset_branch(const char *arg)
b = new_branch(arg);
read_next_command();
parse_from(b);
+   if (b->delete && !strncmp(arg, "refs/tags/", 10)) {
+   /*
+* Elsewhere, we call dump_branches() before dump_tags(),
+* and dump_branches() will handle ref deletions first, so
+* in order to make sure the deletion actually takes effect,
+* we need to remove the tag from our list of tags to update.
+*
+* NEEDSWORK: replace list of tags with hashmap for faster
+* deletion?
+*/
+   struct strbuf tag_name = STRBUF_INIT;
+   struct tag *t, *prev = NULL;
+   for (t = first_tag; t; t = t->next_tag) {
+   strbuf_reset(&tag_name);
+   strbuf_addf(&tag_name, "refs/tags/%s", t->name);
+   if (!strcmp(arg, tag_name.buf))
+   break;
+   prev = t;
+   }
+   if (t) {
+   if (prev)
+   prev->next_tag = t->next_tag;
+   else
+   first_tag = t->next_tag;
+   if (!t->next_tag)
+   last_tag = prev;
+   /* There is no mem_pool_free(t) function to call. */
+   }
+   }
if (command_buf.len > 0)
unread_command_buf = 1;
 }
diff --git a/t/t9300-fast-import.sh b/t/t9300-fast-import.sh
index 141b7fa35e..74bc41333b 100755
--- a/t/t9300-fast-import.sh
+++ b/t/t9300-fast-import.sh
@@ -85,6 +85,15 @@ test_expect_success 'A: create pack from stdin' '
An annotated tag that annotates a blob.
EOF
 
+   tag to-be-deleted
+   from :3
+   data <expect <<-EOF &&
:2 $(git rev-parse --verify master:file2)
-- 
2.23.0.177.g8af0b3ca64

Re: [PATCH] t4038: Remove non-portable '-a' option passed to test_cmp

2019-09-23 Thread Elijah Newren

On Fri, Sep 20, 2019 at 3:07 PM CB Bailey  wrote:
>
> From: CB Bailey 
>
> Signed-off-by: CB Bailey 
> ---
>  t/t4038-diff-combined.sh | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/t/t4038-diff-combined.sh b/t/t4038-diff-combined.sh
> index d4afe12554..b9d876efa2 100755
> --- a/t/t4038-diff-combined.sh
> +++ b/t/t4038-diff-combined.sh
> @@ -509,7 +509,7 @@ test_expect_success FUNNYNAMES '--combined-all-paths and 
> --raw and funny names'
>  test_expect_success FUNNYNAMES '--combined-all-paths and --raw -and -z and 
> funny names' '
> printf "aaf8087c3cbd4db8e185a2d074cf27c53cfb75d7\0::100644 100644 
> 100644 f00c965d8307308469e537302baa73048488f162 
> 088bd5d92c2a8e0203ca8e7e4c2a5c692f6ae3f7 
> 333b9c62519f285e1854830ade0fe1ef1d40ee1b 
> RR\0file\twith\ttabs\0i\tam\ttabbed\0fickle\tnaming\0" >expect &&
> git diff-tree -c -M --raw --combined-all-paths -z HEAD >actual &&
> -   test_cmp -a expect actual
> +   test_cmp expect actual
>  '

This will mean slightly less useful diagnostic output should the test
ever fail on a platform that does support diff -a, but that's a small
price to pay to make sure the test is portable.  If anyone does ever
see this test fail, they can go in and inspect further themselves.

Thanks for the fix.

Re: [DISCUSSION] Growing the Git community

2019-09-19 Thread Elijah Newren

On Thu, Sep 19, 2019 at 11:37 AM Derrick Stolee  wrote:
>
> During the Virtual Git Contributors' Summit, Dscho brought up the topic of
> "Inclusion & Diversity". We discussed ideas for how to make the community
> more welcoming to new contributors of all kinds. Let's discuss some of
> the ideas we talked about, and some that have been growing since.
>
> Feel free to pick apart all of the claims I make below. This is based
> on my own experience and opinions. It should be a good baseline
> for us to all arrive with valuable action items.
>
> I have CC'd some of the people who were part of that discussion. Sorry
> if I accidentally left someone out.

Thanks for working on this.  I like the overall thrust, and many of
the concrete proposals.  I've got lots of comments and feedback, and
if I focus too much on things that could be improved, just remember I
like the overall thrust.

> I. Goals and Perceived Problems
>
> As a community, our number one goal is for Git to continue to be the best
> distributed version control system. At minimum, it should continue to be
> the most widely-used DVCS.

I'd rather we stated our goal in terms of what problems we are trying
to address rather than accolades we want sent our way.  E.g. "Our goal
is to make developers more productive by providing them increasingly
useful version control software".

> Towards that goal, we need to make sure Git is
> the best solution for every kind of developer in every industry. The
> community cannot do this without including developers of all kinds. This

This sounds much too strongly worded to me.  I don't like the idea of
everything for everyone; it suggests that if someone comes up with a
one-off usecase that affect 3 people in the world, we have to devote
resources to it (even at the risk of making ongoing maintenance
harder).  I would prefer a statement like we want to solve more
usecases than we do today, and we want to bring in developers from a
diverse background to help us do so.

> means having a diverse community, for all senses of the word: Diverse in
> physical location, gender, professional status, age, and others.

The combination of wording above ("need to...cannot do this...all
kinds...all senses of the word") suggests that more extreme measures
are in scope.  For example, what about programming language?  C is
going to restrict us to a small and possibly shrinking set of
developers.  I think that changing language is far-fetched and not
worth it, but the wording above would suggest it.

A different way to avoid such interpretations might be if you can find
a way to imbue the document with a "evolutionary not revolutionary"
feeling or wording.

> In addition, the community must continue to grow, but members leave the

"must"?  I agree that we want to grow, but "must" suggests a
priortization level of effort that makes me uneasy.  If you said that
we find it really important and will invest resources in it, then I'm
all for it.

> community on a regular basis for multiple reasons. New contributors must
> join and mature within the community or the community will dwindle. Without
> dedicating effort and attention to this, natural forces may result in the
> community being represented only by contributors working at large tech
> companies focused on the engineering systems of very large groups.
>
> It is worth noting that this community growth must never be at the cost
> of code quality. We must continue to hold all contributors to a high
> standard so Git stays a stable product.
>
> Here are some problems that may exist within the Git community and may
> form a barrier to new contributors entering:
>
> 1. Discovering how to contribute to Git is non-obvious.
>
> 2. Submitting to a mailing list is a new experience for most developers.
>This includes the full review and discussion process.
>
> 3. The high standards for patch quality are intimidating to new contributors.
>
> 4. Some people do not feel comfortable engaging in a community without
>a clear Code of Conduct. This discomfort is significant and based on real
>experiences throughout society.
>
> 5. Since Git development happens in a different place than where users
> acquire the end product, some are not aware that they can contribute.
>
> II. Approach
>
> The action items below match the problems listed above.
>
> 1. Improve the documentation for contributing to Git.
>
> In preparation for this email, I talked to someone familiar with issues
> around new contributors, and they sat down to try and figure out how to
> contribute to Git. The first place they went was https://github.com/git/git
> and looked at the README. It takes deep reading of a paragraph to see a
> link to the SubmittingPatches docs.
>
> To improve this experience, we could rewrite the README to have clearer
> section markers, including one "Contributing to Git" section relatively
> high in the doc. We may want to update the README for multiple reasons.
> It should link to the new "My F

Re: [PATCH v2] merge-recursive: symlink's descendants not in way

2019-09-18 Thread Elijah Newren

On Wed, Sep 18, 2019 at 1:27 PM Jonathan Tan  wrote:
>
> When the working tree has:
>  - bar (directory)
>  - bar/file (file)
>  - foo (symlink to .)
>
> (note that lstat() for "foo/bar" would tell us that it is a directory)
>
> and the user merges a commit that deletes the foo symlink and instead
> contains:
>  - bar (directory, as above)
>  - bar/file (file, as above)
>  - foo (directory)
>  - foo/bar (file)
>
> the merge should happen without requiring user intervention. However,
> this does not happen.
>
> This is because dir_in_way(), when checking the working tree, thinks
> that "foo/bar" is a directory. But a symlink should be treated much the
> same as a file: since dir_in_way() is only checking to see if there is a
> directory in the way, we don't want symlinks in leading paths to
> sometimes cause dir_in_way() to return true.
>
> Teach dir_in_way() to also check for symlinks in leading paths before
> reporting whether a directory is in the way.
>
> Helped-by: Elijah Newren 
> Signed-off-by: Jonathan Tan 
> ---
> Changes from v1:
>
> - Used has_symlink_leading_path(). This drastically shortens the diff.
> - Updated commit message following suggestions from Junio, Szeder Gábor,
>   and Elijah Newren.
> - Updated test to add prereq and verification that the working tree
>   contains what we want.
> ---
>  merge-recursive.c  |  3 ++-
>  t/t3030-merge-recursive.sh | 28 
>  2 files changed, 30 insertions(+), 1 deletion(-)
>
> diff --git a/merge-recursive.c b/merge-recursive.c
> index 6b812d67e3..22a12cfeba 100644
> --- a/merge-recursive.c
> +++ b/merge-recursive.c
> @@ -764,7 +764,8 @@ static int dir_in_way(struct index_state *istate, const 
> char *path,
>
> strbuf_release(&dirpath);
> return check_working_copy && !lstat(path, &st) && S_ISDIR(st.st_mode) 
> &&
> -   !(empty_ok && is_empty_dir(path));
> +   !(empty_ok && is_empty_dir(path)) &&
> +   !has_symlink_leading_path(path, strlen(path));
>  }
>
>  /*
> diff --git a/t/t3030-merge-recursive.sh b/t/t3030-merge-recursive.sh
> index ff641b348a..faa8892741 100755
> --- a/t/t3030-merge-recursive.sh
> +++ b/t/t3030-merge-recursive.sh
> @@ -452,6 +452,34 @@ test_expect_success 'merge-recursive d/f conflict 
> result' '
>
>  '
>
> +test_expect_success SYMLINKS 'dir in working tree with symlink ancestor does 
> not produce d/f conflict' '
> +   git init sym &&
> +   (
> +   cd sym &&
> +   ln -s . foo &&
> +   mkdir bar &&
> +   >bar/file &&
> +   git add foo bar/file &&
> +   git commit -m "foo symlink" &&
> +
> +   git checkout -b branch1 &&
> +   git commit --allow-empty -m "empty commit" &&
> +
> +   git checkout master &&
> +   git rm foo &&
> +   mkdir foo &&
> +   >foo/bar &&
> +   git add foo/bar &&
> +   git commit -m "replace foo symlink with real foo dir and 
> foo/bar file" &&
> +
> +   git checkout branch1 &&
> +
> +   git cherry-pick master &&
> +   test_path_is_dir foo &&
> +   test_path_is_file foo/bar
> +   )
> +'
> +
>  test_expect_success 'reset and 3-way merge' '
>
> git reset --hard "$c2" &&
> --

Looks good to me; nice how much it has simplified.  Thanks for working on this.

> 2.23.0.237.gc6a4ce50a0-goog

A total tangent, but what do you use the "-goog" suffix for?

Re: [PATCH 4/9] sparse-checkout: 'add' subcommand

2019-09-18 Thread Elijah Newren

On Wed, Sep 18, 2019 at 6:55 AM Derrick Stolee  wrote:
>
> On 8/23/2019 7:30 PM, Elijah Newren wrote:
> > On Tue, Aug 20, 2019 at 8:12 AM Derrick Stolee via GitGitGadget
> >  wrote:
> >>
...
> >> diff --git a/t/t1091-sparse-checkout-builtin.sh 
> >> b/t/t1091-sparse-checkout-builtin.sh
> >> index b7d5f15830..499bd8d6d0 100755
> >> --- a/t/t1091-sparse-checkout-builtin.sh
> >> +++ b/t/t1091-sparse-checkout-builtin.sh
> >> @@ -100,4 +100,24 @@ test_expect_success 'clone --sparse' '
> >> test_cmp expect dir
> >>  '
> >>
> >> +test_expect_success 'add to existing sparse-checkout' '
> >> +   echo "/folder2/*" | git -C repo sparse-checkout add &&
> >
> > I've always been using '/folder2/' in sparse-checkout, without the
> > trailing asterisk.  That seems more friendly for cone mode too.  Are
> > there benefits to keeping the trailing asterisk?
>
> I think I've been seeing issues with pattern matching on Windows without
> the trailing asterisk. I'm currently double-checking to make sure this
> is important or not.

Can you try with the en/clean-nested-with-ignored topic in pu to see
if that fixes those issues?

Re: [RFC PATCH] merge-recursive: symlink's descendants not in way

2019-09-17 Thread Elijah Newren

Hi Jonathan,

On Tue, Sep 17, 2019 at 2:50 PM Jonathan Tan  wrote:
>
> When the working tree has:
>  - foo (symlink)
>  - foo/bar (directory)
>
> and the user merges a commit that deletes the foo symlink and instead
> contains:
>  - foo (directory)
>  - foo/bar (file)
>
> the merge should happen without requiring user intervention. However,
> this does not happen.
>
> In merge_trees(), process_entry() will be invoked first for "foo/bar",
> then "foo" (in reverse lexicographical order). process_entry() correctly
> reaches "Case B: Added in one", but dir_in_way() states that "bar" is
> already present as a directory, causing a directory/file conflict at the
> wrong point.

I don't think the notes about hitting the "Case B: Added in one"
codepath help; that's only one codepath that calls dir_in_way(), and
I'm pretty sure with a little work we could trigger the same bug with
the other ones.

> Instead, teach dir_in_way() that directories under symlinks are not "in
> the way", so that symlinks are treated as regular files instead of
> directories containing other directories and files. Thus, the "else"
> branch will be followed instead: "foo/bar" will be added to the working
> tree, make_room_for_path() being indirectly called to unlink the "foo"
> symlink (just like if "foo" were a regular file instead). When
> process_entry() is subsequently invoked for "foo", process_entry() will
> reach "Case A: Deleted in one", and will handle it as "Add/delete" or
> "Modify/delete" appropriately (including reinstatement of the previously
> unlinked symlink with a new unique filename if necessary, again, just
> like if "foo" were a regular file instead).

I was trying to think of a way to summarize it a bit, and then Junio
later in the thread came in and provided a different and compatible
way to view the issue that summarizes it quite nicely:

"In any case, if the working tree has 'foo' as a symlink, Git should
not look at or get affected by what 'foo' points at."

We can probably make the commit message pretty concise using that
wording or something similar.  Maybe adding something like "In
particular, the presence of a symlink should be treated much the same
as the presence of a file; since dir_in_way() is only checking to see
if there is a directory in the way, we don't want symlinks in leading
paths to sometimes cause dir_in_way() to return true."

>
> Helped-by: Elijah Newren 
> Signed-off-by: Jonathan Tan 
> ---
> Thanks to Elijah for his help. Some of the commit message is based on
> his explanation [1].
>
> I'm finding this relatively complicated, so I'm sending this as RFC. My
> main concern is that whether all callers of dir_in_way() are OK with its
> behavior change, and if yes, how to explain it. I suspect that this is
> correct because dir_in_way() should behave consistently for all its
> callers, but I might be wrong.

Yes, we want all callers of dir_in_way() to get this change; if they
don't, I'm pretty sure with some work we could devise special
scenarios that exhibit the same bug.



Thanks for working on this,
Elijah

[PATCH v4 10/12] clean: avoid removing untracked files in a nested git repository

2019-09-17 Thread Elijah Newren

Users expect files in a nested git repository to be left alone unless
sufficiently forced (with two -f's).  Unfortunately, in certain
circumstances, git would delete both tracked (and possibly dirty) files
and untracked files within a nested repository.  To explain how this
happens, let's contrast a couple cases.  First, take the following
example setup (which assumes we are already within a git repo):

   git init nested
   cd nested
   >tracked
   git add tracked
   git commit -m init
   >untracked
   cd ..

In this setup, everything works as expected; running 'git clean -fd'
will result in fill_directory() returning the following paths:
   nested/
   nested/tracked
   nested/untracked
and then correct_untracked_entries() would notice this can be compressed
to
   nested/
and then since "nested/" is a directory, we would call
remove_dirs("nested/", ...), which would
check is_nonbare_repository_dir() and then decide to skip it.

However, if someone also creates an ignored file:
   >nested/ignored
then running 'git clean -fd' would result in fill_directory() returning
the same paths:
   nested/
   nested/tracked
   nested/untracked
but correct_untracked_entries() will notice that we had ignored entries
under nested/ and thus simplify this list to
   nested/tracked
   nested/untracked
Since these are not directories, we do not call remove_dirs() which was
the only place that had the is_nonbare_repository_dir() safety check --
resulting in us deleting both the untracked file and the tracked (and
possibly dirty) file.

One possible fix for this issue would be walking the parent directories
of each path and checking if they represent nonbare repositories, but
that would be wasteful.  Even if we added caching of some sort, it's
still a waste because we should have been able to check that "nested/"
represented a nonbare repository before even descending into it in the
first place.  Add a DIR_SKIP_NESTED_GIT flag to dir_struct.flags and use
it to prevent fill_directory() and friends from descending into nested
git repos.

With this change, we also modify two regression tests added in commit
91479b9c72f1 ("t7300: add tests to document behavior of clean and nested
git", 2015-06-15).  That commit, nor its series, nor the six previous
iterations of that series on the mailing list discussed why those tests
coded the expectation they did.  In fact, it appears their purpose was
simply to test _existing_ behavior to make sure that the performance
changes didn't change the behavior.  However, these two tests directly
contradicted the manpage's claims that two -f's were required to delete
files/directories under a nested git repository.  While one could argue
that the user gave an explicit path which matched files/directories that
were within a nested repository, there's a slippery slope that becomes
very difficult for users to understand once you go down that route (e.g.
what if they specified "git clean -f -d '*.c'"?)  It would also be hard
to explain what the exact behavior was; avoid such problems by making it
really simple.

Also, clean up some grammar errors describing this functionality in the
git-clean manpage.

Finally, there are still a couple bugs with -ffd not cleaning out enough
(e.g.  missing the nested .git) and with -ffdX possibly cleaning out the
wrong files (paying attention to outer .gitignore instead of inner).
This patch does not address these cases at all (and does not change the
behavior relative to those flags), it only fixes the handling when given
a single -f.  See
https://public-inbox.org/git/20190905212043.gc32...@szeder.dev/ for more
discussion of the -ffd[X?] bugs.

Signed-off-by: Elijah Newren 
---
 Documentation/git-clean.txt |  6 +++---
 builtin/clean.c |  2 ++
 dir.c   | 10 ++
 dir.h   |  3 ++-
 t/t7300-clean.sh| 10 +-
 5 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/Documentation/git-clean.txt b/Documentation/git-clean.txt
index 3ab749b921..ba31d8d166 100644
--- a/Documentation/git-clean.txt
+++ b/Documentation/git-clean.txt
@@ -37,9 +37,9 @@ OPTIONS
 --force::
If the Git configuration variable clean.requireForce is not set
to false, 'git clean' will refuse to delete files or directories
-   unless given -f or -i. Git will refuse to delete directories
-   with .git sub directory or file unless a second -f
-   is given.
+   unless given -f or -i.  Git will refuse to modify untracked
+   nested git repositories (directories with a .git subdirectory)
+   unless a second -f is given.
 
 -i::
 --interactive::
diff --git a/builtin/clean.c b/builtin/clean.c
index 68d70e41c0..3a7a63ae71 100644
--- a/builtin/clean.c
+++ b/builtin/clean.c
@@ -946,6 +946,8 @@ int cmd_clean(int argc, const char **argv, const char 
*prefix)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1788 matches

Mail list logo