Re: Small issue with "add untracked" option of 'git add -i'

2017-06-20 Thread Kevin Daudt
On Wed, Jun 21, 2017 at 08:25:26AM +0530, Kaartic Sivaraam wrote:
> 
> I tried applying the patch and building it locally. For some reason I
> couldn't see the change in effect. What could I be missing?
> 

Did you make sure you used the git you built, and also the relevant
subcommands?

What does `which git` and git --exec-path return?

If you did not install git, but try to run it in the compiled dir, you
can better run bin-wrappers/git, which will make sure the correct sub
commands are run.


Re: [PATCHv2] submodules: overhaul documentation

2017-06-20 Thread Jonathan Tan
Thanks, this looks like a good explanation. Some more nits, but
overall I feel like I understand this and have learned something from
it.

On Tue, Jun 20, 2017 at 3:56 PM, Stefan Beller  wrote:
> +A submodule is another Git repository tracked inside a repository.
> +The tracked repository has its own history, which does not
> +interfere with the history of the current repository.
> +
> +It consists of a tracking subdirectory in the working directory,
> +a 'gitlink' in the working tree and an entry in the `.gitmodules`

Probably should be `gitlink` (the special quotes), and (optional)
s/`gitlink`/`gitlink` object/ because it might not be apparent that
gitlink is a type of object.

> +file (see linkgit:gitmodules[5]) at the root of the source tree.

After reading below, maybe we should mention the Git directory in
$GIT_DIR/modules/ as part of what a submodule consists
of too.

> +The tracking subdirectory appears in the main repositorys working

s/repositorys/repository's/ (apostrophe is also missing in some other
places below)

> +tree at the point where the submodules gitlink is tracked in the
> +tree.  It is empty when the submodule is not populated, otherwise
> +it contains the content of the submodule repository.
> +The main repository is often referred to as superproject.
> +
> +The gitlink contains the object name of a particular commit
> +of the submodule.
> +
> +The `.gitmodules` file establishes a relationship between the
> +path, which is where the gitlink is in the tree, and the logical
> +name, which is used for the location of the submodules git
> +directory. The `.gitmodules` file has the same syntax as the
> +$Git_DIR/config file and the mapping of path to name

Capitalization of $GIT_DIR

> +is done via setting `submodule..path = `.

(Optional) I would prefer  and  to be consistent with the
following paragraph.

> +The submodules git directory is found in in the main repositories
> +'$GIT_DIR/modules/' or inside the tracking subdirectory.
> +
> +Submodules can be used for at least two different use cases:
> +
> +1. Using another project while maintaining independent history.
> +  Submodules allow you to contain the working tree of another project
> +  within your own working tree while keeping the history of both
> +  projects separate. Also, since submodules are fixed to a an arbitrary
> +  version, the other project can be independently developed without
> +  affecting the superproject, allowing the superproject project to
> +  fix itself to new versions only whenever desired.
> +
> +2. Splitting a (logically single) project into multiple
> +   repositories and tying them back together. This can be used to
> +   overcome current limitations of Gits implementation to have
> +   finer grained access:
> +
> +* Size of the git repository
> +  In its current form Git scales up poorly for very large repositories that
> +  change a lot, as the history grows very large.
> +  However you can also use submodules to e.g. hold large binary assets
> +  and these repositories are then shallowly cloned such that you do not
> +  have a large history locally.
> +
> +* Transfer size
> +  In its current form Git requires the whole working tree present. It
> +  does not allow partial trees to be transferred in fetch or clone.
> +
> +* Access control
> +  By restricting user access to submodules, this can be used to implement
> +  read/write policies for different users.

The bullet points should probably be indented more.

[snip]

> +Deleting a submodule
> +
> +
> +Deleting a submodule can happen on different levels:
> +
> +1) Removing it from the local working tree without tampering with
> +   the history of the superproject.
> +
> +You may no longer need the submodule, but still want to keep it recorded
> +in the superproject history as others may have use for it. The command
> +`git submodule deinit ` will remove any configuration
> +entries from the config file, such that the submodule becomes

s=config=$GIT_DIR/config= (since there are multiple relevant config files)

> +uninitialized. The tracking directory in the superprojects working
> +tree that holds the submodules working directory is emptied.
> +This step can be undone via `git submodule init`.
> +
> +2) Remove it from history:
> +--
> +   git rm 
> +   git commit
> +--
> +This removes the submodules gitlink from the superprojects tree, as well
> +as removing the entries from the `.gitmodules` file, but keeps the
> +local configuration for the submodule. This can be undone using `git revert`.
> +
> +
> +3) Remove the submodules git directory:
> +
> +When you also want to free up the disk space that the submodules git
> +directory uses, you have to delete it manually as this
> +step cannot be undone using git tools. It is found in `$GIT_DIR/modules`.


[PATCH/FINALRFC] Documentation/git-submodule: cleanup

2017-06-20 Thread Kaartic Sivaraam
The "add" section for 'git-submodule' is redundant in
its description and the short synopsis line. Fix it.

Remove the redundant mentioning of the 'repository' argument
being mandatory.

The text is hard to read because of back-references, so remove
those.

Replace the word "humanish" by "canonical" as that conveys better
what we do to guess the path.

While at it, quote all occurrences of '.gitmodules' as that is an
important file in the submodule context, also link to it on its
first mention.

Helped-by: Stefan Beller 
Signed-off-by: Kaartic Sivaraam 
---
 In case no other changes are required then this is the final version
 of the patch.


 Documentation/git-submodule.txt | 49 ++---

 1 file changed, 21 insertions(+), 28 deletions(-)

diff --git a/Documentation/git-submodule.txt b/Documentation/git-
submodule.txt
index 74bc6200d..6e07bade3 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -63,14 +63,6 @@ add [-b ] [-f|--force] [--name ] [
--reference ] [--dep
    to the changeset to be committed next to the current
    project: the current project is termed the "superproject".
 +
-This requires at least one argument: . The optional
-argument  is the relative location for the cloned submodule
-to exist in the superproject. If  is not given, the
-"humanish" part of the source repository is used ("repo" for
-"/path/to/repo.git" and "foo" for "host.xz:foo/.git").
-The  is also used as the submodule's logical name in its
-configuration entries unless `--name` is used to specify a logical
name.
-+
  is the URL of the new submodule's origin repository.
 This may be either an absolute URL, or (if it begins with ./
 or ../), the location relative to the superproject's default remote
@@ -87,21 +79,22 @@ If the superproject doesn't have a default remote
configured
 the superproject is its own authoritative upstream and the current
 working directory is used instead.
 +
- is the relative location for the cloned submodule to
-exist in the superproject. If  does not exist, then the
-submodule is created by cloning from the named URL. If  does
-exist and is already a valid Git repository, then this is added
-to the changeset without cloning. This second form is provided
-to ease creating a new submodule from scratch, and presumes
-the user will later push the submodule to the given URL.
+The optional argument  is the relative location for the cloned
+submodule to exist in the superproject. If  is not given, the
+canonical part of the source repository is used ("repo" for
+"/path/to/repo.git" and "foo" for "host.xz:foo/.git"). If 
+exists and is already a valid Git repository, then it is staged
+for commit without cloning. The  is also used as the submodule's
+logical name in its configuration entries unless `--name` is used
+to specify a logical name.
 +
-In either case, the given URL is recorded into .gitmodules for
-use by subsequent users cloning the superproject. If the URL is
-given relative to the superproject's repository, the presumption
-is the superproject and submodule repositories will be kept
-together in the same relative location, and only the
-superproject's URL needs to be provided: git-submodule will correctly
-locate the submodule using the relative URL in .gitmodules.
+The given URL is recorded into `.gitmodules` for use by subsequent
users
+cloning the superproject. If the URL is given relative to the
+superproject's repository, the presumption is the superproject and
+submodule repositories will be kept together in the same relative
+location, and only the superproject's URL needs to be provided.
+git-submodule will correctly locate the submodule using the relative
+URL in `.gitmodules`.
 
 status [--cached] [--recursive] [--] [...]::
    Show the status of the submodules. This will print the SHA-1
of the
@@ -123,7 +116,7 @@ too (and can also report changes to a submodule's
work tree).
 init [--] [...]::
    Initialize the submodules recorded in the index (which were
    added and committed elsewhere) by setting
`submodule.$name.url`
-   in .git/config. It uses the same setting from .gitmodules as
+   in .git/config. It uses the same setting from `.gitmodules` as
    a template. If the URL is relative, it will be resolved using
    the default remote. If there is no default remote, the current
    repository will be assumed to be upstream.
@@ -197,7 +190,7 @@ configuration variable:
    none;; the submodule is not updated.
 
 If the submodule is not yet initialized, and you just want to use the
-setting as stored in .gitmodules, you can automatically initialize the
+setting as stored in `.gitmodules`, you can automatically initialize
the
 submodule with the `--init` option.
 
 If `--recursive` is specified, this command will recurse into the
@@ -220,7 +213,7 @@ foreach [--recursive] ::
    Evaluates an arbitrary shell command 

Re: Small issue with "add untracked" option of 'git add -i'

2017-06-20 Thread Kaartic Sivaraam
On Mon, 2017-06-12 at 10:59 -0700, Junio C Hamano wrote:
> Together with your other wishes, perhaps something like this is what
> you have in mind.  The original tried to throw in a blank line as a
> separator to help interactive users to more easily tell the boundary
> of blocks of text, but it wasn't consistently doing so (e.g. "update"
> when nothing is dirty was very silent, while "status" gave one blank
> line that is supposed to be shown after the list of changed ones even
> when the list is empty).
> 
>  git-add--interactive.perl | 8 ++--
>  1 file changed, 2 insertions(+), 6 deletions(-)
> 
> diff --git a/git-add--interactive.perl b/git-add--interactive.perl
> index 709a5f6ce6..0ec09361b4 100755
> --- a/git-add--interactive.perl
> +++ b/git-add--interactive.perl
> @@ -537,7 +537,7 @@ sub list_and_choose {
>   $last_lf = 1;
>   }
>   }
> - if (!$last_lf) {
> + if (@stuff && !$last_lf) {
>   print "\n";
>   }
>  
> @@ -634,7 +634,6 @@ sub prompt_help_cmd {
>  sub status_cmd {
>   list_and_choose({ LIST_ONLY => 1, HEADER => $status_head },
>   list_modified());
> - print "\n";
>  }
>  
>  sub say_n_paths {
> @@ -667,7 +666,6 @@ sub update_cmd {
>      map { $_->{VALUE} } @update);
>   say_n_paths('updated', @update);
>   }
> - print "\n";
>  }
>  
>  sub revert_cmd {
> @@ -701,7 +699,6 @@ sub revert_cmd {
>   refresh();
>   say_n_paths('reverted', @update);
>   }
> - print "\n";
>  }
>  
>  sub add_untracked_cmd {
> @@ -711,9 +708,8 @@ sub add_untracked_cmd {
>   system(qw(git update-index --add --), @add);
>   say_n_paths('added', @add);
>   } else {
> - print __("No untracked files.\n");
> + print __("No untracked file chosen.\n");
>   }
> - print "\n";
>  }
>  
>  sub run_git_apply {

I tried applying the patch and building it locally. For some reason I
couldn't see the change in effect. What could I be missing?

-- 
Regards,
Kaartic Sivaraam 


[PATCH/FINAL] status: contextually notify user about an initial commit

2017-06-20 Thread Kaartic Sivaraam
The existing message, "Initial commit", makes sense for the commit template
notifying users that it's their initial commit, but is confusing when
merely checking the status of a fresh repository (or orphan branch)
without having any commits yet.

Change the output of "status" to say "No commits yet" when "git
status" is run on a fresh repo (or orphan branch), while retaining the
current "Initial commit" message displayed in the template that's
displayed in the editor when the initial commit is being authored.

A few alternatives considered were,

 * Waiting for initial commit
 * Your current branch does not have any commits
 * Current branch waiting for initial commit

The most succint one among the alternatives was chosen.

Helped-by: Junio C Hamano 
Signed-off-by: Kaartic Sivaraam 
Signed-off-by: Ævar Arnfjörð Bjarmason 
---

 The 'FINAL' part in the subject is just my opinion about 
 this patch

 builtin/commit.c  |  1 +
 t/t7501-commit.sh |  2 +-
 t/t7508-status.sh | 30 ++
 wt-status.c   |  5 -
 wt-status.h   |  1 +
 5 files changed, 37 insertions(+), 2 deletions(-)

diff --git a/builtin/commit.c b/builtin/commit.c
index 8d1cac062..3d614a2ac 100644
--- a/builtin/commit.c
+++ b/builtin/commit.c
@@ -1648,6 +1648,7 @@ int cmd_commit(int argc, const char **argv, const char 
*prefix)
usage_with_options(builtin_commit_usage, 
builtin_commit_options);
 
status_init_config(, git_commit_config);
+   s.commit_template = 1;
status_format = STATUS_FORMAT_NONE; /* Ignore status.short */
s.colopts = 0;
 
diff --git a/t/t7501-commit.sh b/t/t7501-commit.sh
index 0b6da7ae1..fa61b1a4e 100755
--- a/t/t7501-commit.sh
+++ b/t/t7501-commit.sh
@@ -18,7 +18,7 @@ test_expect_success 'initial status' '
echo bongo bongo >file &&
git add file &&
git status >actual &&
-   test_i18ngrep "Initial commit" actual
+   test_i18ngrep "No commits yet" actual
 '
 
 test_expect_success 'fail initial amend' '
diff --git a/t/t7508-status.sh b/t/t7508-status.sh
index 5edcc6edf..0ffa585e2 100755
--- a/t/t7508-status.sh
+++ b/t/t7508-status.sh
@@ -1499,4 +1499,34 @@ test_expect_success 'git commit -m will commit a staged 
but ignored submodule' '
git config -f .gitmodules  --remove-section submodule.subname
 '
 
+test_expect_success '"No commits yet" should be noted in status output' '
+   git checkout --orphan empty-branch-1 &&
+   git status >output &&
+   test_i18ngrep "No commits yet" output
+'
+
+test_expect_success '"No commits yet" should not be noted in status output' '
+   git checkout --orphan empty-branch-2 &&
+   test_commit test-commit-1 &&
+   git status >output &&
+   test_must_fail test_i18ngrep "No commits yet" output
+'
+
+test_expect_success '"Initial commit" should be noted in commit template' '
+   git checkout --orphan empty-branch-3 &&
+   touch to_be_committed_1 &&
+   git add to_be_committed_1 &&
+   git commit --dry-run >output &&
+   test_i18ngrep "Initial commit" output
+'
+
+test_expect_success '"Initial commit" should not be noted in commit template' '
+   git checkout --orphan empty-branch-4 &&
+   test_commit test-commit-2 &&
+   touch to_be_committed_2 &&
+   git add to_be_committed_2 &&
+   git commit --dry-run >output &&
+   test_must_fail test_i18ngrep "Initial commit" output
+'
+
 test_done
diff --git a/wt-status.c b/wt-status.c
index 068de38b5..a2e294bb2 100644
--- a/wt-status.c
+++ b/wt-status.c
@@ -1578,7 +1578,10 @@ static void wt_longstatus_print(struct wt_status *s)
 
if (s->is_initial) {
status_printf_ln(s, color(WT_STATUS_HEADER, s), "%s", "");
-   status_printf_ln(s, color(WT_STATUS_HEADER, s), _("Initial 
commit"));
+   status_printf_ln(s, color(WT_STATUS_HEADER, s),
+s->commit_template
+? _("Initial commit")
+: _("No commits yet"));
status_printf_ln(s, color(WT_STATUS_HEADER, s), "%s", "");
}
 
diff --git a/wt-status.h b/wt-status.h
index 8a3864783..2389f0839 100644
--- a/wt-status.h
+++ b/wt-status.h
@@ -76,6 +76,7 @@ struct wt_status {
char color_palette[WT_STATUS_MAXSLOT][COLOR_MAXLEN];
unsigned colopts;
int null_termination;
+   int commit_template;
int show_branch;
int hints;
 
-- 
2.11.0



Re: [PATCH 1/3] Contextually notify user about an initial commit

2017-06-20 Thread Kaartic Sivaraam
On Tue, 2017-06-20 at 16:41 +0200, Ævar Arnfjörð Bjarmason wrote:
> Right now 1/3 breaks the test suite. That's a big no-no, any given
> commit should not break the test suite to not break bisecting.
> 
> But aside from that the general pattern we follow is that if code
> behavior changes, tests for that new behavior go in the same commit.
> 
I did think of squashing the first two patches, anyway. Now the three
of them (except for that spacing fix) must be squashhed. Not a big
issue, anyway. I tweaked the suggested commit message a bit for the
following reason,

Ensure it follows the "describe problem, justify solution, mention
alternatives" pattern as it sounds good and natural. I wanted to keep
the alternatives as I found it to the commit message to be a more
appropriate place than the mailing list archives. This could induce
others who see it to do the same ;)

> 
> Regarding the format: I was referring to the 'prefix the first line
> with
> "area: "' part of SubmittingPatches, sorry for the brevity. I.e. your
> --oneline output just yields "Contextually notify user about an
> initial
> commit" but should be "status: ".
> 
That's a good one. I initially thought it was optional.

-- 
Regards,
Kaartic Sivaraam 


Re: [PATCH v3 11/20] path: construct correct path to a worktree's index

2017-06-20 Thread Jonathan Nieder
Subject: worktree_git_path() should not use file relocation

git_path is a convenience function that usually produces a string
$GIT_DIR/.  Since v2.5.0-rc0~143^2~35 (git_path(): be aware of
file relocation in $GIT_DIR, 2014-11-30), as a side benefit callers
get support for path relocation variables like $GIT_OBJECT_DIRECTORY:

- git_path("index") is $GIT_INDEX_FILE when set
- git_path("info/grafts") is $GIT_GRAFTS_FILE when set
- git_path("objects/") is $GIT_OBJECT_DIRECTORY/ when set
- git_path("hooks/") is  under core.hookspath when set
- git_path("refs/") etc (see path.c::common_list) is relative
  to $GIT_COMMON_DIR instead of $GIT_DIR

worktree_git_path, by comparison, is designed to resolve files in a
specific worktree's git dir.  Unfortunately, it shares code with
git_path and performs the same relocation.  The result is that paths
that are meant to be relative to the specified worktree's git dir end
up replaced by paths from environment variables within the current git
dir.

Luckily, no current callers pass such arguments.  The relocation was
noticed when testing the result of merging two patches under review,
one of which introduces a caller:

* The first patch made git prune check the index file in each
  worktree's git dir (using worktree_git_path(wt, "index")) for
  objects not to prune.  This would trigger the unwanted relocation
  when GIT_INDEX_FILE is set, causing objects reachable from the
  index to be pruned.

* The second patch simplified the relocation logic for index,
  info/grafts, objects, and hooks to happen unconditionally instead of
  based on whether environment or configuration variables are set.
  This caused the relocation to trigger even when GIT_INDEX_FILE is
  not set.

[jn: rewrote commit message; skipping all relocation instead of just
 GIT_INDEX_FILE]

Signed-off-by: Brandon Williams 
Signed-off-by: Jonathan Nieder 
---
Jonathan Nieder wrote:

> How about the following?  I found it a little easier to understand.
>
> -- >8 --
> From: Brandon Williams 
> Subject: worktree_git_path: do not let GIT_INDEX_FILE override path to index
[...]
> Change-Id: I2ba0a48a48b7e9a9c2e3ef97648cf53cb913bdd9

Gah, sorry about the stray Change-Id line.  While we're fixing that,
here's a version with a slightly clearer commit message.

 path.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/path.c b/path.c
index c1cb1cf62..4c3a27a8e 100644
--- a/path.c
+++ b/path.c
@@ -397,7 +397,8 @@ static void do_git_path(const struct worktree *wt, struct 
strbuf *buf,
strbuf_addch(buf, '/');
gitdir_len = buf->len;
strbuf_vaddf(buf, fmt, args);
-   adjust_git_path(buf, gitdir_len);
+   if (!wt)
+   adjust_git_path(buf, gitdir_len);
strbuf_cleanup_path(buf);
 }
 
-- 
2.13.1.611.g7e3b11ae1-goog



Re: [PATCH v3 11/20] path: construct correct path to a worktree's index

2017-06-20 Thread Jonathan Nieder
Hi again,

Brandon Williams wrote:

> When working with worktrees the git directory is split into two part,
> the per-worktree gitdir and a commondir which contains things which are
> shared among all worktrees (like the object store).  With this notion of
> having a split git directory, 557bd833b (git_path(): be aware of file
> relocation in $GIT_DIR) and c7b3a3d2f ($GIT_COMMON_DIR: a new
> environment variable) changed the way that 'git_path()' functioned so
> that paths would be adjusted if they referred to files or directories
> that are stored in the commondir or have been changed via an environment
> variable.
>
> One interesting problem with this is the index file as it is not shared
> across worktrees yet when asking for a path to a specific worktree's
> index it will be replaced with a path to the current worktree's index.
> In order to prevent this, teach 'adjuct_git_path' to replace the
> path to the index with the path recorded in a repository (which would be
> the current, active worktree) only when not asked to construct a path
> for a specific worktree.
>
> Signed-off-by: Brandon Williams 
> ---
>  path.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Digging more into it with your help, I ran into v2.5.0-rc0~143^2~34
and v2.11.0-rc0~15^2 (git-svn: "git worktree" awareness, 2016-10-14),
which uses this function:

 -  my $index = $ENV{GIT_INDEX_FILE} || "$ENV{GIT_DIR}/index";
 +  my $index = command_oneline(qw(rev-parse --git-path index));

That rules out my hope of making git_path stop being aware of the index
at all.

That also made it a little clearer to me what is going on with this
function.  I had previously misread !wt as meaning that git_dir is
not under worktrees/ (or in other words as !wt->id).  Instead, it
means that the caller does not want to use some other work tree in
preference to git_dir.

With that piece in place, the patch starts making more sense to me.
When !wt, repo->index_file is going to have the same value as
getenv("GIT_INDEX_FILE") ?: buf->buf already, since repo->index_file
refers to the current work tree git dir's index file.  When wt != NULL,
we don't want to use that file and have requested to grab the index
from a specific work tree git dir instead.

And patch 05/20 (environment: place key repository state in
the_repository) had no effect since no one calls worktree_git_path
with argument "index", nor with a user-specified path.

How about the following?  I found it a little easier to understand.

-- >8 --
From: Brandon Williams 
Subject: worktree_git_path: do not let GIT_INDEX_FILE override path to index

git_path (and variants like strbuf_git_path) is designed to behave
like a convenience function that produces a string $GIT_DIR/.
In the process, callers automatically get support for path relocation
variables like $GIT_OBJECT_DIRECTORY:

- git_path("index") is $GIT_INDEX_FILE when set
- git_path("info/grafts") is $GIT_GRAFTS_FILE when set
- git_path("objects/") is $GIT_OBJECT_DIRECTORY/ when set
- git_path("hooks/") is  under core.hookspath when set
- git_path("refs/") etc (see path.c::common_list) is relative
  to $GIT_COMMON_DIR instead of $GIT_DIR

worktree_git_path, by comparison, is designed to resolve files in a
specific worktree's git dir and should not perform such relocation.
Do so by skipping the relocation step in worktree_git_path.

Fortunately no current callers of worktree_git_path pass such
arguments.  Noticed due to an interaction between two patches under
review, one of which introduced such a caller:

* One made "git prune" check the index file in each worktree's git dir
  (using worktree_git_path(wt, "index")) for objects not to prune,
  triggering the unwanted relocation code by mistake and allowing
  objects reachable from worktree indices to be pruned if
  GIT_INDEX_FILE is set.

* The other simplified the relocation logic for index, info/grafts,
  objects, and hooks to happen unconditionally instead of based on
  whether environment or configuration variables are set, causing the
  relocation to trigger even when GIT_INDEX_FILE is not set.

[jn: rewrote commit message; skipped all relocations instead of just
 the index]

Change-Id: I2ba0a48a48b7e9a9c2e3ef97648cf53cb913bdd9
Signed-off-by: Brandon Williams 
Signed-off-by: Jonathan Nieder 
---
 path.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/path.c b/path.c
index c1cb1cf62..4c3a27a8e 100644
--- a/path.c
+++ b/path.c
@@ -397,7 +397,8 @@ static void do_git_path(const struct worktree *wt, struct 
strbuf *buf,
strbuf_addch(buf, '/');
gitdir_len = buf->len;
strbuf_vaddf(buf, fmt, args);
-   adjust_git_path(buf, gitdir_len);
+   if (!wt)
+   adjust_git_path(buf, gitdir_len);
strbuf_cleanup_path(buf);
 }
 
-- 
2.13.1.611.g7e3b11ae1



Re: [PATCH v3 11/20] path: construct correct path to a worktree's index

2017-06-20 Thread Brandon Williams
On 06/20, Jonathan Nieder wrote:
> Hi,
> 
> Brandon Williams wrote:
> 
> > When working with worktrees the git directory is split into two part,
> > the per-worktree gitdir and a commondir which contains things which are
> > shared among all worktrees (like the object store).  With this notion of
> > having a split git directory, 557bd833b (git_path(): be aware of file
> > relocation in $GIT_DIR) and c7b3a3d2f ($GIT_COMMON_DIR: a new
> > environment variable) changed the way that 'git_path()' functioned so
> > that paths would be adjusted if they referred to files or directories
> > that are stored in the commondir or have been changed via an environment
> > variable.
> >
> > One interesting problem with this is the index file as it is not shared
> > across worktrees yet when asking for a path to a specific worktree's
> > index it will be replaced with a path to the current worktree's index.
> > In order to prevent this, teach 'adjuct_git_path' to replace the
> > path to the index with the path recorded in a repository (which would be
> > the current, active worktree) only when not asked to construct a path
> > for a specific worktree.
> >
> > Signed-off-by: Brandon Williams 
> > ---
> >  path.c | 11 +--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> Thanks --- this is subtle.  I don't think that what this patch does is
> right.  Commenting below.
> 
> > --- a/path.c
> > +++ b/path.c
> > @@ -372,13 +372,20 @@ void report_linked_checkout_garbage(void)
> >  }
> >  
> >  static void adjust_git_path(const struct repository *repo,
> > +   const struct worktree *wt,
> > struct strbuf *buf, int git_dir_len)
> >  {
> > const char *base = buf->buf + git_dir_len;
> > if (is_dir_file(base, "info", "grafts"))
> > strbuf_splice(buf, 0, buf->len,
> >   repo->graft_file, strlen(repo->graft_file));
> > -   else if (!strcmp(base, "index"))
> > +   /*
> > +* Only try to replace the path '$gitdir/index' with the index file
> > +* recorded in the repository when not constructing a path for a
> > +* worktree.  This way we can retrieve the correct path to a particular
> > +* worktree's index file.
> > +*/
> > +   else if (!wt && !strcmp(base, "index"))
> > strbuf_splice(buf, 0, buf->len,
> >   repo->index_file, strlen(repo->index_file));
> 
> Some context that may have been missing: GIT_INDEX_FILE is a low-level
> tool to allow script authors to specify an alternate index file to use
> when running commands like "git read-tree" or "git checkout-index".
> 
> The above would make it not take effect for git_path() callers when
> 'wt != NULL'.  As a result, if any caller reaches this code path, then
> scripts specifying GIT_INDEX_FILE would stop working when run from a
> worktree that borrows refs and objects from a separate repository.
> I'm pretty sure that such a subtle flip in behavior based on whether
> they are in "git worktree" created worktree or the main working tree
> is not what the end user would intend, so this looks like a step in
> the wrong direction.
> 
> Fortunately this code path doesn't actually get called.
> 
> In fact, rewriting git_path("index") in this function feels to me like
> a layering violation.  Shouldn't callers be using get_index_file() to
> express their intent more clearly?  That's what all current callers
> do.
> 
> IIUC this came up when a patch from nd/prune-in-worktree (e7a6a3b15,
> revision.c: --indexed-objects add objects from all worktrees), which
> is currently not in pu, introduced a caller that does call
> git_path("index").  The old behavior of replacing git_path("index")
> with $GIT_INDEX_FILE when the latter is set was mostly harmless
> because typically GIT_INDEX_FILE is not set, especially when people
> are running "git prune".  Patch 05/20 (environment: place key
> repository state in the_repository) made the substitution harmful: now
> we would use repo->index_file unconditionally instead of allowing the
> ordinary worktree-relative resolution as a fallback when
> GIT_INDEX_FILE is unset.
> 
> Possible next steps:
> 
>  1. I think we should make git_path less magical and discourage
> callers from relying on it to handle the GIT_INDEX_FILE envvar.
> We can do that by removing the !strcmp(base, "index") case
> completely.

>From our conversation off-line it seems like we agree that this is
probably the best course of action.  So I'll plan to drop the case
completely since it isn't even used currently.  I'll also go ahead and
drop the graft_file one as well since it doesn't make much sense to keep
it around when it is only queried for once.

> 
>  2. Optionally, it is possible to be more cautious by keeping the
> !strcmp(base, "index") case and making it call BUG() to force
> people not to use it.  This would help steer callers toward
> get_index_file().  But given that the only 

Why 10k remotes? Was: Re: [RFC/PATCH 0/5] remote: eliminate remote->{fetch,push}_refspec and lazy parsing of refspecs

2017-06-20 Thread SZEDER Gábor
On Sat, Jun 17, 2017 at 2:33 PM, Jeff King  wrote:
> On Sat, Jun 17, 2017 at 02:25:39PM +0200, SZEDER Gábor wrote:
>> Indeed.  Even in my repos with close to 10k remotes the amount of
>> memory wasted by the duplicated refspecs is not an problem, there are
>> more pressing issues there.
>
> This is sort of a side note, but I'm curious why you need 10k remotes.

For fun! :o)

I have a couple of repos containing all branches from all forks (and
forks of forks, recursively) of a few interesting repos on GitHub,
where each fork is a remote.

  (Why on earth would I want to do that?
  Sometimes it's good to see what others are up to.  Apparently a lot
  of people fork projects, push their changes, but then never upstream
  them.
  Such an all-forks-in-one repo allows me to run e.g. 'git log --all
  -p master.. relevant.c' and then search its output for changes in
  interesting functions (thankfully function names are included in
  hunk headers; alas line log doesn't work with --all).  Occasionally
  this unearths some treasures: other people's commits and branches
  scratching the same itch that I was about to scratch, or at least
  solving part of my problem and I can build on top of them.)

> We used to do this at GitHub as part of our fork storage (the shared
> repository had each fork as a remote). We fixed the quadratic issues
> like d0da003d5 (use a hashmap to make remotes faster, 2014-07-29). But
> even the linear work to read the config is noticeable (and hits you on
> every command, even ones that don't care about remotes).

Indeed, that's one of those "more pressing issues".
I worked it around by storing the remotes in a separate config file
which is not included from .git/config, because they're not needed for
local operations.  And when I occasionally do want to fetch, then I
run 'git -c include.path=/.git/config.forks fetch ...'.

> Now instead we
> pass the refspecs directly to fetch whenever move objects between the
> storage repos. They were the same for every remote anyway (and I'd guess
> that is true, too, of your 10k remotes).

I do have different fetch refspecs for every remote, i.e. the repo
'github.com/user/repo' has '+refs/heads/*:refs/forks/user/repo/*'.

Incidentally, this was what triggered
sg/clone-refspec-from-command-line-config back then, because 'git
clone' didn't grab 'refs/forks/*', and the means advertised in the
docs to do so didn't work.


Gábor


Re: [PATCH v3 11/20] path: construct correct path to a worktree's index

2017-06-20 Thread Jonathan Nieder
Hi,

Brandon Williams wrote:

> When working with worktrees the git directory is split into two part,
> the per-worktree gitdir and a commondir which contains things which are
> shared among all worktrees (like the object store).  With this notion of
> having a split git directory, 557bd833b (git_path(): be aware of file
> relocation in $GIT_DIR) and c7b3a3d2f ($GIT_COMMON_DIR: a new
> environment variable) changed the way that 'git_path()' functioned so
> that paths would be adjusted if they referred to files or directories
> that are stored in the commondir or have been changed via an environment
> variable.
>
> One interesting problem with this is the index file as it is not shared
> across worktrees yet when asking for a path to a specific worktree's
> index it will be replaced with a path to the current worktree's index.
> In order to prevent this, teach 'adjuct_git_path' to replace the
> path to the index with the path recorded in a repository (which would be
> the current, active worktree) only when not asked to construct a path
> for a specific worktree.
>
> Signed-off-by: Brandon Williams 
> ---
>  path.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)

Thanks --- this is subtle.  I don't think that what this patch does is
right.  Commenting below.

> --- a/path.c
> +++ b/path.c
> @@ -372,13 +372,20 @@ void report_linked_checkout_garbage(void)
>  }
>  
>  static void adjust_git_path(const struct repository *repo,
> + const struct worktree *wt,
>   struct strbuf *buf, int git_dir_len)
>  {
>   const char *base = buf->buf + git_dir_len;
>   if (is_dir_file(base, "info", "grafts"))
>   strbuf_splice(buf, 0, buf->len,
> repo->graft_file, strlen(repo->graft_file));
> - else if (!strcmp(base, "index"))
> + /*
> +  * Only try to replace the path '$gitdir/index' with the index file
> +  * recorded in the repository when not constructing a path for a
> +  * worktree.  This way we can retrieve the correct path to a particular
> +  * worktree's index file.
> +  */
> + else if (!wt && !strcmp(base, "index"))
>   strbuf_splice(buf, 0, buf->len,
> repo->index_file, strlen(repo->index_file));

Some context that may have been missing: GIT_INDEX_FILE is a low-level
tool to allow script authors to specify an alternate index file to use
when running commands like "git read-tree" or "git checkout-index".

The above would make it not take effect for git_path() callers when
'wt != NULL'.  As a result, if any caller reaches this code path, then
scripts specifying GIT_INDEX_FILE would stop working when run from a
worktree that borrows refs and objects from a separate repository.
I'm pretty sure that such a subtle flip in behavior based on whether
they are in "git worktree" created worktree or the main working tree
is not what the end user would intend, so this looks like a step in
the wrong direction.

Fortunately this code path doesn't actually get called.

In fact, rewriting git_path("index") in this function feels to me like
a layering violation.  Shouldn't callers be using get_index_file() to
express their intent more clearly?  That's what all current callers
do.

IIUC this came up when a patch from nd/prune-in-worktree (e7a6a3b15,
revision.c: --indexed-objects add objects from all worktrees), which
is currently not in pu, introduced a caller that does call
git_path("index").  The old behavior of replacing git_path("index")
with $GIT_INDEX_FILE when the latter is set was mostly harmless
because typically GIT_INDEX_FILE is not set, especially when people
are running "git prune".  Patch 05/20 (environment: place key
repository state in the_repository) made the substitution harmful: now
we would use repo->index_file unconditionally instead of allowing the
ordinary worktree-relative resolution as a fallback when
GIT_INDEX_FILE is unset.

Possible next steps:

 1. I think we should make git_path less magical and discourage
callers from relying on it to handle the GIT_INDEX_FILE envvar.
We can do that by removing the !strcmp(base, "index") case
completely.

 2. Optionally, it is possible to be more cautious by keeping the
!strcmp(base, "index") case and making it call BUG() to force
people not to use it.  This would help steer callers toward
get_index_file().  But given that the only caller did not actually
want GIT_INDEX_FILE substitution, I don't think that that is
necessary or useful.

 3. A docstring for git_path should explain the substitutions it
currently makes and more straightforward alternatives that callers
can use.

 4. Specifying GIT_INDEX_FILE when running "git prune" is a
meaningless combination.  It would be nice for "git prune" to
error out to save the user from confusion.

 5. There are likely other commands that don't make sense with
GIT_INDEX_FILE. 

[PATCHv2] submodules: overhaul documentation

2017-06-20 Thread Stefan Beller
This patch aims to detangle (a) the usage of `git-submodule`
from (b) the concept of submodules and (c) how the actual
implementation looks like, such as where they are configured
and (d) what the best practices are.

To do so, move the conceptual parts of the 'git-submodule'
man page to a new man page gitsubmodules(7). This new page
is just like gitmodules(5), gitattributes(5), gitcredentials(7),
gitnamespaces(7), gittutorial(7), which introduce a concept
rather than explaining a specific command.

Signed-off-by: Stefan Beller 
---

I have considered most of the feedback, and stopped marking it RFC,
but I'd like to propose this as a serious patch.

Thanks,
Stefan

 Documentation/Makefile  |   1 +
 Documentation/git-rm.txt|   4 +-
 Documentation/git-submodule.txt |  44 +++---
 Documentation/gitsubmodules.txt | 189 
 4 files changed, 202 insertions(+), 36 deletions(-)
 create mode 100644 Documentation/gitsubmodules.txt

diff --git a/Documentation/Makefile b/Documentation/Makefile
index b5be2e2d3f..2415e0d657 100644
--- a/Documentation/Makefile
+++ b/Documentation/Makefile
@@ -31,6 +31,7 @@ MAN7_TXT += giteveryday.txt
 MAN7_TXT += gitglossary.txt
 MAN7_TXT += gitnamespaces.txt
 MAN7_TXT += gitrevisions.txt
+MAN7_TXT += gitsubmodules.txt
 MAN7_TXT += gittutorial-2.txt
 MAN7_TXT += gittutorial.txt
 MAN7_TXT += gitworkflows.txt
diff --git a/Documentation/git-rm.txt b/Documentation/git-rm.txt
index f1efc116eb..db444693dd 100644
--- a/Documentation/git-rm.txt
+++ b/Documentation/git-rm.txt
@@ -152,8 +152,8 @@ Ignored files are deemed expendable and won't stop a 
submodule's work
 tree from being removed.
 
 If you only want to remove the local checkout of a submodule from your
-work tree without committing the removal,
-use linkgit:git-submodule[1] `deinit` instead.
+work tree without committing the removal, use linkgit:git-submodule[1] `deinit`
+instead. Also see linkgit:gitsubmodules[7] for details on submodule removal.
 
 EXAMPLES
 
diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
index 74bc6200d5..9ffd129bbc 100644
--- a/Documentation/git-submodule.txt
+++ b/Documentation/git-submodule.txt
@@ -24,37 +24,7 @@ DESCRIPTION
 ---
 Inspects, updates and manages submodules.
 
-A submodule allows you to keep another Git repository in a subdirectory
-of your repository. The other repository has its own history, which does not
-interfere with the history of the current repository. This can be used to
-have external dependencies such as third party libraries for example.
-
-When cloning or pulling a repository containing submodules however,
-these will not be checked out by default; the 'init' and 'update'
-subcommands will maintain submodules checked out and at
-appropriate revision in your working tree.
-
-Submodules are composed from a so-called `gitlink` tree entry
-in the main repository that refers to a particular commit object
-within the inner repository that is completely separate.
-A record in the `.gitmodules` (see linkgit:gitmodules[5]) file at the
-root of the source tree assigns a logical name to the submodule and
-describes the default URL the submodule shall be cloned from.
-The logical name can be used for overriding this URL within your
-local repository configuration (see 'submodule init').
-
-Submodules are not to be confused with remotes, which are other
-repositories of the same project; submodules are meant for
-different projects you would like to make part of your source tree,
-while the history of the two projects still stays completely
-independent and you cannot modify the contents of the submodule
-from within the main project.
-If you want to merge the project histories and want to treat the
-aggregated whole as a single project from then on, you may want to
-add a remote for the other project and use the 'subtree' merge strategy,
-instead of treating the other project as a submodule. Directories
-that come from both projects can be cloned and checked out as a whole
-if you choose to go that route.
+For more information about submodules, see linkgit:gitsubmodules[7].
 
 COMMANDS
 
@@ -149,15 +119,17 @@ deinit [-f|--force] (--all|[--] ...)::
tree. Further calls to `git submodule update`, `git submodule foreach`
and `git submodule sync` will skip any unregistered submodules until
they are initialized again, so use this command if you don't want to
-   have a local checkout of the submodule in your working tree anymore. If
-   you really want to remove a submodule from the repository and commit
-   that use linkgit:git-rm[1] instead.
+   have a local checkout of the submodule in your working tree anymore.
 +
 When the command is run without pathspec, it errors out,
 instead of deinit-ing everything, to prevent mistakes.
 +
 If `--force` is specified, the submodule's working tree will
 be removed even if it contains local 

Re: [PATCH v3 10/20] path: convert do_git_path to take a 'struct repository'

2017-06-20 Thread Jonathan Tan
On Tue, 20 Jun 2017 12:19:41 -0700
Brandon Williams  wrote:

> +static void do_git_path(const struct repository *repo,
> + const struct worktree *wt, struct strbuf *buf,
>   const char *fmt, va_list args)
>  {
>   int gitdir_len;
> - strbuf_addstr(buf, get_worktree_git_dir(wt));

With this change, the get_worktree_git_dir() function no longer seems to
be used from outside - could it be marked static?

> + strbuf_worktree_gitdir(buf, repo, wt);
>   if (buf->len && !is_dir_sep(buf->buf[buf->len - 1]))
>   strbuf_addch(buf, '/');
>   gitdir_len = buf->len;
>   strbuf_vaddf(buf, fmt, args);
> - adjust_git_path(buf, gitdir_len);
> + adjust_git_path(repo, buf, gitdir_len);
>   strbuf_cleanup_path(buf);
>  }


Re: [PATCH v3 05/20] environment: place key repository state in the_repository

2017-06-20 Thread Jonathan Tan
On Tue, 20 Jun 2017 12:19:36 -0700
Brandon Williams  wrote:

> Migrate 'git_dir', 'git_common_dir', 'git_object_dir', 'git_index_file',
> 'git_graft_file', and 'namespace' to be stored in 'the_repository'.
> 
> Signed-off-by: Brandon Williams 
> ---
>  cache.h   |  1 -
>  environment.c | 58 +-
>  path.c| 11 ++-
>  setup.c   | 17 +++--
>  4 files changed, 34 insertions(+), 53 deletions(-)
> 
> diff --git a/cache.h b/cache.h
> index 7c81749a9..cd64cbc81 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -771,7 +771,6 @@ extern int core_apply_sparse_checkout;
>  extern int precomposed_unicode;
>  extern int protect_hfs;
>  extern int protect_ntfs;
> -extern int git_db_env, git_index_env, git_graft_env, git_common_dir_env;

In the commit message, it is probably worth mentioning commit 557bd83
which added these fields to attempt rewriting a path in do_git_path()
only if the appropriate _env flag is set, and that this patch removes
this optimization.


Re: [PATCH v3 04/20] repository: introduce the repository object

2017-06-20 Thread Brandon Williams
On 06/20, Jonathan Tan wrote:
> On Tue, 20 Jun 2017 12:19:35 -0700
> Brandon Williams  wrote:
> 
> > Introduce the repository object 'struct repository' which can be used to
> > hold all state pertaining to a git repository.
> > 
> > Some of the benefits of object-ifying a repository are:
> > 
> >   1. Make the code base more readable and easier to reason about.
> > 
> >   2. Allow for working on multiple repositories, specifically
> >  submodules, within the same process.  Currently the process for
> >  working on a submodule involves setting up an argv_array of options
> >  for a particular command and then launching a child process to
> >  execute the command in the context of the submodule.  This is
> >  clunky and can require lots of little hacks in order to ensure
> >  correctness.  Ideally it would be nice to simply pass a repository
> >  and an options struct to a command.
> > 
> >   3. Eliminating reliance on global state will make it easier to
> >  enable the use of threading to improve performance.
> 
> These would indeed be nice to have.
> 
> > +/* called after setting gitdir */
> > +static void repo_setup_env(struct repository *repo)
> > +{
> > +   struct strbuf sb = STRBUF_INIT;
> > +
> > +   if (!repo->gitdir)
> > +   BUG("gitdir wasn't set before setting up the environment");
> > +
> > +   repo->different_commondir = find_common_dir(, repo->gitdir,
> > +   !repo->ignore_env);
> > +   repo->commondir = strbuf_detach(, NULL);
> > +   repo->objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir,
> > +   "objects", !repo->ignore_env);
> > +   repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir,
> > +"info/grafts", !repo->ignore_env);
> > +   repo->index_file = git_path_from_env(INDEX_ENVIRONMENT, repo->gitdir,
> > +"index", !repo->ignore_env);
> > +}
> 
> It seems that this function is only used once in repo_set_gitdir() -
> could this be inlined there instead? Then you wouldn't need the comment
> and the !repo->gitdir check.

I'd like to keep them separate as it makes things a little clearer in my
opinion, smaller functions and the like.

> 
> > +static int verify_repo_format(struct repository_format *format,
> > + const char *commondir)
> > +{
> > +   int ret = 0;
> > +   struct strbuf sb = STRBUF_INIT;
> > +
> > +   strbuf_addf(, "%s/config", commondir);
> > +   read_repository_format(format, sb.buf);
> > +   strbuf_reset();
> > +
> > +   if (verify_repository_format(format, ) < 0) {
> > +   warning("%s", sb.buf);
> > +   ret = -1;
> > +   }
> > +
> > +   strbuf_release();
> > +   return ret;
> > +}
> 
> This function is confusingly named - firstly, there is already a
> verify_repository_format(), and secondly, this function both reads and
> verifies it.

This one is static to the file, and i don't think its named confusingly
as it does just what it says it does. I'm trying to limit how much other
code I change so I had to make this function.

> 
> > +void repo_clear(struct repository *repo)
> > +{
> > +   free(repo->gitdir);
> > +   repo->gitdir = NULL;
> > +   free(repo->commondir);
> > +   repo->commondir = NULL;
> > +   free(repo->objectdir);
> > +   repo->objectdir = NULL;
> > +   free(repo->graft_file);
> > +   repo->graft_file = NULL;
> > +   free(repo->index_file);
> > +   repo->index_file = NULL;
> > +   free(repo->worktree);
> > +   repo->worktree = NULL;
> > +
> > +   memset(repo, 0, sizeof(*repo));
> > +}
> 
> If you're going to memset, you probably don't need to set everything to
> NULL.
> 
> > +   /* Configurations */
> > +   /*
> > +* Bit used during initialization to indicate if repository state (like
> > +* the location of the 'objectdir') should be read from the
> > +* environment.  By default this bit will be set at the begining of
> > +* 'repo_init()' so that all repositories will ignore the environment.
> > +* The exception to this is 'the_repository', which doesn't go through
> > +* the normal 'repo_init()' process.
> > +*/
> > +   unsigned ignore_env:1;
> 
> If this is only used during initialization, could this be passed as a
> separate parameter internally instead of having it here?

It would feel a little wonky to be a parameter as 'the_repository' is
initialized differently than a repository would normally be.  Theres
been a lot of cleanup to the setup logic but it would need to be cleaned
up even more to have this be a param.

-- 
Brandon Williams


Re: [RFC/PATCH] submodules: overhaul documentation

2017-06-20 Thread Stefan Beller
On Mon, Jun 19, 2017 at 11:10 AM, Brandon Williams  wrote:

> I would probably change the first sentence to:
>
>   A submodule is another Git repository tracked at an arbitrary place
>   inside the working tree.

The tracking doesn't happen at an arbitrary place, but in
the gitlink/.gitmodules file. The location of the submodule is
at an arbitrary place within the working tree.

In a resend, I'll reword it completely (again) to focus more on the structure.


Re: [PATCH v3 04/20] repository: introduce the repository object

2017-06-20 Thread Jonathan Tan
On Tue, 20 Jun 2017 12:19:35 -0700
Brandon Williams  wrote:

> Introduce the repository object 'struct repository' which can be used to
> hold all state pertaining to a git repository.
> 
> Some of the benefits of object-ifying a repository are:
> 
>   1. Make the code base more readable and easier to reason about.
> 
>   2. Allow for working on multiple repositories, specifically
>  submodules, within the same process.  Currently the process for
>  working on a submodule involves setting up an argv_array of options
>  for a particular command and then launching a child process to
>  execute the command in the context of the submodule.  This is
>  clunky and can require lots of little hacks in order to ensure
>  correctness.  Ideally it would be nice to simply pass a repository
>  and an options struct to a command.
> 
>   3. Eliminating reliance on global state will make it easier to
>  enable the use of threading to improve performance.

These would indeed be nice to have.

> +/* called after setting gitdir */
> +static void repo_setup_env(struct repository *repo)
> +{
> + struct strbuf sb = STRBUF_INIT;
> +
> + if (!repo->gitdir)
> + BUG("gitdir wasn't set before setting up the environment");
> +
> + repo->different_commondir = find_common_dir(, repo->gitdir,
> + !repo->ignore_env);
> + repo->commondir = strbuf_detach(, NULL);
> + repo->objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir,
> + "objects", !repo->ignore_env);
> + repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir,
> +  "info/grafts", !repo->ignore_env);
> + repo->index_file = git_path_from_env(INDEX_ENVIRONMENT, repo->gitdir,
> +  "index", !repo->ignore_env);
> +}

It seems that this function is only used once in repo_set_gitdir() -
could this be inlined there instead? Then you wouldn't need the comment
and the !repo->gitdir check.

> +static int verify_repo_format(struct repository_format *format,
> +   const char *commondir)
> +{
> + int ret = 0;
> + struct strbuf sb = STRBUF_INIT;
> +
> + strbuf_addf(, "%s/config", commondir);
> + read_repository_format(format, sb.buf);
> + strbuf_reset();
> +
> + if (verify_repository_format(format, ) < 0) {
> + warning("%s", sb.buf);
> + ret = -1;
> + }
> +
> + strbuf_release();
> + return ret;
> +}

This function is confusingly named - firstly, there is already a
verify_repository_format(), and secondly, this function both reads and
verifies it.

> +void repo_clear(struct repository *repo)
> +{
> + free(repo->gitdir);
> + repo->gitdir = NULL;
> + free(repo->commondir);
> + repo->commondir = NULL;
> + free(repo->objectdir);
> + repo->objectdir = NULL;
> + free(repo->graft_file);
> + repo->graft_file = NULL;
> + free(repo->index_file);
> + repo->index_file = NULL;
> + free(repo->worktree);
> + repo->worktree = NULL;
> +
> + memset(repo, 0, sizeof(*repo));
> +}

If you're going to memset, you probably don't need to set everything to
NULL.

> + /* Configurations */
> + /*
> +  * Bit used during initialization to indicate if repository state (like
> +  * the location of the 'objectdir') should be read from the
> +  * environment.  By default this bit will be set at the begining of
> +  * 'repo_init()' so that all repositories will ignore the environment.
> +  * The exception to this is 'the_repository', which doesn't go through
> +  * the normal 'repo_init()' process.
> +  */
> + unsigned ignore_env:1;

If this is only used during initialization, could this be passed as a
separate parameter internally instead of having it here?


Re: [PATCH v3 11/20] path: construct correct path to a worktree's index

2017-06-20 Thread Stefan Beller
On Tue, Jun 20, 2017 at 12:19 PM, Brandon Williams  wrote:
> When working with worktrees the git directory is split into two part,
> the per-worktree gitdir and a commondir which contains things which are
> shared among all worktrees (like the object store).  With this notion of
> having a split git directory, 557bd833b (git_path(): be aware of file
> relocation in $GIT_DIR) and c7b3a3d2f ($GIT_COMMON_DIR: a new
> environment variable) changed the way that 'git_path()' functioned so
> that paths would be adjusted if they referred to files or directories
> that are stored in the commondir or have been changed via an environment
> variable.
>
> One interesting problem with this is the index file as it is not shared
> across worktrees yet when asking for a path to a specific worktree's
> index it will be replaced with a path to the current worktree's index.
> In order to prevent this, teach 'adjuct_git_path' to replace the

adjust_git_path

> path to the index with the path recorded in a repository (which would be
> the current, active worktree) only when not asked to construct a path
> for a specific worktree.
>
> Signed-off-by: Brandon Williams 
> ---
>  path.c | 11 +--
>  1 file changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/path.c b/path.c
> index 76a872297..c6832a30e 100644
> --- a/path.c
> +++ b/path.c
> @@ -372,13 +372,20 @@ void report_linked_checkout_garbage(void)
>  }
>
>  static void adjust_git_path(const struct repository *repo,
> +   const struct worktree *wt,
> struct strbuf *buf, int git_dir_len)
>  {
> const char *base = buf->buf + git_dir_len;
> if (is_dir_file(base, "info", "grafts"))
> strbuf_splice(buf, 0, buf->len,
>   repo->graft_file, strlen(repo->graft_file));
> -   else if (!strcmp(base, "index"))
> +   /*
> +* Only try to replace the path '$gitdir/index' with the index file
> +* recorded in the repository when not constructing a path for a
> +* worktree.  This way we can retrieve the correct path to a 
> particular
> +* worktree's index file.
> +*/
> +   else if (!wt && !strcmp(base, "index"))
> strbuf_splice(buf, 0, buf->len,
>   repo->index_file, strlen(repo->index_file));
> else if (dir_prefix(base, "objects"))
> @@ -411,7 +418,7 @@ static void do_git_path(const struct repository *repo,
> strbuf_addch(buf, '/');
> gitdir_len = buf->len;
> strbuf_vaddf(buf, fmt, args);
> -   adjust_git_path(repo, buf, gitdir_len);
> +   adjust_git_path(repo, wt, buf, gitdir_len);
> strbuf_cleanup_path(buf);
>  }
>
> --
> 2.13.1.611.g7e3b11ae1-goog
>


Re: [PATCH 22/26] diff.c: color moved lines differently

2017-06-20 Thread Stefan Beller
On Tue, Jun 20, 2017 at 1:13 PM, Jonathan Tan  wrote:
> I just glanced through this file, because it seems similar to the
> versions I have previously reviewed.

Yes, this has not changed much.
The thing that took so long were patches 1-20 to do.
In these later patches (22) the UX was changed slightly.
The mode is zebra instead of alternating for example

> I'll skip patches 23 onwards in this round of review because (i) I would
> be happy if just patches 1-22 were included in the tree

patch 22 just introduced the zebra mode, which contains all
information necessary, the later "dimming" is purely
UI excitement, no further information added.

In former series, I had documentation at each
patch that added a new mode, now I rolled all of
documentation in patch 25. I will add the basic docs
to that patch in a reroll.

> and (ii) those
> patches might end up changing anyway because of review comments in the
> prior patches.

plain, dimming, docs, and this RFC for machine parsable output,
ok with me.

So I'll focus on the first 22 patches.

>
> On Mon, 19 Jun 2017 19:48:12 -0700
> Stefan Beller  wrote:
>
>> +/* Find blocks of moved code, delegate actual coloring decision to helper */
>> +static void mark_color_as_moved(struct diff_options *o,
>> + struct hashmap *add_lines,
>> + struct hashmap *del_lines)
>> +{
>
> [snip]
>
>> + if (flipped_block)
>> + l->flags |= DIFF_SYMBOL_MOVED_LINE_ZEBRA;
>
> This should probably be DIFF_SYMBOL_MOVED_LINE_ALT. "Zebra" refers to
> both the stripes, not just the alternate stripe.


ok


Re: [PATCH 22/26] diff.c: color moved lines differently

2017-06-20 Thread Jonathan Tan
I just glanced through this file, because it seems similar to the
versions I have previously reviewed.

I'll skip patches 23 onwards in this round of review because (i) I would
be happy if just patches 1-22 were included in the tree and (ii) those
patches might end up changing anyway because of review comments in the
prior patches.

On Mon, 19 Jun 2017 19:48:12 -0700
Stefan Beller  wrote:

> +/* Find blocks of moved code, delegate actual coloring decision to helper */
> +static void mark_color_as_moved(struct diff_options *o,
> + struct hashmap *add_lines,
> + struct hashmap *del_lines)
> +{

[snip]

> + if (flipped_block)
> + l->flags |= DIFF_SYMBOL_MOVED_LINE_ZEBRA;

This should probably be DIFF_SYMBOL_MOVED_LINE_ALT. "Zebra" refers to
both the stripes, not just the alternate stripe.


Re: [PATCH 15/26] submodule.c: migrate diff output to use emit_diff_symbol

2017-06-20 Thread Jonathan Tan
On Mon, 19 Jun 2017 19:48:05 -0700
Stefan Beller  wrote:

> As the submodule process is no longer attached to the same stdout as
> the superprojects process, we need to pass coloring explicitly.

I found this confusing - what difference does the stdout make? If they
were the same stdout, one process could set a color for the other
process to follow, but it wouldn't be able to affect color changes
inside the output, right?

> Remove the colors from the function signatures, as all the coloring
> decisions will be made either inside the child process or the final
> emit_diff_symbol.

This seems to be of the same pattern as the others. I would write this
as:

Create diff_emit_submodule_.* functions that make its own coloring
decisions and migrate submodule.c to use them. This centralizes
diff output, including coloring information, in one file.

or something like that.


Re: [PATCH 11/26] diff.c: emit_diff_symbol learns DIFF_SYMBOL_FILEPAIR

2017-06-20 Thread Jonathan Tan
On Mon, 19 Jun 2017 19:48:01 -0700
Stefan Beller  wrote:

> @@ -676,6 +677,14 @@ static void emit_diff_symbol(struct diff_options *o, 
> enum diff_symbol s,
>   }
>   emit_line(o, context, reset, line, len);
>   break;
> + case DIFF_SYMBOL_FILEPAIR:
> + meta = diff_get_color_opt(o, DIFF_METAINFO);
> + reset = diff_get_color_opt(o, DIFF_RESET);
> + fprintf(o->file, "%s%s%s%s%s%s\n", diff_line_prefix(o), meta,
> + flags ? "+++ " : "--- ",

I think that a constant should be defined for this flag, just like for
content lines. If not it is not immediately obvious that a flag should
be set for the +++ lines.

> + line, reset,
> + strchr(line, ' ') ? "\t" : "");
> + break;
>   default:
>   die("BUG: unknown diff symbol");
>   }


Re: [PATCH v3 04/20] repository: introduce the repository object

2017-06-20 Thread Stefan Beller
On Tue, Jun 20, 2017 at 12:19 PM, Brandon Williams  wrote:
> Introduce the repository object 'struct repository' which can be used to
> hold all state pertaining to a git repository.
>
> Some of the benefits of object-ifying a repository are:
>
>   1. Make the code base more readable and easier to reason about.
>
>   2. Allow for working on multiple repositories, specifically
>  submodules, within the same process.  Currently the process for
>  working on a submodule involves setting up an argv_array of options
>  for a particular command and then launching a child process to
>  execute the command in the context of the submodule.  This is
>  clunky and can require lots of little hacks in order to ensure
>  correctness.  Ideally it would be nice to simply pass a repository
>  and an options struct to a command.
>
>   3. Eliminating reliance on global state will make it easier to
>  enable the use of threading to improve performance.
>
> Signed-off-by: Brandon Williams 

> +/*
> + * Initialize 'repo' based on the provided 'gitdir'.
> + * Return 0 upon success and a non-zero value upon failure.

Non zero or negative? The point of the question is if we want to
ask users of this function to be cautious early on. So in the future,
do we want to rather see

if (repo_init(...))
die("you're doomed");

or rather

int x = repo_init(...);
if (x < 0)
die("you're doomed");
else if (x == 1)
warning("you're not doomed, but close."\
 "Not distimming the gostaks.")
else
; /* we're fine, carry on with life */

I guess we can still refactor later, it's just one
thing to thing about when introducing an API
that will likely be used a lot down the road.

> +struct repository {
> +   /* Environment */
> +   /* Path to the git directory */
> +   char *gitdir;
> +
> +   /* Path to the common git directory */
> +   char *commondir;
> +
> +   /* Path to the repository's object store */
> +   char *objectdir;
> +
> +   /* Path to the repository's graft file */
> +   char *graft_file;
> +
> +   /* Path to the current worktree's index file */
> +   char *index_file;
> +
> +   /* Path to the working directory */
> +   char *worktree;
> +
> +   /* Configurations */
> +   /*
> +* Bit used during initialization to indicate if repository state 
> (like
> +* the location of the 'objectdir') should be read from the
> +* environment.  By default this bit will be set at the begining of
> +* 'repo_init()' so that all repositories will ignore the environment.
> +* The exception to this is 'the_repository', which doesn't go through
> +* the normal 'repo_init()' process.
> +*/
> +   unsigned ignore_env:1;
> +
> +   /* Indicate if a repository has a different 'commondir' from 'gitdir' 
> */
> +   unsigned different_commondir:1;
> +};

I applaud the effort towards documenting what each variable is
supposed to contain. But some of them read like

/* increments i by one */
i++;

which is considered bad comment style (it doesn't add
more information, it just wastes a line), so specifically for
all the "Path to X" comments:
* Are they absolute path, or relative path?
  If relative, then relative to what?
* Can they be NULL? When?

(* Why do we need so many path?
Could one of them be constructed using
another and then hardcoding a string relative to it?
This question may rather be answered in the commit
message)


Re: [PATCH v3 00/20] repository object

2017-06-20 Thread Stefan Beller
On Tue, Jun 20, 2017 at 12:19 PM, Brandon Williams  wrote:
> I decided to split up the original series into three parts in order to make
> review a little bit easier.  As such this series is dependent on on
> 'bw/config-h' and 'bw/ls-files-sans-the-index' which should be moving into
> 'next' soon.  Due to this I was able to greatly shrink this series in terms of
> number of patches so hopefully it is a little easier to review.
>
> As before you can find this series at:
> https://github.com/bmwill/git/tree/repository-object

If you plan on building 'foreach' on top of Brandons series,
maybe start off the latest version here.

>
> Chagnes in v3:
...
>
>  * Added an additional initialization function to allow initializing a 'struct
>repository' as a submodule of another 'struct repository'.

This one seems relevant for 'foreach'


[PATCH v3 14/20] config: read config from a repository object

2017-06-20 Thread Brandon Williams
Teach the config machinery to read config information from a repository
object.  This involves storing a 'struct config_set' inside the
repository object and adding a number of functions (repo_config*) to be
able to query a repository's config.

The current config API enables lazy-loading of the config.  This means
that when 'git_config_get_int()' is called, if the_config_set hasn't
been populated yet, then it will be populated and properly initialized by
reading the necessary config files (system wide .gitconfig, user's home
.gitconfig, and the repository's config).  To maintain this paradigm,
the new API to read from a repository object's config will also perform
this lazy-initialization.

Since both APIs (git_config_get* and repo_config_get*) have the same
semantics we can migrate the default config to be stored within
'the_repository' and just have the 'git_config_get*' family of functions
redirect to the 'repo_config_get*' functions.

Signed-off-by: Brandon Williams 
---
 config.c | 216 +++
 config.h |  24 +++
 repository.c |   7 ++
 repository.h |  10 +++
 4 files changed, 183 insertions(+), 74 deletions(-)

diff --git a/config.c b/config.c
index 6f0f8b30f..be1c640a4 100644
--- a/config.c
+++ b/config.c
@@ -7,6 +7,7 @@
  */
 #include "cache.h"
 #include "config.h"
+#include "repository.h"
 #include "lockfile.h"
 #include "exec_cmd.h"
 #include "strbuf.h"
@@ -72,13 +73,6 @@ static int core_compression_seen;
 static int pack_compression_seen;
 static int zlib_compression_seen;
 
-/*
- * Default config_set that contains key-value pairs from the usual set of 
config
- * config files (i.e repo specific .git/config, user wide ~/.gitconfig, XDG
- * config file and the global /etc/gitconfig)
- */
-static struct config_set the_config_set;
-
 static int config_file_fgetc(struct config_source *conf)
 {
return getc_unlocked(conf->u.file);
@@ -1605,31 +1599,6 @@ int config_with_options(config_fn_t fn, void *data,
return do_git_config_sequence(opts, fn, data);
 }
 
-static void git_config_raw(config_fn_t fn, void *data)
-{
-   struct config_options opts = {0};
-
-   opts.respect_includes = 1;
-   if (have_git_dir()) {
-   opts.commondir = get_git_common_dir();
-   opts.git_dir = get_git_dir();
-   }
-
-   if (config_with_options(fn, data, NULL, ) < 0)
-   /*
-* config_with_options() normally returns only
-* zero, as most errors are fatal, and
-* non-fatal potential errors are guarded by "if"
-* statements that are entered only when no error is
-* possible.
-*
-* If we ever encounter a non-fatal error, it means
-* something went really wrong and we should stop
-* immediately.
-*/
-   die(_("unknown error occurred while reading the configuration 
files"));
-}
-
 static void configset_iter(struct config_set *cs, config_fn_t fn, void *data)
 {
int i, value_index;
@@ -1683,14 +1652,6 @@ void read_early_config(config_fn_t cb, void *data)
strbuf_release();
 }
 
-static void git_config_check_init(void);
-
-void git_config(config_fn_t fn, void *data)
-{
-   git_config_check_init();
-   configset_iter(_config_set, fn, data);
-}
-
 static struct config_set_element *configset_find_element(struct config_set 
*cs, const char *key)
 {
struct config_set_element k;
@@ -1900,87 +1861,194 @@ int git_configset_get_pathname(struct config_set *cs, 
const char *key, const cha
return 1;
 }
 
-static void git_config_check_init(void)
+/* Functions use to read configuration from a repository */
+static void repo_read_config(struct repository *repo)
 {
-   if (the_config_set.hash_initialized)
+   struct config_options opts;
+
+   opts.respect_includes = 1;
+   opts.commondir = repo->commondir;
+   opts.git_dir = repo->gitdir;
+
+   if (!repo->config)
+   repo->config = xcalloc(1, sizeof(struct config_set));
+   else
+   git_configset_clear(repo->config);
+
+   git_configset_init(repo->config);
+
+   if (config_with_options(config_set_callback, repo->config, NULL, ) 
< 0)
+   /*
+* config_with_options() normally returns only
+* zero, as most errors are fatal, and
+* non-fatal potential errors are guarded by "if"
+* statements that are entered only when no error is
+* possible.
+*
+* If we ever encounter a non-fatal error, it means
+* something went really wrong and we should stop
+* immediately.
+*/
+   die(_("unknown error occurred while reading the configuration 
files"));
+}
+
+static void git_config_check_init(struct 

[PATCH v3 16/20] submodule-config: store the_submodule_cache in the_repository

2017-06-20 Thread Brandon Williams
Refactor how 'the_submodule_cache' is handled so that it can be stored
inside of a repository object.  Also migrate 'the_submodule_cache' to be
stored in 'the_repository'.

Signed-off-by: Brandon Williams 
---
 repository.c   |  6 +
 repository.h   |  4 
 submodule-config.c | 70 --
 submodule-config.h | 10 
 4 files changed, 72 insertions(+), 18 deletions(-)

diff --git a/repository.c b/repository.c
index 883e6e9e9..317041a4a 100644
--- a/repository.c
+++ b/repository.c
@@ -1,6 +1,7 @@
 #include "cache.h"
 #include "repository.h"
 #include "config.h"
+#include "submodule-config.h"
 
 /* The main repository */
 static struct repository the_repo;
@@ -181,6 +182,11 @@ void repo_clear(struct repository *repo)
repo->index = NULL;
}
 
+   if (repo->submodule_cache) {
+   submodule_cache_free(repo->submodule_cache);
+   repo->submodule_cache = NULL;
+   }
+
memset(repo, 0, sizeof(*repo));
 }
 
diff --git a/repository.h b/repository.h
index 1fa65c42f..4bc70ebc5 100644
--- a/repository.h
+++ b/repository.h
@@ -3,6 +3,7 @@
 
 struct config_set;
 struct index_state;
+struct submodule_cache;
 
 struct repository {
/* Environment */
@@ -32,6 +33,9 @@ struct repository {
 */
struct config_set *config;
 
+   /* Repository's submodule config as defined by '.gitmodules' */
+   struct submodule_cache *submodule_cache;
+
/* Repository's in-memory index */
struct index_state *index;
 
diff --git a/submodule-config.c b/submodule-config.c
index d8f8d5ea3..37cfcceb9 100644
--- a/submodule-config.c
+++ b/submodule-config.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "repository.h"
 #include "config.h"
 #include "submodule-config.h"
 #include "submodule.h"
@@ -15,6 +16,7 @@
 struct submodule_cache {
struct hashmap for_path;
struct hashmap for_name;
+   unsigned initialized:1;
 };
 
 /*
@@ -31,9 +33,6 @@ enum lookup_type {
lookup_path
 };
 
-static struct submodule_cache the_submodule_cache;
-static int is_cache_init;
-
 static int config_path_cmp(const struct submodule_entry *a,
   const struct submodule_entry *b,
   const void *unused)
@@ -50,10 +49,16 @@ static int config_name_cmp(const struct submodule_entry *a,
   hashcmp(a->config->gitmodules_sha1, b->config->gitmodules_sha1);
 }
 
-static void cache_init(struct submodule_cache *cache)
+static struct submodule_cache *submodule_cache_alloc(void)
+{
+   return xcalloc(1, sizeof(struct submodule_cache));
+}
+
+static void submodule_cache_init(struct submodule_cache *cache)
 {
hashmap_init(>for_path, (hashmap_cmp_fn) config_path_cmp, 0);
hashmap_init(>for_name, (hashmap_cmp_fn) config_name_cmp, 0);
+   cache->initialized = 1;
 }
 
 static void free_one_config(struct submodule_entry *entry)
@@ -65,11 +70,14 @@ static void free_one_config(struct submodule_entry *entry)
free(entry->config);
 }
 
-static void cache_free(struct submodule_cache *cache)
+static void submodule_cache_clear(struct submodule_cache *cache)
 {
struct hashmap_iter iter;
struct submodule_entry *entry;
 
+   if (!cache->initialized)
+   return;
+
/*
 * We iterate over the name hash here to be symmetric with the
 * allocation of struct submodule entries. Each is allocated by
@@ -81,6 +89,13 @@ static void cache_free(struct submodule_cache *cache)
 
hashmap_free(>for_path, 1);
hashmap_free(>for_name, 1);
+   cache->initialized = 0;
+}
+
+void submodule_cache_free(struct submodule_cache *cache)
+{
+   submodule_cache_clear(cache);
+   free(cache);
 }
 
 static unsigned int hash_sha1_string(const unsigned char *sha1,
@@ -494,43 +509,62 @@ static const struct submodule *config_from(struct 
submodule_cache *cache,
return submodule;
 }
 
-static void ensure_cache_init(void)
+static void submodule_cache_check_init(struct repository *repo)
 {
-   if (is_cache_init)
+   if (repo->submodule_cache && repo->submodule_cache->initialized)
return;
 
-   cache_init(_submodule_cache);
-   is_cache_init = 1;
+   if (!repo->submodule_cache)
+   repo->submodule_cache = submodule_cache_alloc();
+
+   submodule_cache_init(repo->submodule_cache);
 }
 
-int parse_submodule_config_option(const char *var, const char *value)
+int submodule_config_option(struct repository *repo,
+   const char *var, const char *value)
 {
struct parse_config_parameter parameter;
-   parameter.cache = _submodule_cache;
+
+   submodule_cache_check_init(repo);
+
+   parameter.cache = repo->submodule_cache;
parameter.treeish_name = NULL;
parameter.gitmodules_sha1 = null_sha1;
parameter.overwrite = 1;
 
-   ensure_cache_init();

[PATCH v3 19/20] repository: enable initialization of submodules

2017-06-20 Thread Brandon Williams
Introduce 'repo_submodule_init()' which performs initialization of a
'struct repository' as a submodule of another 'struct repository'.

The resulting submodule can be in one of three states:

  1. The submodule is initialized and has a worktree.

  2. The submodule is initialized but does not have a worktree.  This
 would occur when the submodule's gitdir is present in the
 superproject's 'gitdir/modules/' directory yet the submodule has not
 been checked out in superproject's worktree.

  3. The submodule remains uninitialized due to an error in the
 initialization process or there is no matching submodule at the
 provided path in the superproject.

Signed-off-by: Brandon Williams 
---
 repository.c | 44 
 repository.h | 10 ++
 2 files changed, 54 insertions(+)

diff --git a/repository.c b/repository.c
index 317041a4a..673b35ca5 100644
--- a/repository.c
+++ b/repository.c
@@ -155,6 +155,48 @@ int repo_init(struct repository *repo, const char *gitdir, 
const char *worktree)
return -1;
 }
 
+/*
+ * Initialize 'submodule' as the submodule given by 'path' in parent repository
+ * 'superproject'.
+ * Return 0 upon success and a non-zero value upon failure.
+ */
+int repo_submodule_init(struct repository *submodule,
+   struct repository *superproject,
+   const char *path)
+{
+   const struct submodule *sub;
+   struct strbuf submodule_path = STRBUF_INIT;
+   int ret = 0;
+
+   sub = submodule_from_cache(superproject, null_sha1, path);
+   if (!sub) {
+   ret = -1;
+   goto out;
+   }
+
+   strbuf_repo_worktree_path(_path, superproject, "%s", path);
+
+   if (repo_init(submodule, submodule_path.buf, submodule_path.buf)) {
+   strbuf_reset(_path);
+   strbuf_repo_git_path(_path, superproject,
+"modules/%s", sub->name);
+
+   if (repo_init(submodule, submodule_path.buf, NULL)) {
+   ret = -1;
+   goto out;
+   }
+   }
+
+   submodule->submodule_prefix = xstrfmt("%s%s/",
+ superproject->submodule_prefix ?
+ superproject->submodule_prefix :
+ "", path);
+
+out:
+   strbuf_release(_path);
+   return ret;
+}
+
 void repo_clear(struct repository *repo)
 {
free(repo->gitdir);
@@ -169,6 +211,8 @@ void repo_clear(struct repository *repo)
repo->index_file = NULL;
free(repo->worktree);
repo->worktree = NULL;
+   free(repo->submodule_prefix);
+   repo->submodule_prefix = NULL;
 
if (repo->config) {
git_configset_clear(repo->config);
diff --git a/repository.h b/repository.h
index 4bc70ebc5..7b352accb 100644
--- a/repository.h
+++ b/repository.h
@@ -25,6 +25,13 @@ struct repository {
/* Path to the working directory */
char *worktree;
 
+   /*
+* Path from the root of the top-level superproject down to this
+* repository.  This is only non-NULL if the repository is initialized
+* as a submodule of another repository.
+*/
+   char *submodule_prefix;
+
/* Subsystems */
/*
 * Repository's config which contains key-value pairs from the usual
@@ -59,6 +66,9 @@ extern struct repository *the_repository;
 extern void repo_set_gitdir(struct repository *repo, const char *path);
 extern void repo_set_worktree(struct repository *repo, const char *path);
 extern int repo_init(struct repository *repo, const char *gitdir, const char 
*worktree);
+extern int repo_submodule_init(struct repository *submodule,
+  struct repository *superproject,
+  const char *path);
 extern void repo_clear(struct repository *repo);
 
 extern int repo_read_index(struct repository *repo);
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 12/20] path: add repo_git_path and strbuf_repo_git_path

2017-06-20 Thread Brandon Williams
Introduce 'repo_git_path' and 'strbuf_repo_git_path' which take a
repository struct and constructs a path into the repository's git
directory.

Signed-off-by: Brandon Williams 
---
 path.c | 21 +
 path.h |  8 
 2 files changed, 29 insertions(+)

diff --git a/path.c b/path.c
index c6832a30e..3d223e0b2 100644
--- a/path.c
+++ b/path.c
@@ -422,6 +422,27 @@ static void do_git_path(const struct repository *repo,
strbuf_cleanup_path(buf);
 }
 
+char *repo_git_path(const struct repository *repo,
+   const char *fmt, ...)
+{
+   struct strbuf path = STRBUF_INIT;
+   va_list args;
+   va_start(args, fmt);
+   do_git_path(repo, NULL, , fmt, args);
+   va_end(args);
+   return strbuf_detach(, NULL);
+}
+
+void strbuf_repo_git_path(struct strbuf *sb,
+ const struct repository *repo,
+ const char *fmt, ...)
+{
+   va_list args;
+   va_start(args, fmt);
+   do_git_path(repo, NULL, sb, fmt, args);
+   va_end(args);
+}
+
 char *git_path_buf(struct strbuf *buf, const char *fmt, ...)
 {
va_list args;
diff --git a/path.h b/path.h
index 568d63be5..c779c4aa2 100644
--- a/path.h
+++ b/path.h
@@ -35,6 +35,14 @@ extern char *mkpathdup(const char *fmt, ...)
 extern char *git_pathdup_submodule(const char *path, const char *fmt, ...)
__attribute__((format (printf, 2, 3)));
 
+extern char *repo_git_path(const struct repository *repo,
+  const char *fmt, ...)
+   __attribute__((format (printf, 2, 3)));
+extern void strbuf_repo_git_path(struct strbuf *sb,
+const struct repository *repo,
+const char *fmt, ...)
+   __attribute__((format (printf, 3, 4)));
+
 extern void report_linked_checkout_garbage(void);
 
 /*
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 13/20] path: add repo_worktree_path and strbuf_repo_worktree_path

2017-06-20 Thread Brandon Williams
Introduce 'repo_worktree_path' and 'strbuf_repo_worktree_path' which
take a repository struct and constructs a path relative to the
repository's worktree.

Signed-off-by: Brandon Williams 
---
 path.c | 41 +
 path.h |  8 
 2 files changed, 49 insertions(+)

diff --git a/path.c b/path.c
index 3d223e0b2..a32c19547 100644
--- a/path.c
+++ b/path.c
@@ -512,6 +512,47 @@ const char *worktree_git_path(const struct worktree *wt, 
const char *fmt, ...)
return pathname->buf;
 }
 
+static void do_worktree_path(const struct repository *repo,
+struct strbuf *buf,
+const char *fmt, va_list args)
+{
+   strbuf_addstr(buf, repo->worktree);
+   if(buf->len && !is_dir_sep(buf->buf[buf->len - 1]))
+   strbuf_addch(buf, '/');
+
+   strbuf_vaddf(buf, fmt, args);
+   strbuf_cleanup_path(buf);
+}
+
+char *repo_worktree_path(const struct repository *repo, const char *fmt, ...)
+{
+   struct strbuf path = STRBUF_INIT;
+   va_list args;
+
+   if (!repo->worktree)
+   return NULL;
+
+   va_start(args, fmt);
+   do_worktree_path(repo, , fmt, args);
+   va_end(args);
+
+   return strbuf_detach(, NULL);
+}
+
+void strbuf_repo_worktree_path(struct strbuf *sb,
+  const struct repository *repo,
+  const char *fmt, ...)
+{
+   va_list args;
+
+   if (!repo->worktree)
+   return;
+
+   va_start(args, fmt);
+   do_worktree_path(repo, sb, fmt, args);
+   va_end(args);
+}
+
 /* Returns 0 on success, negative on failure. */
 static int do_submodule_path(struct strbuf *buf, const char *path,
 const char *fmt, va_list args)
diff --git a/path.h b/path.h
index c779c4aa2..9541620c7 100644
--- a/path.h
+++ b/path.h
@@ -43,6 +43,14 @@ extern void strbuf_repo_git_path(struct strbuf *sb,
 const char *fmt, ...)
__attribute__((format (printf, 3, 4)));
 
+extern char *repo_worktree_path(const struct repository *repo,
+   const char *fmt, ...)
+   __attribute__((format (printf, 2, 3)));
+extern void strbuf_repo_worktree_path(struct strbuf *sb,
+ const struct repository *repo,
+ const char *fmt, ...)
+   __attribute__((format (printf, 3, 4)));
+
 extern void report_linked_checkout_garbage(void);
 
 /*
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 15/20] repository: add index_state to struct repo

2017-06-20 Thread Brandon Williams
Signed-off-by: Brandon Williams 
---
 repository.c | 16 
 repository.h |  6 ++
 2 files changed, 22 insertions(+)

diff --git a/repository.c b/repository.c
index 3dcfec3b8..883e6e9e9 100644
--- a/repository.c
+++ b/repository.c
@@ -175,5 +175,21 @@ void repo_clear(struct repository *repo)
repo->config = NULL;
}
 
+   if (repo->index) {
+   discard_index(repo->index);
+   free(repo->index);
+   repo->index = NULL;
+   }
+
memset(repo, 0, sizeof(*repo));
 }
+
+int repo_read_index(struct repository *repo)
+{
+   if (!repo->index)
+   repo->index = xcalloc(1, sizeof(struct index_state));
+   else
+   discard_index(repo->index);
+
+   return read_index_from(repo->index, repo->index_file);
+}
diff --git a/repository.h b/repository.h
index 379b64170..1fa65c42f 100644
--- a/repository.h
+++ b/repository.h
@@ -2,6 +2,7 @@
 #define REPOSITORY_H
 
 struct config_set;
+struct index_state;
 
 struct repository {
/* Environment */
@@ -31,6 +32,9 @@ struct repository {
 */
struct config_set *config;
 
+   /* Repository's in-memory index */
+   struct index_state *index;
+
/* Configurations */
/*
 * Bit used during initialization to indicate if repository state (like
@@ -53,4 +57,6 @@ extern void repo_set_worktree(struct repository *repo, const 
char *path);
 extern int repo_init(struct repository *repo, const char *gitdir, const char 
*worktree);
 extern void repo_clear(struct repository *repo);
 
+extern int repo_read_index(struct repository *repo);
+
 #endif /* REPOSITORY_H */
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 10/20] path: convert do_git_path to take a 'struct repository'

2017-06-20 Thread Brandon Williams
In preparation to adding 'git_path' like functions which operate on a
'struct repository' convert 'do_git_path' to take a 'struct repository'.

Signed-off-by: Brandon Williams 
---
 path.c | 42 --
 1 file changed, 28 insertions(+), 14 deletions(-)

diff --git a/path.c b/path.c
index 9be6804a9..76a872297 100644
--- a/path.c
+++ b/path.c
@@ -371,33 +371,47 @@ void report_linked_checkout_garbage(void)
strbuf_release();
 }
 
-static void adjust_git_path(struct strbuf *buf, int git_dir_len)
+static void adjust_git_path(const struct repository *repo,
+   struct strbuf *buf, int git_dir_len)
 {
const char *base = buf->buf + git_dir_len;
if (is_dir_file(base, "info", "grafts"))
strbuf_splice(buf, 0, buf->len,
- get_graft_file(), strlen(get_graft_file()));
+ repo->graft_file, strlen(repo->graft_file));
else if (!strcmp(base, "index"))
strbuf_splice(buf, 0, buf->len,
- get_index_file(), strlen(get_index_file()));
+ repo->index_file, strlen(repo->index_file));
else if (dir_prefix(base, "objects"))
-   replace_dir(buf, git_dir_len + 7, get_object_directory());
+   replace_dir(buf, git_dir_len + 7, repo->objectdir);
else if (git_hooks_path && dir_prefix(base, "hooks"))
replace_dir(buf, git_dir_len + 5, git_hooks_path);
-   else if (the_repository->different_commondir)
-   update_common_dir(buf, git_dir_len, get_git_common_dir());
+   else if (repo->different_commondir)
+   update_common_dir(buf, git_dir_len, repo->commondir);
 }
 
-static void do_git_path(const struct worktree *wt, struct strbuf *buf,
+static void strbuf_worktree_gitdir(struct strbuf *buf,
+  const struct repository *repo,
+  const struct worktree *wt)
+{
+   if (!wt)
+   strbuf_addstr(buf, repo->gitdir);
+   else if (!wt->id)
+   strbuf_addstr(buf, repo->commondir);
+   else
+   strbuf_git_common_path(buf, repo, "worktrees/%s", wt->id);
+}
+
+static void do_git_path(const struct repository *repo,
+   const struct worktree *wt, struct strbuf *buf,
const char *fmt, va_list args)
 {
int gitdir_len;
-   strbuf_addstr(buf, get_worktree_git_dir(wt));
+   strbuf_worktree_gitdir(buf, repo, wt);
if (buf->len && !is_dir_sep(buf->buf[buf->len - 1]))
strbuf_addch(buf, '/');
gitdir_len = buf->len;
strbuf_vaddf(buf, fmt, args);
-   adjust_git_path(buf, gitdir_len);
+   adjust_git_path(repo, buf, gitdir_len);
strbuf_cleanup_path(buf);
 }
 
@@ -406,7 +420,7 @@ char *git_path_buf(struct strbuf *buf, const char *fmt, ...)
va_list args;
strbuf_reset(buf);
va_start(args, fmt);
-   do_git_path(NULL, buf, fmt, args);
+   do_git_path(the_repository, NULL, buf, fmt, args);
va_end(args);
return buf->buf;
 }
@@ -415,7 +429,7 @@ void strbuf_git_path(struct strbuf *sb, const char *fmt, 
...)
 {
va_list args;
va_start(args, fmt);
-   do_git_path(NULL, sb, fmt, args);
+   do_git_path(the_repository, NULL, sb, fmt, args);
va_end(args);
 }
 
@@ -424,7 +438,7 @@ const char *git_path(const char *fmt, ...)
struct strbuf *pathname = get_pathname();
va_list args;
va_start(args, fmt);
-   do_git_path(NULL, pathname, fmt, args);
+   do_git_path(the_repository, NULL, pathname, fmt, args);
va_end(args);
return pathname->buf;
 }
@@ -434,7 +448,7 @@ char *git_pathdup(const char *fmt, ...)
struct strbuf path = STRBUF_INIT;
va_list args;
va_start(args, fmt);
-   do_git_path(NULL, , fmt, args);
+   do_git_path(the_repository, NULL, , fmt, args);
va_end(args);
return strbuf_detach(, NULL);
 }
@@ -465,7 +479,7 @@ const char *worktree_git_path(const struct worktree *wt, 
const char *fmt, ...)
struct strbuf *pathname = get_pathname();
va_list args;
va_start(args, fmt);
-   do_git_path(wt, pathname, fmt, args);
+   do_git_path(the_repository, wt, pathname, fmt, args);
va_end(args);
return pathname->buf;
 }
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 07/20] path: create path.h

2017-06-20 Thread Brandon Williams
Move all path related declarations from cache.h to a new path.h header
file.  This makes cache.h smaller and makes it easier to add new path
related functions.

Signed-off-by: Brandon Williams 
---
 cache.h | 59 +--
 path.c  |  1 +
 path.h  | 62 ++
 3 files changed, 64 insertions(+), 58 deletions(-)
 create mode 100644 path.h

diff --git a/cache.h b/cache.h
index cd64cbc81..c958fc3ce 100644
--- a/cache.h
+++ b/cache.h
@@ -11,6 +11,7 @@
 #include "string-list.h"
 #include "pack-revindex.h"
 #include "hash.h"
+#include "path.h"
 
 #ifndef platform_SHA_CTX
 /*
@@ -892,64 +893,6 @@ extern void check_repository_format(void);
 #define DATA_CHANGED0x0020
 #define TYPE_CHANGED0x0040
 
-/*
- * Return a statically allocated filename, either generically (mkpath), in
- * the repository directory (git_path), or in a submodule's repository
- * directory (git_path_submodule). In all cases, note that the result
- * may be overwritten by another call to _any_ of the functions. Consider
- * using the safer "dup" or "strbuf" formats below (in some cases, the
- * unsafe versions have already been removed).
- */
-extern const char *mkpath(const char *fmt, ...) __attribute__((format (printf, 
1, 2)));
-extern const char *git_path(const char *fmt, ...) __attribute__((format 
(printf, 1, 2)));
-extern const char *git_common_path(const char *fmt, ...) __attribute__((format 
(printf, 1, 2)));
-
-extern char *mksnpath(char *buf, size_t n, const char *fmt, ...)
-   __attribute__((format (printf, 3, 4)));
-extern void strbuf_git_path(struct strbuf *sb, const char *fmt, ...)
-   __attribute__((format (printf, 2, 3)));
-extern void strbuf_git_common_path(struct strbuf *sb, const char *fmt, ...)
-   __attribute__((format (printf, 2, 3)));
-extern char *git_path_buf(struct strbuf *buf, const char *fmt, ...)
-   __attribute__((format (printf, 2, 3)));
-extern int strbuf_git_path_submodule(struct strbuf *sb, const char *path,
-const char *fmt, ...)
-   __attribute__((format (printf, 3, 4)));
-extern char *git_pathdup(const char *fmt, ...)
-   __attribute__((format (printf, 1, 2)));
-extern char *mkpathdup(const char *fmt, ...)
-   __attribute__((format (printf, 1, 2)));
-extern char *git_pathdup_submodule(const char *path, const char *fmt, ...)
-   __attribute__((format (printf, 2, 3)));
-
-extern void report_linked_checkout_garbage(void);
-
-/*
- * You can define a static memoized git path like:
- *
- *static GIT_PATH_FUNC(git_path_foo, "FOO");
- *
- * or use one of the global ones below.
- */
-#define GIT_PATH_FUNC(func, filename) \
-   const char *func(void) \
-   { \
-   static char *ret; \
-   if (!ret) \
-   ret = git_pathdup(filename); \
-   return ret; \
-   }
-
-const char *git_path_cherry_pick_head(void);
-const char *git_path_revert_head(void);
-const char *git_path_squash_msg(void);
-const char *git_path_merge_msg(void);
-const char *git_path_merge_rr(void);
-const char *git_path_merge_mode(void);
-const char *git_path_merge_head(void);
-const char *git_path_fetch_head(void);
-const char *git_path_shallow(void);
-
 /*
  * Return the name of the file in the local object database that would
  * be used to store a loose object with the specified sha1.  The
diff --git a/path.c b/path.c
index e4abea083..41c861c96 100644
--- a/path.c
+++ b/path.c
@@ -8,6 +8,7 @@
 #include "dir.h"
 #include "worktree.h"
 #include "submodule-config.h"
+#include "path.h"
 
 static int get_st_mode_bits(const char *path, int *mode)
 {
diff --git a/path.h b/path.h
new file mode 100644
index 0..522cd029b
--- /dev/null
+++ b/path.h
@@ -0,0 +1,62 @@
+#ifndef PATH_H
+#define PATH_H
+
+/*
+ * Return a statically allocated filename, either generically (mkpath), in
+ * the repository directory (git_path), or in a submodule's repository
+ * directory (git_path_submodule). In all cases, note that the result
+ * may be overwritten by another call to _any_ of the functions. Consider
+ * using the safer "dup" or "strbuf" formats below (in some cases, the
+ * unsafe versions have already been removed).
+ */
+extern const char *mkpath(const char *fmt, ...) __attribute__((format (printf, 
1, 2)));
+extern const char *git_path(const char *fmt, ...) __attribute__((format 
(printf, 1, 2)));
+extern const char *git_common_path(const char *fmt, ...) __attribute__((format 
(printf, 1, 2)));
+
+extern char *mksnpath(char *buf, size_t n, const char *fmt, ...)
+   __attribute__((format (printf, 3, 4)));
+extern void strbuf_git_path(struct strbuf *sb, const char *fmt, ...)
+   __attribute__((format (printf, 2, 3)));
+extern void strbuf_git_common_path(struct strbuf *sb, const char *fmt, ...)
+   __attribute__((format (printf, 2, 3)));
+extern char *git_path_buf(struct strbuf *buf, 

[PATCH v3 09/20] path: convert strbuf_git_common_path to take a 'struct repository'

2017-06-20 Thread Brandon Williams
Signed-off-by: Brandon Williams 
---
 path.c | 13 -
 path.h |  8 ++--
 worktree.c |  3 ++-
 3 files changed, 16 insertions(+), 8 deletions(-)

diff --git a/path.c b/path.c
index 2434921d8..9be6804a9 100644
--- a/path.c
+++ b/path.c
@@ -524,11 +524,12 @@ int strbuf_git_path_submodule(struct strbuf *buf, const 
char *path,
return err;
 }
 
-static void do_git_common_path(struct strbuf *buf,
+static void do_git_common_path(const struct repository *repo,
+  struct strbuf *buf,
   const char *fmt,
   va_list args)
 {
-   strbuf_addstr(buf, get_git_common_dir());
+   strbuf_addstr(buf, repo->commondir);
if (buf->len && !is_dir_sep(buf->buf[buf->len - 1]))
strbuf_addch(buf, '/');
strbuf_vaddf(buf, fmt, args);
@@ -540,16 +541,18 @@ const char *git_common_path(const char *fmt, ...)
struct strbuf *pathname = get_pathname();
va_list args;
va_start(args, fmt);
-   do_git_common_path(pathname, fmt, args);
+   do_git_common_path(the_repository, pathname, fmt, args);
va_end(args);
return pathname->buf;
 }
 
-void strbuf_git_common_path(struct strbuf *sb, const char *fmt, ...)
+void strbuf_git_common_path(struct strbuf *sb,
+   const struct repository *repo,
+   const char *fmt, ...)
 {
va_list args;
va_start(args, fmt);
-   do_git_common_path(sb, fmt, args);
+   do_git_common_path(repo, sb, fmt, args);
va_end(args);
 }
 
diff --git a/path.h b/path.h
index 522cd029b..568d63be5 100644
--- a/path.h
+++ b/path.h
@@ -1,6 +1,8 @@
 #ifndef PATH_H
 #define PATH_H
 
+struct repository;
+
 /*
  * Return a statically allocated filename, either generically (mkpath), in
  * the repository directory (git_path), or in a submodule's repository
@@ -17,8 +19,10 @@ extern char *mksnpath(char *buf, size_t n, const char *fmt, 
...)
__attribute__((format (printf, 3, 4)));
 extern void strbuf_git_path(struct strbuf *sb, const char *fmt, ...)
__attribute__((format (printf, 2, 3)));
-extern void strbuf_git_common_path(struct strbuf *sb, const char *fmt, ...)
-   __attribute__((format (printf, 2, 3)));
+extern void strbuf_git_common_path(struct strbuf *sb,
+  const struct repository *repo,
+  const char *fmt, ...)
+   __attribute__((format (printf, 3, 4)));
 extern char *git_path_buf(struct strbuf *buf, const char *fmt, ...)
__attribute__((format (printf, 2, 3)));
 extern int strbuf_git_path_submodule(struct strbuf *sb, const char *path,
diff --git a/worktree.c b/worktree.c
index 2801c6d52..e28ffbeb0 100644
--- a/worktree.c
+++ b/worktree.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "repository.h"
 #include "refs.h"
 #include "strbuf.h"
 #include "worktree.h"
@@ -76,7 +77,7 @@ static struct worktree *get_linked_worktree(const char *id)
if (!id)
die("Missing linked worktree name");
 
-   strbuf_git_common_path(, "worktrees/%s/gitdir", id);
+   strbuf_git_common_path(, the_repository, "worktrees/%s/gitdir", 
id);
if (strbuf_read_file(_path, path.buf, 0) <= 0)
/* invalid gitdir file */
goto done;
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 08/20] path: always pass in commondir to update_common_dir

2017-06-20 Thread Brandon Williams
Instead of passing in 'NULL' and having 'update_common_dir()' query for
the commondir, have the callers of 'update_common_dir()' be responsible
for providing the commondir.

Signed-off-by: Brandon Williams 
---
 path.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/path.c b/path.c
index 41c861c96..2434921d8 100644
--- a/path.c
+++ b/path.c
@@ -345,8 +345,6 @@ static void update_common_dir(struct strbuf *buf, int 
git_dir_len,
 {
char *base = buf->buf + git_dir_len;
init_common_trie();
-   if (!common_dir)
-   common_dir = get_git_common_dir();
if (trie_find(_trie, base, check_common, NULL) > 0)
replace_dir(buf, git_dir_len, common_dir);
 }
@@ -387,7 +385,7 @@ static void adjust_git_path(struct strbuf *buf, int 
git_dir_len)
else if (git_hooks_path && dir_prefix(base, "hooks"))
replace_dir(buf, git_dir_len + 5, git_hooks_path);
else if (the_repository->different_commondir)
-   update_common_dir(buf, git_dir_len, NULL);
+   update_common_dir(buf, git_dir_len, get_git_common_dir());
 }
 
 static void do_git_path(const struct worktree *wt, struct strbuf *buf,
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 18/20] submodule: convert is_submodule_initialized to work on a repository

2017-06-20 Thread Brandon Williams
Convert 'is_submodule_initialized()' to take a repository object and
while we're at it, lets rename the function to 'is_submodule_active()'
and remove the NEEDSWORK comment.

Signed-off-by: Brandon Williams 
---
 builtin/grep.c  |  3 ++-
 builtin/submodule--helper.c |  9 +
 submodule.c | 20 
 submodule.h |  2 +-
 4 files changed, 16 insertions(+), 18 deletions(-)

diff --git a/builtin/grep.c b/builtin/grep.c
index f61a9d938..e3ba1d98e 100644
--- a/builtin/grep.c
+++ b/builtin/grep.c
@@ -4,6 +4,7 @@
  * Copyright (c) 2006 Junio C Hamano
  */
 #include "cache.h"
+#include "repository.h"
 #include "config.h"
 #include "blob.h"
 #include "tree.h"
@@ -643,7 +644,7 @@ static int grep_submodule_launch(struct grep_opt *opt,
 static int grep_submodule(struct grep_opt *opt, const struct object_id *oid,
  const char *filename, const char *path)
 {
-   if (!is_submodule_initialized(path))
+   if (!is_submodule_active(the_repository, path))
return 0;
if (!is_submodule_populated_gently(path, NULL)) {
/*
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 8517032b3..e1b06c41d 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1,4 +1,5 @@
 #include "builtin.h"
+#include "repository.h"
 #include "cache.h"
 #include "config.h"
 #include "parse-options.h"
@@ -280,7 +281,7 @@ static void module_list_active(struct module_list *list)
for (i = 0; i < list->nr; i++) {
const struct cache_entry *ce = list->entries[i];
 
-   if (!is_submodule_initialized(ce->name))
+   if (!is_submodule_active(the_repository, ce->name))
continue;
 
ALLOC_GROW(active_modules.entries,
@@ -362,7 +363,7 @@ static void init_submodule(const char *path, const char 
*prefix, int quiet)
 *
 * Set active flag for the submodule being initialized
 */
-   if (!is_submodule_initialized(path)) {
+   if (!is_submodule_active(the_repository, path)) {
strbuf_reset();
strbuf_addf(, "submodule.%s.active", sub->name);
git_config_set_gently(sb.buf, "true");
@@ -817,7 +818,7 @@ static int prepare_to_clone_next_submodule(const struct 
cache_entry *ce,
}
 
/* Check if the submodule has been initialized. */
-   if (!is_submodule_initialized(ce->name)) {
+   if (!is_submodule_active(the_repository, ce->name)) {
next_submodule_warn_missing(suc, out, displaypath);
goto cleanup;
}
@@ -1193,7 +1194,7 @@ static int is_active(int argc, const char **argv, const 
char *prefix)
 
gitmodules_config();
 
-   return !is_submodule_initialized(argv[1]);
+   return !is_submodule_active(the_repository, argv[1]);
 }
 
 #define SUPPORT_SUPER_PREFIX (1<<0)
diff --git a/submodule.c b/submodule.c
index d0b894772..b23c25311 100644
--- a/submodule.c
+++ b/submodule.c
@@ -283,21 +283,17 @@ void gitmodules_config_sha1(const unsigned char 
*commit_sha1)
 }
 
 /*
- * NEEDSWORK: With the addition of different configuration options to determine
- * if a submodule is of interests, the validity of this function's name comes
- * into question.  Once the dust has settled and more concrete terminology is
- * decided upon, come up with a more proper name for this function.  One
- * potential candidate could be 'is_submodule_active()'.
- *
  * Determine if a submodule has been initialized at a given 'path'
  */
-int is_submodule_initialized(const char *path)
+int is_submodule_active(struct repository *repo, const char *path)
 {
int ret = 0;
char *key = NULL;
char *value = NULL;
const struct string_list *sl;
-   const struct submodule *module = submodule_from_path(null_sha1, path);
+   const struct submodule *module;
+
+   module = submodule_from_cache(repo, null_sha1, path);
 
/* early return if there isn't a path->module mapping */
if (!module)
@@ -305,14 +301,14 @@ int is_submodule_initialized(const char *path)
 
/* submodule..active is set */
key = xstrfmt("submodule.%s.active", module->name);
-   if (!git_config_get_bool(key, )) {
+   if (!repo_config_get_bool(repo, key, )) {
free(key);
return ret;
}
free(key);
 
/* submodule.active is set */
-   sl = git_config_get_value_multi("submodule.active");
+   sl = repo_config_get_value_multi(repo, "submodule.active");
if (sl) {
struct pathspec ps;
struct argv_array args = ARGV_ARRAY_INIT;
@@ -332,7 +328,7 @@ int is_submodule_initialized(const char *path)
 
/* fallback to checking if the URL is set */
key = xstrfmt("submodule.%s.url", module->name);
-   ret = !git_config_get_string(key, );
+   ret = 

[PATCH v3 20/20] ls-files: use repository object

2017-06-20 Thread Brandon Williams
Convert ls-files to use a repository struct and recurse submodules
inprocess.

Signed-off-by: Brandon Williams 
---
 builtin/ls-files.c | 195 ++---
 git.c  |   2 +-
 t/t3007-ls-files-recurse-submodules.sh |  39 +++
 3 files changed, 120 insertions(+), 116 deletions(-)

diff --git a/builtin/ls-files.c b/builtin/ls-files.c
index b12d0bb61..f5f36ddb4 100644
--- a/builtin/ls-files.c
+++ b/builtin/ls-files.c
@@ -5,7 +5,9 @@
  *
  * Copyright (C) Linus Torvalds, 2005
  */
+#define NO_THE_INDEX_COMPATIBILITY_MACROS
 #include "cache.h"
+#include "repository.h"
 #include "config.h"
 #include "quote.h"
 #include "dir.h"
@@ -32,10 +34,8 @@ static int line_terminator = '\n';
 static int debug_mode;
 static int show_eol;
 static int recurse_submodules;
-static struct argv_array submodule_options = ARGV_ARRAY_INIT;
 
 static const char *prefix;
-static const char *super_prefix;
 static int max_prefix_len;
 static int prefix_len;
 static struct pathspec pathspec;
@@ -73,25 +73,12 @@ static void write_eolinfo(const struct index_state *istate,
 
 static void write_name(const char *name)
 {
-   /*
-* Prepend the super_prefix to name to construct the full_name to be
-* written.
-*/
-   struct strbuf full_name = STRBUF_INIT;
-   if (super_prefix) {
-   strbuf_addstr(_name, super_prefix);
-   strbuf_addstr(_name, name);
-   name = full_name.buf;
-   }
-
/*
 * With "--full-name", prefix_len=0; this caller needs to pass
 * an empty string in that case (a NULL is good for "").
 */
write_name_quoted_relative(name, prefix_len ? prefix : NULL,
   stdout, line_terminator);
-
-   strbuf_release(_name);
 }
 
 static const char *get_tag(const struct cache_entry *ce, const char *tag)
@@ -210,83 +197,38 @@ static void show_killed_files(const struct index_state 
*istate,
}
 }
 
-/*
- * Compile an argv_array with all of the options supported by 
--recurse_submodules
- */
-static void compile_submodule_options(const char **argv,
- const struct dir_struct *dir,
- int show_tag)
-{
-   if (line_terminator == '\0')
-   argv_array_push(_options, "-z");
-   if (show_tag)
-   argv_array_push(_options, "-t");
-   if (show_valid_bit)
-   argv_array_push(_options, "-v");
-   if (show_cached)
-   argv_array_push(_options, "--cached");
-   if (show_eol)
-   argv_array_push(_options, "--eol");
-   if (debug_mode)
-   argv_array_push(_options, "--debug");
-
-   /* Add Pathspecs */
-   argv_array_push(_options, "--");
-   for (; *argv; argv++)
-   argv_array_push(_options, *argv);
-}
+static void show_files(struct repository *repo, struct dir_struct *dir);
 
-/**
- * Recursively call ls-files on a submodule
- */
-static void show_gitlink(const struct cache_entry *ce)
+static void show_submodule(struct repository *superproject,
+  struct dir_struct *dir, const char *path)
 {
-   struct child_process cp = CHILD_PROCESS_INIT;
-   int status;
-   char *dir;
-
-   prepare_submodule_repo_env(_array);
-   argv_array_push(_array, GIT_DIR_ENVIRONMENT);
-
-   if (prefix_len)
-   argv_array_pushf(_array, "%s=%s",
-GIT_TOPLEVEL_PREFIX_ENVIRONMENT,
-prefix);
-   argv_array_pushf(, "--super-prefix=%s%s/",
-super_prefix ? super_prefix : "",
-ce->name);
-   argv_array_push(, "ls-files");
-   argv_array_push(, "--recurse-submodules");
-
-   /* add supported options */
-   argv_array_pushv(, submodule_options.argv);
-
-   cp.git_cmd = 1;
-   dir = mkpathdup("%s/%s", get_git_work_tree(), ce->name);
-   cp.dir = dir;
-   status = run_command();
-   free(dir);
-   if (status)
-   exit(status);
+   struct repository submodule;
+
+   if (repo_submodule_init(, superproject, path))
+   return;
+
+   if (repo_read_index() < 0)
+   die("index file corrupt");
+
+   repo_read_gitmodules();
+
+   show_files(, dir);
+
+   repo_clear();
 }
 
-static void show_ce_entry(const struct index_state *istate,
- const char *tag, const struct cache_entry *ce)
+static void show_ce(struct repository *repo, struct dir_struct *dir,
+   const struct cache_entry *ce, const char *fullname,
+   const char *tag)
 {
-   struct strbuf name = STRBUF_INIT;
-   int len = max_prefix_len;
-   if (super_prefix)
-   strbuf_addstr(, super_prefix);
-   strbuf_addstr(, ce->name);
-
-   if (len > ce_namelen(ce))
+  

[PATCH v3 17/20] submodule: add repo_read_gitmodules

2017-06-20 Thread Brandon Williams
Teach the repo object to be able to populate the submodule_cache by
reading the repository's gitmodules file.

Signed-off-by: Brandon Williams 
---
 submodule.c | 15 +++
 submodule.h |  2 ++
 2 files changed, 17 insertions(+)

diff --git a/submodule.c b/submodule.c
index da0b80549..d0b894772 100644
--- a/submodule.c
+++ b/submodule.c
@@ -1,4 +1,5 @@
 #include "cache.h"
+#include "repository.h"
 #include "config.h"
 #include "submodule-config.h"
 #include "submodule.h"
@@ -255,6 +256,20 @@ void gitmodules_config(void)
}
 }
 
+static int gitmodules_cb(const char *var, const char *value, void *data)
+{
+   struct repository *repo = data;
+   return submodule_config_option(repo, var, value);
+}
+
+void repo_read_gitmodules(struct repository *repo)
+{
+   char *gitmodules_path = repo_worktree_path(repo, ".gitmodules");
+
+   git_config_from_file(gitmodules_cb, gitmodules_path, repo);
+   free(gitmodules_path);
+}
+
 void gitmodules_config_sha1(const unsigned char *commit_sha1)
 {
struct strbuf rev = STRBUF_INIT;
diff --git a/submodule.h b/submodule.h
index cbe5c1726..8a3771ec6 100644
--- a/submodule.h
+++ b/submodule.h
@@ -1,6 +1,7 @@
 #ifndef SUBMODULE_H
 #define SUBMODULE_H
 
+struct repository;
 struct diff_options;
 struct argv_array;
 struct oid_array;
@@ -46,6 +47,7 @@ int option_parse_recurse_submodules_worktree_updater(const 
struct option *opt,
 const char *arg, int 
unset);
 void load_submodule_cache(void);
 extern void gitmodules_config(void);
+extern void repo_read_gitmodules(struct repository *repo);
 extern void gitmodules_config_sha1(const unsigned char *commit_sha1);
 extern int is_submodule_initialized(const char *path);
 /*
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 04/20] repository: introduce the repository object

2017-06-20 Thread Brandon Williams
Introduce the repository object 'struct repository' which can be used to
hold all state pertaining to a git repository.

Some of the benefits of object-ifying a repository are:

  1. Make the code base more readable and easier to reason about.

  2. Allow for working on multiple repositories, specifically
 submodules, within the same process.  Currently the process for
 working on a submodule involves setting up an argv_array of options
 for a particular command and then launching a child process to
 execute the command in the context of the submodule.  This is
 clunky and can require lots of little hacks in order to ensure
 correctness.  Ideally it would be nice to simply pass a repository
 and an options struct to a command.

  3. Eliminating reliance on global state will make it easier to
 enable the use of threading to improve performance.

Signed-off-by: Brandon Williams 
---
 Makefile |   1 +
 repository.c | 172 +++
 repository.h |  46 
 3 files changed, 219 insertions(+)
 create mode 100644 repository.c
 create mode 100644 repository.h

diff --git a/Makefile b/Makefile
index f48480163..32e4efc71 100644
--- a/Makefile
+++ b/Makefile
@@ -839,6 +839,7 @@ LIB_OBJS += refs/ref-cache.o
 LIB_OBJS += ref-filter.o
 LIB_OBJS += remote.o
 LIB_OBJS += replace_object.o
+LIB_OBJS += repository.o
 LIB_OBJS += rerere.o
 LIB_OBJS += resolve-undo.o
 LIB_OBJS += revision.o
diff --git a/repository.c b/repository.c
new file mode 100644
index 0..7c8a7728c
--- /dev/null
+++ b/repository.c
@@ -0,0 +1,172 @@
+#include "cache.h"
+#include "repository.h"
+
+/* The main repository */
+static struct repository the_repo;
+struct repository *the_repository = _repo;
+
+static char *git_path_from_env(const char *envvar, const char *git_dir,
+  const char *path, int fromenv)
+{
+   if (fromenv) {
+   const char *value = getenv(envvar);
+   if (value)
+   return xstrdup(value);
+   }
+
+   return xstrfmt("%s/%s", git_dir, path);
+}
+
+static int find_common_dir(struct strbuf *sb, const char *gitdir, int fromenv)
+{
+   if (fromenv) {
+   const char *value = getenv(GIT_COMMON_DIR_ENVIRONMENT);
+   if (value) {
+   strbuf_addstr(sb, value);
+   return 1;
+   }
+   }
+
+   return get_common_dir_noenv(sb, gitdir);
+}
+
+/* called after setting gitdir */
+static void repo_setup_env(struct repository *repo)
+{
+   struct strbuf sb = STRBUF_INIT;
+
+   if (!repo->gitdir)
+   BUG("gitdir wasn't set before setting up the environment");
+
+   repo->different_commondir = find_common_dir(, repo->gitdir,
+   !repo->ignore_env);
+   repo->commondir = strbuf_detach(, NULL);
+   repo->objectdir = git_path_from_env(DB_ENVIRONMENT, repo->commondir,
+   "objects", !repo->ignore_env);
+   repo->graft_file = git_path_from_env(GRAFT_ENVIRONMENT, repo->commondir,
+"info/grafts", !repo->ignore_env);
+   repo->index_file = git_path_from_env(INDEX_ENVIRONMENT, repo->gitdir,
+"index", !repo->ignore_env);
+}
+
+void repo_set_gitdir(struct repository *repo, const char *path)
+{
+   const char *gitfile = read_gitfile(path);
+
+   /*
+* NEEDSWORK: Eventually we want to be able to free gitdir and the rest
+* of the environment before reinitializing it again, but we have some
+* crazy code paths where we try to set gitdir with the current gitdir
+* and we don't want to free gitdir before copying the passed in value.
+*/
+   repo->gitdir = xstrdup(gitfile ? gitfile : path);
+
+   repo_setup_env(repo);
+}
+
+/*
+ * Attempt to resolve and set the provided 'gitdir' for repository 'repo'.
+ * Return 0 upon success and a non-zero value upon failure.
+ */
+static int repo_init_gitdir(struct repository *repo, const char *gitdir)
+{
+   int ret = 0;
+   int error = 0;
+   char *abspath = NULL;
+   char *suspect = NULL;
+   const char *resolved_gitdir;
+
+   abspath = real_pathdup(gitdir, 0);
+   if (!abspath) {
+   ret = -1;
+   goto out;
+   }
+
+   /* First assume 'gitdir' references the gitdir directly */
+   resolved_gitdir = resolve_gitdir_gently(abspath, );
+   /* otherwise; try 'gitdir'.git */
+   if (!resolved_gitdir) {
+   suspect = xstrfmt("%s/.git", abspath);
+   resolved_gitdir = resolve_gitdir_gently(suspect, );
+   if (!resolved_gitdir) {
+   ret = -1;
+   goto out;
+   }
+   }
+
+   repo_set_gitdir(repo, resolved_gitdir);
+

[PATCH v3 11/20] path: construct correct path to a worktree's index

2017-06-20 Thread Brandon Williams
When working with worktrees the git directory is split into two part,
the per-worktree gitdir and a commondir which contains things which are
shared among all worktrees (like the object store).  With this notion of
having a split git directory, 557bd833b (git_path(): be aware of file
relocation in $GIT_DIR) and c7b3a3d2f ($GIT_COMMON_DIR: a new
environment variable) changed the way that 'git_path()' functioned so
that paths would be adjusted if they referred to files or directories
that are stored in the commondir or have been changed via an environment
variable.

One interesting problem with this is the index file as it is not shared
across worktrees yet when asking for a path to a specific worktree's
index it will be replaced with a path to the current worktree's index.
In order to prevent this, teach 'adjuct_git_path' to replace the
path to the index with the path recorded in a repository (which would be
the current, active worktree) only when not asked to construct a path
for a specific worktree.

Signed-off-by: Brandon Williams 
---
 path.c | 11 +--
 1 file changed, 9 insertions(+), 2 deletions(-)

diff --git a/path.c b/path.c
index 76a872297..c6832a30e 100644
--- a/path.c
+++ b/path.c
@@ -372,13 +372,20 @@ void report_linked_checkout_garbage(void)
 }
 
 static void adjust_git_path(const struct repository *repo,
+   const struct worktree *wt,
struct strbuf *buf, int git_dir_len)
 {
const char *base = buf->buf + git_dir_len;
if (is_dir_file(base, "info", "grafts"))
strbuf_splice(buf, 0, buf->len,
  repo->graft_file, strlen(repo->graft_file));
-   else if (!strcmp(base, "index"))
+   /*
+* Only try to replace the path '$gitdir/index' with the index file
+* recorded in the repository when not constructing a path for a
+* worktree.  This way we can retrieve the correct path to a particular
+* worktree's index file.
+*/
+   else if (!wt && !strcmp(base, "index"))
strbuf_splice(buf, 0, buf->len,
  repo->index_file, strlen(repo->index_file));
else if (dir_prefix(base, "objects"))
@@ -411,7 +418,7 @@ static void do_git_path(const struct repository *repo,
strbuf_addch(buf, '/');
gitdir_len = buf->len;
strbuf_vaddf(buf, fmt, args);
-   adjust_git_path(repo, buf, gitdir_len);
+   adjust_git_path(repo, wt, buf, gitdir_len);
strbuf_cleanup_path(buf);
 }
 
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 06/20] environment: store worktree in the_repository

2017-06-20 Thread Brandon Williams
Migrate 'work_tree' to be stored in 'the_repository'.

Signed-off-by: Brandon Williams 
---
 environment.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/environment.c b/environment.c
index aa79ef83e..3fd4b1084 100644
--- a/environment.c
+++ b/environment.c
@@ -96,7 +96,6 @@ int ignore_untracked_cache_config;
 
 /* This is set by setup_git_dir_gently() and/or git_default_config() */
 char *git_work_tree_cfg;
-static char *work_tree;
 
 static const char *namespace;
 
@@ -223,19 +222,19 @@ void set_git_work_tree(const char *new_work_tree)
 {
if (git_work_tree_initialized) {
new_work_tree = real_path(new_work_tree);
-   if (strcmp(new_work_tree, work_tree))
+   if (strcmp(new_work_tree, the_repository->worktree))
die("internal error: work tree has already been set\n"
"Current worktree: %s\nNew worktree: %s",
-   work_tree, new_work_tree);
+   the_repository->worktree, new_work_tree);
return;
}
git_work_tree_initialized = 1;
-   work_tree = real_pathdup(new_work_tree, 1);
+   repo_set_worktree(the_repository, new_work_tree);
 }
 
 const char *get_git_work_tree(void)
 {
-   return work_tree;
+   return the_repository->worktree;
 }
 
 char *get_object_directory(void)
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 05/20] environment: place key repository state in the_repository

2017-06-20 Thread Brandon Williams
Migrate 'git_dir', 'git_common_dir', 'git_object_dir', 'git_index_file',
'git_graft_file', and 'namespace' to be stored in 'the_repository'.

Signed-off-by: Brandon Williams 
---
 cache.h   |  1 -
 environment.c | 58 +-
 path.c| 11 ++-
 setup.c   | 17 +++--
 4 files changed, 34 insertions(+), 53 deletions(-)

diff --git a/cache.h b/cache.h
index 7c81749a9..cd64cbc81 100644
--- a/cache.h
+++ b/cache.h
@@ -771,7 +771,6 @@ extern int core_apply_sparse_checkout;
 extern int precomposed_unicode;
 extern int protect_hfs;
 extern int protect_ntfs;
-extern int git_db_env, git_index_env, git_graft_env, git_common_dir_env;
 
 /*
  * Include broken refs in all ref iterations, which will
diff --git a/environment.c b/environment.c
index e035f6372..aa79ef83e 100644
--- a/environment.c
+++ b/environment.c
@@ -8,6 +8,7 @@
  * are.
  */
 #include "cache.h"
+#include "repository.h"
 #include "config.h"
 #include "refs.h"
 #include "fmt-merge-msg.h"
@@ -101,10 +102,6 @@ static const char *namespace;
 
 static const char *super_prefix;
 
-static const char *git_dir, *git_common_dir;
-static char *git_object_dir, *git_index_file, *git_graft_file;
-int git_db_env, git_index_env, git_graft_env, git_common_dir_env;
-
 /*
  * Repository-local GIT_* environment variables; see cache.h for details.
  */
@@ -148,41 +145,11 @@ static char *expand_namespace(const char *raw_namespace)
return strbuf_detach(, NULL);
 }
 
-static char *git_path_from_env(const char *envvar, const char *git_dir,
-  const char *path, int *fromenv)
-{
-   const char *value = getenv(envvar);
-   if (!value)
-   return xstrfmt("%s/%s", git_dir, path);
-   if (fromenv)
-   *fromenv = 1;
-   return xstrdup(value);
-}
-
 void setup_git_env(void)
 {
-   struct strbuf sb = STRBUF_INIT;
-   const char *gitfile;
const char *shallow_file;
const char *replace_ref_base;
 
-   git_dir = getenv(GIT_DIR_ENVIRONMENT);
-   if (!git_dir) {
-   if (!startup_info->have_repository)
-   BUG("setup_git_env called without repository");
-   git_dir = DEFAULT_GIT_DIR_ENVIRONMENT;
-   }
-   gitfile = read_gitfile(git_dir);
-   git_dir = xstrdup(gitfile ? gitfile : git_dir);
-   if (get_common_dir(, git_dir))
-   git_common_dir_env = 1;
-   git_common_dir = strbuf_detach(, NULL);
-   git_object_dir = git_path_from_env(DB_ENVIRONMENT, git_common_dir,
-  "objects", _db_env);
-   git_index_file = git_path_from_env(INDEX_ENVIRONMENT, git_dir,
-  "index", _index_env);
-   git_graft_file = git_path_from_env(GRAFT_ENVIRONMENT, git_common_dir,
-  "info/grafts", _graft_env);
if (getenv(NO_REPLACE_OBJECTS_ENVIRONMENT))
check_replace_refs = 0;
replace_ref_base = getenv(GIT_REPLACE_REF_BASE_ENVIRONMENT);
@@ -203,21 +170,21 @@ int is_bare_repository(void)
 int have_git_dir(void)
 {
return startup_info->have_repository
-   || git_dir;
+   || the_repository->gitdir;
 }
 
 const char *get_git_dir(void)
 {
-   if (!git_dir)
+   if (!the_repository->gitdir)
BUG("git environment hasn't been setup");
-   return git_dir;
+   return the_repository->gitdir;
 }
 
 const char *get_git_common_dir(void)
 {
-   if (!git_dir)
+   if (!the_repository->commondir)
BUG("git environment hasn't been setup");
-   return git_common_dir;
+   return the_repository->commondir;
 }
 
 const char *get_git_namespace(void)
@@ -273,9 +240,9 @@ const char *get_git_work_tree(void)
 
 char *get_object_directory(void)
 {
-   if (!git_object_dir)
+   if (!the_repository->objectdir)
BUG("git environment hasn't been setup");
-   return git_object_dir;
+   return the_repository->objectdir;
 }
 
 int odb_mkstemp(struct strbuf *template, const char *pattern)
@@ -313,22 +280,23 @@ int odb_pack_keep(const char *name)
 
 char *get_index_file(void)
 {
-   if (!git_index_file)
+   if (!the_repository->index_file)
BUG("git environment hasn't been setup");
-   return git_index_file;
+   return the_repository->index_file;
 }
 
 char *get_graft_file(void)
 {
-   if (!git_graft_file)
+   if (!the_repository->graft_file)
BUG("git environment hasn't been setup");
-   return git_graft_file;
+   return the_repository->graft_file;
 }
 
 int set_git_dir(const char *path)
 {
if (setenv(GIT_DIR_ENVIRONMENT, path, 1))
return error("Could not set GIT_DIR to '%s'", path);
+   repo_set_gitdir(the_repository, path);
setup_git_env();
return 0;
 }
diff --git a/path.c b/path.c
index 

[PATCH v3 02/20] setup: add comment indicating a hack

2017-06-20 Thread Brandon Williams
'GIT_TOPLEVEL_PREFIX_ENVIRONMENT' was added in (b58a68c1c setup: allow
for prefix to be passed to git commands) to aid in fixing a bug where
'ls-files' and 'grep' were not able to properly recurse when called from
within a subdirectory.  Add a 'NEEDSWORK' comment indicating that this
envvar should be removed once 'ls-files' and 'grep' can recurse
in-process.

Signed-off-by: Brandon Williams 
---
 setup.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/setup.c b/setup.c
index 24a738b0d..b477faa44 100644
--- a/setup.c
+++ b/setup.c
@@ -1079,6 +1079,12 @@ const char *setup_git_directory_gently(int *nongit_ok)
die("BUG: unhandled setup_git_directory_1() result");
}
 
+   /*
+* NEEDSWORK: This was a hack in order to get ls-files and grep to have
+* properly formated output when recursing submodules.  Once ls-files
+* and grep have been changed to perform this recursing in-process this
+* needs to be removed.
+*/
env_prefix = getenv(GIT_TOPLEVEL_PREFIX_ENVIRONMENT);
if (env_prefix)
prefix = env_prefix;
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 03/20] environment: remove namespace_len variable

2017-06-20 Thread Brandon Williams
Use 'skip_prefix' instead of 'starts_with' so that we can drop the need
to keep around 'namespace_len'.

Signed-off-by: Brandon Williams 
---
 environment.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/environment.c b/environment.c
index a73b08f5d..e035f6372 100644
--- a/environment.c
+++ b/environment.c
@@ -98,7 +98,6 @@ char *git_work_tree_cfg;
 static char *work_tree;
 
 static const char *namespace;
-static size_t namespace_len;
 
 static const char *super_prefix;
 
@@ -190,7 +189,6 @@ void setup_git_env(void)
git_replace_ref_base = xstrdup(replace_ref_base ? replace_ref_base
  : "refs/replace/");
namespace = expand_namespace(getenv(GIT_NAMESPACE_ENVIRONMENT));
-   namespace_len = strlen(namespace);
shallow_file = getenv(GIT_SHALLOW_FILE_ENVIRONMENT);
if (shallow_file)
set_alternate_shallow_file(shallow_file, 0);
@@ -231,9 +229,10 @@ const char *get_git_namespace(void)
 
 const char *strip_namespace(const char *namespaced_ref)
 {
-   if (!starts_with(namespaced_ref, get_git_namespace()))
-   return NULL;
-   return namespaced_ref + namespace_len;
+   const char *out;
+   if (skip_prefix(namespaced_ref, get_git_namespace(), ))
+   return out;
+   return NULL;
 }
 
 const char *get_super_prefix(void)
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 01/20] setup: don't perform lazy initialization of repository state

2017-06-20 Thread Brandon Williams
Under some circumstances (bogus GIT_DIR value or the discovered gitdir
is '.git') 'setup_git_directory()' won't initialize key repository
state.  This leads to inconsistent state after running the setup code.
To account for this inconsistent state, lazy initialization is done once
a caller asks for the repository's gitdir or some other piece of
repository state.  This is confusing and can be error prone.

Instead let's tighten the expected outcome of 'setup_git_directory()'
and ensure that it initializes repository state in all cases that would
have been handled by lazy initialization.

This also lets us drop the requirement to have 'have_git_dir()' check if
the environment variable GIT_DIR was set as that will be handled by the
end of the setup code.

Signed-off-by: Brandon Williams 
---
 cache.h   |  2 ++
 environment.c | 17 -
 setup.c   | 14 ++
 3 files changed, 24 insertions(+), 9 deletions(-)

diff --git a/cache.h b/cache.h
index 96055c222..7c81749a9 100644
--- a/cache.h
+++ b/cache.h
@@ -462,6 +462,8 @@ static inline enum object_type object_type(unsigned int 
mode)
  */
 extern const char * const local_repo_env[];
 
+extern void setup_git_env(void);
+
 /*
  * Returns true iff we have a configured git repository (either via
  * setup_git_directory, or in the environment via $GIT_DIR).
diff --git a/environment.c b/environment.c
index d40b21fb7..a73b08f5d 100644
--- a/environment.c
+++ b/environment.c
@@ -160,7 +160,7 @@ static char *git_path_from_env(const char *envvar, const 
char *git_dir,
return xstrdup(value);
 }
 
-static void setup_git_env(void)
+void setup_git_env(void)
 {
struct strbuf sb = STRBUF_INIT;
const char *gitfile;
@@ -205,28 +205,27 @@ int is_bare_repository(void)
 int have_git_dir(void)
 {
return startup_info->have_repository
-   || git_dir
-   || getenv(GIT_DIR_ENVIRONMENT);
+   || git_dir;
 }
 
 const char *get_git_dir(void)
 {
if (!git_dir)
-   setup_git_env();
+   BUG("git environment hasn't been setup");
return git_dir;
 }
 
 const char *get_git_common_dir(void)
 {
if (!git_dir)
-   setup_git_env();
+   BUG("git environment hasn't been setup");
return git_common_dir;
 }
 
 const char *get_git_namespace(void)
 {
if (!namespace)
-   setup_git_env();
+   BUG("git environment hasn't been setup");
return namespace;
 }
 
@@ -276,7 +275,7 @@ const char *get_git_work_tree(void)
 char *get_object_directory(void)
 {
if (!git_object_dir)
-   setup_git_env();
+   BUG("git environment hasn't been setup");
return git_object_dir;
 }
 
@@ -316,14 +315,14 @@ int odb_pack_keep(const char *name)
 char *get_index_file(void)
 {
if (!git_index_file)
-   setup_git_env();
+   BUG("git environment hasn't been setup");
return git_index_file;
 }
 
 char *get_graft_file(void)
 {
if (!git_graft_file)
-   setup_git_env();
+   BUG("git environment hasn't been setup");
return git_graft_file;
 }
 
diff --git a/setup.c b/setup.c
index 358fbc2e5..24a738b0d 100644
--- a/setup.c
+++ b/setup.c
@@ -1091,6 +1091,20 @@ const char *setup_git_directory_gently(int *nongit_ok)
startup_info->have_repository = !nongit_ok || !*nongit_ok;
startup_info->prefix = prefix;
 
+   /*
+* Not all paths through the setup code will call 'set_git_dir()' (which
+* directly sets up the environment) so in order to guarantee that the
+* environment is in a consistent state after setup, explicitly setup
+* the environment if we have a repository.
+*
+* NEEDSWORK: currently we allow bogus GIT_DIR values to be set in some
+* code paths so we also need to explicitly setup the environment if
+* the user has set GIT_DIR.  It may be beneficial to disallow bogus
+* GIT_DIR values at some point in the future.
+*/
+   if (startup_info->have_repository || getenv(GIT_DIR_ENVIRONMENT))
+   setup_git_env();
+
strbuf_release();
strbuf_release();
 
-- 
2.13.1.611.g7e3b11ae1-goog



[PATCH v3 00/20] repository object

2017-06-20 Thread Brandon Williams
I decided to split up the original series into three parts in order to make
review a little bit easier.  As such this series is dependent on on
'bw/config-h' and 'bw/ls-files-sans-the-index' which should be moving into
'next' soon.  Due to this I was able to greatly shrink this series in terms of
number of patches so hopefully it is a little easier to review.

As before you can find this series at:
https://github.com/bmwill/git/tree/repository-object

Chagnes in v3:

 * Once v2 was merged to 'pu' a bug was discovered in how the 'do_git_path'
   function handled worktree's.  v3 addresses this by ensuring to only replace
   'gitdir/index' with 'repository->index_file' when not operating specifically
   on a 'struct worktree'.

 * Added a bunch of path related functions to facilitate construction of paths
   into a repository's gitdir or in a repository's worktree.

 * Added an additional initialization function to allow initializing a 'struct
   repository' as a submodule of another 'struct repository'.

Brandon Williams (20):
  setup: don't perform lazy initialization of repository state
  setup: add comment indicating a hack
  environment: remove namespace_len variable
  repository: introduce the repository object
  environment: place key repository state in the_repository
  environment: store worktree in the_repository
  path: create path.h
  path: always pass in commondir to update_common_dir
  path: convert strbuf_git_common_path to take a 'struct repository'
  path: convert do_git_path to take a 'struct repository'
  path: construct correct path to a worktree's index
  path: add repo_git_path and strbuf_repo_git_path
  path: add repo_worktree_path and strbuf_repo_worktree_path
  config: read config from a repository object
  repository: add index_state to struct repo
  submodule-config: store the_submodule_cache in the_repository
  submodule: add repo_read_gitmodules
  submodule: convert is_submodule_initialized to work on a repository
  repository: enable initialization of submodules
  ls-files: use repository object

 Makefile   |   1 +
 builtin/grep.c |   3 +-
 builtin/ls-files.c | 195 +++---
 builtin/submodule--helper.c|   9 +-
 cache.h|  62 +
 config.c   | 216 +++--
 config.h   |  24 
 environment.c  |  91 
 git.c  |   2 +-
 path.c | 136 ++
 path.h |  82 +++
 repository.c   | 245 +
 repository.h   |  76 ++
 setup.c|  33 +
 submodule-config.c |  70 +++---
 submodule-config.h |  10 ++
 submodule.c|  35 +++--
 submodule.h|   4 +-
 t/t3007-ls-files-recurse-submodules.sh |  39 ++
 worktree.c |   3 +-
 20 files changed, 962 insertions(+), 374 deletions(-)
 create mode 100644 path.h
 create mode 100644 repository.c
 create mode 100644 repository.h

-- 
2.13.1.611.g7e3b11ae1-goog



Re: [RFC/PATCH] submodules: overhaul documentation

2017-06-20 Thread Stefan Beller
On Tue, Jun 20, 2017 at 11:18 AM, Jonathan Tan  wrote:
>> +DESCRIPTION
>> +---
>> +
>> +A submodule is another Git repository tracked in a subdirectory of your
>> +repository. The tracked repository has its own history, which does not
>> +interfere with the history of the current repository.
>> +
>> +Submodules are composed from a so-called `gitlink` tree entry
>> +in the main repository that refers to a particular commit object
>> +within the inner repository.
>> +
>> +Additionally to the gitlink entry the `.gitmodules` file (see
>> +linkgit:gitmodules[5]) at the root of the source tree contains
>> +information needed for submodules. The only required information
>> +is the path setting, which estabishes a logical name for the submodule.
>
> I know that this was copied over, but this is confusing to me, so it
> might be worthwhile to change it. In particular, `gitlink` is, as far as
> I know, the type of the child object,

correct

> not any sort of tree entry. I

Well a tree references (a) other trees (b) blobs or (c) gitlinks, so calling
a gitlink a tree-entry is correct, but maybe confusing.

> would rewrite this as:
>
> A submodule consists of a tracking subdirectory and an entry in the
> `.gitmodules` file (see linkgit:gitmodules[5]).

A submodule consists of a tracking subdirectory [in the working directory]
and an entry in the `.gitmodules` file (see linkgit:gitmodules[5]).

> The tracking subdirectory appears in the main repository as a
> `gitlink` object (instead of a `tree` object). The parent of the
> tracking subdirectory links to this `gitlink` object through its
> hash, just like linking to a `tree` or `blob`. This `gitlink` object
> contains the hash of a particular commit object of the submodule.
>

I think this is confusing, too. :)
That is because a subdirectory exists on the FS, whereas the gitlink
appears in gits representation of the world (in the tree). Maybe:

The tracking subdirectory appears in the main repository at
the point where it is tracked via the gitlink in the tree.
It is empty when the submodule is not populated, otherwise
it contains the content of the submodule repository.

The gitlink object contains the hash of a particular commit
object of the submodule.

The .gitmodules file establishes a relationship between the
path, which is where the gitlink is in the tree, and the logical
name, which is used for the location of the submodules git
directory. The .gitmodules file has the same syntax as the
$Git_DIR/config file. The relationship mapping of path to name
is done via setting submodule..path = .

The submodules git directory is found in in the main repositories
'$GIT_DIR/modules/' or inside the tracking subdirectory,
but this is deprecated.

> The entry in the `.gitmodules` file establishes the name of the
> submodule (to be used as a reference by other entries and commands)
> and references the tracking subdirectory as follows:
>
> submodule.my_submodule_name.path = path/to/my_submodule
>
> There might also be a need to mention that when the submodule is
> populated (or whatever the right term is), the tracking subdirectory in
> the working tree contains both the submodule's working tree and the
> submodule's Git directory.

See the alternative proposal above, the git directory is best kept outside
the tracking subdirectory, but rather contained inside the superprojects
git dir itself.

>
>> +The usual git configuration (see linkgit:git-config[1]) can be used to
>> +override settings given by the `.gitmodules` file.
>
> Not sure if this is relevant here.

This is relevant for overwriting e.g. the submodule.NAME.url setting.

>
>> +Submodules can be used for two different use cases:
>
> A creative person might come up with more, so this might be better as:
>
> Submodules can be used for at least these use cases:

ok.

>> +1. Using another project that stands on its own.
>> +  When you want to use a third party library, submodules allow you to
>> +  have a clean history for your own project as well as for the library.
>> +  This also allows for updating the third party library as needed.
>
> Probably better as:
>
> 1. Using another project while maintaining independent history.
> Submodules allow you to contain the working tree of another project
> within your own working tree while keeping the history of both
> projects separate. Also, since submodules are fixed to a hash, the

"fixed to an arbitrary version" instead?

> other project can be independently developed without affecting the
> parent project, allowing the parent project to fix itself to new
> versions only whenever desired.

>> +2. Artificially split a (logically single) project into multiple
>> +   repositories and tying them back together. This can be used to
>> +   overcome deficiences in the data model of Git, such as:
>

Re: [PATCH] die routine: change recursion limit from 1 to 1024

2017-06-20 Thread Jeff King
On Tue, Jun 20, 2017 at 08:49:59PM +0200, Ævar Arnfjörð Bjarmason wrote:

> As you can see the third most common case is that we needlessly print
> out the warning, i.e. we have only one error anyway, but we can't
> guarantee that, so it probably makes sense to emit it.

Right, my suggestion actually doesn't help much with the multi-threaded
case. The warning is pointless but it will still be racily shown,
because we can't tell the difference between races and recursion. So
it's better, but fundamentally still pretty ugly. And we'd want to
actually fix the real problem.

> To reply to your 20170620161514.ygbflanx4pldc...@sigill.intra.peff.net
> downthread here (where you talk about setting up a custom die handler
> for grep) yeah that would make sense, but as long as we're supplying
> this default behavior (and not outlawing using it with pthreads) it
> makes sense to get out of our own way with this recursion detection.

I actually was more or less proposing that we consider stock die (no
custom handler) with pthreads to be outlawed. It sometimes kind of does
the right thing, but in a racy way. Probably threads calling die()
should consider their error-handling a bit more carefully.

That said, in my other mail I arrived at "just put a mutex into die()"
which I think would remove the raciness, and give the default behavior
of "the first thread to call die will take down the whole process and
get its single error printed". That actually seems pretty reasonable.
And it makes the "recursion or race?" question go away.

I'm not quite sure how to implement it, though. I think you'd want to
take the lock in die() itself, because it would make the is-recursing
check all the way to the exit an atomic unit. But who is responsible for
unlocking then?  The point of die_routine() is that it never returns.
That's fine if it truly calls exit(), but the threaded die handler in
run-command.c does a pthread_exit(). So _somebody_ has to unlock that so
other threads don't block if they try to die().

What a mess.

> I think my patch (possibly with the fixup above, depending on what we
> think about dupe warnings) is just fine to fix this. This is already a
> super-rare edge case in grep, and to the extent that it would be a
> problem for anyone it's because our paranoid recursion detector totally
> hides the error, I don't think it's worth worrying about us printing a
> few dupe error messages under threading for something that almost never
> happens.
> 
> At least, that's starting to go beyond my bothering to hack on it :)

Yes, I think your patch with my suggestion above is a strict improvement
over what's there. You'd see the warning any time you might have seen
"die is recursing", but at least you _also_ get to see the real error.

-Peff


Re: [PATCH] die routine: change recursion limit from 1 to 1024

2017-06-20 Thread Ævar Arnfjörð Bjarmason

On Tue, Jun 20 2017, Jeff King jotted:

> On Mon, Jun 19, 2017 at 10:00:36PM +, Ævar Arnfjörð Bjarmason wrote:
>
>> Change the recursion limit for the default die routine from a *very*
>> low 1 to 1024. This ensures that infinite recursions are broken, but
>> doesn't lose error messages.
>>
>> The intent of the existing code, as explained in commit
>> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
>> immediately", 2012-11-14), is to break infinite recursion in cases
>> where the die routine itself dies.
>
> I agree that was the original intent, but I think it also does something
> else. Anytime die() recurses, even a single level, we're going to cover
> up the original failure with the one that happened inside die(), which
> is almost certainly the less interesting of the two.
>
> E.g., if I
>
>   die_errno("unable to open %s", filename);
>
> and then the die handler calls malloc() and fails, you'd much rather see
> that first message than "out of memory".
>
> To be fair, "die handler is recursing" is _also_ not helpful, but at
> least it's clear that this is a bug (and IMHO it should be marked with
> BUG()). Saying "out of memory" tells you about the second error, but it
> doesn't tell you that we've masked the first error. So it may lead to
> more confusion in the long run.
>
> I wonder if we can get the best of both, though. Can we make the logic
> more like:
>
>   if (!dying) {
>   /* ok, normal */
>   return 0;
>   } else if (dying < 1024) {
>   /* only show the warning once */
>   if (dying == 1)
>   warning("I heard you liked errors, so I put a die() in your 
> die()");
>   return 0; /* don't bail yet */
>   } else {
>   BUG("recursion detected in die handler");
>   }

If I understand you correctly this on top:

diff --git a/usage.c b/usage.c
index 1c198d4882..f6d5af2bb4 100644
--- a/usage.c
+++ b/usage.c
@@ -46,7 +46,19 @@ static int die_is_recursing_builtin(void)
static int dying;
static int recursion_limit = 1024;

-   return dying++ > recursion_limit;
+   dying++;
+
+   if (!dying) {
+   /* ok, normal */
+   return 0;
+   } else if (dying < recursion_limit) {
+   /* only show the warning once */
+   if (dying == 1)
+   warning("die() called many times. Recursion error or 
racy threaded death!");
+   return 0; /* don't bail yet */
+   } else {
+   return 1;
+   }
 }

 /* If we are in a dlopen()ed .so write to a global variable would segfault

Will yield this over 1000 runs, i.e. mostly this works and we emit the
warning (although sometimes we miss it, and we might even emit it twice
or more due to an extra race condition we have now):

$ (for i in {1..1000}; do ~/g/git/git-grep -P --threads=10 
'(*LIMIT_RECURSION=1)(*LIMIT_MATCH=1)-?-?-?---$' 2>&1|perl -pe 's/^fatal: 
r.*/R/; s/^fatal: p.*/P/; s/^warning.*/W/' | tr '\n' ' '; echo; done)|sort|uniq 
-c|sort -nr|head -n 20
245 W P P
222 W P P P
212 W P
 47 P W P
 36 W P P P P
 35 P P P
 35 P P
 30 P W P P
 16 P P W
 14 W W P P
 12 P P W P
 11 W W P P P
 11 P
  8 W P W P P
  8 P P P P
  7 W P P P P P
  7 P W P P P
  6 W P W P
  4 P W P W
  3 W W P P W

I think it makes sense to apply that on top, even though we could print
more than 1 warning here it makes sense to alert the user that we're in
the middle of some racy death, it explains the multiple lines of output
they'll probably (but not always!) get.

As you can see the third most common case is that we needlessly print
out the warning, i.e. we have only one error anyway, but we can't
guarantee that, so it probably makes sense to emit it.

To reply to your 20170620161514.ygbflanx4pldc...@sigill.intra.peff.net
downthread here (where you talk about setting up a custom die handler
for grep) yeah that would make sense, but as long as we're supplying
this default behavior (and not outlawing using it with pthreads) it
makes sense to get out of our own way with this recursion detection.

I think my patch (possibly with the fixup above, depending on what we
think about dupe warnings) is just fine to fix this. This is already a
super-rare edge case in grep, and to the extent that it would be a
problem for anyone it's because our paranoid recursion detector totally
hides the error, I don't think it's worth worrying about us printing a
few dupe error messages under threading for something that almost never
happens.

At least, that's starting to go beyond my bothering to hack on it :)

>> Now, git-grep could make use of the pluggable error facility added in
>> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
>> 2013-04-16).
>
> Yeah, I think this is a bug in git-grep and should be fixed, independent
> of this 

Re: Transform log message during migration svn -> git (using git-svn)

2017-06-20 Thread Andreas Heiduk

Am 20.06.2017 um 14:32 schrieb paul.mat...@s4m.com:
> Well this is a possibility, of course. Our problem is that our SVN
> repository contains about 220.000 revisions currently. As a colleague of
> mine said that the command you suggest might take about 4 seconds per
> revision, it would take about 10 days to do this for our whole repository.
> So of course it could save a lot of time generally if such operation could
> be done immediately during git-svn.

My data point is this: A "git filter branch" run with ~2000 revisions
took several hours on a Windows 7 box. That number seems to be roughly
the same as your number. A comparable run on a Linux box took only about
10 minutes.

So: If your benchmark was done on Windows you might do that also on Linux.


Re: [GSoC][PATCH 4/6] submodule: port submodule subcommand status

2017-06-20 Thread Brandon Williams
On 06/20, Prathamesh Chavan wrote:
> The mechanism used for porting submodule subcommand 'status'
> is similar to that used for subcommand 'foreach'.

nit: since 'foreach' is stalled atm it may be confusing to reference
that change when it hasn't been merged in yet.

> The function cmd_status from git-submodule is ported to three
> functions in the builtin submodule--helper namely: module_status,
> for_each_submodule_list and status_submodule.
> 
> print_status is also introduced for handling the output of
> the subcommand and also to reduce the code size.
> 
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 

-- 
Brandon Williams


Re: [PATCHv2] Tweak help auto-correct phrasing.

2017-06-20 Thread Kaartic Sivaraam
On Tue, 2016-12-20 at 09:02 -0500, Marc Branchaud wrote:
> When auto-correct is enabled, an invalid git command prints a warning
> and
> a continuation message, which differs depending on whether or not
> help.autoCorrect is positive or negative.
> 
> With help.autoCorrect = 15:
> 
>    WARNING: You called a Git command named 'lgo', which does not
> exist.
>    Continuing under the assumption that you meant 'log'
>    in 1.5 seconds automatically...
> 
> With help.autoCorrect < 0:
> 
>    WARNING: You called a Git command named 'lgo', which does not
> exist.
>    Continuing under the assumption that you meant 'log'
> 
> The continuation message's phrasing is awkward.  This commit cleans
> it up.
> As a bonus, we now use full-sentence strings which make translation
> easier.
> 
> With help.autoCorrect = 15:
> 
>    WARNING: You called a Git command named 'lgo', which does not
> exist.
>    Continuing in 1.5 seconds, assuming that you meant 'log'.
> 
> With help.autoCorrect < 0:
> 
>    WARNING: You called a Git command named 'lgo', which does not
> exist.
>    Continuing under the assumption that you meant 'log'.
> 
> Signed-off-by: Marc Branchaud 
> ---
> 
> Writing the commit message was more work than the commit!  :)
> 
>   M.
> 
>  help.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/help.c b/help.c
> index 53e2a67e00..fc56aa2d76 100644
> --- a/help.c
> +++ b/help.c
> @@ -381,12 +381,18 @@ const char *help_unknown_cmd(const char *cmd)
>   clean_cmdnames(_cmds);
>   fprintf_ln(stderr,
>      _("WARNING: You called a Git command
> named '%s', "
> -  "which does not exist.\n"
> -  "Continuing under the assumption that
> you meant '%s'"),
> - cmd, assumed);
> - if (autocorrect > 0) {
> - fprintf_ln(stderr, _("in %0.1f seconds
> automatically..."),
> - (float)autocorrect/10.0);
> +  "which does not exist."),
> +    cmd);
> + if (autocorrect < 0)
> + fprintf_ln(stderr,
> +    _("Continuing under the
> assumption that "
> +  "you meant '%s'."),
> +    assumed);
> + else {
> + fprintf_ln(stderr,
> +    _("Continuing in %0.1f seconds, "
> +  "assuming that you meant
> '%s'."),
> +    (float)autocorrect/10.0,
> assumed);
>   sleep_millisec(autocorrect * 100);
>   }
>   return assumed;
Excuse me bringing this up after a long time. Was this patch applied?
What's it's status?

-- 
Regards,
Kaartic Sivaraam 



Re: [PATCH/RFC] Cleanup Documentation

2017-06-20 Thread Kaartic Sivaraam
On Tue, 2017-06-20 at 09:57 -0700, Stefan Beller wrote:
> canonical: "according to recognized rules or scientific laws."
> sounds about right. :)
> 
I actually used as I found it to have the meaning of "conforming to
orthodox or recognized rules" :)


> While this was just reflowed and not newly introduced, I am still
> left wondering what a changeset is in Git terms. Our
> Documentation/glossary says:
> 
>   [[def_changeset]]changeset::
>   BitKeeper/cvsps speak for "<>". Since Git does
> not store changes, but states, it really does not make sense to use
> the term "changesets" with Git.
> 
> Maybe we should say instead:
> 
> If exists and is already a valid Git repository,
> then this is staged for commit without cloning.
> 
Does seem to be a good change to make. Done.

On Tue, 2017-06-20 at 10:05 -0700, Stefan Beller wrote:
> > On Tue, Jun 20, 2017 at 9:57 AM, Stefan Beller 
> > wrote:
> > > 
> > > With or without this nit addressed, this patch looks good to me,
> > > 
> > 
> > Well actually not quite. The subject (and commit message) is very
> > vague,
> > maybe:
> 
> > Documentation/git-submodule: cleanup "add" section
> > 
> > The "add" section for 'git-submodule' is redundant in its
> > description and
> > the short synopsis line. Remove the redundant mentioning of the
> > 'repository' argument being mandatory.
> > 
> > The text is hard to read because of back-references, so remove
> > those.
> > 
> > Replace the word "humanish" by "canonical" as that conveys
> better
> > what
> > we do to guess the path.
> > 
> > While at it, quote all occurrences of '.gitmodules' as that is
> an
> > important
> > file in the submodule context, also link to it on its first
> > mention.
> > (This paragraph is not exactly what happens in the commit, but
> I
> > wrote it
> > as a way how I would write commit messages. It shows the reader
> > how
> > you addressed the given problem, with the quantifiers "all"
> "the
> > first" showing
> > what you think is important, and that you deliberately omitted
> > others)
> 
Made this change too.


-- 
Regards,
Kaartic Sivaraam 


Re: [GSoC][PATCH 2/6] submodule--helper: introduce get_submodule_displaypath and for_each_submodule_list

2017-06-20 Thread Brandon Williams
On 06/20, Prathamesh Chavan wrote:
> Functions get_submodule_displaypath and for_each_submodule_list
> for using them in the later patches, related to porting submodule
> subcommands from shell to C.
> These new functions are also used in ported submodule subcommand
> init

I didn't see anything wrong with these patches, but one small nit is
that this one patch is changing two different things, adding
'get_submodule_displaypath' and 'for_each_submodule_list'.  Logically
you could break this patch into two different parts, first introducing
one and then the other.

I'm not saying you need to re-do this patch though (I don't have a super
strong opinion though others might) just wanted to point it out for the
future.

> 
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---
>  builtin/submodule--helper.c | 69 
> -
>  1 file changed, 50 insertions(+), 19 deletions(-)
> 
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 8cc648d85..f7adca95b 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -13,6 +13,9 @@
>  #include "refs.h"
>  #include "connect.h"
>  
> +typedef void (*submodule_list_func_t)(const struct cache_entry *list_item,
> +   void *cb_data);
> +
>  static char *get_default_remote(void)
>  {
>   char *dest = NULL, *ret;
> @@ -219,6 +222,27 @@ static int resolve_relative_url_test(int argc, const 
> char **argv, const char *pr
>   return 0;
>  }
>  
> +static char *get_submodule_displaypath(const char *path, const char *prefix)
> +{
> + const char *super_prefix = get_super_prefix();
> +
> + if (prefix && super_prefix) {
> + BUG("cannot have prefix '%s' and superprefix '%s'",
> + prefix, super_prefix);
> + } else if (prefix) {
> + struct strbuf sb = STRBUF_INIT;
> + char *displaypath = xstrdup(relative_path(path, prefix, ));
> + strbuf_release();
> + return displaypath;
> + } else if (super_prefix) {
> + int len = strlen(super_prefix);
> + const char *format = is_dir_sep(super_prefix[len-1]) ? "%s%s" : 
> "%s/%s";
> + return xstrfmt(format, super_prefix, path);
> + } else {
> + return xstrdup(path);
> + }
> +}
> +
>  struct module_list {
>   const struct cache_entry **entries;
>   int alloc, nr;
> @@ -330,26 +354,30 @@ static int module_list(int argc, const char **argv, 
> const char *prefix)
>   return 0;
>  }
>  
> -static void init_submodule(const char *path, const char *prefix, int quiet)
> +static void for_each_submodule_list(const struct module_list list,
> + submodule_list_func_t fn, void *cb_data)
>  {
> + int i;
> + for (i = 0; i < list.nr; i++)
> + fn(list.entries[i], cb_data);
> +}
> +
> +struct init_cb {
> + const char *prefix;
> + unsigned int quiet: 1;
> +};
> +#define INIT_CB_INIT { NULL, 0 }
> +
> +static void init_submodule(const struct cache_entry *list_item, void 
> *cb_data)
> +{
> + struct init_cb *info = cb_data;
>   const struct submodule *sub;
>   struct strbuf sb = STRBUF_INIT;
>   char *upd = NULL, *url = NULL, *displaypath;
>  
> - /* Only loads from .gitmodules, no overlay with .git/config */
> - gitmodules_config();
> -
> - if (prefix && get_super_prefix())
> - die("BUG: cannot have prefix and superprefix");
> - else if (prefix)
> - displaypath = xstrdup(relative_path(path, prefix, ));
> - else if (get_super_prefix()) {
> - strbuf_addf(, "%s%s", get_super_prefix(), path);
> - displaypath = strbuf_detach(, NULL);
> - } else
> - displaypath = xstrdup(path);
> + displaypath = get_submodule_displaypath(list_item->name, info->prefix);
>  
> - sub = submodule_from_path(null_sha1, path);
> + sub = submodule_from_path(null_sha1, list_item->name);
>  
>   if (!sub)
>   die(_("No url found for submodule path '%s' in .gitmodules"),
> @@ -361,7 +389,7 @@ static void init_submodule(const char *path, const char 
> *prefix, int quiet)
>*
>* Set active flag for the submodule being initialized
>*/
> - if (!is_submodule_initialized(path)) {
> + if (!is_submodule_initialized(list_item->name)) {
>   strbuf_reset();
>   strbuf_addf(, "submodule.%s.active", sub->name);
>   git_config_set_gently(sb.buf, "true");
> @@ -404,7 +432,7 @@ static void init_submodule(const char *path, const char 
> *prefix, int quiet)
>   if (git_config_set_gently(sb.buf, url))
>   die(_("Failed to register url for submodule path '%s'"),
>   displaypath);
> - if (!quiet)
> + 

Re: [PATCHv2] Tweak help auto-correct phrasing.

2017-06-20 Thread Marc Branchaud

On 2017-06-20 02:04 PM, Kaartic Sivaraam wrote:

On Tue, 2016-12-20 at 09:02 -0500, Marc Branchaud wrote:

When auto-correct is enabled, an invalid git command prints a warning
and
a continuation message, which differs depending on whether or not
help.autoCorrect is positive or negative.

With help.autoCorrect = 15:

WARNING: You called a Git command named 'lgo', which does not
exist.
Continuing under the assumption that you meant 'log'
in 1.5 seconds automatically...

With help.autoCorrect < 0:

WARNING: You called a Git command named 'lgo', which does not
exist.
Continuing under the assumption that you meant 'log'

The continuation message's phrasing is awkward.  This commit cleans
it up.
As a bonus, we now use full-sentence strings which make translation
easier.

With help.autoCorrect = 15:

WARNING: You called a Git command named 'lgo', which does not
exist.
Continuing in 1.5 seconds, assuming that you meant 'log'.

With help.autoCorrect < 0:

WARNING: You called a Git command named 'lgo', which does not
exist.
Continuing under the assumption that you meant 'log'.

Signed-off-by: Marc Branchaud 
---


Excuse me for bringing this up after a long time. What's the status of
this patch? Was it applied?


Looks like it got lost in the shuffle.

The topic thread starts at:
http://public-inbox.org/git/1482063500.10858.1.ca...@gmail.com/

There's no reply to my v2 patch, and I neglected to follow up on it -- 
sorry!


Shall I resend the patch?

M.



Re: [RFC/PATCH] submodules: overhaul documentation

2017-06-20 Thread Jonathan Tan
On Wed,  7 Jun 2017 11:53:54 -0700
Stefan Beller  wrote:

[snip]

> +DESCRIPTION
> +---
> +
> +A submodule is another Git repository tracked in a subdirectory of your
> +repository. The tracked repository has its own history, which does not
> +interfere with the history of the current repository.
> +
> +Submodules are composed from a so-called `gitlink` tree entry
> +in the main repository that refers to a particular commit object
> +within the inner repository.
> +
> +Additionally to the gitlink entry the `.gitmodules` file (see
> +linkgit:gitmodules[5]) at the root of the source tree contains
> +information needed for submodules. The only required information
> +is the path setting, which estabishes a logical name for the submodule.

I know that this was copied over, but this is confusing to me, so it
might be worthwhile to change it. In particular, `gitlink` is, as far as
I know, the type of the child object, not any sort of tree entry. I
would rewrite this as:

A submodule consists of a tracking subdirectory and an entry in the
`.gitmodules` file (see linkgit:gitmodules[5]).

The tracking subdirectory appears in the main repository as a
`gitlink` object (instead of a `tree` object). The parent of the
tracking subdirectory links to this `gitlink` object through its
hash, just like linking to a `tree` or `blob`. This `gitlink` object
contains the hash of a particular commit object of the submodule.

The entry in the `.gitmodules` file establishes the name of the
submodule (to be used as a reference by other entries and commands)
and references the tracking subdirectory as follows:

submodule.my_submodule_name.path = path/to/my_submodule

There might also be a need to mention that when the submodule is
populated (or whatever the right term is), the tracking subdirectory in
the working tree contains both the submodule's working tree and the
submodule's Git directory.

> +The usual git configuration (see linkgit:git-config[1]) can be used to
> +override settings given by the `.gitmodules` file.

Not sure if this is relevant here.

> +Submodules can be used for two different use cases:

A creative person might come up with more, so this might be better as:

Submodules can be used for at least these use cases:

> +1. Using another project that stands on its own.
> +  When you want to use a third party library, submodules allow you to
> +  have a clean history for your own project as well as for the library.
> +  This also allows for updating the third party library as needed.

Probably better as:

1. Using another project while maintaining independent history.
Submodules allow you to contain the working tree of another project
within your own working tree while keeping the history of both
projects separate. Also, since submodules are fixed to a hash, the
other project can be independently developed without affecting the
parent project, allowing the parent project to fix itself to new
versions only whenever desired.

> +2. Artificially split a (logically single) project into multiple
> +   repositories and tying them back together. This can be used to
> +   overcome deficiences in the data model of Git, such as:

This should match the gerund used in point 1:

2. Splitting a (logically single) project into multiple
repositories. This can be used to overcome deficiencies in the data
model of Git, such as:

> +* To have finer grained access control.
> +  The design principles of Git do not allow for partial repositories to be
> +  checked out or transferred. A repository is the smallest unit that a user
> +  can be given access to. Submodules are separate repositories, such that
> +  you can restrict access to parts of your project via the use of submodules.

Not sure about this point - if the project is logically single, you
would probably need to see the entire project. If this is about
different teams independently working on different subcomponents, this
seems more like point 1 (inclusion of other projects).

> +* In its current form Git scales up poorly for very large repositories that
> +  change a lot, as the history grows very large. For that you may want to 
> look
> +  at shallow clone, sparse checkout, or git-LFS.
> +  However you can also use submodules to e.g. hold large binary assets
> +  and these repositories are then shallowly cloned such that you do not
> +  have a large history locally.
> +
> +The data model
> +--
> +
> +A submodule can be considered its own autonomous repository, that has a
> +worktree and a git directory at a different place than the superproject.

Isn't the worktree inside the superproject's worktree? I would write
this as:

A submodule is its own repository, having its own working tree and
Git directory.

> +The superproject only records the commit object name in its tree, such that
> +any other information, e.g. where to obtain a copy 

Re: [PATCHv2] Tweak help auto-correct phrasing.

2017-06-20 Thread Kaartic Sivaraam
On Tue, 2016-12-20 at 09:02 -0500, Marc Branchaud wrote:
> When auto-correct is enabled, an invalid git command prints a warning
> and
> a continuation message, which differs depending on whether or not
> help.autoCorrect is positive or negative.
> 
> With help.autoCorrect = 15:
> 
>    WARNING: You called a Git command named 'lgo', which does not
> exist.
>    Continuing under the assumption that you meant 'log'
>    in 1.5 seconds automatically...
> 
> With help.autoCorrect < 0:
> 
>    WARNING: You called a Git command named 'lgo', which does not
> exist.
>    Continuing under the assumption that you meant 'log'
> 
> The continuation message's phrasing is awkward.  This commit cleans
> it up.
> As a bonus, we now use full-sentence strings which make translation
> easier.
> 
> With help.autoCorrect = 15:
> 
>    WARNING: You called a Git command named 'lgo', which does not
> exist.
>    Continuing in 1.5 seconds, assuming that you meant 'log'.
> 
> With help.autoCorrect < 0:
> 
>    WARNING: You called a Git command named 'lgo', which does not
> exist.
>    Continuing under the assumption that you meant 'log'.
> 
> Signed-off-by: Marc Branchaud 
> ---
> 
Excuse me for bringing this up after a long time. What's the status of
this patch? Was it applied?

-- 
Regards,
Kaartic Sivaraam 


Re: [GSoC][PATCH 5/6] submodule: port submodule subcommand sync from shell to C

2017-06-20 Thread Stefan Beller
On Mon, Jun 19, 2017 at 2:50 PM, Prathamesh Chavan  wrote:
> The mechanism used for porting the submodule subcommand 'sync' is
> similar to that of 'foreach', where we split the function cmd_sync
> from shell into three functions in C, module_sync,
> for_each_submodule_list and sync_submodule.
>
> print_default_remote is introduced as a submodule--helper
> subcommand for getting the default remote as stdout.
>
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Prathamesh Chavan 
> ---

Up to this patch, all other patches look good to me,
here I stumbled upon a small nit.


>  builtin/submodule--helper.c | 180 
> 
>  git-submodule.sh|  56 +-
>  2 files changed, 181 insertions(+), 55 deletions(-)
>
> diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
> index 78b21ab22..e10cac462 100644
> --- a/builtin/submodule--helper.c
> +++ b/builtin/submodule--helper.c
> @@ -43,6 +43,20 @@ static char *get_default_remote(void)
> return ret;
>  }
>
> +static int print_default_remote(int argc, const char **argv, const char 
> *prefix)
> +{
> +   const char *remote;
> +
> +   if (argc != 1)
> +   die(_("submodule--helper print-default-remote takes no 
> arguments"));
> +
> +   remote = get_default_remote();
> +   if (remote)
> +   puts(remote);
> +
> +   return 0;
> +}
> +
>  static int starts_with_dot_slash(const char *str)
>  {
> return str[0] == '.' && is_dir_sep(str[1]);
> @@ -311,6 +325,25 @@ static int print_name_rev(int argc, const char **argv, 
> const char *prefix)
> return 0;
>  }
>
> +static char *get_up_path(const char *path)
> +{
> +   int i = count_slashes(path);
> +   struct strbuf sb = STRBUF_INIT;
> +
> +   while (i--)
> +   strbuf_addstr(, "../");
> +
> +   /*
> +*Check if 'path' ends with slash or not
> +*for having the same output for dir/sub_dir
> +*and dir/sub_dir/
> +*/
> +   if (!is_dir_sep(path[i - 1]))
> +   strbuf_addstr(, "../");
> +
> +   return strbuf_detach(, NULL);
> +}
> +
>  struct module_list {
> const struct cache_entry **entries;
> int alloc, nr;
> @@ -736,6 +769,151 @@ static int module_name(int argc, const char **argv, 
> const char *prefix)
> return 0;
>  }
>
> +struct sync_cb {
> +   const char *prefix;
> +   unsigned int quiet: 1;
> +   unsigned int recursive: 1;
> +};
> +#define SYNC_CB_INIT { NULL, 0, 0 }
> +
> +static void sync_submodule(const struct cache_entry *list_item, void 
> *cb_data)
> +{
> +   struct sync_cb *info = cb_data;
> +   const struct submodule *sub;
> +   char *sub_key, *remote_key;
> +   char *url, *sub_origin_url, *super_config_url, *displaypath;
> +   struct strbuf sb = STRBUF_INIT;
> +   struct child_process cp = CHILD_PROCESS_INIT;
> +
> +   if (!is_submodule_initialized(list_item->name))
> +   return;
> +
> +   sub = submodule_from_path(null_sha1, list_item->name);
> +
> +   if (!sub->url)

'sub' can be NULL as well, which when used to obtain the ->url
will crash. So we'd rather want to have (!sub || !sub->url).

I looked through other use cases, others only need (!sub), so this
thought did not hint at other bugs in the code base.


> +   die(_("no url found for submodule path '%s' in .gitmodules"),
> + list_item->name);
> +
> +   url = xstrdup(sub->url);

Why do we need to duplicate the url here? As we are not modifying it
(read: I did not spot the url modification), we could just use sub->url
instead, saving a variable.


Re: [GSoC][PATCH 1/6] dir: create function count_slashes

2017-06-20 Thread Stefan Beller
On Mon, Jun 19, 2017 at 2:50 PM, Prathamesh Chavan  wrote:
> Similar functions exist in apply.c and builtin/show-branch.c for
> counting the number of slashes in a string. Also in the later
> patches, we introduce a third caller for the same. Hence, we unify
> it now by cleaning the existing functions and declaring a common
> function count_slashes in dir.h and implementing it in dir.c to
> remove this code duplication.
>
> Mentored-by: Christian Couder 
> Mentored-by: Stefan Beller 
> Signed-off-by: Junio C Hamano 
> Signed-off-by: Prathamesh Chavan 
> Signed-off-by: Junio C Hamano 
> ---
> The complete build report of this is available at:
> https://travis-ci.org/pratham-pc/git/builds/
> Branch: All-patch-series
> Build #111

This patch was sent separately a couple days ago, see
the latest "What's cooking in git.git" (Jun 2017, #05; Mon, 19)
https://public-inbox.org/git/xmqqh8zbspm7@gitster.mtv.corp.google.com/

  * pc/dir-count-slashes (2017-06-12) 1 commit
(merged to 'next' on 2017-06-19 at 57351a2771)
   + dir: create function count_slashes()

   Three instances of the same helper function have been consolidated
   to one.

   Will merge to 'master'.

so if you pull Junios git and rebase on top of his master branch
this should be already included there. (In that case there is no need
for you to carry this patch. It is just cumbersome for you and might
confuse Junio, which patches exactly to apply)

Thanks,
Stefan


Re: [PATCH/RFC] Cleanup Documentation

2017-06-20 Thread Stefan Beller
On Tue, Jun 20, 2017 at 9:57 AM, Stefan Beller  wrote:
>
> With or without this nit addressed, this patch looks good to me,
>

Well actually not quite. The subject (and commit message) is very vague,
maybe:

Documentation/git-submodule: cleanup "add" section

The "add" section for 'git-submodule' is redundant in its description and
the short synopsis line. Remove the redundant mentioning of the
'repository' argument being mandatory.

The text is hard to read because of back-references, so remove those.

Replace the word "humanish" by "canonical" as that conveys better what
we do to guess the path.

While at it, quote all occurrences of '.gitmodules' as that is an important
file in the submodule context, also link to it on its first mention.
(This paragraph is not exactly what happens in the commit, but I wrote it
as a way how I would write commit messages. It shows the reader how
you addressed the given problem, with the quantifiers "all" "the
first" showing
what you think is important, and that you deliberately omitted others)


Re: [PATCH/RFC] Cleanup Documentation

2017-06-20 Thread Stefan Beller
On Mon, Jun 19, 2017 at 8:12 PM, Kaartic Sivaraam
 wrote:
> Make following changes to the git-submodule
> documentation:
>
> * Remove redundancy
> * Remove unclear back reference
> * Use more appropriate word
> * Quote important word
>
> Suggestions-by: Stefan Beller 
> Signed-off-by: Kaartic Sivaraam 
> ---
>  Currently used the word "canonical" instead of "humanish". If that word
>  sounds more suitable then this is a [PATCH] and not a [PATCH/RFC].

canonical: "according to recognized rules or scientific laws."
sounds about right. :)

>
>  Documentation/git-submodule.txt | 37 +++--
>  1 file changed, 15 insertions(+), 22 deletions(-)
>
> diff --git a/Documentation/git-submodule.txt b/Documentation/git-submodule.txt
> index 74bc6200d..045fef417 100644
> --- a/Documentation/git-submodule.txt
> +++ b/Documentation/git-submodule.txt
> @@ -63,14 +63,6 @@ add [-b ] [-f|--force] [--name ] 
> [--reference ] [--dep
> to the changeset to be committed next to the current
> project: the current project is termed the "superproject".
>  +
> -This requires at least one argument: . The optional
> -argument  is the relative location for the cloned submodule
> -to exist in the superproject. If  is not given, the
> -"humanish" part of the source repository is used ("repo" for
> -"/path/to/repo.git" and "foo" for "host.xz:foo/.git").
> -The  is also used as the submodule's logical name in its
> -configuration entries unless `--name` is used to specify a logical name.
> -+
>   is the URL of the new submodule's origin repository.
>  This may be either an absolute URL, or (if it begins with ./
>  or ../), the location relative to the superproject's default remote
> @@ -87,21 +79,22 @@ If the superproject doesn't have a default remote 
> configured
>  the superproject is its own authoritative upstream and the current
>  working directory is used instead.
>  +
> - is the relative location for the cloned submodule to
> -exist in the superproject. If  does not exist, then the
> -submodule is created by cloning from the named URL. If  does
> -exist and is already a valid Git repository, then this is added
> -to the changeset without cloning. This second form is provided
> -to ease creating a new submodule from scratch, and presumes
> -the user will later push the submodule to the given URL.
> +The optional argument  is the relative location for the cloned
> +submodule to exist in the superproject. If  is not given, the
> +canonical part of the source repository is used ("repo" for
> +"/path/to/repo.git" and "foo" for "host.xz:foo/.git"). If 
> +exists and is already a valid Git repository, then this is added
> +to the changeset without cloning.

While this was just reflowed and not newly introduced, I am still left
wondering what a changeset is in Git terms. Our Documentation/glossary
says:

  [[def_changeset]]changeset::
  BitKeeper/cvsps speak for "<>". Since Git does not
  store changes, but states, it really does not make sense to use the term
  "changesets" with Git.

Maybe we should say instead:

If exists and is already a valid Git repository,
then this is staged for commit without cloning.




> The  is also used as the
> +submodule's logical name in its configuration entries unless `--name`
> +is used to specify a logical name.
>  +
> -In either case, the given URL is recorded into .gitmodules for
> -use by subsequent users cloning the superproject. If the URL is
> -given relative to the superproject's repository, the presumption
> -is the superproject and submodule repositories will be kept
> -together in the same relative location, and only the
> -superproject's URL needs to be provided: git-submodule will correctly
> -locate the submodule using the relative URL in .gitmodules.
> +The given URL is recorded into `.gitmodules` for use by subsequent users
> +cloning the superproject. If the URL is given relative to the
> +superproject's repository, the presumption is the superproject and
> +submodule repositories will be kept together in the same relative
> +location, and only the superproject's URL needs to be provided.
> +git-submodule will correctly locate the submodule using the relative
> +URL in .gitmodules.
>

With or without this nit addressed, this patch looks good to me,

Thanks,
Stefan


Re: Git Strange behaviour with remote deleted branch

2017-06-20 Thread Jeff King
On Mon, Jun 19, 2017 at 05:46:22PM -0400, Paul Smith wrote:

> On Mon, 2017-06-19 at 23:33 +0200, Reda Lyazidi wrote:
> > with git I noticed when I removed a remote branch with git push origin 
> > --delete 
> > in my clone when I used git branch -a I don't the deleted branch
> > but my colleagues still see it.
> > 
> > I tried with two clones in my PC, with the first one delete branch and 
> > the other still sees it
> > despite git pull .
> 
> Normally Git will not remove a remote's branch refs on a fetch (or
> pull) operation.  This is a safety measure, since you might still have
> a use for that branch (for example to run diff against or similar).
> 
> In order to remove branches that have been deleted in your remote, you
> must use the --prune (or -p) option to git fetch or git pull:
> 
>   git pull -p
> 
> will get rid of them.

This is exactly correct. One further question, though, is "why does
git-push remove the branch?".

It makes sense that in general it tries to update the tracking branches
to match what we know we just sent to the other side. It's just
anticipating the next "git fetch".  But it is a little funny that it
treats deletion differently than fetch would.

I don't recall that behavior being consciously decided. I could see an
argument for the current behavior: the danger is less because the branch
didn't just "go away" upstream, it was something the user in the local
repository did. But I could also see an argument for the other: the
point is to anticipate the next fetch, and it should behave exactly like
fetch.

I don't have a strong opinion either way.

I do hope that one day we would turn on "fetch --prune" by default, but
I'd prefer to see better safety for recovering deleted refs first (e.g.,
if we retained the reflog and you could access origin/deleted@{1}).

-Peff


Re: [PATCH] die routine: change recursion limit from 1 to 1024

2017-06-20 Thread Jeff King
On Tue, Jun 20, 2017 at 11:54:59AM -0400, Jeff King wrote:

> > Now, git-grep could make use of the pluggable error facility added in
> > commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> > 2013-04-16).
> 
> Yeah, I think this is a bug in git-grep and should be fixed, independent
> of this commit. You should be able to use as a template the callbacks
> added by the child of c19a490e37:
> 
>   1ece66bc9 (run-command: use thread-aware die_is_recursing routine,
>   2013-04-16)

To clarify, I think anytime we spawn worker threads that might run die()
we probably need to be installing not just a custom recursion handler
but a custom die function.

It's weird to see:

  $ git grep ...
  fatal: some error
  fatal: some error
  fatal: some error

Or even:

  $ git grep ...
  fatal: some error
  some actual results
  more actual results

I'm not sure what the _right_ thing is there, but it probably involves
recording the per-thread errors in individual buffers, waiting for the
master to pthread_join() them all, and then doing something like:

  /* this also covers the case of only having 1 error */
  int all_errors_identical = 1;
  for (i = 1; i < nr_errors; i++) {
if (strcmp(errors[i], errors[i-1])) {
all_errors_identical = 0;
break;
}
  }

  if (all_errors_identical) {
/* just show it */
die("%s", errors[0]);
  } else {
for (i = 0; i < nr_errors; i++)
error("%s", errors[i]);
die("multiple errors encountered");
  }

I don't know if we'd want to actually get into the details of what
"multiple errors" means. It isn't _all_ of the possible errors, because
each thread stopped running when it hit the die(). But it's also not
just one error.

Actually, I guess we could just pick one error and show only that one.
That would most closely match the non-threaded case. And it's way
simpler than what I wrote above.

Hrm. I guess you could even do that without buffering if you allow the
thread to take down the whole process. The only thing you'd need to do
is teach die() to take a mutex so that we don't racily show multiple
errors.

That seems like the best option (I almost just deleted my entire email,
but maybe the thought process in leading there is useful).

-Peff


Re: [PATCH] die routine: change recursion limit from 1 to 1024

2017-06-20 Thread Jeff King
On Mon, Jun 19, 2017 at 10:00:36PM +, Ævar Arnfjörð Bjarmason wrote:

> Change the recursion limit for the default die routine from a *very*
> low 1 to 1024. This ensures that infinite recursions are broken, but
> doesn't lose error messages.
> 
> The intent of the existing code, as explained in commit
> cd163d4b4e ("usage.c: detect recursion in die routines and bail out
> immediately", 2012-11-14), is to break infinite recursion in cases
> where the die routine itself dies.

I agree that was the original intent, but I think it also does something
else. Anytime die() recurses, even a single level, we're going to cover
up the original failure with the one that happened inside die(), which
is almost certainly the less interesting of the two.

E.g., if I

  die_errno("unable to open %s", filename);

and then the die handler calls malloc() and fails, you'd much rather see
that first message than "out of memory".

To be fair, "die handler is recursing" is _also_ not helpful, but at
least it's clear that this is a bug (and IMHO it should be marked with
BUG()). Saying "out of memory" tells you about the second error, but it
doesn't tell you that we've masked the first error. So it may lead to
more confusion in the long run.

I wonder if we can get the best of both, though. Can we make the logic
more like:

  if (!dying) {
/* ok, normal */
return 0;
  } else if (dying < 1024) {
/* only show the warning once */
if (dying == 1)
warning("I heard you liked errors, so I put a die() in your 
die()");
return 0; /* don't bail yet */
  } else {
BUG("recursion detected in die handler");
  }

> Now, git-grep could make use of the pluggable error facility added in
> commit c19a490e37 ("usage: allow pluggable die-recursion checks",
> 2013-04-16).

Yeah, I think this is a bug in git-grep and should be fixed, independent
of this commit. You should be able to use as a template the callbacks
added by the child of c19a490e37:

  1ece66bc9 (run-command: use thread-aware die_is_recursing routine,
  2013-04-16)

-Peff


Re: Transform log message during migration svn -> git (using git-svn)

2017-06-20 Thread Jeff King
On Tue, Jun 20, 2017 at 02:46:22PM +0200, Lars Schneider wrote:

> 
> > On 20 Jun 2017, at 14:32,   wrote:
> > 
> > Well this is a possibility, of course. Our problem is that our SVN
> > repository contains about 220.000 revisions currently. As a colleague of
> > mine said that the command you suggest might take about 4 seconds per
> > revision, it would take about 10 days to do this for our whole repository.
> > So of course it could save a lot of time generally if such operation could
> > be done immediately during git-svn.
> 
> You colleague is most likely correct. I suggested it as this is a one time
> operation and therefore still somewhat practical from my point of view.

I didn't follow this whole thread, but I happened to see this bit. I
think the command in question is:

  git filter-branch -f --msg-filter 'perl -lape "s/^T(\d+)/#\$1/"'

I know filter-branch is slow, but a msg-filter should be relatively
fast.  I'd be surprised at 4 seconds per revision (the main cost is
kicking off a new perl process per revision). It's more like 120/sec on
my machine.

However, I think the fastest way would be to do it with fast-export,
where you can just tweak the stream as it flows through:

  # set up a new repo to hold the results; we won't bother
  # copying the blobs, so just point at the current repo as an
  # alternate.
  git init fixed-repo
  echo "../../../.git/objects" >fixed-repo/.git/objects/info/alternates

  git fast-export --no-data --all |
  perl -ne '
# look for "data" chunks which contain the commit message
if (/^data (\d+)/) {
read STDIN, my $buf, $1;
$buf =~ s/^T(\d+)/#$1/;
print "data ", length($buf), "\n";
print $buf;
} else {
print;
}
  ' |
  git -C fixed-repo fast-import

That runs at about 3600 commits/sec on my machine.

Most of that time goes to doing a tree diff on each commit. Technically
that is not required for your use case, but I don't think there's a way
to get fast-export to skip that (and it's an inherent part of the
fast-import stream). It's probably fast enough, but it's possible that
a specialized tool like BFG repo cleaner[1] could do better (I don't
know offhand if it handles commit message rewrites or not).

-Peff

[1] https://rtyley.github.io/bfg-repo-cleaner/


Re: [PATCH 1/3] Contextually notify user about an initial commit

2017-06-20 Thread Ævar Arnfjörð Bjarmason

On Tue, Jun 20 2017, Kaartic Sivaraam jotted:

> Thanks for trying to help. A few comments regarding your suggestions.
>
>> * Let's do that spacing fix (unrelated fix) in its own commit.
>>
> Ok. That's a good point.
>
>> * You should add tests along with the code being changed, and
>> especially change tests that would fail with your new code,
>> otherwise you break bisection.
>>
> I'm not sure if the convention here is to keep tests along with change
> it tests. I thought it would be better to separate them as they are
> "separate" changes. They're separate because in case of any issues with
> the test it should be possible to identify them separately. This isn't
> possible when they are added along with the change.
>
> Further, adding them as a separate change change (commit) immediately
> after the change it tests would ease the task of finding the change it
> tests.
>
> If the convention here, really, is to add tests along with the
> change,I can't do anything but to follow it. I guess it should be
> mentioned somewhere in the Documentation/SubmittingPatches. AFAIK, I
> don't think it's mentioned there.

Right now 1/3 breaks the test suite. That's a big no-no, any given
commit should not break the test suite to not break bisecting.

But aside from that the general pattern we follow is that if code
behavior changes, tests for that new behavior go in the same commit.

>> * I think the commit message here could be shorter & clearer at the
>> same time.
> I don't think it's unclear. It's longer for the reason that follows.
> The commit message was crafted based on the following guideline found
> in Documentation/SubmittingPatches
>
>> The body should provide a meaningful commit message, which:
>>
>>  . explains the problem the change tries to solve, i.e. what is
>> wrong with the current code without the change.
>>
>>  . justifies the way the change solves the problem, i.e. why the
>> result with the change is better.
>>
>>  . alternate solutions considered but discarded, if any.
>>
> I don't think the new commit message follows this. Both the commit
> messages are found below for comparison.
>
> Old one
> ---
>> On Tue, Jun 20 2017, Kaartic Sivaraam jotted:
>>
>>  "git status" indicated "Initial commit" when HEAD points at
>>  an unborn branch.This message is shared with the commit
>>  log template "git commit" prepares for the user when
>>  creating a commit (i.e. "You are about to create the initial
>>  commit"), and is OK as long as the reader is aware of the
>>  nature of the message (i.e. it guides the user working
>>  toward the next commit), but was confusing to new users,
>>  especially the ones who do "git commit -m message" without
>>  having a chance to pay attention to the commit log template.
>>
>>  The "Initial commit" indication wasn't an issue in the commit
>>  template. Taking that into consideration, a good solution would
>>  be to contextually use different messages to indicate the user
>>  that there were no commits in this branch.
>>
>>  A few alternatives considered were,
>>
>>  * Waiting for initial commit
>>  * Your current branch does not have any commits
>>  * Current branch waiting for initial commit
>>
>>  The most succint one, "No commits yet", among the alternatives
>>  was chosen.
>>
>>  Helped-by: Junio C Hamano 
>>  Signed-off-by: Kaartic Sivaraam 
>>
>
> New one
> ---
>>  status: contextually notify user about an initial commit
>>
>> Change the output of "status" to say "No commits yet" when "git
>>  retaining the current "Initial commit" message displayed in the
>>  template that's displayed in the editor when the initial commit is
>>  being authored.
>>
>> The existing "Initial commit" message makes sense for the commit
>> template where we're making the initial commit, but is confusing
>>  when merely checking the status of a fresh repository without
>>  having any commits yet.
>>
>> Helped-by: Junio C Hamano 
>> Signed-off-by: Kaartic Sivaraam 
>> Signed-off-by: Ævar Arnfjörð Bjarmason 
>>
>
>> * The commit message doesn't follow our usual format.
> Could you be more specific about where it's not following the format?

Regarding the format: I was referring to the 'prefix the first line with
"area: "' part of SubmittingPatches, sorry for the brevity. I.e. your
--oneline output just yields "Contextually notify user about an initial
commit" but should be "status: ".

For the rest of the changes I made: I just found the current version
hard to read, most of the first paragraph is one big 8-line sentence,
after reading it over a few times and understanding what it was doing I
thought I'd rewrite it as a shorter version that was (hopefully) easier
to understand.

I dropped the "alternatives considered" because while it follows the
SubmittingPatches guidelines as-is, given other patches I've seen what
we eventually ended up picking for 

Re: "There are too many unreachable loose objects" - why don't we run 'git prune' automatically?

2017-06-20 Thread Jeff King
On Sun, Jun 18, 2017 at 03:22:29PM +0200, Lars Schneider wrote:

> > To be honest, the fact that we have to write this warning at all is a
> > sign that Git is not doing a very good job. The best place to spend
> > effort would be to teach git-gc to pack all of the unreachable objects
> > into a single "cruft" pack, so this problem doesn't happen at all (and
> > it's way more efficient, as well).
> > 
> > The big problem with that approach is that we lose individual-object
> > timestamps. Each object just gets the timestamp of its surrounding pack,
> > so as we continually ran auto-gc, the cruft-pack timestamp would get
> > updated and we'd never drop objects. So we'd need some auxiliary file
> > (e.g., pack-1234abcd.times) that stores the per-object timestamps. This
> > can be as small as a 32- or 64-bit int per object, since we can just
> > index it by the existing object list in the pack .idx.
> 
> Why can't we generate a new cruft-pack on every gc run that detects too
> many unreachable objects? That would not be as efficient as a single
> cruft-pack but it should be way more efficient than the individual
> objects, no?
> 
> Plus, chances are that the existing cruft-packs are purged with the next
> gc run anyways.

Interesting idea. Here are some thoughts in random order.

That loses some delta opportunities between the cruft packs, but that's
certainly no worse than the all-loose storage we have today.

One nice aspect is that it means cruft objects don't incur any I/O cost
during a repack.

It doesn't really solve the "too many loose objects after gc" problem.
It just punts it to "too many packs after gc". This is likely to be
better because the number of packs would scale with the number of gc
runs, rather than the number of crufty objects. But there would still be
corner cases if you run gc frequently. Maybe that would be acceptable.

I'm not sure how the pruning process would work, especially with respect
to objects reachable from other unreachable-but-recent objects. Right
now the repack-and-delete procedure is done by git-repack, and is
basically:

  1. Get a list of all of the current packs.

  2. Ask pack-objects to pack everything into a new pack. Normally this
 is reachable objects, but we also include recent objects and
 objects reachable from recent objects. And of course with "-k" all
 objects are kept.

  3. Delete everything in the list from (1), under the assumption that
 anything worth keeping was repacked in step (2), and anything else
 is OK to drop.

So if there are regular packs and cruft packs, we'd have to know in step
3 which are which. We'd delete the regular ones, whose objects have all
been migrated to the new pack (either a "real" one or a cruft one), but
keep the crufty ones whose timestamps are still fresh.

That's a small change, and works except for one thing: the reachable
from recent objects. You can't just delete a whole cruft pack. Some of
its objects may be reachable from objects in other cruft packs that
we're keeping. In other words, you have cruft packs where you want to
keep half of the objects they contain. How do you do that?

I think you'd have to make pack-objects aware of the concept of cruft
packs, and that it should include reachable-from-recent objects in the
new pack only if they're in a cruft pack that is going to be deleted. So
those objects would be "rescued" from the cruft pack before it goes away
and migrated to the new cruft pack. That would effectively refresh their
timestamp, but that's fine. They're reachable from objects with that
fresh timestamp already, so effectively they couldn't be deleted until
that timestamp is hit.

So I think it's do-able, but it is a little complicated.

-Peff


Re: [RFC/PATCH v4 00/49] Add initial experimental external ODB support

2017-06-20 Thread Christian Couder
On Tue, Jun 20, 2017 at 9:54 AM, Christian Couder
 wrote:
>
> Future work
> ~~~
>
> First sorry about the state of this patch series, it is not as clean
> as I would have liked, butI think it is interesting to get feedback
> from the mailing list at this point, because the previous RFC was sent
> a long time ago and a lot of things changed.
>
> So a big part of the future work will be about cleaning this patch series.
>
> Other things I think I am going to do:
>
>   -

Ooops, I had not save my emacs buffer where I wrote this when I sent
the patch series.

This should have been:

Other things I think I may work on:

  - Remove the "odb..scriptMode" and "odb..command"
options and instead have just "odb..scriptCommand" and
"odb..subprocessCommand".

  - Use capabilities instead of "odb..fetchKind" to decide
which kind of "get" will be used.

  - Better test all the combinations of the above modes with and
without "have" and "put" instructions.

  - Maybe also have different kinds of "put" so that Git could pass
either a git object a plain object or ask the helper to retreive
it directly from Git's object database.

  - Maybe add an "init" instruction as the script mode has something
like this called "get_cap" and it would help the sub-process mode
too, as it makes it possible for Git to know the capabilities
before trying to send any instruction (that might not be supported
by the helper). The "init" instruction would be the only required
instruction for any helper to implement.

  - Add more long running tests and improve tests in general.


Re: Better usability of stash refs

2017-06-20 Thread Robert Dailey
On Mon, Jun 19, 2017 at 3:56 PM, Jeff King  wrote:
> On Mon, Jun 19, 2017 at 03:32:54PM -0500, Robert Dailey wrote:
>
>> To drop a stash, I have to do this (example):
>>
>> $ git stash drop stash@{3}
>>
>> Using the full "stash@{N}" seems superfluous since the documentation
>> states it must be a stash in the first place. It would make more sense
>> (and be quicker to type) to do:
>>
>> $ git stash drop 3
>>
>> Is there a trick I can use to make this shorthand possible? I thought
>> about creating a "s" script for "stash" that intercepted the
>> parameters for only a couple of stash sub-commands and created the
>> ref, but that seems a lot of work.
>>
>> Any productivity tips here? Thanks in advance.
>
> Junio mentioned that this is already possible. I suspect the problem may
> be that your Git is not recent enough. It was added in a56c8f5aa (stash:
> allow stashes to be referenced by index only, 2016-10-24), which is in
> v2.11.0.
>
> -Peff

Thanks guys. Actually I'm running 2.13, I just haven't tried it since
way before 2.11. I always assumed it wasn't working like I expected
since last time I tried it. Yesterday I just happened to remember that
this would be nice to have, so I wrote the email but didn't bother
testing it on the newest version first. Sorry about that.


Re: [PATCH 1/3] Contextually notify user about an initial commit

2017-06-20 Thread Kaartic Sivaraam
Thanks for trying to help. A few comments regarding your suggestions.

>  * Let's do that spacing fix (unrelated fix) in its own commit.
> 
Ok. That's a good point.

>   * You should add tests along with the code being changed, and
> especially change tests that would fail with your new code,
> otherwise you break bisection.
> 
I'm not sure if the convention here is to keep tests along with change
it tests. I thought it would be better to separate them as they are
"separate" changes. They're separate because in case of any issues with
the test it should be possible to identify them separately. This isn't
possible when they are added along with the change. 

Further, adding them as a separate change change (commit) immediately
after the change it tests would ease the task of finding the change it
tests.

If the convention here, really, is to add tests along with the
change, I can't do anything but to follow it. I guess it should be
mentioned somewhere in the Documentation/SubmittingPatches. AFAIK, I
don't think it's mentioned there.

>  * I think the commit message here could be shorter & clearer at the
>    same time.
I don't think it's unclear. It's longer for the reason that follows.
The commit message was crafted based on the following guideline found
in Documentation/SubmittingPatches

> The body should provide a meaningful commit message, which:
> 
>   . explains the problem the change tries to solve, i.e. what is 
> wrong with the current code without the change.
> 
>   . justifies the way the change solves the problem, i.e. why the
> result with the change is better.
> 
>   . alternate solutions considered but discarded, if any.
> 
I don't think the new commit message follows this. Both the commit
messages are found below for comparison.

Old one
---
> On Tue, Jun 20 2017, Kaartic Sivaraam jotted:
> 
>  "git status" indicated "Initial commit" when HEAD points at
>  an unborn branch.  This message is shared with the commit
>  log template "git commit" prepares for the user when
>  creating a commit (i.e. "You are about to create the initial
>  commit"), and is OK as long as the reader is aware of the
>  nature of the message (i.e. it guides the user working
>  toward the next commit), but was confusing to new users,
>  especially the ones who do "git commit -m message" without
>  having a chance to pay attention to the commit log template.
>  
>  The "Initial commit" indication wasn't an issue in the commit
>  template. Taking that into consideration, a good solution would
>  be to contextually use different messages to indicate the user
>  that there were no commits in this branch.
> 
>  A few alternatives considered were,
>  
>  * Waiting for initial commit
>  * Your current branch does not have any commits
>  * Current branch waiting for initial commit
>  
>  The most succint one, "No commits yet", among the alternatives
>  was chosen.
>  
>  Helped-by: Junio C Hamano 
>  Signed-off-by: Kaartic Sivaraam 
> 

New one
---
>  status: contextually notify user about an initial commit
> 
>  Change the output of "status" to say "No commits yet" when "git
>  retaining the current "Initial commit" message displayed in the
>  template that's displayed in the editor when the initial commit is
>  being authored.
> 
>  The existing "Initial commit" message makes sense for the commit
>  template where we're making the initial commit, but is confusing 
>  when merely checking the status of a fresh repository without 
>  having any commits yet.
> 
>  Helped-by: Junio C Hamano 
>  Signed-off-by: Kaartic Sivaraam 
>  Signed-off-by: Ævar Arnfjörð Bjarmason 
>  

>  * The commit message doesn't follow our usual format.
Could you be more specific about where it's not following the format?

-- 
Regards,
Kaartic Sivaraam 


Re: Transform log message during migration svn -> git (using git-svn)

2017-06-20 Thread Lars Schneider

> On 20 Jun 2017, at 14:32,   wrote:
> 
> Well this is a possibility, of course. Our problem is that our SVN
> repository contains about 220.000 revisions currently. As a colleague of
> mine said that the command you suggest might take about 4 seconds per
> revision, it would take about 10 days to do this for our whole repository.
> So of course it could save a lot of time generally if such operation could
> be done immediately during git-svn.

You colleague is most likely correct. I suggested it as this is a one time
operation and therefore still somewhat practical from my point of view.

If you don't like the solution then you need to change the git-svn code.
Probably here somewhere (I am not familiar with this code):
https://github.com/git/git/blob/master/git-svn.perl#L1836

- Lars

PS: Please don't top post on this mailing list :-)
https://en.wikipedia.org/wiki/Posting_style#Top-posting



> 
> Paul Mattke
> Software Developer
> -
> Arvato Systems S4M GmbH
> Am Coloneum 3
> 50829 Köln
>  
> Phone: +49 221 28555-443
> Fax: +49 221 28555-210
> E-Mail: paul.mat...@s4m.com
> www.s4m.arvato-systems.com
> 
> 
> -Ursprüngliche Nachricht-
> Von: Lars Schneider [mailto:larsxschnei...@gmail.com] 
> Gesendet: Dienstag, 20. Juni 2017 11:32
> An: Mattke, Paul, NMM-BPDD 
> Cc: git@vger.kernel.org
> Betreff: Re: Transform log message during migration svn -> git (using
> git-svn)
> 
> 
>> On 20 Jun 2017, at 09:32, paul.mat...@s4m.com wrote:
>> 
>> Hi there,
>> 
>> this is actually not really a bug report, but much more a feature 
>> request (if I did not oversee an already existing feature like this):
>> 
>> We want to migrate our SVN repository to GIT and will be using git-svn 
>> for that of course. Currently in SVN, all our commit log messages 
>> start either
>> with:
>> 
>> 123456 (a number, representing the Bug Id in our old legacy bug 
>> tracker)
>> 
>> or
>> 
>> T123456 (a number, but prefixed with T, referring a TFS item in this 
>> case)
>> 
>> During conversion to GIT, we want to replace the T in such log 
>> messages with a #, so commits, referring a TFS item will start with
> #123456 in the future.
>> We don’t care about log messages which do not start with a T, only the 
>> TXX messages need to be transformed here.
>> 
>> I guess an operation like this is currently not possible with git-svn, 
>> isn’t it? So it would be nice, if a feature could be implemented that 
>> gives the user the possibility to specify some kind of script file for 
>> example, which transforms the log message in any way we want it.
> 
> You can migrate your repo from SVN to Git as is. Afterwards you can fix up
> the commit messages with the following command:
> 
> git filter-branch -f --msg-filter 'perl -lape "s/^T(\d+)/#\$1/"'
> 
> (this might take a while on a large repo)
> 
> - Lars



AW: Transform log message during migration svn -> git (using git-svn)

2017-06-20 Thread paul.mattke
Well this is a possibility, of course. Our problem is that our SVN
repository contains about 220.000 revisions currently. As a colleague of
mine said that the command you suggest might take about 4 seconds per
revision, it would take about 10 days to do this for our whole repository.
So of course it could save a lot of time generally if such operation could
be done immediately during git-svn.

Paul Mattke
Software Developer
-
Arvato Systems S4M GmbH
Am Coloneum 3
50829 Köln
 
Phone: +49 221 28555-443
Fax: +49 221 28555-210
E-Mail: paul.mat...@s4m.com
www.s4m.arvato-systems.com


-Ursprüngliche Nachricht-
Von: Lars Schneider [mailto:larsxschnei...@gmail.com] 
Gesendet: Dienstag, 20. Juni 2017 11:32
An: Mattke, Paul, NMM-BPDD 
Cc: git@vger.kernel.org
Betreff: Re: Transform log message during migration svn -> git (using
git-svn)


> On 20 Jun 2017, at 09:32, paul.mat...@s4m.com wrote:
> 
> Hi there,
> 
> this is actually not really a bug report, but much more a feature 
> request (if I did not oversee an already existing feature like this):
> 
> We want to migrate our SVN repository to GIT and will be using git-svn 
> for that of course. Currently in SVN, all our commit log messages 
> start either
> with:
> 
> 123456 (a number, representing the Bug Id in our old legacy bug 
> tracker)
> 
> or
> 
> T123456 (a number, but prefixed with T, referring a TFS item in this 
> case)
> 
> During conversion to GIT, we want to replace the T in such log 
> messages with a #, so commits, referring a TFS item will start with
#123456 in the future.
> We don’t care about log messages which do not start with a T, only the 
> TXX messages need to be transformed here.
> 
> I guess an operation like this is currently not possible with git-svn, 
> isn’t it? So it would be nice, if a feature could be implemented that 
> gives the user the possibility to specify some kind of script file for 
> example, which transforms the log message in any way we want it.

You can migrate your repo from SVN to Git as is. Afterwards you can fix up
the commit messages with the following command:

git filter-branch -f --msg-filter 'perl -lape "s/^T(\d+)/#\$1/"'

(this might take a while on a large repo)

- Lars


smime.p7s
Description: S/MIME cryptographic signature


[PATCH] git-p4: changelist template with p4 -G change -o

2017-06-20 Thread Miguel Torroja
The option -G of p4 (python marshal output) gives more context about the
data being output. That's useful when using the command "change -o" as
we can distinguish between warning/error line and real change description.

Some p4 plugin/hooks in the server side generates some warnings when
executed. Unfortunately those messages are mixed with the output of
"p4 change -o". Those extra warning lines are reported as {'code':'info'}
in python marshal output (-G). The real change output is reported as
{'code':'stat'}

Signed-off-by: Miguel Torroja 
---
 git-p4.py | 77 ++-
 1 file changed, 51 insertions(+), 26 deletions(-)

diff --git a/git-p4.py b/git-p4.py
index 8d151da..a300474 100755
--- a/git-p4.py
+++ b/git-p4.py
@@ -1526,37 +1526,62 @@ class P4Submit(Command, P4UserMap):
 
 [upstream, settings] = findUpstreamBranchPoint()
 
-template = ""
+template = """\
+# A Perforce Change Specification.
+#
+#  Change:  The change number. 'new' on a new changelist.
+#  Date:The date this specification was last modified.
+#  Client:  The client on which the changelist was created.  Read-only.
+#  User:The user who created the changelist.
+#  Status:  Either 'pending' or 'submitted'. Read-only.
+#  Type:Either 'public' or 'restricted'. Default is 'public'.
+#  Description: Comments about the changelist.  Required.
+#  Jobs:What opened jobs are to be closed by this changelist.
+#   You may delete jobs from this list.  (New changelists only.)
+#  Files:   What opened files from the default changelist are to be added
+#   to this changelist.  You may delete files from this list.
+#   (New changelists only.)
+"""
+files_list = []
 inFilesSection = False
+change_entry = None
 args = ['change', '-o']
 if changelist:
 args.append(str(changelist))
-
-for line in p4_read_pipe_lines(args):
-if line.endswith("\r\n"):
-line = line[:-2] + "\n"
-if inFilesSection:
-if line.startswith("\t"):
-# path starts and ends with a tab
-path = line[1:]
-lastTab = path.rfind("\t")
-if lastTab != -1:
-path = path[:lastTab]
-if settings.has_key('depot-paths'):
-if not [p for p in settings['depot-paths']
-if p4PathStartsWith(path, p)]:
-continue
-else:
-if not p4PathStartsWith(path, self.depotPath):
-continue
+for entry in p4CmdList(args):
+if not entry.has_key('code'):
+continue
+if entry['code'] == 'stat':
+change_entry = entry
+break
+if not change_entry:
+die('Failed to decode output of p4 change -o')
+for key, value in change_entry.iteritems():
+if key.startswith('File'):
+if settings.has_key('depot-paths'):
+if not [p for p in settings['depot-paths']
+if p4PathStartsWith(value, p)]:
+continue
 else:
-inFilesSection = False
-else:
-if line.startswith("Files:"):
-inFilesSection = True
-
-template += line
-
+if not p4PathStartsWith(value, self.depotPath):
+continue
+files_list.append(value)
+continue
+# Output in the order expected by prepareLogMessage
+for key in ['Change','Client','User','Status','Description','Jobs']:
+if not change_entry.has_key(key):
+continue
+template += '\n'
+template += key + ':'
+if key == 'Description':
+template += '\n'
+for field_line in change_entry[key].splitlines():
+template += '\t'+field_line+'\n'
+if len(files_list) > 0:
+template += '\n'
+template += 'Files:\n'
+for path in files_list:
+template += '\t'+path+'\n'
 return template
 
 def edit_template(self, template_file):
-- 
2.1.4



Re: Behavior of 'git fetch' for commit hashes

2017-06-20 Thread eero.aaltonen
On 20.06.2017 01:26, Jonathan Tan wrote:
> On Mon, 19 Jun 2017 10:49:36 -0700
> Jonathan Tan  wrote:
> 
>> On Mon, 19 Jun 2017 12:09:28 +
>>  wrote:
>>
>>> For version 2.13.3
> 
> Firstly, exactly which version of Git doesn't work? I'm assuming 2.13.1
> (as written elsewhere in your e-mail), since 2.13.3 doesn't exist.

Yes. 2.13.1. I should stick to copy-pasting.

> I tried to reproduce with this script, but it seems to pass even at
> 2.13.1:
> 
> #!/bin/bash
> rm -rf ~/tmp/x &&
> make --quiet &&
> ./git init ~/tmp/x &&
> ./git -C ~/tmp/x fetch --quiet ~/gitpristine/git master:foo &&
> ./git -C ~/tmp/x fetch ~/gitpristine/git "$(git -C ~/gitpristine/git 
> rev-parse master^)"
> exit $?
> 
> Commenting out the first fetch line produces, as expected:
> 
> error: Server does not allow request for unadvertised object 
> 
> And I have not seen the "fatal: Couldn't find remote ref" error you
> describe.

I am now getting the same "unadvertised object" error. The "remote ref"
error it seems was due to missing the last character of the SHA1 :/

Now with that resolved, the "fetch branch; fetch commit" also works just
as before.

> I'll take a look into the expected behavior, but if my assumptions are
> correct, you should be able to just checkout the commit you want after
> fetching the branch:
> 
>   git fetch  
>   git checkout 

This actually produces
fatal: reference is not a tree: 

But I can use the above dual fetch. Seems this problem was more between
the keyboard and chair. I am however glad to hear that literal SHA1s are
becoming fetchable.

Thanks for investigating,
-- 
Eero Aaltonen


Re: Transform log message during migration svn -> git (using git-svn)

2017-06-20 Thread Lars Schneider

> On 20 Jun 2017, at 09:32, paul.mat...@s4m.com wrote:
> 
> Hi there,
> 
> this is actually not really a bug report, but much more a feature request
> (if I did not oversee an already existing feature like this):
> 
> We want to migrate our SVN repository to GIT and will be using git-svn for
> that of course. Currently in SVN, all our commit log messages start either
> with:
> 
> 123456 (a number, representing the Bug Id in our old legacy bug tracker)
> 
> or
> 
> T123456 (a number, but prefixed with T, referring a TFS item in this case)
> 
> During conversion to GIT, we want to replace the T in such log messages with
> a #, so commits, referring a TFS item will start with #123456 in the future.
> We don’t care about log messages which do not start with a T, only the
> TXX messages need to be transformed here.
> 
> I guess an operation like this is currently not possible with git-svn, isn’t
> it? So it would be nice, if a feature could be implemented that gives the
> user the possibility to specify some kind of script file for example, which
> transforms the log message in any way we want it.

You can migrate your repo from SVN to Git as is. Afterwards you can
fix up the commit messages with the following command:

git filter-branch -f --msg-filter 'perl -lape "s/^T(\d+)/#\$1/"'

(this might take a while on a large repo)

- Lars

[RFC/PATCH v4 04/49] Add Git/Packet.pm from parts of t0021/rot13-filter.pl

2017-06-20 Thread Christian Couder
This will make it possible to reuse packet reading and writing
functions in other test scripts.

Signed-off-by: Christian Couder 
---
 perl/Git/Packet.pm | 71 ++
 1 file changed, 71 insertions(+)
 create mode 100644 perl/Git/Packet.pm

diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
new file mode 100644
index 00..aaffecbe2a
--- /dev/null
+++ b/perl/Git/Packet.pm
@@ -0,0 +1,71 @@
+package Git::Packet;
+use 5.008;
+use strict;
+use warnings;
+BEGIN {
+   require Exporter;
+   if ($] < 5.008003) {
+   *import = \::import;
+   } else {
+   # Exporter 5.57 which supports this invocation was
+   # released with perl 5.8.3
+   Exporter->import('import');
+   }
+}
+
+our @EXPORT = qw(
+   packet_bin_read
+   packet_txt_read
+   packet_bin_write
+   packet_txt_write
+   packet_flush
+   );
+our @EXPORT_OK = @EXPORT;
+
+sub packet_bin_read {
+   my $buffer;
+   my $bytes_read = read STDIN, $buffer, 4;
+   if ( $bytes_read == 0 ) {
+   # EOF - Git stopped talking to us!
+   return ( -1, "" );
+   } elsif ( $bytes_read != 4 ) {
+   die "invalid packet: '$buffer'";
+   }
+   my $pkt_size = hex($buffer);
+   if ( $pkt_size == 0 ) {
+   return ( 1, "" );
+   } elsif ( $pkt_size > 4 ) {
+   my $content_size = $pkt_size - 4;
+   $bytes_read = read STDIN, $buffer, $content_size;
+   if ( $bytes_read != $content_size ) {
+   die "invalid packet ($content_size bytes expected; 
$bytes_read bytes read)";
+   }
+   return ( 0, $buffer );
+   } else {
+   die "invalid packet size: $pkt_size";
+   }
+}
+
+sub packet_txt_read {
+   my ( $res, $buf ) = packet_bin_read();
+   unless ( $res == -1 || $buf =~ s/\n$// ) {
+   die "A non-binary line MUST be terminated by an LF.";
+   }
+   return ( $res, $buf );
+}
+
+sub packet_bin_write {
+   my $buf = shift;
+   print STDOUT sprintf( "%04x", length($buf) + 4 );
+   print STDOUT $buf;
+   STDOUT->flush();
+}
+
+sub packet_txt_write {
+   packet_bin_write( $_[0] . "\n" );
+}
+
+sub packet_flush {
+   print STDOUT sprintf( "%04x", 0 );
+   STDOUT->flush();
+}
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 07/49] Git/Packet.pm: add packet_initialize()

2017-06-20 Thread Christian Couder
Add a function to initialize the communication. And use this
function in 't/t0021/rot13-filter.pl'.

Signed-off-by: Christian Couder 
---
 perl/Git/Packet.pm  | 13 +
 t/t0021/rot13-filter.pl |  8 +---
 2 files changed, 14 insertions(+), 7 deletions(-)

diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
index 2ad6b00d6c..b0233caf37 100644
--- a/perl/Git/Packet.pm
+++ b/perl/Git/Packet.pm
@@ -19,6 +19,7 @@ our @EXPORT = qw(
packet_bin_write
packet_txt_write
packet_flush
+   packet_initialize
);
 our @EXPORT_OK = @EXPORT;
 
@@ -70,3 +71,15 @@ sub packet_flush {
print STDOUT sprintf( "%04x", 0 );
STDOUT->flush();
 }
+
+sub packet_initialize {
+   my ($name, $version) = @_;
+
+   ( packet_txt_read() eq ( 0, $name . "-client" ) )   || die "bad 
initialize";
+   ( packet_txt_read() eq ( 0, "version=" . $version ) )   || die "bad 
version";
+   ( packet_bin_read() eq ( 1, "" ) )  || die "bad 
version end";
+
+   packet_txt_write( $name . "-server" );
+   packet_txt_write( "version=" . $version );
+   packet_flush();
+}
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 36a9eb3608..5b05518640 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -40,13 +40,7 @@ sub rot13 {
 print $debug "START\n";
 $debug->flush();
 
-( packet_txt_read() eq ( 0, "git-filter-client" ) ) || die "bad initialize";
-( packet_txt_read() eq ( 0, "version=2" ) ) || die "bad version";
-( packet_bin_read() eq ( 1, "" ) )  || die "bad version end";
-
-packet_txt_write("git-filter-server");
-packet_txt_write("version=2");
-packet_flush();
+packet_initialize("git-filter", 2);
 
 ( packet_txt_read() eq ( 0, "capability=clean" ) )  || die "bad capability";
 ( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 08/49] Git/Packet: add capability functions

2017-06-20 Thread Christian Couder
Add functions to help read and write capabilities.
Use these functions in 't/t0021/rot13-filter.pl'.

Signed-off-by: Christian Couder 
---
 perl/Git/Packet.pm  | 33 +
 t/t0021/rot13-filter.pl |  9 ++---
 2 files changed, 35 insertions(+), 7 deletions(-)

diff --git a/perl/Git/Packet.pm b/perl/Git/Packet.pm
index b0233caf37..4443b67724 100644
--- a/perl/Git/Packet.pm
+++ b/perl/Git/Packet.pm
@@ -20,6 +20,9 @@ our @EXPORT = qw(
packet_txt_write
packet_flush
packet_initialize
+   packet_read_capabilities
+   packet_write_capabilities
+   packet_read_and_check_capabilities
);
 our @EXPORT_OK = @EXPORT;
 
@@ -83,3 +86,33 @@ sub packet_initialize {
packet_txt_write( "version=" . $version );
packet_flush();
 }
+
+sub packet_read_capabilities {
+   my @cap;
+   while (1) {
+   my ( $res, $buf ) = packet_bin_read();
+   return ( $res, @cap ) if ( $res != 0 );
+   unless ( $buf =~ s/\n$// ) {
+   die "A non-binary line MUST be terminated by an LF.\n"
+   . "Received: '$buf'";
+   }
+   die "bad capability buf: '$buf'" unless ( $buf =~ 
s/capability=// );
+   push @cap, $buf;
+   }
+}
+
+sub packet_read_and_check_capabilities {
+   my @local_caps = @_;
+   my @remote_res_caps = packet_read_capabilities();
+   my $res = shift @remote_res_caps;
+   my %remote_caps = map { $_ => 1 } @remote_res_caps;
+   foreach (@local_caps) {
+   die "'$_' capability not available" unless 
(exists($remote_caps{$_}));
+   }
+   return $res;
+}
+
+sub packet_write_capabilities {
+   packet_txt_write( "capability=" . $_ ) foreach (@_);
+   packet_flush();
+}
diff --git a/t/t0021/rot13-filter.pl b/t/t0021/rot13-filter.pl
index 5b05518640..bbfd52619d 100644
--- a/t/t0021/rot13-filter.pl
+++ b/t/t0021/rot13-filter.pl
@@ -42,14 +42,9 @@ $debug->flush();
 
 packet_initialize("git-filter", 2);
 
-( packet_txt_read() eq ( 0, "capability=clean" ) )  || die "bad capability";
-( packet_txt_read() eq ( 0, "capability=smudge" ) ) || die "bad capability";
-( packet_bin_read() eq ( 1, "" ) )  || die "bad capability 
end";
+packet_read_and_check_capabilities("clean", "smudge");
+packet_write_capabilities(@capabilities);
 
-foreach (@capabilities) {
-   packet_txt_write( "capability=" . $_ );
-}
-packet_flush();
 print $debug "init handshake complete\n";
 $debug->flush();
 
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 10/49] external odb foreach

2017-06-20 Thread Christian Couder
From: Jeff King 

---
 external-odb.c | 14 ++
 external-odb.h |  6 ++
 odb-helper.c   | 15 +++
 odb-helper.h   |  4 
 4 files changed, 39 insertions(+)

diff --git a/external-odb.c b/external-odb.c
index 1ccfa99a01..42978a3298 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -113,3 +113,17 @@ int external_odb_fetch_object(const unsigned char *sha1)
 
return -1;
 }
+
+int external_odb_for_each_object(each_external_object_fn fn, void *data)
+{
+   struct odb_helper *o;
+
+   external_odb_init();
+
+   for (o = helpers; o; o = o->next) {
+   int r = odb_helper_for_each_object(o, fn, data);
+   if (r)
+   return r;
+   }
+   return 0;
+}
diff --git a/external-odb.h b/external-odb.h
index 2397477684..cea8570a49 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -5,4 +5,10 @@ const char *external_odb_root(void);
 int external_odb_has_object(const unsigned char *sha1);
 int external_odb_fetch_object(const unsigned char *sha1);
 
+typedef int (*each_external_object_fn)(const unsigned char *sha1,
+  enum object_type type,
+  unsigned long size,
+  void *data);
+int external_odb_for_each_object(each_external_object_fn, void *);
+
 #endif /* EXTERNAL_ODB_H */
diff --git a/odb-helper.c b/odb-helper.c
index de5562da9c..d8ef5cbf4b 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -243,3 +243,18 @@ int odb_helper_fetch_object(struct odb_helper *o, const 
unsigned char *sha1,
 
return 0;
 }
+
+int odb_helper_for_each_object(struct odb_helper *o,
+  each_external_object_fn fn,
+  void *data)
+{
+   int i;
+   for (i = 0; i < o->have_nr; i++) {
+   struct odb_helper_object *obj = >have[i];
+   int r = fn(obj->sha1, obj->type, obj->size, data);
+   if (r)
+   return r;
+   }
+
+   return 0;
+}
diff --git a/odb-helper.h b/odb-helper.h
index 0f704f9452..8c3916d215 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -1,6 +1,8 @@
 #ifndef ODB_HELPER_H
 #define ODB_HELPER_H
 
+#include "external-odb.h"
+
 struct odb_helper {
const char *name;
const char *cmd;
@@ -21,5 +23,7 @@ struct odb_helper *odb_helper_new(const char *name, int 
namelen);
 int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1);
 int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1,
int fd);
+int odb_helper_for_each_object(struct odb_helper *o,
+  each_external_object_fn, void *);
 
 #endif /* ODB_HELPER_H */
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 11/49] t0400: add 'put' command to odb-helper script

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0400-external-odb.sh | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index fe85413725..6c6da5cf4f 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -7,6 +7,10 @@ test_description='basic tests for external object databases'
 ALT_SOURCE="$PWD/alt-repo/.git"
 export ALT_SOURCE
 write_script odb-helper <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
 GIT_DIR=$ALT_SOURCE; export GIT_DIR
 case "$1" in
 have)
@@ -16,6 +20,16 @@ have)
 get)
cat "$GIT_DIR"/objects/$(echo $2 | sed 's#..#&/#')
;;
+put)
+   sha1="$2"
+   size="$3"
+   kind="$4"
+   writen=$(git hash-object -w -t "$kind" --stdin)
+   test "$writen" = "$sha1" || die "bad sha1 passed '$sha1' vs writen 
'$writen'"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
 esac
 EOF
 HELPER="\"$PWD\"/odb-helper"
@@ -43,4 +57,13 @@ test_expect_success 'helper can retrieve alt objects' '
test_cmp expect actual
 '
 
+test_expect_success 'helper can add objects to alt repo' '
+   hash=$(echo "Hello odb!" | git hash-object -w -t blob --stdin) &&
+   test -f .git/objects/$(echo $hash | sed "s#..#&/#") &&
+   size=$(git cat-file -s "$hash") &&
+   git cat-file blob "$hash" | ./odb-helper put "$hash" "$size" blob &&
+   alt_size=$(cd alt-repo && git cat-file -s "$hash") &&
+   test "$size" -eq "$alt_size"
+'
+
 test_done
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 17/49] lib-httpd: pass config file to start_httpd()

2017-06-20 Thread Christian Couder
This makes it possible to start an apache web server with different
config files.

This will be used in a later patch to pass a config file that makes
apache store external objects.

Signed-off-by: Christian Couder 
---
 t/lib-httpd.sh | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/t/lib-httpd.sh b/t/lib-httpd.sh
index 435a37465a..2e659a8ee2 100644
--- a/t/lib-httpd.sh
+++ b/t/lib-httpd.sh
@@ -171,12 +171,14 @@ prepare_httpd() {
 }
 
 start_httpd() {
+   APACHE_CONF_FILE=${1-apache.conf}
+
prepare_httpd >&3 2>&4
 
trap 'code=$?; stop_httpd; (exit $code); die' EXIT
 
"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-   -f "$TEST_PATH/apache.conf" $HTTPD_PARA \
+   -f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA \
-c "Listen 127.0.0.1:$LIB_HTTPD_PORT" -k start \
>&3 2>&4
if test $? -ne 0
@@ -191,7 +193,7 @@ stop_httpd() {
trap 'die' EXIT
 
"$LIB_HTTPD_PATH" -d "$HTTPD_ROOT_PATH" \
-   -f "$TEST_PATH/apache.conf" $HTTPD_PARA -k stop
+   -f "$TEST_PATH/$APACHE_CONF_FILE" $HTTPD_PARA -k stop
 }
 
 test_http_push_nonff () {
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 44/49] odb-helper: add have_object_process()

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 odb-helper.c | 103 ---
 1 file changed, 91 insertions(+), 12 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index 2e5d8af526..01cd6a713c 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -579,27 +579,106 @@ static int odb_helper_object_cmp(const void *va, const 
void *vb)
return hashcmp(a->sha1, b->sha1);
 }
 
+static int have_object_process(struct odb_helper *o)
+{
+   int err;
+   struct read_object_process *entry;
+   struct child_process *process;
+   struct strbuf status = STRBUF_INIT;
+   const char *cmd = o->cmd;
+   uint64_t start;
+   char *line;
+   int packet_len;
+   int total_got = 0;
+
+   start = getnanotime();
+
+   entry = launch_read_object_process(cmd);
+   process = >subprocess.process;
+   o->supported_capabilities = entry->supported_capabilities;
+
+   if (!(ODB_HELPER_CAP_HAVE & entry->supported_capabilities))
+   return -1;
+
+   sigchain_push(SIGPIPE, SIG_IGN);
+
+   err = packet_write_fmt_gently(process->in, "command=have\n");
+   if (err)
+   goto done;
+
+   err = packet_flush_gently(process->in);
+   if (err)
+   goto done;
+
+   for (;;) {
+   /* packet_read() writes a '\0' extra byte at the end */
+   char buf[LARGE_PACKET_DATA_MAX + 1];
+   char *p = buf;
+   int more;
+
+   packet_len = packet_read(process->out, NULL, NULL,
+   buf, LARGE_PACKET_DATA_MAX + 1,
+   PACKET_READ_GENTLE_ON_EOF);
+
+   if (packet_len <= 0)
+   break;
+
+   total_got += packet_len;
+
+   do {
+   char *eol = strchrnul(p, '\n');
+   more = (*eol == '\n');
+   *eol = '\0';
+   if (add_have_entry(o, p))
+   break;
+   p = eol + 1;
+   } while (more);
+   }
+
+   if (packet_len < 0) {
+   err = packet_len;
+   goto done;
+   }
+
+   subprocess_read_status(process->out, );
+   err = strcmp(status.buf, "success");
+
+done:
+   sigchain_pop(SIGPIPE);
+
+   err = check_object_process_error(err, status.buf, entry, cmd, 
ODB_HELPER_CAP_HAVE);
+
+   trace_performance_since(start, "have_object_process");
+
+   return err;
+}
+
 static void odb_helper_load_have(struct odb_helper *o)
 {
-   struct odb_helper_cmd cmd;
-   FILE *fh;
-   struct strbuf line = STRBUF_INIT;
 
if (o->have_valid)
return;
o->have_valid = 1;
 
-   if (odb_helper_start(o, , 0, "have") < 0)
-   return;
+   if (o->script_mode) {
+   struct odb_helper_cmd cmd;
+   FILE *fh;
+   struct strbuf line = STRBUF_INIT;
 
-   fh = xfdopen(cmd.child.out, "r");
-   while (strbuf_getline(, fh) != EOF)
-   if (add_have_entry(o, line.buf))
-   break;
+   if (odb_helper_start(o, , 0, "have") < 0)
+   return;
 
-   strbuf_release();
-   fclose(fh);
-   odb_helper_finish(o, );
+   fh = xfdopen(cmd.child.out, "r");
+   while (strbuf_getline(, fh) != EOF)
+   if (add_have_entry(o, line.buf))
+   break;
+
+   strbuf_release();
+   fclose(fh);
+   odb_helper_finish(o, );
+   } else {
+   have_object_process(o);
+   }
 
qsort(o->have, o->have_nr, sizeof(*o->have), odb_helper_object_cmp);
 }
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 23/49] t0420: add test with HTTP external odb

2017-06-20 Thread Christian Couder
This tests that an apache web server can be used as an
external object database and store files in their native
format instead of converting them to a Git object.

Signed-off-by: Christian Couder 
---
 t/t0420-transfer-http-e-odb.sh | 150 +
 1 file changed, 150 insertions(+)
 create mode 100755 t/t0420-transfer-http-e-odb.sh

diff --git a/t/t0420-transfer-http-e-odb.sh b/t/t0420-transfer-http-e-odb.sh
new file mode 100755
index 00..716d722e97
--- /dev/null
+++ b/t/t0420-transfer-http-e-odb.sh
@@ -0,0 +1,150 @@
+#!/bin/sh
+
+test_description='tests for transfering external objects to an HTTPD server'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+# odb helper script must see this
+export HTTPD_URL
+
+write_script odb-http-helper <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
+echo >&2 "odb-http-helper args:" "$@"
+case "$1" in
+have)
+   list_url="$HTTPD_URL/list/"
+   curl "$list_url" ||
+   die "curl '$list_url' failed"
+   ;;
+get)
+   get_url="$HTTPD_URL/list/?sha1=$2"
+   curl "$get_url" ||
+   die "curl '$get_url' failed"
+   ;;
+put)
+   sha1="$2"
+   size="$3"
+   kind="$4"
+   upload_url="$HTTPD_URL/upload/?sha1=$sha1=$size=$kind"
+   curl --data-binary @- --include "$upload_url" >out ||
+   die "curl '$upload_url' failed"
+   ref_hash=$(echo "$sha1 $size $kind" | GIT_NO_EXTERNAL_ODB=1 git 
hash-object -w -t blob --stdin) || exit
+   git update-ref refs/odbs/magic/"$sha1" "$ref_hash"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-http-helper"
+
+
+test_expect_success 'setup repo with a root commit and the helper' '
+   test_commit zero &&
+   git config odb.magic.command "$HELPER" &&
+   git config odb.magic.plainObjects "true"
+'
+
+test_expect_success 'setup another repo from the first one' '
+   git init other-repo &&
+   (cd other-repo &&
+git remote add origin .. &&
+git pull origin master &&
+git checkout master &&
+git log)
+'
+
+UPLOADFILENAME="hello_apache_upload.txt"
+
+UPLOAD_URL="$HTTPD_URL/upload/?sha1=$UPLOADFILENAME=123=blob"
+
+test_expect_success 'can upload a file' '
+   echo "Hello Apache World!" >hello_to_send.txt &&
+   echo "How are you?" >>hello_to_send.txt &&
+   curl --data-binary @hello_to_send.txt --include "$UPLOAD_URL" 
>out_upload
+'
+
+LIST_URL="$HTTPD_URL/list/"
+
+test_expect_success 'can list uploaded files' '
+   curl --include "$LIST_URL" >out_list &&
+   grep "$UPLOADFILENAME" out_list
+'
+
+test_expect_success 'can delete uploaded files' '
+   curl --data "delete" --include "$UPLOAD_URL=1" >out_delete &&
+   curl --include "$LIST_URL" >out_list2 &&
+   ! grep "$UPLOADFILENAME" out_list2
+'
+
+FILES_DIR="httpd/www/files"
+
+test_expect_success 'new blobs are transfered to the http server' '
+   test_commit one &&
+   hash1=$(git ls-tree HEAD | grep one.t | cut -f1 | cut -d\  -f3) &&
+   echo "$hash1-4-blob" >expected &&
+   ls "$FILES_DIR" >actual &&
+   test_cmp expected actual
+'
+
+test_expect_success 'blobs can be retrieved from the http server' '
+   git cat-file blob "$hash1" &&
+   git log -p >expected
+'
+
+test_expect_success 'update other repo from the first one' '
+   (cd other-repo &&
+git fetch origin "refs/odbs/magic/*:refs/odbs/magic/*" &&
+test_must_fail git cat-file blob "$hash1" &&
+git config odb.magic.command "$HELPER" &&
+git config odb.magic.plainObjects "true" &&
+git cat-file blob "$hash1" &&
+git pull origin master)
+'
+
+test_expect_success 'local clone from the first repo' '
+   mkdir my-clone &&
+   (cd my-clone &&
+git clone .. . &&
+git cat-file blob "$hash1")
+'
+
+test_expect_success 'no-local clone from the first repo fails' '
+   mkdir my-other-clone &&
+   (cd my-other-clone &&
+test_must_fail git clone --no-local .. .) &&
+   rm -rf my-other-clone
+'
+
+test_expect_success 'no-local clone from the first repo with helper succeeds' '
+   mkdir my-other-clone &&
+   (cd my-other-clone &&
+git clone -c odb.magic.command="$HELPER" \
+   -c odb.magic.plainObjects="true" \
+   --no-local .. .) &&
+   rm -rf my-other-clone
+'
+
+test_expect_success 'no-local initial-refspec clone succeeds' '
+   mkdir my-other-clone &&
+   (cd my-other-clone &&
+git config odb.magic.command "$HELPER" &&
+git config odb.magic.plainObjects "true" &&
+git -c odb.magic.command="$HELPER" -c odb.magic.plainObjects="true" \
+   

[RFC/PATCH v4 36/49] odb-helper: add read_packetized_git_object_to_fd()

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 odb-helper.c | 84 +++-
 1 file changed, 78 insertions(+), 6 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index 0017faa36e..a27208463c 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -142,6 +142,82 @@ static int check_object_process_error(int err,
return err;
 }
 
+static ssize_t read_packetized_git_object_to_fd(struct odb_helper *o,
+   const unsigned char *sha1,
+   int fd_in, int fd_out)
+{
+   ssize_t total_read = 0;
+   unsigned long total_got = 0;
+   int packet_len;
+   git_zstream stream;
+   int zret = Z_STREAM_END;
+   git_SHA_CTX hash;
+   unsigned char real_sha1[20];
+
+   memset(, 0, sizeof(stream));
+   git_inflate_init();
+   git_SHA1_Init();
+
+   for (;;) {
+   /* packet_read() writes a '\0' extra byte at the end */
+   char buf[LARGE_PACKET_DATA_MAX + 1];
+
+   packet_len = packet_read(fd_in, NULL, NULL,
+   buf, LARGE_PACKET_DATA_MAX + 1,
+   PACKET_READ_GENTLE_ON_EOF);
+
+   if (packet_len <= 0)
+   break;
+
+   write_or_die(fd_out, buf, packet_len);
+
+   stream.next_in = (unsigned char *)buf;
+   stream.avail_in = packet_len;
+   do {
+   unsigned char inflated[4096];
+   unsigned long got;
+
+   stream.next_out = inflated;
+   stream.avail_out = sizeof(inflated);
+   zret = git_inflate(, Z_SYNC_FLUSH);
+   got = sizeof(inflated) - stream.avail_out;
+
+   git_SHA1_Update(, inflated, got);
+   /* skip header when counting size */
+   if (!total_got) {
+   const unsigned char *p = memchr(inflated, '\0', 
got);
+   if (p)
+   got -= p - inflated + 1;
+   else
+   got = 0;
+   }
+   total_got += got;
+   } while (stream.avail_in && zret == Z_OK);
+
+   total_read += packet_len;
+   }
+
+   git_inflate_end();
+
+   if (packet_len < 0)
+   return packet_len;
+
+   git_SHA1_Final(real_sha1, );
+
+   if (zret != Z_STREAM_END) {
+   warning("bad zlib data from odb helper '%s' for %s",
+   o->name, sha1_to_hex(sha1));
+   return -1;
+   }
+   if (hashcmp(real_sha1, sha1)) {
+   warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+   o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+   return -1;
+   }
+
+   return total_read;
+}
+
 static int read_object_process(struct odb_helper *o, const unsigned char 
*sha1, int fd)
 {
int err;
@@ -174,12 +250,8 @@ static int read_object_process(struct odb_helper *o, const 
unsigned char *sha1,
if (err)
goto done;
 
-   if (o->fetch_kind != ODB_FETCH_KIND_FAULT_IN) {
-   struct strbuf buf;
-   read_packetized_to_strbuf(process->out, );
-   if (err)
-   goto done;
-   }
+   if (o->fetch_kind != ODB_FETCH_KIND_FAULT_IN)
+   err = read_packetized_git_object_to_fd(o, sha1, process->out, 
fd) < 0;
 
subprocess_read_status(process->out, );
err = strcmp(status.buf, "success");
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 21/49] odb-helper: add 'store_plain_objects' to 'struct odb_helper'

2017-06-20 Thread Christian Couder
This adds a configuration option odb..plainObjects and the
corresponding boolean variable called 'store_plain_objects' in
'struct odb_helper' to make it possible for external object
databases to store object as plain objects instead of Git objects.

The existing odb_helper_fetch_object() is renamed
odb_helper_fetch_git_object() and a new odb_helper_fetch_plain_object()
is introduce to deal with external objects that are not in Git format.

Signed-off-by: Christian Couder 
---
 external-odb.c |   2 +
 odb-helper.c   | 113 -
 odb-helper.h   |   1 +
 3 files changed, 114 insertions(+), 2 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index a88837feda..d11fc98719 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -36,6 +36,8 @@ static int external_odb_config(const char *var, const char 
*value, void *data)
 
if (!strcmp(key, "command"))
return git_config_string(>cmd, var, value);
+   if (!strcmp(key, "plainobjects"))
+   o->store_plain_objects = git_config_bool(var, value);
 
return 0;
 }
diff --git a/odb-helper.c b/odb-helper.c
index af7cc55ca2..b33ee81c97 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -159,8 +159,107 @@ int odb_helper_has_object(struct odb_helper *o, const 
unsigned char *sha1)
return !!odb_helper_lookup(o, sha1);
 }
 
-int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1,
-   int fd)
+static int odb_helper_fetch_plain_object(struct odb_helper *o,
+const unsigned char *sha1,
+int fd)
+{
+   struct odb_helper_object *obj;
+   struct odb_helper_cmd cmd;
+   unsigned long total_got = 0;
+
+   char hdr[32];
+   int hdrlen;
+
+   int ret = Z_STREAM_END;
+   unsigned char compressed[4096];
+   git_zstream stream;
+   git_SHA_CTX hash;
+   unsigned char real_sha1[20];
+
+   obj = odb_helper_lookup(o, sha1);
+   if (!obj)
+   return -1;
+
+   if (odb_helper_start(o, , 0, "get %s", sha1_to_hex(sha1)) < 0)
+   return -1;
+
+   /* Set it up */
+   git_deflate_init(, zlib_compression_level);
+   stream.next_out = compressed;
+   stream.avail_out = sizeof(compressed);
+   git_SHA1_Init();
+
+   /* First header.. */
+   hdrlen = xsnprintf(hdr, sizeof(hdr), "%s %lu", typename(obj->type), 
obj->size) + 1;
+   stream.next_in = (unsigned char *)hdr;
+   stream.avail_in = hdrlen;
+   while (git_deflate(, 0) == Z_OK)
+   ; /* nothing */
+   git_SHA1_Update(, hdr, hdrlen);
+
+   for (;;) {
+   unsigned char buf[4096];
+   int r;
+
+   r = xread(cmd.child.out, buf, sizeof(buf));
+   if (r < 0) {
+   error("unable to read from odb helper '%s': %s",
+ o->name, strerror(errno));
+   close(cmd.child.out);
+   odb_helper_finish(o, );
+   git_deflate_end();
+   return -1;
+   }
+   if (r == 0)
+   break;
+
+   total_got += r;
+
+   /* Then the data itself.. */
+   stream.next_in = (void *)buf;
+   stream.avail_in = r;
+   do {
+   unsigned char *in0 = stream.next_in;
+   ret = git_deflate(, Z_FINISH);
+   git_SHA1_Update(, in0, stream.next_in - in0);
+   write_or_die(fd, compressed, stream.next_out - 
compressed);
+   stream.next_out = compressed;
+   stream.avail_out = sizeof(compressed);
+   } while (ret == Z_OK);
+   }
+
+   close(cmd.child.out);
+   if (ret != Z_STREAM_END) {
+   warning("bad zlib data from odb helper '%s' for %s",
+   o->name, sha1_to_hex(sha1));
+   return -1;
+   }
+   ret = git_deflate_end_gently();
+   if (ret != Z_OK) {
+   warning("deflateEnd on object %s from odb helper '%s' failed 
(%d)",
+   sha1_to_hex(sha1), o->name, ret);
+   return -1;
+   }
+   git_SHA1_Final(real_sha1, );
+   if (hashcmp(sha1, real_sha1)) {
+   warning("sha1 mismatch from odb helper '%s' for %s (got %s)",
+   o->name, sha1_to_hex(sha1), sha1_to_hex(real_sha1));
+   return -1;
+   }
+   if (odb_helper_finish(o, ))
+   return -1;
+   if (total_got != obj->size) {
+   warning("size mismatch from odb helper '%s' for %s (%lu != 
%lu)",
+   o->name, sha1_to_hex(sha1), total_got, obj->size);
+   return -1;
+   }
+
+   return 0;
+}
+
+static int 

[RFC/PATCH v4 25/49] external-odb: add external_odb_fault_in_object()

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 external-odb.c | 21 -
 external-odb.h |  1 +
 odb-helper.c   |  7 +++
 odb-helper.h   |  1 +
 4 files changed, 25 insertions(+), 5 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 0b6e443372..502380cac2 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -113,7 +113,8 @@ int external_odb_fetch_object(const unsigned char *sha1)
int ret;
int fd;
 
-   if (!odb_helper_has_object(o, sha1))
+   if (o->fetch_kind != ODB_FETCH_KIND_PLAIN_OBJECT &&
+   o->fetch_kind != ODB_FETCH_KIND_GIT_OBJECT)
continue;
 
fd = create_object_tmpfile(, path);
@@ -139,6 +140,24 @@ int external_odb_fetch_object(const unsigned char *sha1)
return -1;
 }
 
+int external_odb_fault_in_object(const unsigned char *sha1)
+{
+   struct odb_helper *o;
+
+   if (!external_odb_has_object(sha1))
+   return -1;
+
+   for (o = helpers; o; o = o->next) {
+   if (o->fetch_kind != ODB_FETCH_KIND_FAULT_IN)
+   continue;
+   if (odb_helper_fault_in_object(o, sha1) < 0)
+   continue;
+   return 0;
+   }
+
+   return -1;
+}
+
 int external_odb_for_each_object(each_external_object_fn fn, void *data)
 {
struct odb_helper *o;
diff --git a/external-odb.h b/external-odb.h
index 53879e900d..1b46c49e25 100644
--- a/external-odb.h
+++ b/external-odb.h
@@ -4,6 +4,7 @@
 const char *external_odb_root(void);
 int external_odb_has_object(const unsigned char *sha1);
 int external_odb_fetch_object(const unsigned char *sha1);
+int external_odb_fault_in_object(const unsigned char *sha1);
 
 typedef int (*each_external_object_fn)(const unsigned char *sha1,
   enum object_type type,
diff --git a/odb-helper.c b/odb-helper.c
index 24dc5375cb..5fb56c6135 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -347,9 +347,8 @@ static int odb_helper_fetch_git_object(struct odb_helper *o,
return 0;
 }
 
-static int odb_helper_fetch_fault_in(struct odb_helper *o,
-const unsigned char *sha1,
-int fd)
+int odb_helper_fault_in_object(struct odb_helper *o,
+  const unsigned char *sha1)
 {
struct odb_helper_object *obj;
struct odb_helper_cmd cmd;
@@ -377,7 +376,7 @@ int odb_helper_fetch_object(struct odb_helper *o,
case ODB_FETCH_KIND_GIT_OBJECT:
return odb_helper_fetch_git_object(o, sha1, fd);
case ODB_FETCH_KIND_FAULT_IN:
-   return odb_helper_fetch_fault_in(o, sha1, fd);
+   return 0;
default:
BUG("invalid fetch kind '%d'", o->fetch_kind);
}
diff --git a/odb-helper.h b/odb-helper.h
index e3ad8e3316..2dc6d96c40 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -30,6 +30,7 @@ struct odb_helper *odb_helper_new(const char *name, int 
namelen);
 int odb_helper_has_object(struct odb_helper *o, const unsigned char *sha1);
 int odb_helper_fetch_object(struct odb_helper *o, const unsigned char *sha1,
int fd);
+int odb_helper_fault_in_object(struct odb_helper *o, const unsigned char 
*sha1);
 int odb_helper_for_each_object(struct odb_helper *o,
   each_external_object_fn, void *);
 int odb_helper_write_object(struct odb_helper *o,
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 09/49] Add initial external odb support

2017-06-20 Thread Christian Couder
From: Jeff King 

Signed-off-by: Christian Couder 
---
 Makefile|   2 +
 cache.h |   9 ++
 external-odb.c  | 115 +++
 external-odb.h  |   8 ++
 odb-helper.c| 245 
 odb-helper.h|  25 +
 sha1_file.c |  79 +++-
 t/t0400-external-odb.sh |  46 +
 8 files changed, 507 insertions(+), 22 deletions(-)
 create mode 100644 external-odb.c
 create mode 100644 external-odb.h
 create mode 100644 odb-helper.c
 create mode 100644 odb-helper.h
 create mode 100755 t/t0400-external-odb.sh

diff --git a/Makefile b/Makefile
index f484801638..b488874d60 100644
--- a/Makefile
+++ b/Makefile
@@ -776,6 +776,7 @@ LIB_OBJS += ewah/ewah_bitmap.o
 LIB_OBJS += ewah/ewah_io.o
 LIB_OBJS += ewah/ewah_rlw.o
 LIB_OBJS += exec_cmd.o
+LIB_OBJS += external-odb.o
 LIB_OBJS += fetch-pack.o
 LIB_OBJS += fsck.o
 LIB_OBJS += gettext.o
@@ -808,6 +809,7 @@ LIB_OBJS += notes-cache.o
 LIB_OBJS += notes-merge.o
 LIB_OBJS += notes-utils.o
 LIB_OBJS += object.o
+LIB_OBJS += odb-helper.o
 LIB_OBJS += oidset.o
 LIB_OBJS += pack-bitmap.o
 LIB_OBJS += pack-bitmap-write.o
diff --git a/cache.h b/cache.h
index d6ba8a2f11..391a69e9c5 100644
--- a/cache.h
+++ b/cache.h
@@ -954,6 +954,12 @@ const char *git_path_shallow(void);
  */
 extern const char *sha1_file_name(const unsigned char *sha1);
 
+/*
+ * Like sha1_file_name, but return the filename within a specific alternate
+ * object directory. Shares the same static buffer with sha1_file_name.
+ */
+extern const char *sha1_file_name_alt(const char *objdir, const unsigned char 
*sha1);
+
 /*
  * Return the name of the (local) packfile with the specified sha1 in
  * its name.  The return value is a pointer to memory that is
@@ -1265,6 +1271,8 @@ extern int do_check_packed_object_crc;
 
 extern int check_sha1_signature(const unsigned char *sha1, void *buf, unsigned 
long size, const char *type);
 
+extern int create_object_tmpfile(struct strbuf *tmp, const char *filename);
+extern void close_sha1_file(int fd);
 extern int finalize_object_file(const char *tmpfile, const char *filename);
 
 extern int has_sha1_pack(const unsigned char *sha1);
@@ -1600,6 +1608,7 @@ extern void read_info_alternates(const char * 
relative_base, int depth);
 extern char *compute_alternate_path(const char *path, struct strbuf *err);
 typedef int alt_odb_fn(struct alternate_object_database *, void *);
 extern int foreach_alt_odb(alt_odb_fn, void*);
+extern void prepare_external_alt_odb(void);
 
 /*
  * Allocate a "struct alternate_object_database" but do _not_ actually
diff --git a/external-odb.c b/external-odb.c
new file mode 100644
index 00..1ccfa99a01
--- /dev/null
+++ b/external-odb.c
@@ -0,0 +1,115 @@
+#include "cache.h"
+#include "external-odb.h"
+#include "odb-helper.h"
+
+static struct odb_helper *helpers;
+static struct odb_helper **helpers_tail = 
+
+static struct odb_helper *find_or_create_helper(const char *name, int len)
+{
+   struct odb_helper *o;
+
+   for (o = helpers; o; o = o->next)
+   if (!strncmp(o->name, name, len) && !o->name[len])
+   return o;
+
+   o = odb_helper_new(name, len);
+   *helpers_tail = o;
+   helpers_tail = >next;
+
+   return o;
+}
+
+static int external_odb_config(const char *var, const char *value, void *data)
+{
+   struct odb_helper *o;
+   const char *key, *dot;
+
+   if (!skip_prefix(var, "odb.", ))
+   return 0;
+   dot = strrchr(key, '.');
+   if (!dot)
+   return 0;
+
+   o = find_or_create_helper(key, dot - key);
+   key = dot + 1;
+
+   if (!strcmp(key, "command"))
+   return git_config_string(>cmd, var, value);
+
+   return 0;
+}
+
+static void external_odb_init(void)
+{
+   static int initialized;
+
+   if (initialized)
+   return;
+   initialized = 1;
+
+   git_config(external_odb_config, NULL);
+}
+
+const char *external_odb_root(void)
+{
+   static const char *root;
+   if (!root)
+   root = git_pathdup("objects/external");
+   return root;
+}
+
+int external_odb_has_object(const unsigned char *sha1)
+{
+   struct odb_helper *o;
+
+   external_odb_init();
+
+   for (o = helpers; o; o = o->next)
+   if (odb_helper_has_object(o, sha1))
+   return 1;
+   return 0;
+}
+
+int external_odb_fetch_object(const unsigned char *sha1)
+{
+   struct odb_helper *o;
+   const char *path;
+
+   if (!external_odb_has_object(sha1))
+   return -1;
+
+   path = sha1_file_name_alt(external_odb_root(), sha1);
+   safe_create_leading_directories_const(path);
+   prepare_external_alt_odb();
+
+   for (o = helpers; o; o = o->next) {
+   struct strbuf tmpfile = STRBUF_INIT;
+   int ret;
+   

[RFC/PATCH v4 22/49] pack-objects: don't pack objects in external odbs

2017-06-20 Thread Christian Couder
Objects managed by an external ODB should not be put into
pack files.

Signed-off-by: Christian Couder 
---
 builtin/pack-objects.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/builtin/pack-objects.c b/builtin/pack-objects.c
index f672225def..e423f685ff 100644
--- a/builtin/pack-objects.c
+++ b/builtin/pack-objects.c
@@ -24,6 +24,7 @@
 #include "sha1-array.h"
 #include "argv-array.h"
 #include "mru.h"
+#include "external-odb.h"
 
 static const char *pack_usage[] = {
N_("git pack-objects --stdout [...] [<  | < 
]"),
@@ -1011,6 +1012,9 @@ static int want_object_in_pack(const unsigned char *sha1,
return want;
}
 
+   if (external_odb_has_object(sha1))
+   return 0;
+
for (entry = packed_git_mru->head; entry; entry = entry->next) {
struct packed_git *p = entry->item;
off_t offset;
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 39/49] odb-helper: add write_object_process()

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 odb-helper.c | 76 +---
 1 file changed, 73 insertions(+), 3 deletions(-)

diff --git a/odb-helper.c b/odb-helper.c
index b2d86a7928..e21113c0b8 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -383,6 +383,65 @@ static int read_object_process(struct odb_helper *o, const 
unsigned char *sha1,
return err;
 }
 
+static int write_object_process(struct odb_helper *o,
+   const void *buf, size_t len,
+   const char *type, unsigned char *sha1)
+{
+   int err;
+   struct read_object_process *entry;
+   struct child_process *process;
+   struct strbuf status = STRBUF_INIT;
+   const char *cmd = o->cmd;
+   uint64_t start;
+
+   start = getnanotime();
+
+   entry = launch_read_object_process(cmd);
+   process = >subprocess.process;
+   o->supported_capabilities = entry->supported_capabilities;
+
+   if (!(ODB_HELPER_CAP_PUT & entry->supported_capabilities))
+   return -1;
+
+   sigchain_push(SIGPIPE, SIG_IGN);
+
+   err = packet_write_fmt_gently(process->in, "command=put\n");
+   if (err)
+   goto done;
+
+   err = packet_write_fmt_gently(process->in, "sha1=%s\n", 
sha1_to_hex(sha1));
+   if (err)
+   goto done;
+
+   err = packet_write_fmt_gently(process->in, "size=%"PRIuMAX"\n", len);
+   if (err)
+   goto done;
+
+   err = packet_write_fmt_gently(process->in, "kind=blob\n");
+   if (err)
+   goto done;
+
+   err = packet_flush_gently(process->in);
+   if (err)
+   goto done;
+
+   err = write_packetized_from_buf(buf, len, process->in);
+   if (err)
+   goto done;
+
+   subprocess_read_status(process->out, );
+   err = strcmp(status.buf, "success");
+
+done:
+   sigchain_pop(SIGPIPE);
+
+   err = check_object_process_error(err, status.buf, entry, cmd, 
ODB_HELPER_CAP_PUT);
+
+   trace_performance_since(start, "write_object_process");
+
+   return err;
+}
+
 struct odb_helper *odb_helper_new(const char *name, int namelen)
 {
struct odb_helper *o;
@@ -804,9 +863,9 @@ int odb_helper_for_each_object(struct odb_helper *o,
return 0;
 }
 
-int odb_helper_write_object(struct odb_helper *o,
-   const void *buf, size_t len,
-   const char *type, unsigned char *sha1)
+int odb_helper_write_plain_object(struct odb_helper *o,
+ const void *buf, size_t len,
+ const char *type, unsigned char *sha1)
 {
struct odb_helper_cmd cmd;
 
@@ -832,3 +891,14 @@ int odb_helper_write_object(struct odb_helper *o,
odb_helper_finish(o, );
return 0;
 }
+
+int odb_helper_write_object(struct odb_helper *o,
+   const void *buf, size_t len,
+   const char *type, unsigned char *sha1)
+{
+   if (o->script_mode) {
+   return odb_helper_write_plain_object(o, buf, len, type, sha1);
+   } else {
+   return write_object_process(o, buf, len, type, sha1);
+   }
+}
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 28/49] contrib: add long-running-read-object/example.pl

2017-06-20 Thread Christian Couder
From: Ben Peart 

Signed-off-by: Ben Peart 
Signed-off-by: Christian Couder 
---
 contrib/long-running-read-object/example.pl | 114 
 1 file changed, 114 insertions(+)
 create mode 100644 contrib/long-running-read-object/example.pl

diff --git a/contrib/long-running-read-object/example.pl 
b/contrib/long-running-read-object/example.pl
new file mode 100644
index 00..6587333b87
--- /dev/null
+++ b/contrib/long-running-read-object/example.pl
@@ -0,0 +1,114 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#  cut -d' ' -f1 | git pack-objects /e/guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard 
+#
+# Please note, this sample is a minimal skeleton. No proper error handling
+# was implemented.
+#
+
+use strict;
+use warnings;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = "/host_repo/.git/";
+
+sub packet_bin_read {
+   my $buffer;
+   my $bytes_read = read STDIN, $buffer, 4;
+   if ( $bytes_read == 0 ) {
+
+   # EOF - Git stopped talking to us!
+   exit();
+   }
+   elsif ( $bytes_read != 4 ) {
+   die "invalid packet: '$buffer'";
+   }
+   my $pkt_size = hex($buffer);
+   if ( $pkt_size == 0 ) {
+   return ( 1, "" );
+   }
+   elsif ( $pkt_size > 4 ) {
+   my $content_size = $pkt_size - 4;
+   $bytes_read = read STDIN, $buffer, $content_size;
+   if ( $bytes_read != $content_size ) {
+   die "invalid packet ($content_size bytes expected; 
$bytes_read bytes read)";
+   }
+   return ( 0, $buffer );
+   }
+   else {
+   die "invalid packet size: $pkt_size";
+   }
+}
+
+sub packet_txt_read {
+   my ( $res, $buf ) = packet_bin_read();
+   unless ( $buf =~ s/\n$// ) {
+   die "A non-binary line MUST be terminated by an LF.";
+   }
+   return ( $res, $buf );
+}
+
+sub packet_bin_write {
+   my $buf = shift;
+   print STDOUT sprintf( "%04x", length($buf) + 4 );
+   print STDOUT $buf;
+   STDOUT->flush();
+}
+
+sub packet_txt_write {
+   packet_bin_write( $_[0] . "\n" );
+}
+
+sub packet_flush {
+   print STDOUT sprintf( "%04x", 0 );
+   STDOUT->flush();
+}
+
+( packet_txt_read() eq ( 0, "git-read-object-client" ) ) || die "bad 
initialize";
+( packet_txt_read() eq ( 0, "version=1" ) ) || die 
"bad version";
+( packet_bin_read() eq ( 1, "" ) )   || die "bad version 
end";
+
+packet_txt_write("git-read-object-server");
+packet_txt_write("version=1");
+packet_flush();
+
+( packet_txt_read() eq ( 0, "capability=get" ) )|| die "bad capability";
+( packet_bin_read() eq ( 1, "" ) )  || die "bad capability 
end";
+
+packet_txt_write("capability=get");
+packet_flush();
+
+while (1) {
+   my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
+
+   if ( $command eq "get" ) {
+   my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+   packet_bin_read();
+
+   system ('git --git-dir="' . $DIR . '" cat-file blob ' . $sha1 . 
' | git -c core.virtualizeobjects=false hash-object -w --stdin >/dev/null 
2>&1');
+   packet_txt_write(($?) ? "status=error" : "status=success");
+   packet_flush();
+   } else {
+   die "bad command '$command'";
+   }
+}
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 20/49] lib-httpd: add apache-e-odb.conf

2017-06-20 Thread Christian Couder
This is an apache config file to test external object databases.
It uses the upload.sh and list.sh cgi that have been added
previously to make apache store external objects.

Signed-off-by: Christian Couder 
---
 t/lib-httpd/apache-e-odb.conf | 214 ++
 1 file changed, 214 insertions(+)
 create mode 100644 t/lib-httpd/apache-e-odb.conf

diff --git a/t/lib-httpd/apache-e-odb.conf b/t/lib-httpd/apache-e-odb.conf
new file mode 100644
index 00..19a1540c82
--- /dev/null
+++ b/t/lib-httpd/apache-e-odb.conf
@@ -0,0 +1,214 @@
+ServerName dummy
+PidFile httpd.pid
+DocumentRoot www
+LogFormat "%h %l %u %t \"%r\" %>s %b" common
+CustomLog access.log common
+ErrorLog error.log
+
+   LoadModule log_config_module modules/mod_log_config.so
+
+
+   LoadModule alias_module modules/mod_alias.so
+
+
+   LoadModule cgi_module modules/mod_cgi.so
+
+
+   LoadModule env_module modules/mod_env.so
+
+
+   LoadModule rewrite_module modules/mod_rewrite.so
+
+
+   LoadModule version_module modules/mod_version.so
+
+
+   LoadModule headers_module modules/mod_headers.so
+
+
+
+LockFile accept.lock
+
+
+
+
+   LoadModule auth_module modules/mod_auth.so
+
+
+
+= 2.1>
+
+   LoadModule auth_basic_module modules/mod_auth_basic.so
+
+
+   LoadModule authn_file_module modules/mod_authn_file.so
+
+
+   LoadModule authz_user_module modules/mod_authz_user.so
+
+
+   LoadModule authz_host_module modules/mod_authz_host.so
+
+
+
+= 2.4>
+
+   LoadModule authn_core_module modules/mod_authn_core.so
+
+
+   LoadModule authz_core_module modules/mod_authz_core.so
+
+
+   LoadModule access_compat_module modules/mod_access_compat.so
+
+
+   LoadModule mpm_prefork_module modules/mod_mpm_prefork.so
+
+
+   LoadModule unixd_module modules/mod_unixd.so
+
+
+
+PassEnv GIT_VALGRIND
+PassEnv GIT_VALGRIND_OPTIONS
+PassEnv GNUPGHOME
+PassEnv ASAN_OPTIONS
+PassEnv GIT_TRACE
+PassEnv GIT_CONFIG_NOSYSTEM
+
+Alias /dumb/ www/
+Alias /auth/dumb/ www/auth/dumb/
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+   SetEnv GIT_COMMITTER_NAME "Custom User"
+   SetEnv GIT_COMMITTER_EMAIL cus...@example.com
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+   SetEnv GIT_NAMESPACE ns
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+   Header set Set-Cookie name=value
+
+
+   SetEnv GIT_EXEC_PATH ${GIT_EXEC_PATH}
+   SetEnv GIT_HTTP_EXPORT_ALL
+
+ScriptAlias /upload/ upload.sh/
+ScriptAlias /list/ list.sh/
+
+   Options FollowSymlinks
+
+
+  Options ExecCGI
+
+
+  Options ExecCGI
+
+
+   Options ExecCGI
+
+
+RewriteEngine on
+RewriteRule ^/smart-redir-perm/(.*)$ /smart/$1 [R=301]
+RewriteRule ^/smart-redir-temp/(.*)$ /smart/$1 [R=302]
+RewriteRule ^/smart-redir-auth/(.*)$ /auth/smart/$1 [R=301]
+RewriteRule ^/smart-redir-limited/(.*)/info/refs$ /smart/$1/info/refs [R=301]
+RewriteRule ^/ftp-redir/(.*)$ ftp://localhost:1000/$1 [R=302]
+
+RewriteRule ^/loop-redir/x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-x-(.*) /$1 
[R=302]
+RewriteRule ^/loop-redir/(.*)$ /loop-redir/x-$1 [R=302]
+
+# Apache 2.2 does not understand , so we use RewriteCond.
+# And as RewriteCond does not allow testing for non-matches, we match
+# the desired case first (one has abra, two has cadabra), and let it
+# pass by marking the RewriteRule as [L], "last rule, do not process
+# any other matching RewriteRules after this"), and then have another
+# RewriteRule that matches all other cases and lets them fail via '[F]',
+# "fail the request".
+RewriteCond %{HTTP:x-magic-one} =abra
+RewriteCond %{HTTP:x-magic-two} =cadabra
+RewriteRule ^/smart_headers/.* - [L]
+RewriteRule ^/smart_headers/.* - [F]
+
+
+LoadModule ssl_module modules/mod_ssl.so
+
+SSLCertificateFile httpd.pem
+SSLCertificateKeyFile httpd.pem
+SSLRandomSeed startup file:/dev/urandom 512
+SSLRandomSeed connect file:/dev/urandom 512
+SSLSessionCache none
+SSLMutex file:ssl_mutex
+SSLEngine On
+
+
+
+   AuthType Basic
+   AuthName "git-auth"
+   AuthUserFile passwd
+   Require valid-user
+
+
+
+   AuthType Basic
+   AuthName "git-auth"
+   AuthUserFile passwd
+   Require valid-user
+
+
+
+   AuthType Basic
+   AuthName "git-auth"
+   AuthUserFile passwd
+   Require valid-user
+
+
+RewriteCond %{QUERY_STRING} service=git-receive-pack [OR]
+RewriteCond %{REQUEST_URI} /git-receive-pack$
+RewriteRule ^/half-auth-complete/ - [E=AUTHREQUIRED:yes]
+
+
+  Order Deny,Allow
+  Deny from env=AUTHREQUIRED
+
+  AuthType Basic
+  AuthName "Git Access"
+  AuthUserFile passwd
+  Require valid-user
+  Satisfy Any
+
+
+
+   LoadModule dav_module modules/mod_dav.so
+   LoadModule dav_fs_module 

[RFC/PATCH v4 35/49] Add t0460 to test passing git objects

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0460-read-object-git.sh | 29 
 t/t0460/read-object-git| 67 ++
 2 files changed, 96 insertions(+)
 create mode 100755 t/t0460-read-object-git.sh
 create mode 100755 t/t0460/read-object-git

diff --git a/t/t0460-read-object-git.sh b/t/t0460-read-object-git.sh
new file mode 100755
index 00..d08b44cdce
--- /dev/null
+++ b/t/t0460-read-object-git.sh
@@ -0,0 +1,29 @@
+#!/bin/sh
+
+test_description='tests for long running read-object process passing git 
objects'
+
+. ./test-lib.sh
+
+PATH="$PATH:$TEST_DIRECTORY/t0460"
+
+test_expect_success 'setup host repo with a root commit' '
+   test_commit zero &&
+   hash1=$(git ls-tree HEAD | grep zero.t | cut -f1 | cut -d\  -f3)
+'
+
+HELPER="read-object-git"
+
+test_expect_success 'blobs can be retrieved from the host repo' '
+   git init guest-repo &&
+   (cd guest-repo &&
+git config odb.magic.command "$HELPER" &&
+git config odb.magic.fetchKind "gitObject" &&
+git cat-file blob "$hash1")
+'
+
+test_expect_success 'invalid blobs generate errors' '
+   cd guest-repo &&
+   test_must_fail git cat-file blob "invalid"
+'
+
+test_done
diff --git a/t/t0460/read-object-git b/t/t0460/read-object-git
new file mode 100755
index 00..356a22cd4c
--- /dev/null
+++ b/t/t0460/read-object-git
@@ -0,0 +1,67 @@
+#!/usr/bin/perl
+#
+# Example implementation for the Git read-object protocol version 1
+# See Documentation/technical/read-object-protocol.txt
+#
+# Allows you to test the ability for blobs to be pulled from a host git repo
+# "on demand."  Called when git needs a blob it couldn't find locally due to
+# a lazy clone that only cloned the commits and trees.
+#
+# A lazy clone can be simulated via the following commands from the host repo
+# you wish to create a lazy clone of:
+#
+# cd /host_repo
+# git rev-parse HEAD
+# git init /guest_repo
+# git cat-file --batch-check --batch-all-objects | grep -v 'blob' |
+#  cut -d' ' -f1 | git pack-objects /e/guest_repo/.git/objects/pack/noblobs
+# cd /guest_repo
+# git config core.virtualizeobjects true
+# git reset --hard 
+#
+# Please note, this sample is a minimal skeleton. No proper error handling 
+# was implemented.
+#
+
+use 5.008;
+use lib (split(/:/, $ENV{GITPERLLIB}));
+use strict;
+use warnings;
+use Git::Packet;
+
+#
+# Point $DIR to the folder where your host git repo is located so we can pull
+# missing objects from it
+#
+my $DIR = "../.git/";
+
+packet_initialize("git-read-object", 1);
+
+packet_read_and_check_capabilities("get");
+packet_write_capabilities("get");
+
+while (1) {
+   my ($command) = packet_txt_read() =~ /^command=([^=]+)$/;
+
+   if ( $command eq "get" ) {
+   my ($sha1) = packet_txt_read() =~ /^sha1=([0-9a-f]{40})$/;
+   packet_bin_read();
+
+   my $path = $sha1;
+   $path =~ s{..}{$&/};
+   $path = $DIR . "/objects/" . $path;
+
+   my $contents = do {
+   local $/;
+   open my $fh, $path or die "Can't open '$path': $!";
+   <$fh>
+   };
+
+   packet_bin_write($contents);
+   packet_flush();
+   packet_txt_write("status=success");
+   packet_flush();
+   } else {
+   die "bad command '$command'";
+   }
+}
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 24/49] odb-helper: start fault in implementation

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 external-odb.c | 24 ++--
 odb-helper.c   | 30 --
 odb-helper.h   |  8 +++-
 t/t0400-external-odb.sh|  2 ++
 t/t0410-transfer-e-odb.sh  |  4 +++-
 t/t0420-transfer-http-e-odb.sh |  6 +++---
 6 files changed, 65 insertions(+), 9 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index d11fc98719..0b6e443372 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -20,6 +20,19 @@ static struct odb_helper *find_or_create_helper(const char 
*name, int len)
return o;
 }
 
+static enum odb_helper_fetch_kind parse_fetch_kind(const char *key,
+  const char *value)
+{
+   if (!strcasecmp(value, "plainobject"))
+   return ODB_FETCH_KIND_PLAIN_OBJECT;
+   else if (!strcasecmp(value, "gitobject"))
+   return ODB_FETCH_KIND_GIT_OBJECT;
+   else if (!strcasecmp(value, "faultin"))
+   return ODB_FETCH_KIND_FAULT_IN;
+
+   die("unknown value for config '%s': %s", key, value);
+}
+
 static int external_odb_config(const char *var, const char *value, void *data)
 {
struct odb_helper *o;
@@ -36,8 +49,15 @@ static int external_odb_config(const char *var, const char 
*value, void *data)
 
if (!strcmp(key, "command"))
return git_config_string(>cmd, var, value);
-   if (!strcmp(key, "plainobjects"))
-   o->store_plain_objects = git_config_bool(var, value);
+   if (!strcmp(key, "fetchkind")) {
+   const char *fetch_kind;
+   int ret = git_config_string(_kind, var, value);
+   if (!ret) {
+   o->fetch_kind = parse_fetch_kind(var, fetch_kind);
+   free((char *)fetch_kind);
+   }
+   return ret;
+   }
 
return 0;
 }
diff --git a/odb-helper.c b/odb-helper.c
index b33ee81c97..24dc5375cb 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -347,14 +347,40 @@ static int odb_helper_fetch_git_object(struct odb_helper 
*o,
return 0;
 }
 
+static int odb_helper_fetch_fault_in(struct odb_helper *o,
+const unsigned char *sha1,
+int fd)
+{
+   struct odb_helper_object *obj;
+   struct odb_helper_cmd cmd;
+
+   obj = odb_helper_lookup(o, sha1);
+   if (!obj)
+   return -1;
+
+   if (odb_helper_start(o, , 0, "get %s", sha1_to_hex(sha1)) < 0)
+   return -1;
+
+   if (odb_helper_finish(o, ))
+   return -1;
+
+   return 0;
+}
+
 int odb_helper_fetch_object(struct odb_helper *o,
const unsigned char *sha1,
int fd)
 {
-   if (o->store_plain_objects)
+   switch(o->fetch_kind) {
+   case ODB_FETCH_KIND_PLAIN_OBJECT:
return odb_helper_fetch_plain_object(o, sha1, fd);
-   else
+   case ODB_FETCH_KIND_GIT_OBJECT:
return odb_helper_fetch_git_object(o, sha1, fd);
+   case ODB_FETCH_KIND_FAULT_IN:
+   return odb_helper_fetch_fault_in(o, sha1, fd);
+   default:
+   BUG("invalid fetch kind '%d'", o->fetch_kind);
+   }
 }
 
 int odb_helper_for_each_object(struct odb_helper *o,
diff --git a/odb-helper.h b/odb-helper.h
index 3953b9bbaf..e3ad8e3316 100644
--- a/odb-helper.h
+++ b/odb-helper.h
@@ -3,10 +3,16 @@
 
 #include "external-odb.h"
 
+enum odb_helper_fetch_kind {
+   ODB_FETCH_KIND_PLAIN_OBJECT = 0,
+   ODB_FETCH_KIND_GIT_OBJECT,
+   ODB_FETCH_KIND_FAULT_IN
+};
+
 struct odb_helper {
const char *name;
const char *cmd;
-   int store_plain_objects;
+   enum odb_helper_fetch_kind fetch_kind;
 
struct odb_helper_object {
unsigned char sha1[20];
diff --git a/t/t0400-external-odb.sh b/t/t0400-external-odb.sh
index 3c868cad4c..c3cb0fdc84 100755
--- a/t/t0400-external-odb.sh
+++ b/t/t0400-external-odb.sh
@@ -49,6 +49,7 @@ test_expect_success 'alt objects are missing' '
 
 test_expect_success 'helper can retrieve alt objects' '
test_config odb.magic.command "$HELPER" &&
+   test_config odb.magic.fetchKind "gitObject" &&
cat >expect <<-\EOF &&
two
one
@@ -68,6 +69,7 @@ test_expect_success 'helper can add objects to alt repo' '
 
 test_expect_success 'commit adds objects to alt repo' '
test_config odb.magic.command "$HELPER" &&
+   test_config odb.magic.fetchKind "gitObject" &&
test_commit three &&
hash3=$(git ls-tree HEAD | grep three.t | cut -f1 | cut -d\  -f3) &&
content=$(cd alt-repo && git show "$hash3") &&
diff --git a/t/t0410-transfer-e-odb.sh b/t/t0410-transfer-e-odb.sh
index 868b55db94..cba89866e2 100755
--- a/t/t0410-transfer-e-odb.sh
+++ b/t/t0410-transfer-e-odb.sh
@@ -89,7 +89,8 @@ 

[RFC/PATCH v4 45/49] clone: add initial param to write_remote_refs()

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 builtin/clone.c | 19 +--
 1 file changed, 13 insertions(+), 6 deletions(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index 370a233d22..bd690576e6 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -572,7 +572,7 @@ static struct ref *wanted_peer_refs(const struct ref *refs,
return local_refs;
 }
 
-static void write_remote_refs(const struct ref *local_refs)
+static void write_remote_refs(const struct ref *local_refs, int initial)
 {
const struct ref *r;
 
@@ -591,8 +591,13 @@ static void write_remote_refs(const struct ref *local_refs)
die("%s", err.buf);
}
 
-   if (initial_ref_transaction_commit(t, ))
-   die("%s", err.buf);
+   if (initial) {
+   if (initial_ref_transaction_commit(t, ))
+   die("%s", err.buf);
+   } else {
+   if (ref_transaction_commit(t, ))
+   die("%s", err.buf);
+   }
 
strbuf_release();
ref_transaction_free(t);
@@ -639,7 +644,8 @@ static void update_remote_refs(const struct ref *refs,
   const char *branch_top,
   const char *msg,
   struct transport *transport,
-  int check_connectivity)
+  int check_connectivity,
+  int initial)
 {
const struct ref *rm = mapped_refs;
 
@@ -654,7 +660,7 @@ static void update_remote_refs(const struct ref *refs,
}
 
if (refs) {
-   write_remote_refs(mapped_refs);
+   write_remote_refs(mapped_refs, initial);
if (option_single_branch && !option_no_tags)
write_followtags(refs, msg);
}
@@ -1163,7 +1169,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
transport_fetch_refs(transport, mapped_refs);
 
update_remote_refs(refs, mapped_refs, remote_head_points_at,
-  branch_top.buf, reflog_msg.buf, transport, 
!is_local);
+  branch_top.buf, reflog_msg.buf, transport,
+  !is_local, 0);
 
update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 46/49] clone: add --initial-refspec option

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 builtin/clone.c | 55 ++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/builtin/clone.c b/builtin/clone.c
index bd690576e6..dda0ad360b 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -55,6 +55,7 @@ static enum transport_family family;
 static struct string_list option_config = STRING_LIST_INIT_NODUP;
 static struct string_list option_required_reference = STRING_LIST_INIT_NODUP;
 static struct string_list option_optional_reference = STRING_LIST_INIT_NODUP;
+static struct string_list option_initial_refspec = STRING_LIST_INIT_NODUP;
 static int option_dissociate;
 static int max_jobs = -1;
 static struct string_list option_recurse_submodules = STRING_LIST_INIT_NODUP;
@@ -105,6 +106,8 @@ static struct option builtin_clone_options[] = {
N_("reference repository")),
OPT_STRING_LIST(0, "reference-if-able", _optional_reference,
N_("repo"), N_("reference repository")),
+   OPT_STRING_LIST(0, "initial-refspec", _initial_refspec,
+   N_("refspec"), N_("fetch this refspec first")),
OPT_BOOL(0, "dissociate", _dissociate,
 N_("use --reference only while cloning")),
OPT_STRING('o', "origin", _origin, N_("name"),
@@ -864,6 +867,47 @@ static void dissociate_from_references(void)
free(alternates);
 }
 
+static struct refspec *parse_initial_refspecs(void)
+{
+   const char **refspecs;
+   struct refspec *initial_refspecs;
+   struct string_list_item *rs;
+   int i = 0;
+
+   if (!option_initial_refspec.nr)
+   return NULL;
+
+   refspecs = xcalloc(option_initial_refspec.nr, sizeof(const char *));
+
+   for_each_string_list_item(rs, _initial_refspec)
+   refspecs[i++] = rs->string;
+
+   initial_refspecs = parse_fetch_refspec(option_initial_refspec.nr, 
refspecs);
+
+   free(refspecs);
+
+   return initial_refspecs;
+}
+
+static void fetch_initial_refs(struct transport *transport,
+  const struct ref *refs,
+  struct refspec *initial_refspecs,
+  const char *branch_top,
+  const char *reflog_msg,
+  int is_local)
+{
+   int i;
+
+   for (i = 0; i < option_initial_refspec.nr; i++) {
+   struct ref *init_refs = NULL;
+   struct ref **tail = _refs;
+   get_fetch_map(refs, _refspecs[i], , 0);
+   transport_fetch_refs(transport, init_refs);
+   update_remote_refs(refs, init_refs, NULL, branch_top, 
reflog_msg,
+  transport, !is_local, 1);
+   }
+}
+
 int cmd_clone(int argc, const char **argv, const char *prefix)
 {
int is_bundle = 0, is_local;
@@ -887,6 +931,9 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
struct refspec *refspec;
const char *fetch_pattern;
 
+   struct refspec *initial_refspecs;
+   int is_initial;
+
packet_trace_identity("clone");
argc = parse_options(argc, argv, prefix, builtin_clone_options,
 builtin_clone_usage, 0);
@@ -1054,6 +1101,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
if (option_required_reference.nr || option_optional_reference.nr)
setup_reference();
 
+   initial_refspecs = parse_initial_refspecs();
+
fetch_pattern = xstrfmt("+%s*:%s*", src_ref_prefix, branch_top.buf);
refspec = parse_fetch_refspec(1, _pattern);
free((char *)fetch_pattern);
@@ -1109,6 +1158,9 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
refs = transport_get_remote_refs(transport);
 
if (refs) {
+   fetch_initial_refs(transport, refs, initial_refspecs,
+  branch_top.buf, reflog_msg.buf, is_local);
+
mapped_refs = wanted_peer_refs(refs, refspec);
/*
 * transport_get_remote_refs() may return refs with null sha-1
@@ -1168,9 +1220,10 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
else if (refs && complete_refs_before_fetch)
transport_fetch_refs(transport, mapped_refs);
 
+   is_initial = !refs || option_initial_refspec.nr == 0;
update_remote_refs(refs, mapped_refs, remote_head_points_at,
   branch_top.buf, reflog_msg.buf, transport,
-  !is_local, 0);
+  !is_local, is_initial);
 
update_head(our_head_points_at, remote_head, reflog_msg.buf);
 
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 43/49] odb-helper: advertise 'put' capability

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 odb-helper.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/odb-helper.c b/odb-helper.c
index 2cd1f25e83..2e5d8af526 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -67,7 +67,11 @@ static int start_read_object_fn(struct subprocess_entry 
*subprocess)
if (err)
goto done;
 
-   err = packet_writel(process->in, "capability=get", "capability=have", 
NULL);
+   err = packet_writel(process->in,
+   "capability=get",
+   "capability=put",
+   "capability=have",
+   NULL);
if (err)
goto done;
 
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 48/49] Add test for 'clone --initial-refspec'

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t5616-clone-initial-refspec.sh | 48 
 1 file changed, 48 insertions(+)
 create mode 100755 t/t5616-clone-initial-refspec.sh

diff --git a/t/t5616-clone-initial-refspec.sh b/t/t5616-clone-initial-refspec.sh
new file mode 100755
index 00..ccbc27f83f
--- /dev/null
+++ b/t/t5616-clone-initial-refspec.sh
@@ -0,0 +1,48 @@
+#!/bin/sh
+
+test_description='test clone with --initial-refspec option'
+. ./test-lib.sh
+
+
+test_expect_success 'setup regular repo' '
+   # Make two branches, "master" and "side"
+   echo one >file &&
+   git add file &&
+   git commit -m one &&
+   echo two >file &&
+   git commit -a -m two &&
+   git tag two &&
+   echo three >file &&
+   git commit -a -m three &&
+   git checkout -b side &&
+   echo four >file &&
+   git commit -a -m four &&
+   git checkout master
+'
+
+test_expect_success 'add a special ref pointing to a blob' '
+   hash=$(echo "Hello world!" | git hash-object -w -t blob --stdin) &&
+   git update-ref refs/special/hello "$hash"
+'
+
+test_expect_success 'no-local clone from the first repo' '
+   mkdir my-clone &&
+   (cd my-clone &&
+git clone --no-local .. . &&
+test_must_fail git cat-file blob "$hash") &&
+   rm -rf my-clone
+'
+
+test_expect_success 'no-local clone with --initial-refspec' '
+   mkdir my-clone &&
+   (cd my-clone &&
+git clone --no-local --initial-refspec "refs/special/*:refs/special/*" 
.. . &&
+git cat-file blob "$hash" &&
+git rev-parse refs/special/hello >actual &&
+echo "$hash" >expected &&
+test_cmp expected actual) &&
+   rm -rf my-clone
+'
+
+test_done
+
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 41/49] external-odb: add external_odb_do_fetch_object()

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 external-odb.c | 52 ++--
 1 file changed, 30 insertions(+), 22 deletions(-)

diff --git a/external-odb.c b/external-odb.c
index 8c2570b2e7..c39f207dd3 100644
--- a/external-odb.c
+++ b/external-odb.c
@@ -95,32 +95,11 @@ const char *external_odb_root(void)
return root;
 }
 
-int external_odb_has_object(const unsigned char *sha1)
-{
-   struct odb_helper *o;
-
-   if (!use_external_odb)
-   return 0;
-
-   external_odb_init();
-
-   for (o = helpers; o; o = o->next) {
-   if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE))
-   return 1;
-   if (odb_helper_has_object(o, sha1))
-   return 1;
-   }
-   return 0;
-}
-
-int external_odb_fetch_object(const unsigned char *sha1)
+static int external_odb_do_fetch_object(const unsigned char *sha1)
 {
struct odb_helper *o;
const char *path;
 
-   if (!external_odb_has_object(sha1))
-   return -1;
-
path = sha1_file_name_alt(external_odb_root(), sha1);
safe_create_leading_directories_const(path);
prepare_external_alt_odb();
@@ -175,6 +154,35 @@ int external_odb_fault_in_object(const unsigned char *sha1)
return -1;
 }
 
+int external_odb_has_object(const unsigned char *sha1)
+{
+   struct odb_helper *o;
+
+   if (!use_external_odb)
+   return 0;
+
+   external_odb_init();
+
+   for (o = helpers; o; o = o->next) {
+   if (!(o->supported_capabilities & ODB_HELPER_CAP_HAVE)) {
+   if (o->fetch_kind == ODB_FETCH_KIND_FAULT_IN)
+   return 1;
+   return !external_odb_do_fetch_object(sha1);
+   }
+   if (odb_helper_has_object(o, sha1))
+   return 1;
+   }
+   return 0;
+}
+
+int external_odb_fetch_object(const unsigned char *sha1)
+{
+   if (!external_odb_has_object(sha1))
+   return -1;
+
+   return external_odb_do_fetch_object(sha1);
+}
+
 int external_odb_for_each_object(each_external_object_fn fn, void *data)
 {
struct odb_helper *o;
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 49/49] t: add t0430 to test cloning using bundles

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 t/t0430-clone-bundle-e-odb.sh | 91 +++
 1 file changed, 91 insertions(+)
 create mode 100755 t/t0430-clone-bundle-e-odb.sh

diff --git a/t/t0430-clone-bundle-e-odb.sh b/t/t0430-clone-bundle-e-odb.sh
new file mode 100755
index 00..8934bea006
--- /dev/null
+++ b/t/t0430-clone-bundle-e-odb.sh
@@ -0,0 +1,91 @@
+#!/bin/sh
+
+test_description='tests for cloning using a bundle through e-odb'
+
+. ./test-lib.sh
+
+# If we don't specify a port, the current test number will be used
+# which will not work as it is less than 1024, so it can only be used by root.
+LIB_HTTPD_PORT=$(expr ${this_test#t} + 12000)
+
+. "$TEST_DIRECTORY"/lib-httpd.sh
+
+start_httpd apache-e-odb.conf
+
+# odb helper script must see this
+export HTTPD_URL
+
+write_script odb-clone-bundle-helper <<\EOF
+die() {
+   printf >&2 "%s\n" "$@"
+   exit 1
+}
+echo >&2 "odb-clone-bundle-helper args:" "$@"
+case "$1" in
+get_cap)
+   echo "capability=get"
+   echo "capability=have"
+   ;;
+have)
+   ref_hash=$(git rev-parse refs/odbs/magic/bundle) ||
+   die "couldn't find refs/odbs/magic/bundle"
+   GIT_NO_EXTERNAL_ODB=1 git cat-file blob "$ref_hash" >bundle_info ||
+   die "couldn't get blob $ref_hash"
+   bundle_url=$(sed -e 's/bundle url: //' bundle_info)
+   echo >&2 "bundle_url: '$bundle_url'"
+   curl "$bundle_url" -o bundle_file ||
+   die "curl '$bundle_url' failed"
+   GIT_NO_EXTERNAL_ODB=1 git bundle unbundle bundle_file >unbundling_info 
||
+   die "unbundling 'bundle_file' failed"
+   ;;
+get)
+   die "odb-clone-bundle-helper 'get' called"
+   ;;
+put)
+   die "odb-clone-bundle-helper 'put' called"
+   ;;
+*)
+   die "unknown command '$1'"
+   ;;
+esac
+EOF
+HELPER="\"$PWD\"/odb-clone-bundle-helper"
+
+
+test_expect_success 'setup repo with a few commits' '
+   test_commit one &&
+   test_commit two &&
+   test_commit three &&
+   test_commit four
+'
+
+BUNDLE_FILE="file.bundle"
+FILES_DIR="httpd/www/files"
+GET_URL="$HTTPD_URL/files/$BUNDLE_FILE"
+
+test_expect_success 'create a bundle for this repo and check that it can be 
downloaded' '
+   git bundle create "$BUNDLE_FILE" master &&
+   mkdir "$FILES_DIR" &&
+   cp "$BUNDLE_FILE" "$FILES_DIR/" &&
+   curl "$GET_URL" --output actual &&
+   test_cmp "$BUNDLE_FILE" actual
+'
+
+test_expect_success 'create an e-odb ref for this bundle' '
+   ref_hash=$(echo "bundle url: $GET_URL" | GIT_NO_EXTERNAL_ODB=1 git 
hash-object -w -t blob --stdin) &&
+   git update-ref refs/odbs/magic/bundle "$ref_hash"
+'
+
+test_expect_success 'clone using the e-odb helper to download and install the 
bundle' '
+   mkdir my-clone &&
+   (cd my-clone &&
+git clone --no-local \
+   -c odb.magic.command="$HELPER" \
+   -c odb.magic.fetchKind="faultin" \
+   -c odb.magic.scriptMode="true" \
+   --initial-refspec "refs/odbs/magic/*:refs/odbs/magic/*" .. .)
+'
+
+stop_httpd
+
+test_done
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 42/49] odb-helper: advertise 'have' capability

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 odb-helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/odb-helper.c b/odb-helper.c
index e21113c0b8..2cd1f25e83 100644
--- a/odb-helper.c
+++ b/odb-helper.c
@@ -67,7 +67,7 @@ static int start_read_object_fn(struct subprocess_entry 
*subprocess)
if (err)
goto done;
 
-   err = packet_writel(process->in, "capability=get", NULL);
+   err = packet_writel(process->in, "capability=get", "capability=have", 
NULL);
if (err)
goto done;
 
-- 
2.13.1.565.gbfcd7a9048



[RFC/PATCH v4 47/49] clone: disable external odb before initial clone

2017-06-20 Thread Christian Couder
Signed-off-by: Christian Couder 
---
 builtin/clone.c | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/builtin/clone.c b/builtin/clone.c
index dda0ad360b..a0d7b2bd2f 100644
--- a/builtin/clone.c
+++ b/builtin/clone.c
@@ -933,6 +933,7 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
 
struct refspec *initial_refspecs;
int is_initial;
+   int saved_use_external_odb;
 
packet_trace_identity("clone");
argc = parse_options(argc, argv, prefix, builtin_clone_options,
@@ -1078,6 +1079,10 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
 
git_config(git_default_config, NULL);
 
+   /* Temporarily disable external ODB before initial clone */
+   saved_use_external_odb = use_external_odb;
+   use_external_odb = 0;
+
if (option_bare) {
if (option_mirror)
src_ref_prefix = "refs/";
@@ -1161,6 +1166,8 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
fetch_initial_refs(transport, refs, initial_refspecs,
   branch_top.buf, reflog_msg.buf, is_local);
 
+   use_external_odb = saved_use_external_odb;
+
mapped_refs = wanted_peer_refs(refs, refspec);
/*
 * transport_get_remote_refs() may return refs with null sha-1
@@ -1202,6 +1209,9 @@ int cmd_clone(int argc, const char **argv, const char 
*prefix)
option_branch, option_origin);
 
warning(_("You appear to have cloned an empty repository."));
+
+   use_external_odb = saved_use_external_odb;
+
mapped_refs = NULL;
our_head_points_at = NULL;
remote_head_points_at = NULL;
-- 
2.13.1.565.gbfcd7a9048



  1   2   >