Re: [PATCH 1/2] submodule.c: add has_submodules to check if we have any submodules

2017-04-20 Thread Brandon Williams
On 04/20, Stefan Beller wrote:
> On Thu, Apr 20, 2017 at 3:07 PM, Brandon Williams  wrote:
> > On 04/11, Stefan Beller wrote:
> >> +int has_submodules(unsigned what_to_check)
> >> +{
> >> + if (what_to_check & SUBMODULE_CHECK_ANY_CONFIG) {
> >> + if (submodule_config_reading == SUBMODULE_CONFIG_NOT_READ)
> >> + load_submodule_config();
> >> + if (submodule_config_reading == SUBMODULE_CONFIG_EXISTS)
> >> + return 1;
> >> + }
> >> +
> >> + if ((what_to_check & SUBMODULE_CHECK_ABSORBED_GIT_DIRS) &&
> >> + file_exists(git_path("modules")))
> >> + return 1;
> >> +
> >> + if ((what_to_check & SUBMODULE_CHECK_GITMODULES_IN_WT) &&
> >> + (!is_bare_repository() && file_exists(".gitmodules")))
> >> + return 1;
> >> +
> >> + if (what_to_check & SUBMODULE_CHECK_GITLINKS_IN_TREE) {
> >> + int i;
> >> +
> >> + if (read_cache() < 0)
> >> + die(_("index file corrupt"));
> >> +
> >> + for (i = 0; i < active_nr; i++)
> >> + if (S_ISGITLINK(active_cache[i]->ce_mode))
> >> + return 1;
> >> + }
> >> +
> >> + return 0;
> >> +}
> >
> > It may be a good idea to rearrange these by order to correctness.
> 
> I arranged by estimated speed, i.e. from fastest to slowest:
> * The first check just returns a value from memory in the standard case
>   Though the first one may take a hit in performance for the very first time
>   as it may need to load the config.
> * The next two are an actual stat system call, each having a different
>   'defect'. (We may have non-absorbed submodules or non-bare repos)
>   -> We could have a check for in-tree as well, not sure if that is desired.

Fair enough, I'm fine with the ordering then.

> 
> > Correctness may not be the best way to describe it, but which is the
> > strongest indicator that there is a submodule or that a repo 'has a
> > submodule'.
> 
> These indicators differ in strength for different scenarios IMO.
> (Just after clone: check for a .gitmodules file instead of a config;
> later: rather check for a config as it is fastest and actually catches
> active submodules; maybe we do not care about inactive submodules)

This is why I went back on my thinking :)  I realized that it really
depends on the scenario you are in.

> 
> >  That way in the future we could have a #define that is
> > SUBMODULE_CHECK_ANY or ALL or somethingNow that I'm thinking harder
> > about that we may not want that, and just require explicitly stating
> > which check you want done.
> >
> > Anyways good looking patch, and I like the idea of consolidating the
> > checks into a single function.
> 
> Thanks,
> Stefan

-- 
Brandon Williams


Re: [PATCH 1/2] submodule.c: add has_submodules to check if we have any submodules

2017-04-20 Thread Stefan Beller
On Thu, Apr 20, 2017 at 3:07 PM, Brandon Williams  wrote:
> On 04/11, Stefan Beller wrote:
>> +int has_submodules(unsigned what_to_check)
>> +{
>> + if (what_to_check & SUBMODULE_CHECK_ANY_CONFIG) {
>> + if (submodule_config_reading == SUBMODULE_CONFIG_NOT_READ)
>> + load_submodule_config();
>> + if (submodule_config_reading == SUBMODULE_CONFIG_EXISTS)
>> + return 1;
>> + }
>> +
>> + if ((what_to_check & SUBMODULE_CHECK_ABSORBED_GIT_DIRS) &&
>> + file_exists(git_path("modules")))
>> + return 1;
>> +
>> + if ((what_to_check & SUBMODULE_CHECK_GITMODULES_IN_WT) &&
>> + (!is_bare_repository() && file_exists(".gitmodules")))
>> + return 1;
>> +
>> + if (what_to_check & SUBMODULE_CHECK_GITLINKS_IN_TREE) {
>> + int i;
>> +
>> + if (read_cache() < 0)
>> + die(_("index file corrupt"));
>> +
>> + for (i = 0; i < active_nr; i++)
>> + if (S_ISGITLINK(active_cache[i]->ce_mode))
>> + return 1;
>> + }
>> +
>> + return 0;
>> +}
>
> It may be a good idea to rearrange these by order to correctness.

I arranged by estimated speed, i.e. from fastest to slowest:
* The first check just returns a value from memory in the standard case
  Though the first one may take a hit in performance for the very first time
  as it may need to load the config.
* The next two are an actual stat system call, each having a different
  'defect'. (We may have non-absorbed submodules or non-bare repos)
  -> We could have a check for in-tree as well, not sure if that is desired.

> Correctness may not be the best way to describe it, but which is the
> strongest indicator that there is a submodule or that a repo 'has a
> submodule'.

These indicators differ in strength for different scenarios IMO.
(Just after clone: check for a .gitmodules file instead of a config;
later: rather check for a config as it is fastest and actually catches
active submodules; maybe we do not care about inactive submodules)

>  That way in the future we could have a #define that is
> SUBMODULE_CHECK_ANY or ALL or somethingNow that I'm thinking harder
> about that we may not want that, and just require explicitly stating
> which check you want done.
>
> Anyways good looking patch, and I like the idea of consolidating the
> checks into a single function.

Thanks,
Stefan


Re: [PATCH 1/2] submodule.c: add has_submodules to check if we have any submodules

2017-04-20 Thread Brandon Williams
On 04/11, Stefan Beller wrote:
> +int has_submodules(unsigned what_to_check)
> +{
> + if (what_to_check & SUBMODULE_CHECK_ANY_CONFIG) {
> + if (submodule_config_reading == SUBMODULE_CONFIG_NOT_READ)
> + load_submodule_config();
> + if (submodule_config_reading == SUBMODULE_CONFIG_EXISTS)
> + return 1;
> + }
> +
> + if ((what_to_check & SUBMODULE_CHECK_ABSORBED_GIT_DIRS) &&
> + file_exists(git_path("modules")))
> + return 1;
> +
> + if ((what_to_check & SUBMODULE_CHECK_GITMODULES_IN_WT) &&
> + (!is_bare_repository() && file_exists(".gitmodules")))
> + return 1;
> +
> + if (what_to_check & SUBMODULE_CHECK_GITLINKS_IN_TREE) {
> + int i;
> +
> + if (read_cache() < 0)
> + die(_("index file corrupt"));
> +
> + for (i = 0; i < active_nr; i++)
> + if (S_ISGITLINK(active_cache[i]->ce_mode))
> + return 1;
> + }
> +
> + return 0;
> +}

It may be a good idea to rearrange these by order to correctness.
Correctness may not be the best way to describe it, but which is the
strongest indicator that there is a submodule or that a repo 'has a
submodule'.  That way in the future we could have a #define that is
SUBMODULE_CHECK_ANY or ALL or somethingNow that I'm thinking harder
about that we may not want that, and just require explicitly stating
which check you want done.

Anyways good looking patch, and I like the idea of consolidating the
checks into a single function.

> diff --git a/submodule.h b/submodule.h
> index 8a8bc49dc9..5ec72fbb16 100644
> --- a/submodule.h
> +++ b/submodule.h
> @@ -1,6 +1,12 @@
>  #ifndef SUBMODULE_H
>  #define SUBMODULE_H
>  
> +#define SUBMODULE_CHECK_ANY_CONFIG   (1<<0)
> +#define SUBMODULE_CHECK_ABSORBED_GIT_DIRS(1<<1)
> +#define SUBMODULE_CHECK_GITMODULES_IN_WT (1<<2)
> +#define SUBMODULE_CHECK_GITLINKS_IN_TREE (1<<3)
> +int has_submodules(unsigned what_to_check);

-- 
Brandon Williams


[PATCH 1/2] submodule.c: add has_submodules to check if we have any submodules

2017-04-11 Thread Stefan Beller
When submodules are involved, it often slows down the process, as most
submodule related handling is either done via a child process or by
iterating over the index finding all gitlinks.

For most commands that may interact with submodules, we need have a
quick check if we do have any submodules at all, such that we can
be fast in the case when no submodules are in use.  For this quick
check introduce a function that checks with different heuristics if
we do have submodules around, checking if
* anything related to submodules is configured,
* absorbed git dirs for submodules are present,
* the '.gitmodules' file exists
* gitlinks are recorded in the index.

Each heuristic has advantages and disadvantages.
For example in a later patch, when we first use this function in
git-clone, we'll just check for the existence of the '.gitmodules'
file, because at the time of running the clone command there will
be no absorbed git dirs or submodule configuration around.

Checking for any configuration related to submodules would be useful
in a later stage (after cloning) to see if the submodules are actually
in use.

Checking for absorbed git directories is good to see if the user has
actually cloned submodules already (i.e. not just initialized them by
configuring them).

The heuristic for checking the configuration requires this patch
to have have a global state, whether the submodule config has already
been read, and if there were any submodule related keys. Make
'submodule_config' private to the submodule code, and introduce
'load_submodule_config' that will take care of this global state.

Signed-off-by: Stefan Beller 
---
 builtin/checkout.c  |  2 +-
 builtin/fetch.c |  2 +-
 builtin/read-tree.c |  2 +-
 builtin/submodule--helper.c |  6 ++--
 submodule.c | 78 +++--
 submodule.h |  8 -
 unpack-trees.c  |  2 +-
 7 files changed, 76 insertions(+), 24 deletions(-)

diff --git a/builtin/checkout.c b/builtin/checkout.c
index 3ecc47bec0..cd5a2fd75a 100644
--- a/builtin/checkout.c
+++ b/builtin/checkout.c
@@ -1216,7 +1216,7 @@ int cmd_checkout(int argc, const char **argv, const char 
*prefix)
}
 
if (recurse_submodules != RECURSE_SUBMODULES_OFF) {
-   git_config(submodule_config, NULL);
+   load_submodule_config();
if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT)

set_config_update_recurse_submodules(recurse_submodules);
}
diff --git a/builtin/fetch.c b/builtin/fetch.c
index b5ad09d046..0bf3aa4458 100644
--- a/builtin/fetch.c
+++ b/builtin/fetch.c
@@ -1342,7 +1342,7 @@ int cmd_fetch(int argc, const char **argv, const char 
*prefix)
set_config_fetch_recurse_submodules(arg);
}
gitmodules_config();
-   git_config(submodule_config, NULL);
+   load_submodule_config();
}
 
if (all) {
diff --git a/builtin/read-tree.c b/builtin/read-tree.c
index 23e212ee8c..fe2ec60a51 100644
--- a/builtin/read-tree.c
+++ b/builtin/read-tree.c
@@ -177,7 +177,7 @@ int cmd_read_tree(int argc, const char **argv, const char 
*unused_prefix)
 
if (recurse_submodules != RECURSE_SUBMODULES_DEFAULT) {
gitmodules_config();
-   git_config(submodule_config, NULL);
+   load_submodule_config();
set_config_update_recurse_submodules(RECURSE_SUBMODULES_ON);
}
 
diff --git a/builtin/submodule--helper.c b/builtin/submodule--helper.c
index 85aafe46a4..770a95ca14 100644
--- a/builtin/submodule--helper.c
+++ b/builtin/submodule--helper.c
@@ -1015,7 +1015,7 @@ static int update_clone(int argc, const char **argv, 
const char *prefix)
 
/* Overlay the parsed .gitmodules file with .git/config */
gitmodules_config();
-   git_config(submodule_config, NULL);
+   load_submodule_config();
 
if (max_jobs < 0)
max_jobs = parallel_submodules();
@@ -1058,7 +1058,7 @@ static const char *remote_submodule_branch(const char 
*path)
 {
const struct submodule *sub;
gitmodules_config();
-   git_config(submodule_config, NULL);
+   load_submodule_config();
 
sub = submodule_from_path(null_sha1, path);
if (!sub)
@@ -1130,7 +1130,7 @@ static int absorb_git_dirs(int argc, const char **argv, 
const char *prefix)
 git_submodule_helper_usage, 0);
 
gitmodules_config();
-   git_config(submodule_config, NULL);
+   load_submodule_config();
 
if (module_list_compute(argc, argv, prefix, , ) < 0)
return 1;
diff --git a/submodule.c b/submodule.c
index c52d6634c5..da508dc257 100644
--- a/submodule.c
+++ b/submodule.c
@@ -24,6 +24,12 @@ static int initialized_fetch_ref_tips;
 static struct sha1_array ref_tips_before_fetch;
 static struct sha1_array