Re: Re: [RFC/WIP PATCH] implement reading of submodule .gitmodules configuration into cache

2013-12-12 Thread Heiko Voigt
On Mon, Dec 09, 2013 at 03:37:50PM -0800, Junio C Hamano wrote:
> > +void submodule_config_cache_free(struct submodule_config_cache *cache)
> > +{
> > +   /* NOTE: its important to iterate over the name hash here
> > +* since paths might have multiple entries */
> 
> Style (multi-line comments).

Will fix.

> This is interesting.  I wonder what the practical consequence is to
> have a single submodule bound to the top-level tree more than once.
> Updating from one of the working tree will make the other working
> tree out of sync because the ultimate location of the submodule
> directory pointed at by the two .git gitdirs can only have a single
> HEAD, be it detached or on a branch, and a single index.

To clarify, when writing this comment I was not thinking about the same
submodule with multiple paths in the same tree but rather with the same
name under different paths in different commits.

> Not that the decision to enforce that names are unique in the
> top-level .gitmodules, and follow that decision in this part of the
> code to be defensive (not rely on the "one submodule can be bound
> only once to a top-level tree"), but shouldn't such a configuration
> to have a single submodule bound to more than one place in the
> top-level tree be forbidden?

Yes IMO, that should be forbidden currently. I do not think we actually
prevent the user from doing so but it can not happen by accident since
we derive the initial name from the local path. Maybe we should be more
strict about that and put more guards in place to avoid such
configurations from entering the database.

> > +   for_each_hash(&cache->for_name, free_one_submodule_config, NULL);
> > +   free_hash(&cache->for_path);
> > +   free_hash(&cache->for_name);
> > +}
> > +
> > +static unsigned int hash_sha1_string(const unsigned char *sha1, const char 
> > *string)
> > +{
> > +   int c;
> > +   unsigned int hash, string_hash = 5381;
> > +   memcpy(&hash, sha1, sizeof(hash));
> > +
> > +   /* djb2 hash */
> > +   while ((c = *string++))
> > +   string_hash = ((string_hash << 5) + hash) + c; /* hash * 33 + c 
> > */
> 
> Hmm, the comment and the code does not seem to match in math here...

Yeah sorry that was a leftover from the code I started with. In the
beginning it was a pure string hash. Will remove both comments (since
its also not a pure djb2 hash anymore).

Cheers Heiko
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC/WIP PATCH] implement reading of submodule .gitmodules configuration into cache

2013-12-09 Thread Junio C Hamano
Heiko Voigt  writes:

> This submodule configuration cache allows us to lazily read .gitmodules
> configurations by commit into a runtime cache which can then be used to
> easily lookup values from it. Currently only the values for path or name
> are stored but it can be extended for any value needed.
> ...

Thanks.

> diff --git a/submodule-config-cache.c b/submodule-config-cache.c
> new file mode 100644
> index 000..7253fad
> --- /dev/null
> +++ b/submodule-config-cache.c
> @@ -0,0 +1,96 @@
> +#include "cache.h"
> +#include "submodule-config-cache.h"
> +#include "strbuf.h"
> +#include "hash.h"
> +
> +void submodule_config_cache_init(struct submodule_config_cache *cache)
> +{
> + init_hash(&cache->for_name);
> + init_hash(&cache->for_path);
> +}
> +
> +static int free_one_submodule_config(void *ptr, void *data)
> +{
> + struct submodule_config *entry = ptr;
> +
> + strbuf_release(&entry->path);
> + strbuf_release(&entry->name);
> + free(entry);
> +
> + return 0;
> +}
> +
> +void submodule_config_cache_free(struct submodule_config_cache *cache)
> +{
> + /* NOTE: its important to iterate over the name hash here
> +  * since paths might have multiple entries */

Style (multi-line comments).

This is interesting.  I wonder what the practical consequence is to
have a single submodule bound to the top-level tree more than once.
Updating from one of the working tree will make the other working
tree out of sync because the ultimate location of the submodule
directory pointed at by the two .git gitdirs can only have a single
HEAD, be it detached or on a branch, and a single index.

Not that the decision to enforce that names are unique in the
top-level .gitmodules, and follow that decision in this part of the
code to be defensive (not rely on the "one submodule can be bound
only once to a top-level tree"), but shouldn't such a configuration
to have a single submodule bound to more than one place in the
top-level tree be forbidden?

> + for_each_hash(&cache->for_name, free_one_submodule_config, NULL);
> + free_hash(&cache->for_path);
> + free_hash(&cache->for_name);
> +}
> +
> +static unsigned int hash_sha1_string(const unsigned char *sha1, const char 
> *string)
> +{
> + int c;
> + unsigned int hash, string_hash = 5381;
> + memcpy(&hash, sha1, sizeof(hash));
> +
> + /* djb2 hash */
> + while ((c = *string++))
> + string_hash = ((string_hash << 5) + hash) + c; /* hash * 33 + c 
> */

Hmm, the comment and the code does not seem to match in math here...
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC/WIP PATCH] implement reading of submodule .gitmodules configuration into cache

2013-12-09 Thread Heiko Voigt
This submodule configuration cache allows us to lazily read .gitmodules
configurations by commit into a runtime cache which can then be used to
easily lookup values from it. Currently only the values for path or name
are stored but it can be extended for any value needed.

This cache can be used for all purposes which need knowledge about
submodule configurations.

It is expected that .gitmodules files do not change often between
commits. Thats why we lookup the .gitmodules sha1 from a commit and then
either lookup an already parsed configuration or parse and cache an
unknown one for each sha1. The cache is lazily build on demand for each
requested commit.

Signed-off-by: Heiko Voigt 
---
On Tue, Dec 03, 2013 at 07:33:01PM +0100, Heiko Voigt wrote:
> On Mon, Dec 02, 2013 at 03:55:36PM -0800, Nick Townsend wrote:
> > It seems to me that the question that you are trying to solve is
> > more complex than the problem I faced in git-archive, where we have a
> > single commit of the top-level repository that we are chasing. 
> > Perhaps we should split the work into two pieces:
> > 
> > a. Identifying the complete submodule configuration for a single commit, and
> > b. the complexity of behaviour when fetching and cloning recursively (which 
> > of course requires a.)
> 
> You are right the latter (b) is a separate topic. So how about I extract the
> submodule config parsing part from the mentioned patch and you can then
> use that patch as a basis for your work? As far as I understand you only
> need to parse the .gitmodules file for one commit and then lookup the
> submodule names from paths right? That would simplify matters and we can
> postpone the caching of multiple commits for the time when I continue on b.

Ok and here is a patch that you can use as basis for your work. I looked
into it and found that its actually easier to use the cache. Storing
everything in a hashmap seems like overkill for your application but it
is prepared for my usecase which actually needs to parse configurations
for multiple commits.

Since this has not been discussed yet some details of my implementation
might change.

The test I implemented is only for demonstration of usage and my quick
manual testing. If we agree on going ahead with my patch I will extend
that to a proper test. Or if anyone has an idea where we can plug that
in to make a useful user interface we can also implement a test based on
that.

Cheers Heiko

 .gitignore|   1 +
 Makefile  |   2 +
 submodule-config-cache.c  |  96 
 submodule-config-cache.h  |  33 +++
 submodule.c   | 125 ++
 submodule.h   |   4 ++
 test-submodule-config-cache.c |  45 +++
 7 files changed, 306 insertions(+)
 create mode 100644 submodule-config-cache.c
 create mode 100644 submodule-config-cache.h
 create mode 100644 test-submodule-config-cache.c

diff --git a/.gitignore b/.gitignore
index 66199ed..6c91e98 100644
--- a/.gitignore
+++ b/.gitignore
@@ -200,6 +200,7 @@
 /test-sha1
 /test-sigchain
 /test-string-list
+/test-submodule-config-cache
 /test-subprocess
 /test-svn-fe
 /test-urlmatch-normalization
diff --git a/Makefile b/Makefile
index af847f8..e3d869b 100644
--- a/Makefile
+++ b/Makefile
@@ -572,6 +572,7 @@ TEST_PROGRAMS_NEED_X += test-scrap-cache-tree
 TEST_PROGRAMS_NEED_X += test-sha1
 TEST_PROGRAMS_NEED_X += test-sigchain
 TEST_PROGRAMS_NEED_X += test-string-list
+TEST_PROGRAMS_NEED_X += test-submodule-config-cache
 TEST_PROGRAMS_NEED_X += test-subprocess
 TEST_PROGRAMS_NEED_X += test-svn-fe
 TEST_PROGRAMS_NEED_X += test-urlmatch-normalization
@@ -872,6 +873,7 @@ LIB_OBJS += strbuf.o
 LIB_OBJS += streaming.o
 LIB_OBJS += string-list.o
 LIB_OBJS += submodule.o
+LIB_OBJS += submodule-config-cache.o
 LIB_OBJS += symlinks.o
 LIB_OBJS += tag.o
 LIB_OBJS += trace.o
diff --git a/submodule-config-cache.c b/submodule-config-cache.c
new file mode 100644
index 000..7253fad
--- /dev/null
+++ b/submodule-config-cache.c
@@ -0,0 +1,96 @@
+#include "cache.h"
+#include "submodule-config-cache.h"
+#include "strbuf.h"
+#include "hash.h"
+
+void submodule_config_cache_init(struct submodule_config_cache *cache)
+{
+   init_hash(&cache->for_name);
+   init_hash(&cache->for_path);
+}
+
+static int free_one_submodule_config(void *ptr, void *data)
+{
+   struct submodule_config *entry = ptr;
+
+   strbuf_release(&entry->path);
+   strbuf_release(&entry->name);
+   free(entry);
+
+   return 0;
+}
+
+void submodule_config_cache_free(struct submodule_config_cache *cache)
+{
+   /* NOTE: its important to iterate over the name hash here
+* since paths might have multiple entries */
+   for_each_hash(&cache->for_name, free_one_submodule_config, NULL);
+   free_hash(&cache->for_path);
+   free_hash(&cache->for_name);
+}
+
+static unsigned int hash_sha1_string(const unsigned char *s