Re: [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension

2017-07-28 Thread Junio C Hamano
Jonathan Tan  writes:

>> >  extern int repository_format_precious_objects;
>> > +extern char *repository_format_lazy_object;
>> 
>> This is not a new problem, but I think these two should be
>> called repository_extension_$NAME not repository_format_$NAME.
>
> Looking at the original commit 067fbd4 ("introduce "preciousObjects"
> repository extension", 2015-06-24), it seems that this was so named to
> be analogous to the existing "struct repository_format { int version;
> ...}" => "int repository_format_version;". The existing
> repository_format_$NAME thus seems reasonable to me.

OK.  They smell like "repository extension" to me, but probably the
fully spelled name of the concept is "repository format extension",
so using the word "format" out of that phrase sounds OK to me.

Thanks.


Re: [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension

2017-07-28 Thread Jonathan Tan
On Thu, 27 Jul 2017 11:55:46 -0700
Junio C Hamano  wrote:

> My reading hiccupped after the first sentence, as the problem
> description made it sound like this was a boolean ("are we using
> lazy object feature?"), after reading "data type string".  And then
> "the command in that option" made me hiccup one more time, as it did
> not "click" that "in that option" was trying to say that the string
> is used as the command name (or is it a whole command line?  The
> leading part of the command line to which some arguments are
> appended before it gets invoked as a command? or what?).
> 
> Logically, I think it is more like
> 
>  - extensions.lazyobject can be set to tell Git to consider missing
>objects in certain cases are not errors;
> 
>  - the value of extensions.lazyobject variable must be a string,
>which is used to name the command to lazily make the object
>"appear" in the repository on demand.

OK, I'll update the commit message in the next reroll.

> >  extern int repository_format_precious_objects;
> > +extern char *repository_format_lazy_object;
> 
> This is not a new problem, but I think these two should be
> called repository_extension_$NAME not repository_format_$NAME.

Looking at the original commit 067fbd4 ("introduce "preciousObjects"
repository extension", 2015-06-24), it seems that this was so named to
be analogous to the existing "struct repository_format { int version;
...}" => "int repository_format_version;". The existing
repository_format_$NAME thus seems reasonable to me.


Re: [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension

2017-07-28 Thread Ben Peart



On 7/27/2017 2:55 PM, Junio C Hamano wrote:

Jonathan Tan  writes:


Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage.



I very much like the direction this is taking.  Handling missing objects 
gracefully is an important part of the overall partial clone support.



Introduce a new repository extension option "extensions.lazyobject", of
data type string. If this is set in a repository, Git will tolerate
objects being missing in that repository.  When Git needs those objects,
it will invoke the command in that option.




I'm very glad you are doing this.  Having never used precious objects I 
was unaware of the extensions concept but it looks like a great way to 
deal with this repo specific option.



My reading hiccupped after the first sentence, as the problem
description made it sound like this was a boolean ("are we using
lazy object feature?"), after reading "data type string".  And then
"the command in that option" made me hiccup one more time, as it did
not "click" that "in that option" was trying to say that the string
is used as the command name (or is it a whole command line?  The
leading part of the command line to which some arguments are
appended before it gets invoked as a command? or what?).

Logically, I think it is more like

  - extensions.lazyobject can be set to tell Git to consider missing
objects in certain cases are not errors;

  - the value of extensions.lazyobject variable must be a string,
which is used to name the command to lazily make the object
"appear" in the repository on demand.


Teach fsck about the new state of affairs. In this commit, teach fsck
that missing objects referenced from the reflog are not an error case;
in future commits, fsck will be taught about other cases.


In any case, sounds like a small and good first step.



I agree completely with the feedback on the description.  That is quite 
the run on sentence. :)



+
+`lazyObject`
+~
+
+When the config key `extensions.lazyObject` is set to a string, Git
+tolerates objects being missing in the repository. This string contains
+the command to be run whenever a missing object is needed.


This has the same issue but to a lessor degree as there is "string
contains".  What the command will do (e.g. "makes the object
magically appear in the object store" or "emits the contents of the
object to its standard output, so that Git can hash it to make it
appear in the object store"), and how it is used (e.g. "this is a
single command name and nothing else", "this is a leading part of
command line and arguments are appended at the end, before the whole
thing is passed to the shell to be executed", etc.) will need to be
clarified in the final version of the series (not necessarily at
this step---the series can elaborate in the later patches).


diff --git a/builtin/fsck.c b/builtin/fsck.c
index fb0947009..1cfb8d98c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -402,7 +402,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
struct object_id *oid,
xstrfmt("%s@{%"PRItime"}", refname, 
timestamp));
obj->flags |= USED;
mark_object_reachable(obj);
-   } else {
+   } else if (!repository_format_lazy_object) {
error("%s: invalid reflog entry %s", refname, 
oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
}
diff --git a/cache.h b/cache.h
index 6c8242340..9e6bc0a21 100644
--- a/cache.h
+++ b/cache.h
@@ -853,10 +853,12 @@ extern int grafts_replace_parents;
  #define GIT_REPO_VERSION 0
  #define GIT_REPO_VERSION_READ 1
  extern int repository_format_precious_objects;
+extern char *repository_format_lazy_object;


This is not a new problem, but I think these two should be
called repository_extension_$NAME not repository_format_$NAME.


diff --git a/t/t0410-lazy-object.sh b/t/t0410-lazy-object.sh
new file mode 100755
index 0..36442531f
--- /dev/null
+++ b/t/t0410-lazy-object.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='lazy object'
+
+. ./test-lib.sh
+
+delete_object () {
+   rm $1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut -c3-40)
+}
+
+test_expect_success 'fsck fails on lazy object in reflog' '
+   test_create_repo repo &&
+   test_commit -C repo 1 &&
+
+   A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
+   B=$(git -C repo commit-tree -m b HEAD^{tree}) &&
+
+   # Reference $A only from reflog, and delete it
+   git -C repo branch mybranch "$A" &&
+   git -C repo branch -f mybranch "$B" &&
+   delete_object repo "$A" &&

Re: [RFC PATCH 1/4] environment, fsck: introduce lazyobject extension

2017-07-27 Thread Junio C Hamano
Jonathan Tan  writes:

> Currently, Git does not support repos with very large numbers of objects
> or repos that wish to minimize manipulation of certain blobs (for
> example, because they are very large) very well, even if the user
> operates mostly on part of the repo, because Git is designed on the
> assumption that every referenced object is available somewhere in the
> repo storage.
>
> Introduce a new repository extension option "extensions.lazyobject", of
> data type string. If this is set in a repository, Git will tolerate
> objects being missing in that repository.  When Git needs those objects,
> it will invoke the command in that option.

My reading hiccupped after the first sentence, as the problem
description made it sound like this was a boolean ("are we using
lazy object feature?"), after reading "data type string".  And then
"the command in that option" made me hiccup one more time, as it did
not "click" that "in that option" was trying to say that the string
is used as the command name (or is it a whole command line?  The
leading part of the command line to which some arguments are
appended before it gets invoked as a command? or what?).

Logically, I think it is more like

 - extensions.lazyobject can be set to tell Git to consider missing
   objects in certain cases are not errors;

 - the value of extensions.lazyobject variable must be a string,
   which is used to name the command to lazily make the object
   "appear" in the repository on demand.

> Teach fsck about the new state of affairs. In this commit, teach fsck
> that missing objects referenced from the reflog are not an error case;
> in future commits, fsck will be taught about other cases.

In any case, sounds like a small and good first step.

> +
> +`lazyObject`
> +~
> +
> +When the config key `extensions.lazyObject` is set to a string, Git
> +tolerates objects being missing in the repository. This string contains
> +the command to be run whenever a missing object is needed.

This has the same issue but to a lessor degree as there is "string
contains".  What the command will do (e.g. "makes the object
magically appear in the object store" or "emits the contents of the
object to its standard output, so that Git can hash it to make it
appear in the object store"), and how it is used (e.g. "this is a
single command name and nothing else", "this is a leading part of
command line and arguments are appended at the end, before the whole
thing is passed to the shell to be executed", etc.) will need to be
clarified in the final version of the series (not necessarily at
this step---the series can elaborate in the later patches).

> diff --git a/builtin/fsck.c b/builtin/fsck.c
> index fb0947009..1cfb8d98c 100644
> --- a/builtin/fsck.c
> +++ b/builtin/fsck.c
> @@ -402,7 +402,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
> struct object_id *oid,
>   xstrfmt("%s@{%"PRItime"}", refname, 
> timestamp));
>   obj->flags |= USED;
>   mark_object_reachable(obj);
> - } else {
> + } else if (!repository_format_lazy_object) {
>   error("%s: invalid reflog entry %s", refname, 
> oid_to_hex(oid));
>   errors_found |= ERROR_REACHABLE;
>   }
> diff --git a/cache.h b/cache.h
> index 6c8242340..9e6bc0a21 100644
> --- a/cache.h
> +++ b/cache.h
> @@ -853,10 +853,12 @@ extern int grafts_replace_parents;
>  #define GIT_REPO_VERSION 0
>  #define GIT_REPO_VERSION_READ 1
>  extern int repository_format_precious_objects;
> +extern char *repository_format_lazy_object;

This is not a new problem, but I think these two should be
called repository_extension_$NAME not repository_format_$NAME.

> diff --git a/t/t0410-lazy-object.sh b/t/t0410-lazy-object.sh
> new file mode 100755
> index 0..36442531f
> --- /dev/null
> +++ b/t/t0410-lazy-object.sh
> @@ -0,0 +1,32 @@
> +#!/bin/sh
> +
> +test_description='lazy object'
> +
> +. ./test-lib.sh
> +
> +delete_object () {
> + rm $1/.git/objects/$(echo $2 | cut -c1-2)/$(echo $2 | cut -c3-40)
> +}
> +
> +test_expect_success 'fsck fails on lazy object in reflog' '
> + test_create_repo repo &&
> + test_commit -C repo 1 &&
> +
> + A=$(git -C repo commit-tree -m a HEAD^{tree}) &&
> + B=$(git -C repo commit-tree -m b HEAD^{tree}) &&
> +
> + # Reference $A only from reflog, and delete it
> + git -C repo branch mybranch "$A" &&
> + git -C repo branch -f mybranch "$B" &&
> + delete_object repo "$A" &&
> +
> + test_must_fail git -C repo fsck
> +'
> +test_expect_success '...but succeeds if lazyobject is set' '
> + git -C repo config core.repositoryformatversion 1 &&
> + git -C repo config extensions.lazyobject "arbitrary string" &&
> + git -C repo fsck
> +'
> +
> +test_done


[RFC PATCH 1/4] environment, fsck: introduce lazyobject extension

2017-07-26 Thread Jonathan Tan
Currently, Git does not support repos with very large numbers of objects
or repos that wish to minimize manipulation of certain blobs (for
example, because they are very large) very well, even if the user
operates mostly on part of the repo, because Git is designed on the
assumption that every referenced object is available somewhere in the
repo storage.

Introduce a new repository extension option "extensions.lazyobject", of
data type string. If this is set in a repository, Git will tolerate
objects being missing in that repository.  When Git needs those objects,
it will invoke the command in that option.

Teach fsck about the new state of affairs. In this commit, teach fsck
that missing objects referenced from the reflog are not an error case;
in future commits, fsck will be taught about other cases.

Signed-off-by: Jonathan Tan 
---
 Documentation/technical/repository-version.txt |  7 ++
 builtin/fsck.c |  2 +-
 cache.h|  2 ++
 environment.c  |  1 +
 setup.c|  7 +-
 t/t0410-lazy-object.sh | 32 ++
 6 files changed, 49 insertions(+), 2 deletions(-)
 create mode 100755 t/t0410-lazy-object.sh

diff --git a/Documentation/technical/repository-version.txt 
b/Documentation/technical/repository-version.txt
index 00ad37986..39e362543 100644
--- a/Documentation/technical/repository-version.txt
+++ b/Documentation/technical/repository-version.txt
@@ -86,3 +86,10 @@ for testing format-1 compatibility.
 When the config key `extensions.preciousObjects` is set to `true`,
 objects in the repository MUST NOT be deleted (e.g., by `git-prune` or
 `git repack -d`).
+
+`lazyObject`
+~
+
+When the config key `extensions.lazyObject` is set to a string, Git
+tolerates objects being missing in the repository. This string contains
+the command to be run whenever a missing object is needed.
diff --git a/builtin/fsck.c b/builtin/fsck.c
index fb0947009..1cfb8d98c 100644
--- a/builtin/fsck.c
+++ b/builtin/fsck.c
@@ -402,7 +402,7 @@ static void fsck_handle_reflog_oid(const char *refname, 
struct object_id *oid,
xstrfmt("%s@{%"PRItime"}", refname, 
timestamp));
obj->flags |= USED;
mark_object_reachable(obj);
-   } else {
+   } else if (!repository_format_lazy_object) {
error("%s: invalid reflog entry %s", refname, 
oid_to_hex(oid));
errors_found |= ERROR_REACHABLE;
}
diff --git a/cache.h b/cache.h
index 6c8242340..9e6bc0a21 100644
--- a/cache.h
+++ b/cache.h
@@ -853,10 +853,12 @@ extern int grafts_replace_parents;
 #define GIT_REPO_VERSION 0
 #define GIT_REPO_VERSION_READ 1
 extern int repository_format_precious_objects;
+extern char *repository_format_lazy_object;
 
 struct repository_format {
int version;
int precious_objects;
+   char *lazy_object;
int is_bare;
char *work_tree;
struct string_list unknown_extensions;
diff --git a/environment.c b/environment.c
index 3fd4b1084..cd8ef2897 100644
--- a/environment.c
+++ b/environment.c
@@ -27,6 +27,7 @@ int warn_ambiguous_refs = 1;
 int warn_on_object_refname_ambiguity = 1;
 int ref_paranoia = -1;
 int repository_format_precious_objects;
+char *repository_format_lazy_object;
 const char *git_commit_encoding;
 const char *git_log_output_encoding;
 const char *apply_default_whitespace;
diff --git a/setup.c b/setup.c
index 860507e1f..94cfde3cc 100644
--- a/setup.c
+++ b/setup.c
@@ -425,7 +425,11 @@ static int check_repo_format(const char *var, const char 
*value, void *vdata)
;
else if (!strcmp(ext, "preciousobjects"))
data->precious_objects = git_config_bool(var, value);
-   else
+   else if (!strcmp(ext, "lazyobject")) {
+   if (!value)
+   return config_error_nonbool(var);
+   data->lazy_object = xstrdup(value);
+   } else
string_list_append(>unknown_extensions, ext);
} else if (strcmp(var, "core.bare") == 0) {
data->is_bare = git_config_bool(var, value);
@@ -468,6 +472,7 @@ static int check_repository_format_gently(const char 
*gitdir, int *nongit_ok)
}
 
repository_format_precious_objects = candidate.precious_objects;
+   repository_format_lazy_object = candidate.lazy_object;
string_list_clear(_extensions, 0);
if (!has_common) {
if (candidate.is_bare != -1) {
diff --git a/t/t0410-lazy-object.sh b/t/t0410-lazy-object.sh
new file mode 100755
index 0..36442531f
--- /dev/null
+++ b/t/t0410-lazy-object.sh
@@ -0,0 +1,32 @@
+#!/bin/sh
+
+test_description='lazy object'