date:20170706

Re: [PATCH 2/6] t1414: document some reflog-walk oddities

2017-07-06 Thread Jeff King

On Thu, Jul 06, 2017 at 08:42:45AM -0700, Junio C Hamano wrote:

> >> I somehow feel that the "showing all entries from HEAD and then
> >> showing all from side" is simply a buggy behaviour.  I do not think
> >> our users would terribly mind if we changed it later.  But I may be
> >> missing the reason why (sometimes?) the sequential behaviour may be
> >> useful.
> >
> > If we think it's buggy, we can fix it now. But I'm not convinced that
> > sequential iteration is that bad. It's at least _simple_ and easy to
> > explain.
> 
> Yes, I agree that sequential is easy to explain, but only when I
> consider use of "log" family without "-n 30" or "--since 3.days".
> It still is easy to explain---we show from one and then from the
> other, but because we stop after showing 30 of them, and the first
> one has more than that, you do not see any from the latter.

Ah, right, I didn't think of limiting like that. I agree that makes a
strong argument in favor of the date-ordered queue.

I suspect that "--since 3.days" is still quite buggy (even with a single
reflog) because it checks commit timestamps and stops traversing when we
go too bar back. But in a reflog, the commits may be totally out of
order. I'm not sure what it should do. Either:

  1. During a reflog walk, --since and --until should respect reflog
 timestamps instead of commit timestamps. You can already do
 "--until" there by simply starting the traversal later, but there's
 no way to cut it off with --since.

  2. Limit commits shown by --since/--until as usual, but skip the "stop
 traversing" optimization when we see too many "old" commits. I.e.,
 omit a 4.days.ago commit, but keep walking to find other recent
 commits.

-Peff

Honorable Barrister Paul Williams.

2017-07-06 Thread BARRISTER PAUL WILLIAMS

Attn: Sir/Madam

I am Honorable Barrister Paul Williams the personal resident Attorney
here in Burkina Faso to Late Mr. Muammar Muhammad Abu Minyar
AL-Gaddafi of Libya c. 1942 – 20 October 2011.

Late Mr. Muammar Muhammad Abu Minyar al-Gaddafi c. 1942 – 20 October
2011, commonly known as Colonel Gaddafi, was a Libyan former head of
state, revolutionary and a politician, who died on 20 October 2011,
was my client here in Burkina Faso Africa.

My client Late Mr. Muammar Muhammad Abu Minyar AL-Gaddafi c. 1942 – 20
October 2011, was having a deposit sum of {thirty million four Hundred
thousand united state dollars} only ($30.4M USD) with a security
finance firm affiliated with African development bank here in Burkina
Faso.

With the above explanation’s I want to move this money from Burkina
Faso to your country, affidavit on your name, but note that this is a
deal between me and you and should not be related to anybody until the
deal is over for security reasons, please if interested reply as soon
as possible.

Thanks,
Honorable Barrister Paul Williams.

Re: [PATCH 6/6] reflog-walk: stop using fake parents

2017-07-06 Thread Jeff King

On Thu, Jul 06, 2017 at 11:02:24PM -0400, Jeff King wrote:

> On Fri, Jul 07, 2017 at 12:32:39AM +, Eric Wong wrote:
> 
> > I'm not sure why, but this is causing t1414.8 failures on 32-bit
> > x86 with the latest pu with Debian jessie (oldstable).
> > 
> > Reverting this (beafb2c62947a6d4a97b9c3baf99fe62ec8e830f) in pu
> > seems to fix the test for me.
> > 
> > +Cc: Ramsay since he also had a 32-bit environment.
> 
> Thanks, I was able to reproduce with CFLAGS=-m32.
> 
> > --- expect  2017-07-07 00:30:57.796325232 +
> > +++ actual  2017-07-07 00:30:57.796325232 +
> > @@ -3,6 +3,8 @@
> >  HEAD@{2} checkout: moving from master to side
> >  HEAD@{3} commit: two
> >  HEAD@{4} commit (initial): one
> > -side@{0} commit (merge): Merge branch 'master' into side
> > -side@{1} commit: three
> > -side@{2} branch: Created from HEAD^
> > +HEAD@{0} commit (merge): Merge branch 'master' into side
> > +HEAD@{1} commit: three
> > +HEAD@{2} checkout: moving from master to side
> > +HEAD@{3} commit: two
> > +HEAD@{4} commit (initial): one
> 
> That's quite an unexpected error (to show the HEAD reflog twice). Given
> that it triggers with 32-bit builds, it's like there's some funny memory
> error. And indeed, running it under valgrind shows a problem in
> add_reflog_for_walk. I'll try to dig into it. Thanks for reporting.

Ah, it's a bug in the existing code. It inserts an entry into a string
list and then frees the memory, but the string list wasn't asked to make
a copy of the string. You can see the bug by running t1414 with
--valgrind even before any of the code changes (though note that you'll
have to look at the "-v" output, since it's already marked as
expect-failure).

This fixes it:

diff --git a/reflog-walk.c b/reflog-walk.c
index b7e489ad32..c1f61c524a 100644
--- a/reflog-walk.c
+++ b/reflog-walk.c
@@ -136,6 +136,7 @@ struct reflog_walk_info {
 void init_reflog_walk(struct reflog_walk_info **info)
 {
*info = xcalloc(1, sizeof(struct reflog_walk_info));
+   (*info)->complete_reflogs.strdup_strings = 1;
 }
 
 int add_reflog_for_walk(struct reflog_walk_info *info,

I'll add that to the 'maint' portion of the series.

-Peff

Re: [PATCH 6/6] reflog-walk: stop using fake parents

2017-07-06 Thread Jeff King

On Fri, Jul 07, 2017 at 12:32:39AM +, Eric Wong wrote:

> I'm not sure why, but this is causing t1414.8 failures on 32-bit
> x86 with the latest pu with Debian jessie (oldstable).
> 
> Reverting this (beafb2c62947a6d4a97b9c3baf99fe62ec8e830f) in pu
> seems to fix the test for me.
> 
> +Cc: Ramsay since he also had a 32-bit environment.

Thanks, I was able to reproduce with CFLAGS=-m32.

> --- expect2017-07-07 00:30:57.796325232 +
> +++ actual2017-07-07 00:30:57.796325232 +
> @@ -3,6 +3,8 @@
>  HEAD@{2} checkout: moving from master to side
>  HEAD@{3} commit: two
>  HEAD@{4} commit (initial): one
> -side@{0} commit (merge): Merge branch 'master' into side
> -side@{1} commit: three
> -side@{2} branch: Created from HEAD^
> +HEAD@{0} commit (merge): Merge branch 'master' into side
> +HEAD@{1} commit: three
> +HEAD@{2} checkout: moving from master to side
> +HEAD@{3} commit: two
> +HEAD@{4} commit (initial): one

That's quite an unexpected error (to show the HEAD reflog twice). Given
that it triggers with 32-bit builds, it's like there's some funny memory
error. And indeed, running it under valgrind shows a problem in
add_reflog_for_walk. I'll try to dig into it. Thanks for reporting.

-Peff

Re: [RFC/WIP PATCH] object store classification

2017-07-06 Thread Junio C Hamano

Stefan Beller  writes:

> Subject: Re: [RFC/WIP PATCH] object store classification

I thought you are writing different object-store backends and
classifying them into many categories (e.g. local, networked,
telepathic, etc.)

It is a logical next step to put per-repository things into
the_repository.  I skimmed the patch and the changes smelled sane,
but I didn't read it really carefully with fine toothed comb.

Thanks.  The remainder kept for reference, but no additional
comments in there.

> This continues the efforts of bw/repo-object, making our code base
> more object oriented by adding the object store to the the repository struct.
>
> Long term goal of the series that would evolve from this discussion:
> * Easier to implement submodule features as it can be in the same process.
>
> Short term goal:
> * get rid of 'add_submodule_odb' in submodule.c
>   When fetching or pushing we need to determine if a submodule needs 
> processing
>   by finding out if certain commits are present.  To do this we add the 
> submodule
>   objects as an alternate object database and do the processing in the same
>   process. In case of multiple submodules, this would pollute the object store
>   hence being slower and increasing memory usage, too.
>
> This patch only changes the object store code to be object oriented based
> on the repository struct.
>
> To go for the short term goal we'd need to convert a few more places, mostly
> all the construction (blob.c, tree.c, commit.c)
>
> This is marked RFC as I'd want to gather feedback if the approach, presented
> in the header files is sound.
>
> Thanks!
>
> Signed-off-by: Stefan Beller 
> ---
>
>   Peff, this area of the code seems performance sensitive for a variety of
>   use cases, specifically yours. So I cc'd you here. :)
>
>  object.c | 77 
> +++-
>  object.h | 51 
>  repository.h |  6 +
>  3 files changed, 92 insertions(+), 42 deletions(-)
>
> diff --git a/object.c b/object.c
> index 06ba3a11d8..b5ec0bb2f9 100644
> --- a/object.c
> +++ b/object.c
> @@ -5,17 +5,15 @@
>  #include "commit.h"
>  #include "tag.h"
>  
> -static struct object **obj_hash;
> -static int nr_objs, obj_hash_size;
> -
> -unsigned int get_max_object_index(void)
> +unsigned int object_store_get_max_index(struct repository *r)
>  {
> - return obj_hash_size;
> + return r->objects.obj_hash_size;
>  }
>  
> -struct object *get_indexed_object(unsigned int idx)
> +struct object *object_store_get_indexed(struct repository *r,
> + unsigned int idx)
>  {
> - return obj_hash[idx];
> + return r->objects.obj_hash[idx];
>  }
>  
>  static const char *object_type_strings[] = {
> @@ -82,11 +80,15 @@ static void insert_obj_hash(struct object *obj, struct 
> object **hash, unsigned i
>   * Look up the record for the given sha1 in the hash map stored in
>   * obj_hash.  Return NULL if it was not found.
>   */
> -struct object *lookup_object(const unsigned char *sha1)
> +struct object *object_store_lookup(struct repository *r,
> +const unsigned char *sha1)
>  {
>   unsigned int i, first;
>   struct object *obj;
>  
> + struct object **obj_hash = r->objects.obj_hash;
> + int obj_hash_size = r->objects.obj_hash_size;
> +
>   if (!obj_hash)
>   return NULL;
>  
> @@ -114,29 +116,31 @@ struct object *lookup_object(const unsigned char *sha1)
>   * power of 2 (but at least 32).  Copy the existing values to the new
>   * hash map.
>   */
> -static void grow_object_hash(void)
> +static void grow_object_hash(struct repository *r)
>  {
>   int i;
>   /*
>* Note that this size must always be power-of-2 to match hash_obj
>* above.
>*/
> - int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
> + int new_hash_size = r->objects.obj_hash_size < 32 ?
> + 32 : 2 * r->objects.obj_hash_size;
>   struct object **new_hash;
>  
>   new_hash = xcalloc(new_hash_size, sizeof(struct object *));
> - for (i = 0; i < obj_hash_size; i++) {
> - struct object *obj = obj_hash[i];
> + for (i = 0; i < r->objects.obj_hash_size; i++) {
> + struct object *obj = r->objects.obj_hash[i];
>   if (!obj)
>   continue;
>   insert_obj_hash(obj, new_hash, new_hash_size);
>   }
> - free(obj_hash);
> - obj_hash = new_hash;
> - obj_hash_size = new_hash_size;
> + free(r->objects.obj_hash);
> + r->objects.obj_hash = new_hash;
> + r->objects.obj_hash_size = new_hash_size;
>  }
>  
> -void *create_object(const unsigned char *sha1, void *o)
> +void *object_store_create(struct repository *r,
> +   const unsigned char *sha1, void *o)
>  {
>   struct object *obj = o;
>  
> @@ -145,15 +149,17 @@ void

Re: What's cooking in git.git (Jul 2017, #01; Wed, 5)

2017-07-06 Thread Junio C Hamano

Stefan Beller  writes:

> Speaking of submodules, It's not just features, but I also send bug fixes. ;)
> https://public-inbox.org/git/20170630003851.17288-1-sbel...@google.com/
> (That patch is not related to this series, except for working in the submodule
> area, but I consider that patch more important than e.g. this series.)

I did not see the patch as fixing a bug, though.  

I do agree that overwriting the branch tips in the submodule
repositories, possibly rewinding and discarding user's work done on
the local branches, is indeed a problem.  It however is unclear why
detaching HEAD is a good solution to solve that problem.

After all, there must have been a reason why the user had checked
out a branch and had pointed it at a specific commit (presumably,
so that further work would be done while on the branch, to make it
easier and safer to eventually push the result back to the upstream
of the submodule's project).  So another solution that seems equally
viable, if not even more so, could be to fail the recursive checkout
saying why the checkout cannot be done, just like we fail a checkout
when a local change interferes with updating the contents in the
working tree and the index with an error message explaining which
paths are problematic.

I am *not* saying which one among the above two is better; I am not
even saying that there could be only these two possible solutions.
I just found the posted patch unsatisfactory because it did not make
it clear why the chosen solution is a good one.

Perhaps I misread the description; but that would mean the
description was prone to be misread and has room for improvement ;-)

[RFPR] please ping me with materials for upcoming 2.14

2017-07-06 Thread Junio C Hamano

As I want to start the pre-release stabilization period for the
upcoming Git 2.14 sometime next week, please throw me a pull request
soonish if you have updates meant for it.

Thanks.

Re: [PATCH 6/6] reflog-walk: stop using fake parents

2017-07-06 Thread Eric Wong

I'm not sure why, but this is causing t1414.8 failures on 32-bit
x86 with the latest pu with Debian jessie (oldstable).

Reverting this (beafb2c62947a6d4a97b9c3baf99fe62ec8e830f) in pu
seems to fix the test for me.

+Cc: Ramsay since he also had a 32-bit environment.

--8<--
ok 7 - --parents shows true parents

expecting success: 
{
do_walk HEAD &&
do_walk side
} >expect &&
do_walk HEAD side >actual &&
test_cmp expect actual

--- expect  2017-07-07 00:30:57.796325232 +
+++ actual  2017-07-07 00:30:57.796325232 +
@@ -3,6 +3,8 @@
 HEAD@{2} checkout: moving from master to side
 HEAD@{3} commit: two
 HEAD@{4} commit (initial): one
-side@{0} commit (merge): Merge branch 'master' into side
-side@{1} commit: three
-side@{2} branch: Created from HEAD^
+HEAD@{0} commit (merge): Merge branch 'master' into side
+HEAD@{1} commit: three
+HEAD@{2} checkout: moving from master to side
+HEAD@{3} commit: two
+HEAD@{4} commit (initial): one
not ok 8 - walking multiple reflogs shows both
#   
#   {
#   do_walk HEAD &&
#   do_walk side
#   } >expect &&
#   do_walk HEAD side >actual &&
#   test_cmp expect actual
#   

expecting success: 
head=$(git rev-parse HEAD) &&
one=$(git rev-parse one) &&
ident="$GIT_COMMITTER_NAME <$GIT_COMMITTER_EMAIL> $GIT_COMMITTER_DATE" 
&&
echo "$head $one $ident broken reflog entry" >>.git/logs/HEAD &&

echo $one >expect &&
git log -g --format=%H -1 >actual &&
test_cmp expect actual

ok 9 - walk prefers reflog to ref tip

Re: [PATCH] push: disable lazy --force-with-lease by default

2017-07-06 Thread Stefan Beller

On Thu, Jul 6, 2017 at 3:39 PM, Junio C Hamano  wrote:
>>> @@ -91,6 +91,7 @@ test_expect_success 'push to update (allowed)' '
>>> (
>>> cd dst &&
>>> test_commit D &&
>>> +   git config push.allowLazyForceWithLease false &&
>>
>> Here I thought
>>
>> test_config -C dst ...
>>
>> at the beginning might be useful, though ..
>
> I did not think test_config would work inside a subshell.

It does not, it would need to be done outside the ( ),
which is a bit subtle.

Re: [PATCH] push: disable lazy --force-with-lease by default

2017-07-06 Thread Junio C Hamano

Stefan Beller  writes:

>> diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
>> index 0a639664fd..1fa01210a2 100644
>> --- a/Documentation/git-push.txt
>> +++ b/Documentation/git-push.txt
>> @@ -212,8 +212,9 @@ must not already exist.
>>  +
>>  Note that all forms other than `--force-with-lease=:`
>>  that specifies the expected current value of the ref explicitly are
>> -still experimental and their semantics may change as we gain experience
>
> This indicates that this feature is not 'experimental' any more, but disabled
> (for safety reasons as described below). This implies we will not change the
> heuristic for push.allowLazyForceWithLease easily.

I actually wanted to say it was a failed experiment, but I see your
point.  Let's leave the "still experimental" label.

>>  test_expect_success 'push to update (protected)' '
>> @@ -78,7 +79,6 @@ test_expect_success 'push to update (protected, tracking, 
>> forced)' '
>> (
>> cd dst &&
>> test_commit E &&
>> -   git ls-remote . refs/remotes/origin/master >expect &&
>
> This seems unrelated, so ideally it is a separate commit?

Yes, I looked for other things that may be good to fix while at it,
and planned to make a separate clean-up patch if I found enough, but
this was the only one.

>> @@ -91,6 +91,7 @@ test_expect_success 'push to update (allowed)' '
>> (
>> cd dst &&
>> test_commit D &&
>> +   git config push.allowLazyForceWithLease false &&
>
> Here I thought
>
> test_config -C dst ...
>
> at the beginning might be useful, though ..

I did not think test_config would work inside a subshell.

Re: Should "head" also work for "HEAD" on case-insensitive FS?

2017-07-06 Thread Junio C Hamano

Junio C Hamano  writes:

> Once Michael's packed-refs backend stabilizes, we may have a nice
> calm period in the refs subsystem and I expect that this will become
> a good medium-sized project for a contributor who does not have to 
> be so experienced (but not a complete newbie).
>
> It needs to:
>
>  - add icase-files-backend, preferrably sharing as much code as the
>existing files-backend, in refs/.
>
>  - design a mechanism to configure which refs backend to use at
>runtime; as this has to be done fairly early in the control flow,
>this will likely to use early configuration mechanism and will
>probably need to be done in the set-up code, but doing it lazy
>may even be nicer, as not all subcommands need access to refs.
>
> Thanks for a pointer to the archive.

So here is an early WIP/illustration I did to see how involved such
a change would be, which should apply cleanly on top of 'pu'.

I was pleasantly surprised how cleanly refs/files-backend.c
separates the notion of "path" and "refname".  Only two functions,
files_reflog_path() and files_ref_path(), are responsible for taking
the refname and turning it to the pathname of an filesystem entity.
One one function, loose_fill_ref_dir(), is responsible for running
readdir() to find pathname, and turning the result into a refname.
So in theory, these three are the only things that need to know
about the "encoding".

The exact detail of the encoding used here is immaterial, but I just
used "encode uppercase letters and % as % followed by two hex",
which was simple enough.  Usual refs/heads/master and friends will
not have to be touched when encoded this way.  Perhaps the decoding
side should be tweaked so that uppercase letters it sees needs to be
downcased to avoid "refs/heads/Foo" getting returned as "Foo" branch,
as a "Foo" branch should have been encoded as "refs/heads/%46oo".

Having said that, this patch alone does not quite work yet.

 * In the repository discovery code, we have some logic that
   hard-codes the path in the directory (which is a candidate for
   being a repository) to check, like "refs/" and "HEAD".  In the
   attached illustration patch, files_path_encode() special cases
   "HEAD" so that it is not munged, which is a bit of ugly
   workaround for this.

 * I haven't figured out why, but what refs.c calls "pseudo refs"
   bypasses the files backend layer for some but not all operations,
   which causes t1405-main-ref-store to fail.  The test creates a
   "pseudo ref" FOO and then tries to remove it.  Creation seems to
   follow the files-backend.c and thusly goes through the escaping;
   refs.::refs_delete_ref() however does not consult files-backend.c
   and fails to find and delete .git/FOO, because the creation side
   encoded it as ".git/%46%4F%4F".

Michael is CC'ed as I thought it would be simpler to just ask about
the latter bullet point than digging it further myself ;-)

Thanks.

 refs/files-backend.c | 62 +++-
 1 file changed, 56 insertions(+), 6 deletions(-)

diff --git a/refs/files-backend.c b/refs/files-backend.c
index 923e481e06..5bde77cbf8 100644
--- a/refs/files-backend.c
+++ b/refs/files-backend.c
@@ -23,6 +23,7 @@ struct ref_lock {
 struct files_ref_store {
struct ref_store base;
unsigned int store_flags;
+   int encode_names;

char *gitdir;
char *gitcommondir;
@@ -54,6 +55,9 @@ static struct ref_store *files_ref_store_create(const char 
*gitdir,
base_ref_store_init(ref_store, _be_files);
refs->store_flags = flags;

+   /* git_config_get_bool("core.escapeLooseRefNames", 
>encode_names); */
+   refs->encode_names = 1;
+
refs->gitdir = xstrdup(gitdir);
get_common_dir_noenv(, gitdir);
refs->gitcommondir = strbuf_detach(, NULL);
@@ -102,6 +106,49 @@ static struct files_ref_store *files_downcast(struct 
ref_store *ref_store,
return refs;
 }

+static void files_path_encode(struct files_ref_store *refs,
+ struct strbuf *sb, const char *refname)
+{
+   if (!refs->encode_names || !strcmp(refname, "HEAD")) {
+   strbuf_addstr(sb, refname);
+   } else {
+   const char *cp;
+
+   for (cp = refname; *cp; cp++) {
+   int ch = *cp;
+   if (('A' <= ch && ch <= 'Z') || (ch == '%'))
+   strbuf_addf(sb, "%%%02x", ch);
+   else
+   strbuf_addch(sb, ch);
+   }
+   }
+}
+
+static int files_path_decode(struct files_ref_store *refs,
+struct strbuf *sb, const char *name, int namelen)
+{
+   if (!refs->encode_names) {
+   strbuf_add(sb, name, namelen);
+   } else {
+   size_t origlen = sb->len;
+
+   while (namelen--) {
+   int ch = *name++;
+
+

[RFC/WIP PATCH] object store classification

2017-07-06 Thread Stefan Beller

This continues the efforts of bw/repo-object, making our code base
more object oriented by adding the object store to the the repository struct.

Long term goal of the series that would evolve from this discussion:
* Easier to implement submodule features as it can be in the same process.

Short term goal:
* get rid of 'add_submodule_odb' in submodule.c
  When fetching or pushing we need to determine if a submodule needs processing
  by finding out if certain commits are present.  To do this we add the 
submodule
  objects as an alternate object database and do the processing in the same
  process. In case of multiple submodules, this would pollute the object store
  hence being slower and increasing memory usage, too.

This patch only changes the object store code to be object oriented based
on the repository struct.

To go for the short term goal we'd need to convert a few more places, mostly
all the construction (blob.c, tree.c, commit.c)

This is marked RFC as I'd want to gather feedback if the approach, presented
in the header files is sound.

Thanks!

Signed-off-by: Stefan Beller 
---

  Peff, this area of the code seems performance sensitive for a variety of
  use cases, specifically yours. So I cc'd you here. :)

 object.c | 77 +++-
 object.h | 51 
 repository.h |  6 +
 3 files changed, 92 insertions(+), 42 deletions(-)

diff --git a/object.c b/object.c
index 06ba3a11d8..b5ec0bb2f9 100644
--- a/object.c
+++ b/object.c
@@ -5,17 +5,15 @@
 #include "commit.h"
 #include "tag.h"
 
-static struct object **obj_hash;
-static int nr_objs, obj_hash_size;
-
-unsigned int get_max_object_index(void)
+unsigned int object_store_get_max_index(struct repository *r)
 {
-   return obj_hash_size;
+   return r->objects.obj_hash_size;
 }
 
-struct object *get_indexed_object(unsigned int idx)
+struct object *object_store_get_indexed(struct repository *r,
+   unsigned int idx)
 {
-   return obj_hash[idx];
+   return r->objects.obj_hash[idx];
 }
 
 static const char *object_type_strings[] = {
@@ -82,11 +80,15 @@ static void insert_obj_hash(struct object *obj, struct 
object **hash, unsigned i
  * Look up the record for the given sha1 in the hash map stored in
  * obj_hash.  Return NULL if it was not found.
  */
-struct object *lookup_object(const unsigned char *sha1)
+struct object *object_store_lookup(struct repository *r,
+  const unsigned char *sha1)
 {
unsigned int i, first;
struct object *obj;
 
+   struct object **obj_hash = r->objects.obj_hash;
+   int obj_hash_size = r->objects.obj_hash_size;
+
if (!obj_hash)
return NULL;
 
@@ -114,29 +116,31 @@ struct object *lookup_object(const unsigned char *sha1)
  * power of 2 (but at least 32).  Copy the existing values to the new
  * hash map.
  */
-static void grow_object_hash(void)
+static void grow_object_hash(struct repository *r)
 {
int i;
/*
 * Note that this size must always be power-of-2 to match hash_obj
 * above.
 */
-   int new_hash_size = obj_hash_size < 32 ? 32 : 2 * obj_hash_size;
+   int new_hash_size = r->objects.obj_hash_size < 32 ?
+   32 : 2 * r->objects.obj_hash_size;
struct object **new_hash;
 
new_hash = xcalloc(new_hash_size, sizeof(struct object *));
-   for (i = 0; i < obj_hash_size; i++) {
-   struct object *obj = obj_hash[i];
+   for (i = 0; i < r->objects.obj_hash_size; i++) {
+   struct object *obj = r->objects.obj_hash[i];
if (!obj)
continue;
insert_obj_hash(obj, new_hash, new_hash_size);
}
-   free(obj_hash);
-   obj_hash = new_hash;
-   obj_hash_size = new_hash_size;
+   free(r->objects.obj_hash);
+   r->objects.obj_hash = new_hash;
+   r->objects.obj_hash_size = new_hash_size;
 }
 
-void *create_object(const unsigned char *sha1, void *o)
+void *object_store_create(struct repository *r,
+ const unsigned char *sha1, void *o)
 {
struct object *obj = o;
 
@@ -145,15 +149,17 @@ void *create_object(const unsigned char *sha1, void *o)
obj->flags = 0;
hashcpy(obj->oid.hash, sha1);
 
-   if (obj_hash_size - 1 <= nr_objs * 2)
-   grow_object_hash();
+   if (r->objects.obj_hash_size - 1 <= r->objects.nr_objs * 2)
+   grow_object_hash(r);
 
-   insert_obj_hash(obj, obj_hash, obj_hash_size);
-   nr_objs++;
+   insert_obj_hash(obj, r->objects.obj_hash, r->objects.obj_hash_size);
+   r->objects.nr_objs++;
return obj;
 }
 
-void *object_as_type(struct object *obj, enum object_type type, int quiet)
+void *object_as_type(struct object *obj,
+enum object_type type,
+

Re: [PATCH] push: disable lazy --force-with-lease by default

2017-07-06 Thread Stefan Beller

On Thu, Jul 6, 2017 at 11:56 AM, Junio C Hamano  wrote:
> "git push --force-with-lease=:" makes sure that
> there is no unexpected changes to the branch at the remote while you
> prepare a rewrite based on the old state of the branch.  This
> feature came with an experimental option that allows : part
> to be omitted by using the tip of remote-tracking branch that
> corresponds to the .
>
> It turns out that some people use third-party tools that fetch from
> remote and update the remote-tracking branches behind users' back,
> defeating the safety relying on the stability of the remote-tracking
> branches.  We have some warning text that was meant to be scary
> sounding in our documentation, but nevertheless people seem to be
> bitten.  cf. 
> https://public-inbox.org/git/1491617750.2149.10.ca...@mattmccutchen.net/
> for a recent example.
>
> Let's disable the form that relies on the stability of remote-tracking
> branches by default, and allow users who _know_ their remote-tracking
> branches are stable to enable it with a configuration variable.
>
> This problem was predicted from the very beginning; see 28f5d176
> (remote.c: add command line option parser for "--force-with-lease",
> 2013-07-08).
>
> Signed-off-by: Junio C Hamano 
> ---
>
>  * This is a bit overdue safety fix that we should have done long
>time ago.  If we had this, I do not think it makes it riskier to
>forbid --force and tell people to use --force-with-lease.
>
>  Documentation/config.txt   |  5 +
>  Documentation/git-push.txt |  5 +++--
>  builtin/send-pack.c|  5 +
>  remote.c   | 16 
>  remote.h   |  2 ++
>  send-pack.c|  1 +
>  t/t5533-push-cas.sh| 19 +--
>  transport-helper.c |  5 +
>  transport.c|  5 +
>  9 files changed, 55 insertions(+), 8 deletions(-)
>
> diff --git a/Documentation/config.txt b/Documentation/config.txt
> index 06898a7498..2f929315a2 100644
> --- a/Documentation/config.txt
> +++ b/Documentation/config.txt
> @@ -2588,6 +2588,11 @@ new default).
>
>  --
>
> +push.allowLazyForceWithLease::
> +   If set to true, allow the `--force-with-lease` option
> +   without the expected object name (i.e. expecting the objects
> +   at the tip of corresponding remote-tracking branches).
> +
>  push.followTags::
> If set to true enable `--follow-tags` option by default.  You
> may override this configuration at time of push by specifying
> diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
> index 0a639664fd..1fa01210a2 100644
> --- a/Documentation/git-push.txt
> +++ b/Documentation/git-push.txt
> @@ -212,8 +212,9 @@ must not already exist.
>  +
>  Note that all forms other than `--force-with-lease=:`
>  that specifies the expected current value of the ref explicitly are
> -still experimental and their semantics may change as we gain experience

This indicates that this feature is not 'experimental' any more, but disabled
(for safety reasons as described below). This implies we will not change the
heuristic for push.allowLazyForceWithLease easily.

Upon reading this documentation and the safety issue, I thought
one could have used the reflog to make it safer as well:
 "I have (manually) inspected the remote tracking branch
  just before lunch, so now I can safely push with lease after lunch"

would translate to e.g. "--force-with-lease-safe--inspection-time=1h",
which would make sure that no reflog entries for the given branches
exist in the last hour.

Just as food for thought.

>  test_expect_success 'push to update (protected)' '
> @@ -78,7 +79,6 @@ test_expect_success 'push to update (protected, tracking, 
> forced)' '
> (
> cd dst &&
> test_commit E &&
> -   git ls-remote . refs/remotes/origin/master >expect &&

This seems unrelated, so ideally it is a separate commit?
Fine with me though.

> @@ -91,6 +91,7 @@ test_expect_success 'push to update (allowed)' '
> (
> cd dst &&
> test_commit D &&
> +   git config push.allowLazyForceWithLease false &&

Here I thought

test_config -C dst ...

at the beginning might be useful, though ..

> @@ -103,6 +104,10 @@ test_expect_success 'push to update (allowed, tracking)' 
> '
> (
> cd dst &&
> test_commit D &&
> +   git config push.allowLazyForceWithLease false &&
> +   test_must_fail git push --force-with-lease=master origin 
> master 2>err &&
> +   grep "lazy force-with-lease" err &&
> +   git config --unset push.allowLazyForceWithLease &&

.. here the -C is not useful, so just using it once above may
not be a good idea. For more dense and readable tests
(also faster?), have you considered using passing the option via
-c instead of setting and unsetting it?

Thanks,

Re: [PATCH] push: add config option to --force-with-lease by default.

2017-07-06 Thread Junio C Hamano

Francesco Mazzoli  writes:

> Moreover, it seems to me that the problem `--force-with-lease` is
> just one of marketing. `--force-with-lease` is strictly more "safe"
> than `--force` in the sense that it'll reject some pushes that `--force`
> will let through.

By that logic, a hypothetical update to `--force` that makes 1/3 of
the attempted forced push randomly would make it safer than the
current `--force`, wouldn't it?

When third-party tools fetch and update remote-tracking branches
behind the users' back, the safety based on the stability of
remote-tracking branches are defeated.  And the biggest problem
is that the way `--force-with-lease` misbehaves---it is not like
it randomly and mistakenly stops the push that could go through;
it lets through what shouldn't.

See the other patch I sent just now---with something like that patch
that lets those like you, who know their remote-tracking branches
are reliable, use the lazy form, while disabling it by default for
others (until they examine their situation and perhaps disable the
problematic auto-fetching) in place, I do not think it is a bad idea
to advertise --force-with-lease a safer option than --force (because
those for whom it is not safer will not be able to use it).

[PATCH] push: disable lazy --force-with-lease by default

2017-07-06 Thread Junio C Hamano

"git push --force-with-lease=:" makes sure that
there is no unexpected changes to the branch at the remote while you
prepare a rewrite based on the old state of the branch.  This
feature came with an experimental option that allows : part
to be omitted by using the tip of remote-tracking branch that
corresponds to the .

It turns out that some people use third-party tools that fetch from
remote and update the remote-tracking branches behind users' back,
defeating the safety relying on the stability of the remote-tracking
branches.  We have some warning text that was meant to be scary
sounding in our documentation, but nevertheless people seem to be
bitten.  cf. 
https://public-inbox.org/git/1491617750.2149.10.ca...@mattmccutchen.net/
for a recent example.

Let's disable the form that relies on the stability of remote-tracking
branches by default, and allow users who _know_ their remote-tracking
branches are stable to enable it with a configuration variable.

This problem was predicted from the very beginning; see 28f5d176
(remote.c: add command line option parser for "--force-with-lease",
2013-07-08).

Signed-off-by: Junio C Hamano 
---

 * This is a bit overdue safety fix that we should have done long
   time ago.  If we had this, I do not think it makes it riskier to
   forbid --force and tell people to use --force-with-lease.

 Documentation/config.txt   |  5 +
 Documentation/git-push.txt |  5 +++--
 builtin/send-pack.c|  5 +
 remote.c   | 16 
 remote.h   |  2 ++
 send-pack.c|  1 +
 t/t5533-push-cas.sh| 19 +--
 transport-helper.c |  5 +
 transport.c|  5 +
 9 files changed, 55 insertions(+), 8 deletions(-)

diff --git a/Documentation/config.txt b/Documentation/config.txt
index 06898a7498..2f929315a2 100644
--- a/Documentation/config.txt
+++ b/Documentation/config.txt
@@ -2588,6 +2588,11 @@ new default).
 
 --
 
+push.allowLazyForceWithLease::
+   If set to true, allow the `--force-with-lease` option
+   without the expected object name (i.e. expecting the objects
+   at the tip of corresponding remote-tracking branches).
+
 push.followTags::
If set to true enable `--follow-tags` option by default.  You
may override this configuration at time of push by specifying
diff --git a/Documentation/git-push.txt b/Documentation/git-push.txt
index 0a639664fd..1fa01210a2 100644
--- a/Documentation/git-push.txt
+++ b/Documentation/git-push.txt
@@ -212,8 +212,9 @@ must not already exist.
 +
 Note that all forms other than `--force-with-lease=:`
 that specifies the expected current value of the ref explicitly are
-still experimental and their semantics may change as we gain experience
-with this feature.
+disabled by default.  Read the note on safety below, and if you think
+you are not affected, enable it with the `push.allowLazyForceWithLease`
+configuration option (cf. linkgit:git-config[1]).
 +
 "--no-force-with-lease" will cancel all the previous --force-with-lease on the
 command line.
diff --git a/builtin/send-pack.c b/builtin/send-pack.c
index 633e0c3cdd..c008f5b60f 100644
--- a/builtin/send-pack.c
+++ b/builtin/send-pack.c
@@ -68,6 +68,11 @@ static void print_helper_status(struct ref *ref)
msg = "stale info";
break;
 
+   case REF_STATUS_REJECT_LAZY_CAS:
+   res = "error";
+   msg = "lazy force-with-error";
+   break;
+
case REF_STATUS_REJECT_ALREADY_EXISTS:
res = "error";
msg = "already exists";
diff --git a/remote.c b/remote.c
index d87482573d..2d3ee6020f 100644
--- a/remote.c
+++ b/remote.c
@@ -1517,7 +1517,9 @@ void set_ref_status_for_push(struct ref *remote_refs, int 
send_mirror,
 int force_update)
 {
struct ref *ref;
+   int allow_lazy_cas = 0;
 
+   git_config_get_bool("push.allowLazyForceWithLease", _lazy_cas);
for (ref = remote_refs; ref; ref = ref->next) {
int force_ref_update = ref->force || force_update;
int reject_reason = 0;
@@ -1544,7 +1546,9 @@ void set_ref_status_for_push(struct ref *remote_refs, int 
send_mirror,
 * branch.
 */
if (ref->expect_old_sha1) {
-   if (oidcmp(>old_oid, >old_oid_expect))
+   if (!allow_lazy_cas && ref->lazy_cas)
+   reject_reason = REF_STATUS_REJECT_LAZY_CAS;
+   else if (oidcmp(>old_oid, >old_oid_expect))
reject_reason = REF_STATUS_REJECT_STALE;
else
/* If the ref isn't stale then force the 
update. */
@@ -2341,10 +2345,13 @@ static void apply_cas(struct push_cas_option *cas,
if

Re: name-rev: anchor pattern without --exclude?

2017-07-06 Thread Kyle Meyer

Bryan Turner  writes:

[...]

> If you don't need the ancestor traversals "git name-rev" provides,
> "git for-each-ref --count 1 --format "%(refname:short)" --points-at
>  refs/heads/" might work. That only goes back to Git 2.7.0,
> though; still quite a ways off your 1.9 target. ("git branch
> --points-at" does the same thing, I should add, and only for branches,
> but you can't directly limit its output like you can with
> "for-each-ref".. Perhaps that doesn't matter for your use case.) If
> you want names like "master~2", from your example, though,
> "--points-at" won't do what you need.

Thanks for the suggestion.  Unfortunately, I do want to consider names
like "master~2".  I was just hoping that I had overlooked another way to
achieve the

git name-rev --refs="refs/heads/*" --exclude="*/refs/heads/*" 

example I gave in my initial post.

-- 
Kyle

Re: bug report: dates on diff

2017-07-06 Thread Junio C Hamano

Junio C Hamano  writes:

> Imagine this scenario:
>
>  - Contributor A writes a change on 2017-07-01 and send it in to me
>  - Contributor B writes a change on 2017-07-03 and send it in to me
>  - I apply change from B on 2017-07-04 on 'master'
>  - I apply change from A on 2017-07-05 on 'master'
>  - You clone the resulting repository from me on 2017-07-06
>
> Now, you have at the tip of 'master' in your repository the commit
> that records the change by contributor A.
>
> And there are three times that are relevant to your tip of 'master'.
>
>  - When was the commit that sits at the tip of 'master' made?
>  - When was the change recorded in that commit made?
>  - When was the commit made at the tip of _your_ 'master'?
>
> and the answers are 2017-07-01, 2017-07-05 and 2017-07-06, respectively.
> They are called "committer", "author" and "reflog" timestamps.

Oops, obviously the dates have to be 2017-07-05, 2017-07-01 and
2017-07-06.  I made the commit on the 5th (i.e. "committer"
timestamp) that records a change written on the 1st (i.e. "author"
timestamp).

Re: bug report: dates on diff

2017-07-06 Thread Junio C Hamano

Todd Lewis  writes:

> Trying not to sound snide, but, what's the point of "--date=" on commits if 
> you
> can't use it later? Granted, things always seem harder until you understand 
> how
> the work. Thanks again.

You do not sound snide at all, at least to me ;-)

Imagine this scenario:

 - Contributor A writes a change on 2017-07-01 and send it in to me
 - Contributor B writes a change on 2017-07-03 and send it in to me
 - I apply change from B on 2017-07-04 on 'master'
 - I apply change from A on 2017-07-05 on 'master'
 - You clone the resulting repository from me on 2017-07-06

Now, you have at the tip of 'master' in your repository the commit
that records the change by contributor A.

And there are three times that are relevant to your tip of 'master'.

 - When was the commit that sits at the tip of 'master' made?
 - When was the change recorded in that commit made?
 - When was the commit made at the tip of _your_ 'master'?

and the answers are 2017-07-01, 2017-07-05 and 2017-07-06, respectively.
They are called "committer", "author" and "reflog" timestamps.

The 'master@{}' syntax is about the reflog timestamp.  It
never looks at the former two.  In general whenever you see @{...},
that is talking about the information that is stored in "reflog",
the record of when and in what order the  in _your_ repository
pointed at various objects.

The "commit --date=" is about tweaking the "author" timestamp
the commit records.  It does not affect the "committer" timestamp.
By definition (of what "reflog" is), it will not affect the reflog
timestamp, because the reflog timestamp for a particular commit
would be different across repositories.  I had A's commit on _my_
master on 2017-07-05, but the time you had it on _your_ master was
not until 2017-07-06.

I think the best way to do this properly would be to extend the
"^{...}" syntax so that we can say e.g.

git show "master^{#author-time > 2017-01-01}"

to mean "traverse from the tip of 'master' and find the first commit
that satisfies the given expression, "author-time > 2017-01-01", i.e.
has author timestamp that is later than the specified date.  Which
would be in line with the existing

git show "master^{/my commit message}"

that means "traverse from the tip of 'master' and find the first
commit that has 'my commit message' in the log message".

Note that "git log --since= --until= master" would be
the thing that is closest to what you would want that already
exists, but that limits by the "committer" timestamp.  You _could_
lie about both author and committer timestamp when building the
backdated history and use this mechanism, but we have author and
committer timestamps that are distinct for a reason, so it is a
rather poor workaround.

Re: name-rev: anchor pattern without --exclude?

2017-07-06 Thread Bryan Turner

On Thu, Jul 6, 2017 at 10:23 AM, Kyle Meyer  wrote:
> Junio C Hamano  writes:
>> Kyle Meyer  writes:
>
>> What is the answer desired by your application when two or more
>> branches point at the same commit you are interested in?  Pick one at
>> random?  An error saying it cannot decide where to place the snapshot?
>
> Our use of name-rev is just to get a friendly name for a hash.  If two
> branches point to the same commit, we're fine with whatever decision
> "git name-rev" makes; we just want to limit it to refs in the
> "refs/heads/" namespace.

If you don't need the ancestor traversals "git name-rev" provides,
"git for-each-ref --count 1 --format "%(refname:short)" --points-at
 refs/heads/" might work. That only goes back to Git 2.7.0,
though; still quite a ways off your 1.9 target. ("git branch
--points-at" does the same thing, I should add, and only for branches,
but you can't directly limit its output like you can with
"for-each-ref".. Perhaps that doesn't matter for your use case.) If
you want names like "master~2", from your example, though,
"--points-at" won't do what you need.

Bryan

>
> --
> Kyle

Re: [RFC/PATCH v4 00/49] Add initial experimental external ODB support

2017-07-06 Thread Ben Peart

On 7/1/2017 3:41 PM, Christian Couder wrote:

On Fri, Jun 23, 2017 at 8:24 PM, Ben Peart  wrote:

On 6/20/2017 3:54 AM, Christian Couder wrote:

To be able to better handle some kind of objects, for example big
blobs, it would be nice if Git could store its objects in other object
databases (ODB).

To do that, this patch series makes it possible to register commands,
also called "helpers", using "odb..command" config variables,
to access external ODBs where objects can be stored and retrieved.

External ODBs should be able to tranfer information about the blobs
they store. This patch series shows how this is possible using kind of
replace refs.

Great to see this making progress!

My thoughts and questions are mostly about the overall design tradeoffs.

Is your intention to enable the ODB to completely replace the regular object
store or just to supplement it?

It is to supplement it, as I think the regular object store works very
well most of the time.

I certainly understand the desire to restrict the scope of the patch 
series.  I know full replacement is a much larger problem as it would 
touch much more of the codebase.

I'd still like to see an object store that was thread safe, more robust 
(ie transactional) and hopefully faster so I am hoping we can design the 
ODB interface to eventually enable that.

For example: it seems the ODB helpers need to be able to be called 
before the regular object store in the "put" case (so they can intercept 
large objects for example) and after in in the "get" case to enable 
"fault-in."  Something like this:

have/get

git object store
large object ODB helper

put
===
large object ODB helper
git object store

It would be nice if that order wasn't hard coded but that the order or 
level of the "git object store" could be specified using the same 
mechanism as used for the ODB helpers so that some day you could do 
something like this:

have/get

"LMDB" ODB helper
git object store

put
===
"LMDB" ODB helper
git object store

(and even further out, drop the current git object store completely :)).

I think it would be good to ensure the
interface is robust and performant enough to actually replace the current
object store interface (even if we don't actually do that just yet).

I agree that it should be robust and performant, but I don't think it
needs to be as performant in all cases as the current object store
right now.

Another way of asking this is: do the 3 verbs (have, get, put) and the 3
types of "get" enable you to wrap the current loose object and pack file
code as ODBs and run completely via the external ODB interface?  If not,
what is missing and can it be added?

One example of what I think is missing is a way to stream objects (ie 
get_stream, put_stream).  This isn't used often in git but it did exist 
last I checked.  I'm not saying this needs to be supported in the first 
version - more if we want to support total replacement.

I also wonder if we'd need an "optimize" verb (for "git gc") or a 
"validate" verb (for "git fsck").  Again, only if/when we are looking at 
total replacement.

Right now the "put" verb only send plain blobs, so the most logical
way to run completely via the external ODB interface would be to use
it to send and receive plain blobs. There are tests scripts (t0420,
t0470 and t0480) that use an http server as the external ODB and all
the blobs are stored in it.

And yeah for now it works only for blobs. There is a temporary patch
in the series that limits it to blobs. For the non RFC patch series, I
think it should either use the attribute system to tell which objects
should be run via the external ODB interface, or perhaps there should
be a way to ask each external ODB helper which kind of objects and
blobs it can handle. I should add that in the future work part.

Sounds good.  For GVFS we handle all object types (including commits and 
trees) so would need this to be enabled so that we can switch to using it.

_Eventually_ it would be great to see the current object store(s) moved
behind the new ODB interface.

This is not one of my goals and I think it could be a problem if we
want to keep the "fault in" mode.

> In this mode the helper writes or reads directly to or from the
> current object store, so it needs the current object store to be
> available.
>

I think implementing "fault in" should be an option that the ODB handler 
can implement but should not be required by the design/interface.  As 
you state above, this could be as simple as having the ODB handler write 
the object to the git object store on "get."

Also I think compatibility with other git implementations is important
and it is a good thing that they can all work on a common repository
format.

I agree this should be an option but I don't want to say we'll _never_ 
move to a better object store.

When there are multiple ODB providers, what is the order they are called?

The

Re: bug report: dates on diff

2017-07-06 Thread Todd Lewis

On 07/06/2017 12:22 PM, Junio C Hamano wrote:
> If you didn't create this repository back in 2012, then the syntax
> "master@{01-01-2012}" that asks "Back at the beginning of 2012, what
> object did the master branch point at?" does not have a sensible
> answer.  That can be seen in the warning you got from Git.
> 
> Hope this clarifies.

Thanks; it does explain what I saw, and even makes some sense.

However, it does leave me scratching my head about how to accomplish what I set
out to do, or just how to reference commits by their dates as displayed by git
log w/o either changing the system time/date when committing historic data
(which seems like a total no-no), or making multiple passes over "get log" to
determine how to ref. Perhaps there's a way to patch the log to match dates
recorded in relevant objects?

Trying not to sound snide, but, what's the point of "--date=" on commits if you
can't use it later? Granted, things always seem harder until you understand how
the work. Thanks again.

Re: name-rev: anchor pattern without --exclude?

2017-07-06 Thread Kyle Meyer

Junio C Hamano  writes:

> Kyle Meyer  writes:
>
>> [*] A bit more information on why I'm trying to do this: In Magit, we
>> have a work-in-progress feature that takes "snapshots" of changes
>> before they are committed.  These snapshots are stored as
>> "refs/wip/{wtree,index}/".
>>
>> We want to use name-rev to map a commit to a name in "refs/heads/",
>> ignoring these snapshot refs.
>
> What is the  in the above supposed to represent?

It's the current branch, as returned by "git symbolic-ref HEAD".

> When a user sees two refs "refs/wip/{wtree,index}/",
> does it mean: "These two represent a snapshot for changes while the
> user was working on this branch"?

Yes.

> Isn't name-rev a wrong tool to find that information?

Sorry for the confusion.  We're using "symbolic-ref HEAD" to get the
above information.  I was just trying to explain why we're dealing with
ref names that contain a "refs/heads" substring (like
refs/wip/wtree/refs/heads/master in the example I gave).

> What is the answer desired by your application when two or more
> branches point at the same commit you are interested in?  Pick one at
> random?  An error saying it cannot decide where to place the snapshot?

Our use of name-rev is just to get a friendly name for a hash.  If two
branches point to the same commit, we're fine with whatever decision
"git name-rev" makes; we just want to limit it to refs in the
"refs/heads/" namespace.

-- 
Kyle

Re: name-rev: anchor pattern without --exclude?

2017-07-06 Thread Junio C Hamano

Kyle Meyer  writes:

> [*] A bit more information on why I'm trying to do this: In Magit, we
> have a work-in-progress feature that takes "snapshots" of changes
> before they are committed.  These snapshots are stored as
> "refs/wip/{wtree,index}/".
>
> We want to use name-rev to map a commit to a name in "refs/heads/",
> ignoring these snapshot refs.

What is the  in the above supposed to represent?  When
a user sees two refs "refs/wip/{wtree,index}/", does
it mean: "These two represent a snapshot for changes while the user
was working on this branch"?

Isn't name-rev a wrong tool to find that information?  What is the
answer desired by your application when two or more branches point
at the same commit you are interested in?  Pick one at random?  An
error saying it cannot decide where to place the snapshot?

I am wondering if you are looking for "symbolic-ref HEAD".

Re: Flurries of 'git reflog expire'

2017-07-06 Thread Bryan Turner

I'm one of the Bitbucket Server developers. My apologies; I just
noticed this thread or I would have jumped in sooner!

On Thu, Jul 6, 2017 at 6:31 AM, Andreas Krey  wrote:
> On Wed, 05 Jul 2017 04:20:27 +, Jeff King wrote:
>> On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote:
> ...
>> And what does the process tree look like?
>
> Lots (~ 10) of
>
>   \_ /usr/bin/git receive-pack 
> /opt/apps/atlassian/bitbucket-data/shared/data/repositories/68
>   |   \_ git gc --auto --quiet
>   |   \_ git reflog expire --all
>
> plus another dozen gc/expire pairs where the parent is already gone.
> All with the same arguments - auto GC.

Do you know what version of Bitbucket Server is in use? Based on the
fact that it's "git gc --auto" triggered from a "git receive-pack",
that implies two things:
- You're on a 4.x version of Bitbucket Server
- The repository (68) has never been forked

Depending on your Bitbucket Server version (this being the reason I
asked), there are a couple different fixes available:

- Fork the repository. You don't need to _use_ the fork, but having a
fork existing will trigger Bitbucket Server to disable auto GC and
fully manage that itself. That includes managing both _concurrency_
and _frequency_ of GC. This works on all versions of Bitbucket Server.

- Run "git config gc.auto 0" in
/opt/apps/atlassian/bitbucket-data/shared/data/repositories/68 to
disable auto GC yourself. This may be preferable to forking the
repository, which, in addition to disabling auto GC, also disables
object pruning. However, you must be running at least Bitbucket Server
4.6.0 for this approach to work. Otherwise auto GC will simply be
reenabled the first time Bitbucket Server goes to trigger GC, when it
detects that the repository has no forks.

Assuming you're on 4.6.0 or newer, either approach should fix the
issue. If you're on 4.5 or older, forking is the only viable approach
unless you upgrade Bitbucket Server first.

I also want to add that Bitbucket Server 5.x includes totally
rewritten GC handling. 5.0.x automatically disables auto GC in all
repositories and manages it explicitly, and 5.1.x fully removes use of
"git gc" in favor of running relevant plumbing commands directly. We
moved away from "git gc" specifically to avoid the "git reflog expire
--all", because there's no config setting that _fully disables_
forking that process. By default our bare clones only have reflogs for
pull request refs, and we've explicitly configured those to never
expire, so all "git reflog expire --all" can do is use up I/O and,
quite frequently, fail because refs are updated. Since we stopped
running "git gc", we've not yet seen any GC failures on our internal
Bitbucket Server clusters.

Bitbucket Server 5.1.x also includes a new "gc.log" (not to be
confused with the one Git itself writes) which retains a record of
every GC-related process we run in each repository, and how long that
process took to complete. That can be useful for getting clearer
insight into both how often GC work is being done, and how long it's
taking.

Upgrading to 5.x can be a bit of an undertaking, since the major
version brings API changes, so it's totally understandable that many
organizations haven't upgraded yet. I'm just noting that these
improvements are there for when such an upgrade becomes viable.

Hope this helps!
Bryan

>
> I'd wager that each push sees that a GC is in order,
> and doesn't notice that there is one already running.
>
> - Andreas
>
> --
> "Totally trivial. Famous last words."
> From: Linus Torvalds 
> Date: Fri, 22 Jan 2010 07:29:21 -0800

name-rev: anchor pattern without --exclude?

2017-07-06 Thread Kyle Meyer

Hello,

I'm trying to restrict name-rev's output to a ref whose name begins with
the supplied pattern [*].  Looking at "man git-name-rev",
builtin/name-rev.c, and wildmatch.c, I don't see a way to accomplish
this with "--refs=".

Say, for example, that I want to limit name-rev's output to a ref in the
"refs/heads/" namespace.

* cc8f7dc (master) c
* ad6d6f0 b
| * 49a2156 (refs/wip/wtree/refs/heads/master) d
|/  
* b91f97a a

--

$ git name-rev b91f97a
b91f97a wip/wtree/refs/heads/master~1

$ git name-rev --refs="refs/heads/*" b91f97a
b91f97a wip/wtree/refs/heads/master~1

$ git name-rev --refs="^refs/heads/*" b91f97a
b91f97a undefined

Starting with v2.13.0, I can get my desired output using the --exclude
option.

$ git name-rev --refs="refs/heads/*" --exclude="*/refs/heads/*" b91f97a
b91f97a master~2

But, unfortunately, I'm trying to find a solution that works with
earlier Git versions (back to v1.9.4).

Am I overlooking a way to do this without the --exclude option?


[*] A bit more information on why I'm trying to do this: In Magit, we
have a work-in-progress feature that takes "snapshots" of changes
before they are committed.  These snapshots are stored as
"refs/wip/{wtree,index}/".

We want to use name-rev to map a commit to a name in "refs/heads/",
ignoring these snapshot refs.

-- 
Kyle

Re: [PATCH] t5534: fix misleading grep invocation

2017-07-06 Thread Junio C Hamano

Michael J Gruber  writes:

> Junio C Hamano venit, vidit, dixit 05.07.2017 18:26:
>
>> The invocation this fixes is not just misleading but simply wrong.
>> Nicely spotted.
>
> In addition, the patch makes sure to catch any rev-parse failures which
> the original invocation shove under the rug.

Yeah, good thing that this got fixed ;-)

Re: bug report: dates on diff

2017-07-06 Thread Junio C Hamano

Todd Lewis  writes:

> [utoddl@tarna gitbug]$ git diff master@{01-01-2012} charter.txt
> warning: Log for 'master' only goes back to Thu, 6 Jul 2017 08:19:45 -0400.

What you observed is how @{} syntax is designed to
work, and is not limited to "git diff".  Any Git command e.g. "git
rev-parse master@{01-01-2012}", would and should behave the same
way.

The thing to note is that the syntax does not pay any attention to
author or committer dates recorded in the commit objects.  In fact,
if you have a ref that points at an object that is not a commit-ish,
you can still use the syntax.  If you did this, for example

$ git config core.logallrefupdates always

$ one=$(echo one | git hash-object --stdin -w)
$ git update-ref refs/my/blob $one

... time passes ...

$ two=$(echo two | git hash-object --stdin -w)
$ git update-ref refs/my/blob $two

then "git show my/blob@{2.minutes.ago}" will show the blob object
your refs/my/blob ref was pointing at 2 minutes ago.

And as you may know, blobs do not record any timestamp.  So how does
this work?  

The reason why this works is because the time in @{$time}
syntax is about asking what was pointed by the  back in $time
in your repository.  It does not matter what timestamp the object
that was pointed by the  has (or does not have).

If you didn't create this repository back in 2012, then the syntax
"master@{01-01-2012}" that asks "Back at the beginning of 2012, what
object did the master branch point at?" does not have a sensible
answer.  That can be seen in the warning you got from Git.

Hope this clarifies.

Re: [PATCH] comment: fix a typo in the comment

2017-07-06 Thread Junio C Hamano

SZEDER Gábor  writes:

> Speaking of sharp eyes...  That Subject: line really needs a
> s/comment:/commit:/, doesn't it? :)

Surely it does.

[PATCH] reflog-walk: mark a file-local symbol as static

2017-07-06 Thread Ramsay Jones


Signed-off-by: Ramsay Jones 
---

Hi Jeff,

When you re-roll your 'jk/reflog-walk' branch, could you please squash
this into the relevant patch (commit beafb2c629, "reflog-walk: stop using
fake parents", 05-07-2017).

Thanks!

ATB,
Ramsay Jones

 reflog-walk.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/reflog-walk.c b/reflog-walk.c
index a7644d944..8effe722b 100644
--- a/reflog-walk.c
+++ b/reflog-walk.c
@@ -284,7 +284,7 @@ int reflog_walk_empty(struct reflog_walk_info *info)
return !info || !info->nr;
 }
 
-struct commit *next_entry_in_log(struct commit_reflog *log)
+static struct commit *next_entry_in_log(struct commit_reflog *log)
 {
while (log->recno >= 0) {
struct reflog_info *entry = >reflogs->items[log->recno--];
-- 
2.13.0

Re: [PATCH 2/6] t1414: document some reflog-walk oddities

2017-07-06 Thread Junio C Hamano

Jeff King  writes:

> On Wed, Jul 05, 2017 at 03:45:03PM -0700, Junio C Hamano wrote:
>
>> > I did make the ordering intentional. We process each reflog sequentially
>> > in turn. I don't think it would be wrong to interleave them by date, but
>> > I actually think it's useful for somebody who switches that behavior to
>> > consciously update the test. Maybe it's worth calling out in the commit
>> > message.
>> >
>> > I stopped short of actually documenting and guaranteeing that behavior
>> > to the user. I don't know how comfortable we would be changing it later.
>> 
>> I somehow feel that the "showing all entries from HEAD and then
>> showing all from side" is simply a buggy behaviour.  I do not think
>> our users would terribly mind if we changed it later.  But I may be
>> missing the reason why (sometimes?) the sequential behaviour may be
>> useful.
>
> If we think it's buggy, we can fix it now. But I'm not convinced that
> sequential iteration is that bad. It's at least _simple_ and easy to
> explain.

Yes, I agree that sequential is easy to explain, but only when I
consider use of "log" family without "-n 30" or "--since 3.days".
It still is easy to explain---we show from one and then from the
other, but because we stop after showing 30 of them, and the first
one has more than that, you do not see any from the latter.  

It's just the easy-to-explian behaviour is not very useful and very
hard to defend.

> The only other thing that would make sense to me is sorting the
> entries by date (reflog date, not commit date) and showing them in that
> order. But that gives rise to some funny corner cases.
>
> What do we do if there's clock skew within the reflog (e.g., somebody
> fixed their skewed clock such that HEAD@{5} is less recent than
> HEAD@{6}). Do we show them out of order then (even if only a single
> reflog is being shown?).

I would think the easiest-to-explain thing is handle clock skew just
like how clock skew affects the traversal with commit date.  At
least inside a single reflog, there is no need to sort---its
append-only nature gives a reliable "newer to older" order.

> But maybe we could do a pseudo-list-merge, where we treat the reflogs as
> queues and always pick from the queue with the most recent timestamp.

Yes, that was what I had in mind.  An entry with an artificially old
timestamp due to clock skew would clog the output and prevent the
entries behind it on the same reflog from getting shown before
entries from other reflog, but that is the same as what happens in
the normal traversal to sane parents that can only be reached from a
child commit that has an artificially old timestamp.

Thanks.

Charity

2017-07-06 Thread Peggy Altman

Hi,
  My name is Peggy Altman the personal assistant of Ms. Doris Buffett, She is a 
philanthropist and the founder of one of the largest private foundation in the 
world. She is on a mission to give it all away as she strongly believes in 
‘giving while living.’ She always had the idea that never changed in her mind — 
that wealth should be used to help each other which has made her decide to 
donate to you. She has instructed I get in touch with you, kindly acknowledge 
this message and I will get back to you with more details.

Visit the web page to know more about her: 
http://abcnews.go.com/GMA/Books/giving-dorris-buffett-story-michael-zitz/story?id=10827641

Regards,
Peggy Altman.

Re: [PATCH] comment: fix a typo in the comment

2017-07-06 Thread SZEDER Gábor

> On Wed, 2017-07-05 at 11:18 -0700, Junio C Hamano wrote:
> > Kaartic Sivaraam  writes:
> > 
> > > ---
> > > �Though very trivial, I wanted to correct this as I didn't
> > > �want to ignore it after seeing it.
> > 
> > Thanks for sharp eyes.��Sign-off?��(or Sign-of? ;-))
> > 
> I should also thank you for your sharp eyes!

Speaking of sharp eyes...  That Subject: line really needs a
s/comment:/commit:/, doesn't it? :)

Re: [PATCH] hooks: add signature using "interpret-trailers"

2017-07-06 Thread Kaartic Sivaraam

On Wed, 2017-07-05 at 21:14 +0100, Ramsay Jones wrote:
> 
> On 05/07/17 18:35, Kaartic Sivaraam wrote:
> > The sample hook to prepare the commit message before
> > a commit allows users to opt-in to add the signature
> > to the commit message. The signature is added at a place
> > that isn't consistent with the "-s" option of "git commit".
> > Further, it could go out of view in certain cases.
> > 
> > Add the signature in a way similar to "-s" option of
> > "git commit" using git's interpret-trailers command.
> > 
> > It works well in all cases except when the user invokes
> > "git commit" without any arguments. In that case manually
> > add a new line after the first line to ensure it's consistent
> > with the output of "-s" option.
> > 
> > While at it, name the input parameters to improve readability
> > of script.
> 
> I assume each occurrence of 'signature' in the commit message,
> including the subject, should be 'sign-off' (or Signed-off-by)
> instead. Yes?
> 
Yes. Thanks for pointing out a possible way in which the message could
be misinterpreted.

-- 
Kaartic

Re: Flurries of 'git reflog expire'

2017-07-06 Thread Andreas Krey

On Wed, 05 Jul 2017 04:20:27 +, Jeff King wrote:
> On Tue, Jul 04, 2017 at 09:57:58AM +0200, Andreas Krey wrote:
...
> I seem to recall that using --stale-fix is also extremely expensive,
> too. What do the command line arguments for the slow commands look like?

The problem isn't that the expire is slow, it is that there are
many of them, waiting for disk writes.

> And what does the process tree look like?

Lots (~ 10) of

  \_ /usr/bin/git receive-pack 
/opt/apps/atlassian/bitbucket-data/shared/data/repositories/68
  |   \_ git gc --auto --quiet
  |   \_ git reflog expire --all

plus another dozen gc/expire pairs where the parent is already gone.
All with the same arguments - auto GC.

I'd wager that each push sees that a GC is in order,
and doesn't notice that there is one already running.

- Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800

Re: Flurries of 'git reflog expire'

2017-07-06 Thread Andreas Krey

On Tue, 04 Jul 2017 11:43:33 +, Ævar Arnfjörð Bjarmason wrote:
...
> You can set gc.auto=0 in the repo to disable auto-gc, and play with
> e.g. the reflog expire values, see the git-gc manpage.
> 
> But then you need to run your own gc, which is not a bad idea anyway
> with a dedicated git server.

Actually, bitbucket should be doing this. Although I can't quite
rule out the possibility that we reenabled GC in this repo some
time ago.

> But it would be good to get to the bottom of this, we shouldn't be
> running these concurrently.

Indeed. Unfortunately this isn't easily reproduced in the test instance,
so I will need to get a newer git under the production bitbucket.

There are quite some of

  \_ /usr/bin/git receive-pack 
/opt/apps/atlassian/bitbucket-data/shared/data/repositories/68
  |   \_ git gc --auto --quiet
  |   \_ git reflog expire --all

in the process tree, apparently a new one gets started even though previous
ones are still running. The number of running expires grew slowly, in the
order of many minutes.

Andreas

-- 
"Totally trivial. Famous last words."
From: Linus Torvalds 
Date: Fri, 22 Jan 2010 07:29:21 -0800

bug report: dates on diff

2017-07-06 Thread Todd Lewis

I've run into what I think is a bug wrt date handling in "git diff". I have some
historical data with which I'm attempting to populate a new git repo with
back-dated commits. That appears to work. But referencing those commits by date
with "git diff" does not. (I have no idea if the problem is limited to "git
diff"; that's just where I've been banging my head.)

I'm using git version 2.9.4.

Below is a recreation of the problem from scratch using made up data. Note that
the "git log" output shows the commits with the correct dates, but referencing
"master@{any-date-before-git-init}" throws a "Warning: Log for 'master' only
goes back to ..." and it erroneously selects the first commit (for 2010 in the
example below, instead of the requested 2012 commit).

Here's the terminal session demonstrating the problem:

[utoddl@tarna tmp]$ mkdir gitbug

[utoddl@tarna tmp]$ cd gitbug

[utoddl@tarna gitbug]$ git init
Initialized empty Git repository in /tmp/gitbug/.git/

[utoddl@tarna gitbug]$ for year in 2010 2011 2012 2013 2014 ; do
> echo "Charter for year $year." > charter_01-01-$year.txt
> touch -d"$year-01-01" charter_01-01-$year.txt
> done

[utoddl@tarna gitbug]$ ls -al
total 20
drwxrwxr-x.  3 utoddl utoddl 160 Jul  6 08:14 .
drwxrwxrwt. 19 root   root   420 Jul  6 08:12 ..
-rw-rw-r--.  1 utoddl utoddl  23 Jan  1  2010 charter_01-01-2010.txt
-rw-rw-r--.  1 utoddl utoddl  23 Jan  1  2011 charter_01-01-2011.txt
-rw-rw-r--.  1 utoddl utoddl  23 Jan  1  2012 charter_01-01-2012.txt
-rw-rw-r--.  1 utoddl utoddl  23 Jan  1  2013 charter_01-01-2013.txt
-rw-rw-r--.  1 utoddl utoddl  23 Jan  1  2014 charter_01-01-2014.txt
drwxrwxr-x.  7 utoddl utoddl 200 Jul  6 08:12 .git

[utoddl@tarna gitbug]$ for year in 2010 2011 2012 2013 2014 ; do
> cp -p charter_01-01-$year.txt charter.txt
> git add charter.txt
> git commit --date="01-01-$year" -m "Committing charter for $year." charter.txt
> done
[master (root-commit) f5dc22c] Committing charter for 2010.
 Date: Fri Jan 1 07:19:45 2010 -0500
 1 file changed, 1 insertion(+)
 create mode 100644 charter.txt
cp: overwrite 'charter.txt'? y
[master dec8944] Committing charter for 2011.
 Date: Sat Jan 1 07:19:47 2011 -0500
 1 file changed, 1 insertion(+), 1 deletion(-)
cp: overwrite 'charter.txt'? y
[master cab72e0] Committing charter for 2012.
 Date: Sun Jan 1 07:19:48 2012 -0500
 1 file changed, 1 insertion(+), 1 deletion(-)
cp: overwrite 'charter.txt'? y
[master a0cbf74] Committing charter for 2013.
 Date: Tue Jan 1 07:19:49 2013 -0500
 1 file changed, 1 insertion(+), 1 deletion(-)
cp: overwrite 'charter.txt'? y
[master a75164c] Committing charter for 2014.
 Date: Wed Jan 1 07:19:53 2014 -0500
 1 file changed, 1 insertion(+), 1 deletion(-)

[utoddl@tarna gitbug]$ git log
commit a75164cdf56b5f7b10e7575ee4aa4f653656e456
Author: Todd Lewis 
Date:   Wed Jan 1 07:19:53 2014 -0500

Committing charter for 2014.

commit a0cbf74320f5b2309f697a48a8026328a998d787
Author: Todd Lewis 
Date:   Tue Jan 1 07:19:49 2013 -0500

Committing charter for 2013.

commit cab72e0902cb4130de49c5d898ef4cc2daf1e2d1
Author: Todd Lewis 
Date:   Sun Jan 1 07:19:48 2012 -0500

Committing charter for 2012.

commit dec8944f3c411f5f667cdfefee0609d38298489f
Author: Todd Lewis 
Date:   Sat Jan 1 07:19:47 2011 -0500

Committing charter for 2011.

commit f5dc22c636dd66a51e74dc01935a4879bd4946d6
Author: Todd Lewis 
Date:   Fri Jan 1 07:19:45 2010 -0500

Committing charter for 2010.

[utoddl@tarna gitbug]$ git diff master@{01-01-2012} charter.txt
warning: Log for 'master' only goes back to Thu, 6 Jul 2017 08:19:45 -0400.
diff --git a/charter.txt b/charter.txt
index a18dbc9..29fedf9 100644
--- a/charter.txt
+++ b/charter.txt
@@ -1 +1 @@
-Charter for year 2010.
+Charter for year 2014.
[utoddl@tarna gitbug]$

-- 
   +--+
  / todd_le...@unc.edu  919-445-0091  http://www.unc.edu/~utoddl /
 /  A man's home is his castle, in a manor of speaking. /
+--+

Re: [PATCH] merge-message: change meaning of "empty merge message"

2017-07-06 Thread Kaartic Sivaraam

On Thu, 2017-07-06 at 06:46 +0200, Kevin Daudt wrote:
> 
> The function is called "rest_is_empty".
> 
Thanks for correcting that!

> But isn't it better that commit and merge use the same code, instead
> of
> duplicating it again? Otherwise one may be updated, and the other
> forgotten, getting differences in behaviur, which is what you want to
> solve.
> 
Yes, I did think of that. It *seems* that neither "message_is_empty" or
the "rest_is_empty" are exposed to other files. Have to work on that.

-- 
Kaartic

Re: [PATCH] t5534: fix misleading grep invocation

2017-07-06 Thread Michael J Gruber

Junio C Hamano venit, vidit, dixit 05.07.2017 18:26:
> Johannes Schindelin  writes:
> 
>> It seems to be a little-known feature of `grep` (and it certainly came
>> as a surprise to this here developer who believed to know the Unix tools
>> pretty well) that multiple patterns can be passed in the same
>> command-line argument simply by separating them by newlines. Watch, and
>> learn:
>>
>>  $ printf '1\n2\n3\n' | grep "$(printf '1\n3\n')"
>>  1
>>  3
>>
>> That behavior also extends to patterns passed via `-e`, and it is not
>> modified by passing the option `-E` (but trying this with -P issues the
>> error "grep: the -P option only supports a single pattern").
>>
>> It seems that there are more old Unix hands who are surprised by this
>> behavior, as grep invocations of the form
>>
>>  grep "$(git rev-parse A B) C" file
>>
>> were introduced in a85b377d041 (push: the beginning of "git push
>> --signed", 2014-09-12), and later faithfully copy-edited in b9459019bbb
>> (push: heed user.signingkey for signed pushes, 2014-10-22).
>>
>> Please note that the output of `git rev-parse A B` separates the object
>> IDs via *newlines*, not via spaces, and those newlines are preserved
>> because the interpolation is enclosed in double quotes.
>>
>> As a consequence, these tests try to validate that the file contains
>> either A's object ID, or B's object ID followed by C, or both. Clearly,
>> however, what the test wanted to see is that there is a line that
>> contains all of them.
>>
>> This is clearly unintended, and the grep invocations in question really
>> match too many lines.
>>
>> Fix the test by avoiding the newlines in the patterns.
>>
>> Signed-off-by: Johannes Schindelin 
>> ---
> 
> The invocation this fixes is not just misleading but simply wrong.
> Nicely spotted.

In addition, the patch makes sure to catch any rev-parse failures which
the original invocation shove under the rug.

> Thanks, will queue.

Thanks from the faithful copy-editor ;)

How did you spot this? Are there grep versions that behave differently?

>>  t/t5534-push-signed.sh | 14 ++
>>  1 file changed, 10 insertions(+), 4 deletions(-)
>>
>> diff --git a/t/t5534-push-signed.sh b/t/t5534-push-signed.sh
>> index 5bcb288f5c4..464ffdd147a 100755
>> --- a/t/t5534-push-signed.sh
>> +++ b/t/t5534-push-signed.sh
>> @@ -119,8 +119,11 @@ test_expect_success GPG 'signed push sends push 
>> certificate' '
>>  sed -n -e "s/^nonce /NONCE=/p" -e "/^$/q" dst/push-cert
>>  ) >expect &&
>>  
>> -grep "$(git rev-parse noop ff) refs/heads/ff" dst/push-cert &&
>> -grep "$(git rev-parse noop noff) refs/heads/noff" dst/push-cert &&
>> +noop=$(git rev-parse noop) &&
>> +ff=$(git rev-parse ff) &&
>> +noff=$(git rev-parse noff) &&
>> +grep "$noop $ff refs/heads/ff" dst/push-cert &&
>> +grep "$noop $noff refs/heads/noff" dst/push-cert &&
>>  test_cmp expect dst/push-cert-status
>>  '
>>  
>> @@ -200,8 +203,11 @@ test_expect_success GPG 'fail without key and heed 
>> user.signingkey' '
>>  sed -n -e "s/^nonce /NONCE=/p" -e "/^$/q" dst/push-cert
>>  ) >expect &&
>>  
>> -grep "$(git rev-parse noop ff) refs/heads/ff" dst/push-cert &&
>> -grep "$(git rev-parse noop noff) refs/heads/noff" dst/push-cert &&
>> +noop=$(git rev-parse noop) &&
>> +ff=$(git rev-parse ff) &&
>> +noff=$(git rev-parse noff) &&
>> +grep "$noop $ff refs/heads/ff" dst/push-cert &&
>> +grep "$noop $noff refs/heads/noff" dst/push-cert &&
>>  test_cmp expect dst/push-cert-status
>>  '
>>  
>>
>> base-commit: 5116f791c12dda6b6c22fa85b600a8e30dfa168a

Re: [PATCH 2/6] t1414: document some reflog-walk oddities

2017-07-06 Thread Jeff King

On Thu, Jul 06, 2017 at 03:16:06AM -0400, Jeff King wrote:

> If we think it's buggy, we can fix it now. But I'm not convinced that
> sequential iteration is that bad. It's at least _simple_ and easy to
> explain. The only other thing that would make sense to me is sorting the
> entries by date (reflog date, not commit date) and showing them in that
> order. But that gives rise to some funny corner cases.
> 
> What do we do if there's clock skew within the reflog (e.g., somebody
> fixed their skewed clock such that HEAD@{5} is less recent than
> HEAD@{6}). Do we show them out of order then (even if only a single
> reflog is being shown?).
> 
> Or do we try some complicated sort where entries compare sequentially
> within a given reflog, but use date comparisons between different logs?
> I don't think that can work with a normal comparison sort, as it makes
> the comparison non-transitive.
> 
> But maybe we could do a pseudo-list-merge, where we treat the reflogs as
> queues and always pick from the queue with the most recent timestamp.
> 
> That would probably be pretty easy to retrofit on the iteration from the
> current series.

Something like the patch below. Looking over the output, the
interleaving makes it look confusing (especially if you don't use
--date, since then the interleaved numbers have no relation to each
other).

It would help if we actually had a use case where somebody really did
want to show two reflogs from a single command. Then we'd know what they
were expecting and which output would be most useful, at least for that
case. But I can't really think of why you'd want to do so (which is
probably why nobody noticed for years that the current behavior is so
broken).

---
diff --git a/reflog-walk.c b/reflog-walk.c
index a7644d944e..6cdf7bfb0f 100644
--- a/reflog-walk.c
+++ b/reflog-walk.c
@@ -90,7 +90,7 @@ struct commit_reflog {
 
 struct reflog_walk_info {
struct commit_reflog **logs;
-   size_t nr, alloc, cur;
+   size_t nr, alloc;
struct string_list complete_reflogs;
struct commit_reflog *last_commit_reflog;
 };
@@ -284,26 +284,43 @@ int reflog_walk_empty(struct reflog_walk_info *info)
return !info || !info->nr;
 }
 
-struct commit *next_entry_in_log(struct commit_reflog *log)
+static void skip_unusable_entries(struct commit_reflog *log)
 {
while (log->recno >= 0) {
-   struct reflog_info *entry = >reflogs->items[log->recno--];
+   struct reflog_info *entry = >reflogs->items[log->recno];
struct object *obj = parse_object(>noid);
 
if (obj && obj->type == OBJ_COMMIT)
-   return (struct commit *)obj;
+   return;
+   log->recno--;
}
-   return NULL;
+}
+
+static timestamp_t next_timestamp(struct commit_reflog *log)
+{
+   return log->reflogs->items[log->recno].timestamp;
 }
 
 struct commit *next_reflog_entry(struct reflog_walk_info *walk)
 {
-   for (; walk->cur < walk->nr; walk->cur++) {
-   struct commit *ret = next_entry_in_log(walk->logs[walk->cur]);
-   if (ret) {
-   walk->last_commit_reflog = walk->logs[walk->cur];
-   return ret;
-   }
+   struct commit_reflog *best = NULL;
+   size_t i;
+
+   for (i = 0; i < walk->nr; i++) {
+   struct commit_reflog *log = walk->logs[i];
+
+   skip_unusable_entries(log);
+   if (log->recno < 0)
+   continue;
+
+   if (!best || next_timestamp(log) > next_timestamp(best))
+   best = log;
}
+
+   if (best) {
+   walk->last_commit_reflog = best;
+   return lookup_commit(>reflogs->items[best->recno--].noid);
+   }
+
return NULL;
 }

The implementation is fairly naive. It would be O(nr_reflogs *
nr_entries) in the worst case. I wouldn't expect nr_reflogs to get
large. But if we cared, we could fix it by keeping the reflogs in a
heap.

-Peff

Re: [PATCH 2/6] t1414: document some reflog-walk oddities

2017-07-06 Thread Jeff King

On Wed, Jul 05, 2017 at 03:45:03PM -0700, Junio C Hamano wrote:

> > I did make the ordering intentional. We process each reflog sequentially
> > in turn. I don't think it would be wrong to interleave them by date, but
> > I actually think it's useful for somebody who switches that behavior to
> > consciously update the test. Maybe it's worth calling out in the commit
> > message.
> >
> > I stopped short of actually documenting and guaranteeing that behavior
> > to the user. I don't know how comfortable we would be changing it later.
> 
> I somehow feel that the "showing all entries from HEAD and then
> showing all from side" is simply a buggy behaviour.  I do not think
> our users would terribly mind if we changed it later.  But I may be
> missing the reason why (sometimes?) the sequential behaviour may be
> useful.

If we think it's buggy, we can fix it now. But I'm not convinced that
sequential iteration is that bad. It's at least _simple_ and easy to
explain. The only other thing that would make sense to me is sorting the
entries by date (reflog date, not commit date) and showing them in that
order. But that gives rise to some funny corner cases.

What do we do if there's clock skew within the reflog (e.g., somebody
fixed their skewed clock such that HEAD@{5} is less recent than
HEAD@{6}). Do we show them out of order then (even if only a single
reflog is being shown?).

Or do we try some complicated sort where entries compare sequentially
within a given reflog, but use date comparisons between different logs?
I don't think that can work with a normal comparison sort, as it makes
the comparison non-transitive.

But maybe we could do a pseudo-list-merge, where we treat the reflogs as
queues and always pick from the queue with the most recent timestamp.

That would probably be pretty easy to retrofit on the iteration from the
current series.

-Peff

40 matches

Mail list logo