Re: [PATCH 09/17] gc_boundary(): move the check alloc = nr to caller

2013-05-23 Thread Michael Haggerty
On 05/21/2013 07:49 PM, Junio C Hamano wrote:
 Michael Haggerty mhag...@alum.mit.edu writes:
 
 There is no logical reason for this test to be here.  At the caller we
 might be able to figure out its meaning.

 Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
 ---
 
 I do not think this change is justified, *unless* the caller later
 in the series gains a better heuristics than what can be done with
 information in the object_array (namely, alloc and nr) to decide
 when to trigger gc.
 
 And I was hoping to see such a cleverness added to the caller, but I
 do not think I saw any.

Correct.

 I would have to say gc_boundary() knows better when it needs to gc
 with the code at this point in the series, and that is true also in
 the final code after all the patches in this series.
 
 If we keep the when to gc logic inside gc, in 11/17 this caller
 can no longer call directly to object_array_filter().  It should
 call gc_boundary(), but I see it as a merit, not a downside.  The
 gc function can later be taught the high/low watermark logic you
 alluded to in 10/17, and the growth/shrinkage characteristic you
 would take advantage of while doing gc is specific to this
 codepath.  And the logic still does not have to have access to
 anything only the caller has access to; gc can work on what can be
 read from the object_array-{alloc,nr} that is given to it.

I don't feel strongly about this patch and if you prefer to have it
dropped I will do so.  But let me explain my reasoning:

1. The function name gc_boundary() suggests that it will do a garbage
collection unconditionally.  In fact, it might or might not depending on
this test.  At the caller, this made it look like a gc was happening
each time through the loop, which made me nervous about the performance.
 The new version makes it clear at the caller that the gc is only
happening occasionally.

2. Even assuming that gc_boundary() were renamed to maybe_gc_boundary(),
the function has hopelessly little information on which to base its
decision whether or not to gc, and the choice of policy can only be
justified based on some implicit knowledge about how the array is grown
and shrunk.  But the growing/shrinking is done at the layer of the
caller, and therefore the choice of gc policy also belongs at the layer
of the caller.

3. As you point out, if the gc policy is ever to be made more
intelligent, then gc_boundary() is unlikely to have enough information
to implement the new policy (e.g., it would have no place to record
low/high water marks).  Separating concerns at the correct level would
make a change like that easier.

At the moment I am not interested in improving the gc policy of this
code.  The only reason that I am mucking about here is to change it to
use object_array_filter(), which is needed to centralize where
object_array_entries are created and destroyed so that the name memory
can be copied and freed consistently.  That can be done with or without
patches 09 and 10.  Please let me know what you prefer.

Michael

   revision.c | 27 ---
 
  1 file changed, 12 insertions(+), 15 deletions(-)

 diff --git a/revision.c b/revision.c
 index 8ac88d6..2e0992b 100644
 --- a/revision.c
 +++ b/revision.c
 @@ -2437,23 +2437,19 @@ static struct commit *get_revision_1(struct rev_info 
 *revs)
  
  static void gc_boundary(struct object_array *array)
  {
 -unsigned nr = array-nr;
 -unsigned alloc = array-alloc;
 +unsigned nr = array-nr, i, j;
  struct object_array_entry *objects = array-objects;
  
 -if (alloc = nr) {
 -unsigned i, j;
 -for (i = j = 0; i  nr; i++) {
 -if (objects[i].item-flags  SHOWN)
 -continue;
 -if (i != j)
 -objects[j] = objects[i];
 -j++;
 -}
 -for (i = j; i  nr; i++)
 -objects[i].item = NULL;
 -array-nr = j;
 +for (i = j = 0; i  nr; i++) {
 +if (objects[i].item-flags  SHOWN)
 +continue;
 +if (i != j)
 +objects[j] = objects[i];
 +j++;
  }
 +for (i = j; i  nr; i++)
 +objects[i].item = NULL;
 +array-nr = j;
  }
  
  static void create_boundary_commit_list(struct rev_info *revs)
 @@ -2577,7 +2573,8 @@ static struct commit *get_revision_internal(struct 
 rev_info *revs)
  if (p-flags  (CHILD_SHOWN | SHOWN))
  continue;
  p-flags |= CHILD_SHOWN;
 -gc_boundary(revs-boundary_commits);
 +if (revs-boundary_commits.alloc = revs-boundary_commits.nr)
 +gc_boundary(revs-boundary_commits);
  add_object_array(p, NULL, revs-boundary_commits);
  }

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to 

Re: [PATCH 09/17] gc_boundary(): move the check alloc = nr to caller

2013-05-23 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 1. The function name gc_boundary() suggests that it will do a garbage
 collection unconditionally.  In fact, it might or might not depending on
 this test.  At the caller, this made it look like a gc was happening
 each time through the loop, which made me nervous about the performance.
  The new version makes it clear at the caller that the gc is only
 happening occasionally.

Perhaps.

 2. Even assuming that gc_boundary() were renamed to maybe_gc_boundary(),
 the function has hopelessly little information on which to base its
 decision whether or not to gc, and the choice of policy can only be
 justified based on some implicit knowledge about how the array is grown
 and shrunk.  But the growing/shrinking is done at the layer of the
 caller, and therefore the choice of gc policy also belongs at the layer
 of the caller.

 3. As you point out, if the gc policy is ever to be made more
 intelligent, then gc_boundary() is unlikely to have enough information
 to implement the new policy (e.g., it would have no place to record
 low/high water marks).  Separating concerns at the correct level would
 make a change like that easier.

These two depend on how you look at the API hierarchy.  You seem to
think that the ideal end result is

get_revision_internal()
  have an open coded when to gc logic in this function
  call object_array_filter()

My suggestion was based on a different view, which is:

get_revision_internal()
  call gc_boundary()

gc_boundary()
  make decision on when and how to gc
  if decided to do so
call object_array_fitler()

You can obviously rename gc_boundary() to auto_gc_boundary() if that
makes it easier to understand, but these two belong to the same
codepath that deals with the object array used specifically for
keeping track of boundary commits. I view who has what information
as secondary---if the decision to gc is not the primary thing
get_revision_internal() should be worrying about (and I do not think
it is), it would be a better code structure to have a helper specific
for doing so, i.e. gc_boundary(), and delegate that part of job to it.

Obviously, the caller needs to supply sufficient information to that
helper if the helper needs more than what it currently gets by
adding parameters to it, but that goes without saying.

 At the moment I am not interested in improving the gc policy of this
 code.

I am not either; the only effect we get from removing gc_boudnary()
and calling directly to object_array_filter() is to lose the above
abstraction and make it easy to let get_revision_internal() become
more cluttered when somebody else later decides to improve the gc
policy, it seems.

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 09/17] gc_boundary(): move the check alloc = nr to caller

2013-05-21 Thread Junio C Hamano
Michael Haggerty mhag...@alum.mit.edu writes:

 There is no logical reason for this test to be here.  At the caller we
 might be able to figure out its meaning.

 Signed-off-by: Michael Haggerty mhag...@alum.mit.edu
 ---

I do not think this change is justified, *unless* the caller later
in the series gains a better heuristics than what can be done with
information in the object_array (namely, alloc and nr) to decide
when to trigger gc.

And I was hoping to see such a cleverness added to the caller, but I
do not think I saw any.

I would have to say gc_boundary() knows better when it needs to gc
with the code at this point in the series, and that is true also in
the final code after all the patches in this series.

If we keep the when to gc logic inside gc, in 11/17 this caller
can no longer call directly to object_array_filter().  It should
call gc_boundary(), but I see it as a merit, not a downside.  The
gc function can later be taught the high/low watermark logic you
alluded to in 10/17, and the growth/shrinkage characteristic you
would take advantage of while doing gc is specific to this
codepath.  And the logic still does not have to have access to
anything only the caller has access to; gc can work on what can be
read from the object_array-{alloc,nr} that is given to it.

  revision.c | 27 ---

  1 file changed, 12 insertions(+), 15 deletions(-)

 diff --git a/revision.c b/revision.c
 index 8ac88d6..2e0992b 100644
 --- a/revision.c
 +++ b/revision.c
 @@ -2437,23 +2437,19 @@ static struct commit *get_revision_1(struct rev_info 
 *revs)
  
  static void gc_boundary(struct object_array *array)
  {
 - unsigned nr = array-nr;
 - unsigned alloc = array-alloc;
 + unsigned nr = array-nr, i, j;
   struct object_array_entry *objects = array-objects;
  
 - if (alloc = nr) {
 - unsigned i, j;
 - for (i = j = 0; i  nr; i++) {
 - if (objects[i].item-flags  SHOWN)
 - continue;
 - if (i != j)
 - objects[j] = objects[i];
 - j++;
 - }
 - for (i = j; i  nr; i++)
 - objects[i].item = NULL;
 - array-nr = j;
 + for (i = j = 0; i  nr; i++) {
 + if (objects[i].item-flags  SHOWN)
 + continue;
 + if (i != j)
 + objects[j] = objects[i];
 + j++;
   }
 + for (i = j; i  nr; i++)
 + objects[i].item = NULL;
 + array-nr = j;
  }
  
  static void create_boundary_commit_list(struct rev_info *revs)
 @@ -2577,7 +2573,8 @@ static struct commit *get_revision_internal(struct 
 rev_info *revs)
   if (p-flags  (CHILD_SHOWN | SHOWN))
   continue;
   p-flags |= CHILD_SHOWN;
 - gc_boundary(revs-boundary_commits);
 + if (revs-boundary_commits.alloc = revs-boundary_commits.nr)
 + gc_boundary(revs-boundary_commits);
   add_object_array(p, NULL, revs-boundary_commits);
   }
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html