Re: [PATCH 0/3] lazily load commit->buffer

2013-01-26 Thread Junio C Hamano
Jeff King  writes:

> My HEAD has about 400/3000 non-unique commits, which matches your
> numbers percentage-wise. Dropping the lines above (and always freeing)
> takes my best-of-five for "git log -g" from 0.085s to 0.080s. Which is
> well within the noise.  Doing "git log -g Makefile" ended up at 0.183s
> both before and after.
>
> ... I'd be in favor of
> dropping the lines just to decrease complexity of the code.

I think we are in agreement, then.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] lazily load commit->buffer

2013-01-26 Thread Jeff King
On Sat, Jan 26, 2013 at 01:26:53PM -0800, Junio C Hamano wrote:

> This looks very good.
> 
> I wonder if this lets us get rid of the hack in cmd_log_walk() that
> does this:
> 
> while ((commit = get_revision(rev)) != NULL) {
> if (!log_tree_commit(rev, commit) &&
> rev->max_count >= 0)
> rev->max_count++;
> !   if (!rev->reflog_info) {
> !   /* we allow cycles in reflog ancestry */
> free(commit->buffer);
> commit->buffer = NULL;
> !   }
> free_commit_list(commit->parents);
> commit->parents = NULL;
> 
> After log_tree_commit() handles the commit, using the buffer, we
> discard the memory associated to it because we know we no longer
> will use it in normal cases.
> [...]
> But that is a performance thing, not a correctness issue, so "we
> allow cycles" implying "therefore if we discard the buffer, we will
> show wrong output" becomes an incorrect justification.

Right. I think the correctness issue goes away with my patches, and it
is just a question of estimating the workload for performance. I doubt
it makes a big difference either way, especially when compared to
actually showing the commit (even a single pathspec limiter, or doing
"-p", would likely dwarf a few extra commit decompressions).

My HEAD has about 400/3000 non-unique commits, which matches your
numbers percentage-wise. Dropping the lines above (and always freeing)
takes my best-of-five for "git log -g" from 0.085s to 0.080s. Which is
well within the noise.  Doing "git log -g Makefile" ended up at 0.183s
both before and after.

So I suspect it does not matter at all in normal cases, and the time is
indeed dwarfed by adding even a rudimentary pathspec. I'd be in favor of
dropping the lines just to decrease complexity of the code.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 0/3] lazily load commit->buffer

2013-01-26 Thread Junio C Hamano
Jeff King  writes:

> Yeah, agreed. I started to fix this up with a use/unuse pattern and
> realized something: all of the call sites are calling logmsg_reencode
> anyway, because that is the next logical step in doing anything with the
> buffer that is not just parsing out the parent/timestamp/tree info. And
> since that function already might allocate (for the re-encoded copy),
> callers have to handle the maybe-borrowed-maybe-free situation already.
>
> So I came up with this patch series, which I think should fix the
> problem, and actually makes the call-sites easier to read, rather than
> harder.
>
>   [1/3]: commit: drop useless xstrdup of commit message
>   [2/3]: logmsg_reencode: never return NULL
>   [3/3]: logmsg_reencode: lazily load missing commit buffers
>
> Here's the diffstat:
>
>  builtin/blame.c  | 22 ++---
>  builtin/commit.c | 14 +-
>  commit.h |  1 +
>  pretty.c | 93 ++-
>  t/t4042-diff-textconv-caching.sh |  8 +++
>  5 files changed, 85 insertions(+), 53 deletions(-)
>
> Not too bad, and 27 of the lines added in pretty.c are new comments
> explaining the flow of logmsg_reencode. So even if this doesn't get
> every case, I think it's a nice cleanup.

This looks very good.

I wonder if this lets us get rid of the hack in cmd_log_walk() that
does this:

while ((commit = get_revision(rev)) != NULL) {
if (!log_tree_commit(rev, commit) &&
rev->max_count >= 0)
rev->max_count++;
!   if (!rev->reflog_info) {
!   /* we allow cycles in reflog ancestry */
free(commit->buffer);
commit->buffer = NULL;
!   }
free_commit_list(commit->parents);
commit->parents = NULL;

After log_tree_commit() handles the commit, using the buffer, we
discard the memory associated to it because we know we no longer
will use it in normal cases.

The "do not do that if rev->reflog_info is true" was added in
a6c7306 (--walk-reflogs: do not crash with cyclic reflog ancestry,
2007-01-20) because the second and subsequent display of "commit"
(which happens to occur only when walking reflogs) needs to look at
commit->buffer again, and this hack forces us to retain the buffer
for _all_ commit objects.

But your patches could be seen as a different (and more correct) way
to fix the same issue.  Once the display side learns how to re-read
the log text of the commit object, the above becomes unnecessary, no?

We may still be helped if majority of commit objects that appear in
the reflog appear more than once, in which case retaining the buffer
for _all_ commits could be an overall win.  Not having to read the
buffer for the same commit each time it is shown for majority of
multiply-appearing commits, in exchange for having to keep the
buffer for commits that appears only once that are minority and
suffering increasted page cache pressure.  That could be seen as an
optimization.

But that is a performance thing, not a correctness issue, so "we
allow cycles" implying "therefore if we discard the buffer, we will
show wrong output" becomes an incorrect justification.

I happen to have HEAD reflog that is 30k entries long; more than 26k
represent a checkout of unique commit.  So I suspect that the above
hack to excessive retain commit->buffer for already used commits will
not help us performance-wise, either.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 0/3] lazily load commit->buffer

2013-01-26 Thread Jeff King
On Fri, Jan 25, 2013 at 07:36:18AM -0800, Junio C Hamano wrote:

> Jonathon Mah  writes:
> 
> > Just to note, the proposals so far don't prevent a "smart-ass"
> > function from freeing the buffer when it's called underneath the
> > use/release scope, as in:
> >
> > with_commit_buffer(commit); {
> > fn1_needing_buffer(commit);
> > walk_rev_tree_or_something();
> > fn2_needing_buffer(commit);
> > } done_with_commit_buffer(commit);
> 
> I think the goal of everybody discussing these ideas is to make sure
> that all code follows the simple ownership policy proposed at the
> beginning of this subthread: commit->buffer belongs to the revision
> traversal machinery, and other users could borrow it when available.

Yeah, agreed. I started to fix this up with a use/unuse pattern and
realized something: all of the call sites are calling logmsg_reencode
anyway, because that is the next logical step in doing anything with the
buffer that is not just parsing out the parent/timestamp/tree info. And
since that function already might allocate (for the re-encoded copy),
callers have to handle the maybe-borrowed-maybe-free situation already.

So I came up with this patch series, which I think should fix the
problem, and actually makes the call-sites easier to read, rather than
harder.

  [1/3]: commit: drop useless xstrdup of commit message
  [2/3]: logmsg_reencode: never return NULL
  [3/3]: logmsg_reencode: lazily load missing commit buffers

Here's the diffstat:

 builtin/blame.c  | 22 ++---
 builtin/commit.c | 14 +-
 commit.h |  1 +
 pretty.c | 93 ++-
 t/t4042-diff-textconv-caching.sh |  8 +++
 5 files changed, 85 insertions(+), 53 deletions(-)

Not too bad, and 27 of the lines added in pretty.c are new comments
explaining the flow of logmsg_reencode. So even if this doesn't get
every case, I think it's a nice cleanup.

-Peff
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html