git blame vs git log --follow performance

2014-01-26 Thread Joe Perches
Hi.

Is there something that can be done about improving
git log --follow -- file performance to be nearly
equivalent speed to git blame -- file ?

The overall cpu time taken for these 2 commands that
track individual file history can be quite different.

git log --follow -- file
and
git blame -- file

It seems that there can be a couple orders of magnitude
delta in the overall time taken.

For instance (using the Linus' linux kernel git):

$ time git log --follow -- drivers/firmware/google/Kconfig  /dev/null

real0m42.329s
user0m40.984s
sys 0m0.792s

$ time git blame -- drivers/firmware/google/Kconfig  /dev/null

real0m0.963s
user0m0.860s
sys 0m0.096s

This particular file has never been renamed.

Looking at the output on screen, there does seem to
be 25+ seconds of cpu time consumed after the initial
(last shown) commit that introduces this file.

Perhaps adding a whole-file rename option to the
git log history simplification mechanism could
help?

Thoughts?

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git blame vs git log --follow performance

2014-01-26 Thread Duy Nguyen
On Mon, Jan 27, 2014 at 4:10 AM, Joe Perches j...@perches.com wrote:
 For instance (using the Linus' linux kernel git):

 $ time git log --follow -- drivers/firmware/google/Kconfig  /dev/null

 real0m42.329s
 user0m40.984s
 sys 0m0.792s

 $ time git blame -- drivers/firmware/google/Kconfig  /dev/null

 real0m0.963s
 user0m0.860s
 sys 0m0.096s


It's not fair to compare blame and log. If you compare, compare it to
non follow version

$ time git log --follow -- drivers/firmware/google/Kconfig  /dev/null

real0m35.552s
user0m35.120s
sys 0m0.383s

$ time git log -- drivers/firmware/google/Kconfig  /dev/null

real0m4.366s
user0m4.215s
sys 0m0.144s

Although because we need to detect rename, we can't really filter to
one path. So the base line is more like

$ time git log  /dev/null

real0m29.338s
user0m28.485s
sys 0m0.813s

with rename detection taking some more time.

 Perhaps adding a whole-file rename option to the
 git log history simplification mechanism could
 help?

 Thoughts?

I tested a version with rename detection logic removed. It did not
change the timing significantly. To improve --follow I think we need
to do something about path filtering.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git blame vs git log --follow performance

2014-01-26 Thread Joe Perches
On Mon, 2014-01-27 at 08:33 +0700, Duy Nguyen wrote:
 On Mon, Jan 27, 2014 at 4:10 AM, Joe Perches j...@perches.com wrote:
  For instance (using the Linus' linux kernel git):
 
  $ time git log --follow -- drivers/firmware/google/Kconfig  /dev/null
 
  real0m42.329s
  user0m40.984s
  sys 0m0.792s
 
  $ time git blame -- drivers/firmware/google/Kconfig  /dev/null
 
  real0m0.963s
  user0m0.860s
  sys 0m0.096s
 
 
 It's not fair to compare blame and log. If you compare, compare it to
 non follow version

Perhaps not, but git blame does follow renames.

$ git blame --help
[]
The origin of lines is automatically followed across
whole-file renames (currently there is no option to
turn the rename-following off). To follow lines moved
from one file to another, or to follow lines that were
copied and pasted from another file, etc., see the -C
and -M options.

 I tested a version with rename detection logic removed. It did not
 change the timing significantly. To improve --follow I think we need
 to do something about path filtering.

Perhaps the log history could stop being read when
a commit is found that creates the file without
another file being deleted in the same commit.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: git blame vs git log --follow performance

2014-01-26 Thread Duy Nguyen
On Mon, Jan 27, 2014 at 4:10 AM, Joe Perches j...@perches.com wrote:
 Is there something that can be done about improving
 git log --follow -- file performance to be nearly
 equivalent speed to git blame -- file ?

Not strictly about --follow, but there is room for improvement for
diff'ing in log in general. Right now we do diff HEAD HEAD~1, diff
HEAD~1 HEAD~2 and so on (--follow needs diff to detect rename). At
each step we load new tree objects and reparse. Notice after diff
HEAD HEAD~1 we may have HEAD~1 and its subtrees read and parsed
(not entirely). We could reuse that diff HEAD~1 HEAD~2.

On git.git, git log --raw takes 10s and it seems tree object reading
is about 2s.In ideal case we might be able to cut that to 1s. The tree
parsing code (update_tree_entry) takes about 5s. We might be able to
cut that in half, I'm not entirely sure. But there could be a lot of
work in caching HEAD~1 and the overhead may turn out too high for
any gain.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html