Re: [RFC] Add posibility to preload stat information.

2013-03-30 Thread Phil Hord
On Thu, Mar 21, 2013 at 10:44 AM, Junio C Hamano gits...@pobox.com wrote:
 Thomas Rast tr...@student.ethz.ch writes:

 I think it would actually be a somewhat interesting feature if it
 interacted with GIT_PS1_SHOW*.  If you use these settings (I personally
 use SHOWDIRTYSTATE but not SHOWUNTRACKEDFILES), the prompt hangs while
 __git_ps1 runs git-status.  It should be possible to run a git-status
 process in the background when entering a repository, and displaying
 some marker ('??' maybe) in the prompt instead of the dirty-state info
 until git-status has finished.

 This is somewhat interesting.

 Perhaps we can introduce a helper binary that does what __git_ps1()
 does, with a --timeout=500ms option to say I dunno (yet), and keep
 priming the well in the background when it takes more than the
 specified amount of time?

That would be nice.  My fork-fu is weak, so I cheated and relied on
kill/timeout instead.

I have had this code below in my zsh git prompt (based on oh-my-zsh)
for more than a year.  It uses $(timeout) to kill the status command
if it does not complete in 1 second.  It's dumb in several ways, but
it does show me four different flags fairly reliably indicating
whether I have changed files, untracked files, clean workdir, or I
timed out trying to find out.

git_dirty_timeout () {
  #-- Modified files
  xx=$(timeout 1s git status -s $@ 2 /dev/null)
  test $? -eq 124  return 124
  test -n ${xx}  return 50

  #-- Untracked files (only)
  xx=$(timeout 1s git status -s -uno $@ 2 /dev/null)
  test $? -eq 124  return 124
  test -n ${xx}  return 51
  return 0
}

parse_git_dirty () {
  git_dirty_timeout
  case $? in
'50')  echo $ZSH_THEME_GIT_PROMPT_DIRTY   ;;
'51')  echo $ZSH_THEME_GIT_PROMPT_UNTRACKED   ;;
'124') echo $ZSH_THEME_GIT_PROMPT_TIMEOUT ;;
*) echo [$?]$ZSH_THEME_GIT_PROMPT_CLEAN   ;;
  esac
}
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add posibility to preload stat information.

2013-03-21 Thread Thomas Rast
Jeff King p...@peff.net writes:

 On Wed, Mar 20, 2013 at 10:15:39AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  So maybe just run git status /dev/null?
 
 In the background?  How often would it run?  I do not think a single
 lockfile solves anything.  It may prevent simultaneous runs of two
 such prime the well processes, but the same user may be working in
 two separate repositories.

 Yes, in the background (he invokes __git_recursive_stat already in the
 background). I'd think you would want to run it whenever you enter a
 repository.

 I do not see anything that prevents it from running in the same
 repository over and over again, either.  prompt is a bad place to
 do this kind of thing.

 Yeah, I did not look closely at that. The commit message claims When
 entering a git working dir, but the implementation runs it on each
 prompt invocation, which is awful. I think you'd want to check to use
 rev-parse to see if you have changed into a new git repo, and only run
 it once then.

I think it would actually be a somewhat interesting feature if it
interacted with GIT_PS1_SHOW*.  If you use these settings (I personally
use SHOWDIRTYSTATE but not SHOWUNTRACKEDFILES), the prompt hangs while
__git_ps1 runs git-status.  It should be possible to run a git-status
process in the background when entering a repository, and displaying
some marker ('??' maybe) in the prompt instead of the dirty-state info
until git-status has finished.  That way the user doesn't have his shell
blocked by cding to a big repo.

-- 
Thomas Rast
trast@{inf,student}.ethz.ch
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add posibility to preload stat information.

2013-03-21 Thread Junio C Hamano
Thomas Rast tr...@student.ethz.ch writes:

 I think it would actually be a somewhat interesting feature if it
 interacted with GIT_PS1_SHOW*.  If you use these settings (I personally
 use SHOWDIRTYSTATE but not SHOWUNTRACKEDFILES), the prompt hangs while
 __git_ps1 runs git-status.  It should be possible to run a git-status
 process in the background when entering a repository, and displaying
 some marker ('??' maybe) in the prompt instead of the dirty-state info
 until git-status has finished.

This is somewhat interesting.

Perhaps we can introduce a helper binary that does what __git_ps1()
does, with a --timeout=500ms option to say I dunno (yet), and keep
priming the well in the background when it takes more than the
specified amount of time?
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[RFC] Add posibility to preload stat information.

2013-03-20 Thread Fredrik Gustafsson
When entering a git working dir, optionally run a forked process that
stat all files in the whole workdir and therefore loads stat information
to RAM which will speedup things like git status and so on.

The feature is optional and by default it's off.

Signed-off-by: Fredrik Gustafsson iv...@iveqy.com
---
 contrib/completion/git-prompt.sh | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/contrib/completion/git-prompt.sh b/contrib/completion/git-prompt.sh
index 341422a..e67bc97 100644
--- a/contrib/completion/git-prompt.sh
+++ b/contrib/completion/git-prompt.sh
@@ -78,6 +78,12 @@
 # If you would like a colored hint about the current dirty state, set
 # GIT_PS1_SHOWCOLORHINTS to a nonempty value. The colors are based on
 # the colored output of git status -sb.
+#
+# When enter a git work dir you can have a forked process run
+# stat on all files in the top level directory for git and down.
+# This will decrease later calls to git status and alike because
+# stat info will already be loaded into RAM. set GIT_PRE_STAT to
+# a nonempty value.
 
 # __gitdir accepts 0 or 1 arguments (i.e., location)
 # returns location of .git repo
@@ -222,6 +228,19 @@ __git_ps1_show_upstream ()
 
 }
 
+# Forks and recursive do a stat from the toplevel git dir.
+# This will load inodes into RAM for faster access when running
+# A git command, like git show.
+__git_recursive_stat ()
+{
+   if test ! -e /tmp/gitbash.lock
+   then
+   touch /tmp/gitbash.lock
+   cd $(git rev-parse --show-toplevel)
+   find . | xargs stat 2 /dev/null
+   rm /tmp/gitbash.lock
+   fi
+}
 
 # __git_ps1 accepts 0 or 1 arguments (i.e., format string)
 # when called from PS1 using command substitution
@@ -320,6 +339,10 @@ __git_ps1 ()
b=GIT_DIR!
fi
elif [ true = $(git rev-parse --is-inside-work-tree 
2/dev/null) ]; then
+   if [ -n ${GIT_PRE_STAT-} ];
+   then
+   (__git_recursive_stat 2 /dev/null )
+   fi
if [ -n ${GIT_PS1_SHOWDIRTYSTATE-} ] 
   [ $(git config --bool bash.showDirtyState) != 
false ]
then
-- 
1.8.1.5

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add posibility to preload stat information.

2013-03-20 Thread Jeff King
On Wed, Mar 20, 2013 at 01:15:32PM +0100, Fredrik Gustafsson wrote:

 When entering a git working dir, optionally run a forked process that
 stat all files in the whole workdir and therefore loads stat information
 to RAM which will speedup things like git status and so on.
 
 The feature is optional and by default it's off.

Kind of gross, but I guess it is useful to some people.

 +__git_recursive_stat ()
 +{
 + if test ! -e /tmp/gitbash.lock
 + then
 + touch /tmp/gitbash.lock

This is a tmp-race security hole. E.g., do:

  ln -s /etc/nologin /tmp/gitbash.lock

as a user; when root runs __git_recursive_stat, it will create
/etc/nologin. It's not quite as bad as some other holes, because we only
touch the file, not overwrite its contents, but you can see that it's
possible to do some mischief.

Should this maybe just be ~/.gitbash.lock or something?

 + cd $(git rev-parse --show-toplevel)
 + find . | xargs stat 2 /dev/null

The stat utility is not portable. But why not use git to do the
reading? Then you can get the benefit of core.preloadindex, and you will
not recurse into untracked directories that are ignored (i.e.,
ones that git would not go into anyway).

So maybe just run git status /dev/null?

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add posibility to preload stat information.

2013-03-20 Thread Junio C Hamano
Jeff King p...@peff.net writes:

 So maybe just run git status /dev/null?

In the background?  How often would it run?  I do not think a single
lockfile solves anything.  It may prevent simultaneous runs of two
such prime the well processes, but the same user may be working in
two separate repositories.

I do not see anything that prevents it from running in the same
repository over and over again, either.  prompt is a bad place to
do this kind of thing.


--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add posibility to preload stat information.

2013-03-20 Thread Fredrik Gustafsson
On Wed, Mar 20, 2013 at 10:45:22PM +0530, Ramkumar Ramachandra wrote:
 Fredrik Gustafsson wrote:
  When entering a git working dir, optionally run a forked process that
  stat all files in the whole workdir and therefore loads stat information
  to RAM which will speedup things like git status and so on.
 
 This is misleading.  You just execute the equivalent of `git status`
 everytime I request a prompt inside a git working directory.  And this
 is if I'm using __git_ps1() to augment my prompt, which I'm not- I use
 ZSH's vcs_info, which is arguably better.  Also, you forgot to say how
 to turn on the feature.

The invokation place is quesionable (Junio also had some thoughts about
that). I don't find vcs_info in the contrib/completition/. Do you have
any suggestion about where the best way is to inwoke this kind of thing?

I added documentation about how to turn the feature on, in the same way
the other features is documented. (Is there an other way/better way I
should do this?)

 
 That said, this feature is extremely gross; it thrashes my filesystem
 and hard drive.  Modern software is written to minimize IO, not
 maximize it!  I'm completely against the inclusion of this patch.

It's extremly gross. I don't like this, _but_ it does speed up my work.
I'm unsure if it should be included in git though (hence the RFC-tag).

 
 However, I would not mind a feature that runs `git status` the very
 first time I enter a git working directory: when I enter my clone of
 linux.git, it takes my first `git status` invocation a good ten
 seconds to complete, and we can fix this pretty easily.

That's the problem I try to solve. However the first time is
irrelevant. We will run git status a bit before we need it. If we enter
linux.git, do other work (in an other project) for an hour and go back
to linux.git our cache will probably be empty. We will need to run this
more than the first time. But still, we don't want it to run too
often. (Which is does now).

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add posibility to preload stat information.

2013-03-20 Thread Jeff King
On Wed, Mar 20, 2013 at 10:15:39AM -0700, Junio C Hamano wrote:

 Jeff King p...@peff.net writes:
 
  So maybe just run git status /dev/null?
 
 In the background?  How often would it run?  I do not think a single
 lockfile solves anything.  It may prevent simultaneous runs of two
 such prime the well processes, but the same user may be working in
 two separate repositories.

Yes, in the background (he invokes __git_recursive_stat already in the
background). I'd think you would want to run it whenever you enter a
repository.

 I do not see anything that prevents it from running in the same
 repository over and over again, either.  prompt is a bad place to
 do this kind of thing.

Yeah, I did not look closely at that. The commit message claims When
entering a git working dir, but the implementation runs it on each
prompt invocation, which is awful. I think you'd want to check to use
rev-parse to see if you have changed into a new git repo, and only run
it once then.

Which is still gross, and I have no interest whatsoever in this feature.
I was just trying to be positive and constructive to the original
submission. :)

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add posibility to preload stat information.

2013-03-20 Thread Ramkumar Ramachandra
Fredrik Gustafsson wrote:
 On Wed, Mar 20, 2013 at 10:45:22PM +0530, Ramkumar Ramachandra wrote:
 Fredrik Gustafsson wrote:
  When entering a git working dir, optionally run a forked process that
  stat all files in the whole workdir and therefore loads stat information
  to RAM which will speedup things like git status and so on.

 This is misleading.  You just execute the equivalent of `git status`
 everytime I request a prompt inside a git working directory.  And this
 is if I'm using __git_ps1() to augment my prompt, which I'm not- I use
 ZSH's vcs_info, which is arguably better.  Also, you forgot to say how
 to turn on the feature.

 The invokation place is quesionable (Junio also had some thoughts about
 that). I don't find vcs_info in the contrib/completition/. Do you have
 any suggestion about where the best way is to inwoke this kind of thing?

I think it should be a separate script in contrib/ that people can
just `eval` in their shell configs; zsh has a chpwd() function for
example, which seems like the right place to put such a thing.

 I added documentation about how to turn the feature on, in the same way
 the other features is documented. (Is there an other way/better way I
 should do this?)

No, I meant in the commit message.

 That said, this feature is extremely gross; it thrashes my filesystem
 and hard drive.  Modern software is written to minimize IO, not
 maximize it!  I'm completely against the inclusion of this patch.

 It's extremly gross. I don't like this, _but_ it does speed up my work.
 I'm unsure if it should be included in git though (hence the RFC-tag).

Yes, I would certainly like my git startup time to be improved.  But I
don't want to trade my hard drive's life for it.

 However, I would not mind a feature that runs `git status` the very
 first time I enter a git working directory: when I enter my clone of
 linux.git, it takes my first `git status` invocation a good ten
 seconds to complete, and we can fix this pretty easily.

 That's the problem I try to solve. However the first time is
 irrelevant. We will run git status a bit before we need it. If we enter
 linux.git, do other work (in an other project) for an hour and go back
 to linux.git our cache will probably be empty. We will need to run this
 more than the first time. But still, we don't want it to run too
 often. (Which is does now).

What I meant by first time is chpwd() into the git repository, not
further chpwd()s when already inside the git repository.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add posibility to preload stat information.

2013-03-20 Thread Fredrik Gustafsson
On Wed, Mar 20, 2013 at 11:19:38PM +0530, Ramkumar Ramachandra wrote:
 I think it should be a separate script in contrib/ that people can
 just `eval` in their shell configs; zsh has a chpwd() function for
 example, which seems like the right place to put such a thing.

I was trying to spare the number of calls to git rev-parse
--is-inside-work-tree. But maybe that is to fast to care about.

 No, I meant in the commit message.

Okay, thanks.

  That said, this feature is extremely gross; it thrashes my filesystem
  and hard drive.  Modern software is written to minimize IO, not
  maximize it!  I'm completely against the inclusion of this patch.
 
  It's extremly gross. I don't like this, _but_ it does speed up my work.
  I'm unsure if it should be included in git though (hence the RFC-tag).
 
 Yes, I would certainly like my git startup time to be improved.  But I
 don't want to trade my hard drive's life for it.

Does this really increase disk-reads? The fs-cache would make sure that
the disk reads is almost the same, we only do them before we usually do
them.

 What I meant by first time is chpwd() into the git repository, not
 further chpwd()s when already inside the git repository.

That's a good point. I'm not sure how to solve that though. Because it's
not a fact that you always go to the root git-dir first.

The only way I see this is with a lock-file that's kept and we only run
git status every 5 minutes when doing something inside a work dir. That
would add a lot of meta-data (the lock files), to store. (I hope I
successfully explained that).

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add posibility to preload stat information.

2013-03-20 Thread Fredrik Gustafsson
On Wed, Mar 20, 2013 at 12:48:06PM -0400, Jeff King wrote:
 Kind of gross, but I guess it is useful to some people.

Yes it is. The questions is if it's gross enough to never
leave my computer, or if someone else can find this useful.

 
  +__git_recursive_stat ()
  +{
  +   if test ! -e /tmp/gitbash.lock
  +   then
  +   touch /tmp/gitbash.lock
 
 This is a tmp-race security hole. E.g., do:
 
   ln -s /etc/nologin /tmp/gitbash.lock
 
 as a user; when root runs __git_recursive_stat, it will create
 /etc/nologin. It's not quite as bad as some other holes, because we only
 touch the file, not overwrite its contents, but you can see that it's
 possible to do some mischief.
 
 Should this maybe just be ~/.gitbash.lock or something?

Thank you! I totally missed that.

I guess a new solution would be to keep an access time-stamp in each
repository and with certain intervall run git status on that repository.

-- 
Med vänliga hälsningar
Fredrik Gustafsson

tel: 0733-608274
e-post: iv...@iveqy.com
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [RFC] Add posibility to preload stat information.

2013-03-20 Thread Jeff King
On Wed, Mar 20, 2013 at 07:36:41PM +0100, Fredrik Gustafsson wrote:

  Yes, I would certainly like my git startup time to be improved.  But I
  don't want to trade my hard drive's life for it.
 
 Does this really increase disk-reads? The fs-cache would make sure that
 the disk reads is almost the same, we only do them before we usually do
 them.

It shouldn't. But if you are running stat on every file in the repo
for each prompt, that is going to take measurable CPU time for large
repos (e.g., WebKit).

  What I meant by first time is chpwd() into the git repository, not
  further chpwd()s when already inside the git repository.
 
 That's a good point. I'm not sure how to solve that though. Because it's
 not a fact that you always go to the root git-dir first.
 
 The only way I see this is with a lock-file that's kept and we only run
 git status every 5 minutes when doing something inside a work dir. That
 would add a lot of meta-data (the lock files), to store. (I hope I
 successfully explained that).

How about something like:

  __git_primed_toplevel=
  __git_prime_dir() {
  local toplevel=`git rev-parse --show-toplevel 2/dev/null`
  if test -n $toplevel 
 test $toplevel != $git_primed_toplevel; then
  git status /dev/null 21
  git_primed_toplevel=$toplevel
  fi
  }

that would prime the whole repo the first time you enter it, but otherwise do
nothing (and you could run it from each prompt). If you switched back and forth
between two repos a lot, you would end up priming them both a lot, but that is
not that common a mode of operation (and you could keep an list of recently
primed repos instead of a single one, if you really wanted to deal with that).
You would also prime twice if you used two different terminals, but that's OK.
The subsequent ones are much faster due to disk cache, so this is really just
about not paying the extra stat penalty on _every_ prompt.

-Peff
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html