Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache

2018-02-14 Thread Junio C Hamano
On Tue, Feb 13, 2018 at 5:24 PM, Duy Nguyen  wrote:
> I am worried that always doing the right thing may carry performance
> penalty (this is based purely on reading verify_path() code, no actual
> benchmarking). For safety, you can always set safe_path to zero. But
> if you do a lot of invalidation and something starts to slow down,
> then you can consider setting safe_path to 1 (if it's actually safe to
> do so).

Fair enough. Thanks for articulating the reasoning.


Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache

2018-02-13 Thread Duy Nguyen
On Wed, Feb 14, 2018 at 12:57 AM, Junio C Hamano  wrote:
> Duy Nguyen  writes:
>
>> It's very tempting considering that the amount of changes is much
>> smaller. But I think we should go with my version. The hope is when a
>> _new_ call site appears, the author would think twice before passing
>> zero or one to the safe_path argument.
>
> Wouldn't it be a better API if the author of new callsite does not
> have to think twice and can instead rely on the called function
> untracked_cache_invalidate_path() to always do the right thing?

I am worried that always doing the right thing may carry performance
penalty (this is based purely on reading verify_path() code, no actual
benchmarking). For safety, you can always set safe_path to zero. But
if you do a lot of invalidation and something starts to slow down,
then you can consider setting safe_path to 1 (if it's actually safe to
do so). I think we do mass invalidation in some case, so I will try to
actually benchmark that and see if this safe_path argument is
justified or if we can always call verify_path().
-- 
Duy


Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache

2018-02-13 Thread Junio C Hamano
Duy Nguyen  writes:

> It's very tempting considering that the amount of changes is much
> smaller. But I think we should go with my version. The hope is when a
> _new_ call site appears, the author would think twice before passing
> zero or one to the safe_path argument.

Wouldn't it be a better API if the author of new callsite does not
have to think twice and can instead rely on the called function
untracked_cache_invalidate_path() to always do the right thing?




Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache

2018-02-13 Thread Duy Nguyen
On Wed, Feb 7, 2018 at 11:59 PM, Ben Peart  wrote:
> diff --git a/dir.c b/dir.c
> index 7c4b45e30e..d431da46f5 100644
> --- a/dir.c
> +++ b/dir.c
> @@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct
> dir_struct *dir,
> if (!de)
> return treat_path_fast(dir, untracked, cdir, istate, path,
>baselen, pathspec);
> -   if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
> +   if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
> return path_none;
> strbuf_setlen(path, baselen);
> strbuf_addstr(path, de->d_name);
> diff --git a/fsmonitor.c b/fsmonitor.c
> index 0af7c4edba..019576f306 100644
> --- a/fsmonitor.c
> +++ b/fsmonitor.c
> @@ -118,8 +118,12 @@ static int query_fsmonitor(int version, uint64_t
> last_update, struct strbuf *que
>
>  static void fsmonitor_refresh_callback(struct index_state *istate, const
> char *name)
>  {
> -   int pos = index_name_pos(istate, name, strlen(name));
> +   int pos;
>
> +   if (!verify_path(name))
> +   return;
> +
> +   pos = index_name_pos(istate, name, strlen(name));
> if (pos >= 0) {
> struct cache_entry *ce = istate->cache[pos];
> ce->ce_flags &= ~CE_FSMONITOR_VALID;
>

It's very tempting considering that the amount of changes is much
smaller. But I think we should go with my version. The hope is when a
_new_ call site appears, the author would think twice before passing
zero or one to the safe_path argument.

> diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
> index eb2d13bbcf..756beb0d8e 100755
> --- a/t/t7519-status-fsmonitor.sh
> +++ b/t/t7519-status-fsmonitor.sh
> @@ -314,4 +314,43 @@ test_expect_success 'splitting the index results in the
> same state' '
> test_cmp expect actual
>  '
>
> +test_expect_success UNTRACKED_CACHE 'ignore .git changes when invalidating
> UNTR' '
> +   test_create_repo dot-git &&
> +   (
> +   cd dot-git &&
> +   mkdir -p .git/hooks &&
> +   : >tracked &&
> +   : >modified &&
> +   mkdir dir1 &&
> +   : >dir1/tracked &&
> +   : >dir1/modified &&
> +   mkdir dir2 &&
> +   : >dir2/tracked &&
> +   : >dir2/modified &&
> +   write_integration_script &&
> +   git config core.fsmonitor .git/hooks/fsmonitor-test &&
> +   git update-index --untracked-cache &&
> +   git update-index --fsmonitor &&
> +   GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-before" \
> +   git status &&
> +   test-dump-untracked-cache >../before
> +   ) &&
> +   cat >>dot-git/.git/hooks/fsmonitor-test <<-\EOF &&
> +   printf ".git\0"
> +   printf ".git/index\0"
> +   printf "dir1/.git\0"
> +   printf "dir1/.git/index\0"
> +   EOF
> +   (
> +   cd dot-git &&
> +   GIT_TRACE_UNTRACKED_STATS="$TRASH_DIRECTORY/trace-after" \
> +   git status &&
> +   test-dump-untracked-cache >../after
> +   ) &&
> +   grep "directory invalidation" trace-before >>before &&
> +   grep "directory invalidation" trace-after >>after &&
> +   # UNTR extension unchanged, dir invalidation count unchanged
> +   test_cmp before after
> +'
> +
>  test_done
>
> base-commit: 5be1f00a9a701532232f57958efab4be8c959a29
> --
> 2.15.0.windows.1
>



-- 
Duy


Re: [PATCH v2] dir.c: ignore paths containing .git when invalidating untracked cache

2018-02-07 Thread Ben Peart



On 2/7/2018 4:21 AM, Nguyễn Thái Ngọc Duy wrote:

read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.

This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".

Without this patch, we invalidate the root directory unncessarily,
which:

- makes read_directory() fall back to slow path for root directory
   (slower)

- makes the index dirty (because UNTR extension is updated). Depending
   on the index size, writing it down could also be slow.



Thank you again, this patch makes much more sense to me.


A note about the new "safe_path" knob. Since this new check could be
relatively expensive, avoid it when we know it's not needed. If the
path comes from the index, it can't contain ".git". If it does
contain, we may be screwed up at many more levels, not just this one.



I do have a simplifying suggestion to make.  I noticed that other uses 
of verify_path() check when the potentially erroneous path is passed in 
and then all the underlying code can assume it is valid.  I think that 
makes sense here as well and it makes for a smaller patch.




diff --git a/fsmonitor.h b/fsmonitor.h
index cd3cc0ccf2..65f3743636 100644
--- a/fsmonitor.h
+++ b/fsmonitor.h
@@ -65,7 +65,7 @@ static inline void mark_fsmonitor_invalid(struct index_state 
*istate, struct cac
  {
if (core_fsmonitor) {
ce->ce_flags &= ~CE_FSMONITOR_VALID;
-   untracked_cache_invalidate_path(istate, ce->name);
+   untracked_cache_invalidate_path(istate, ce->name, 1);


This test isn't needed because we're pulling the name right out of the 
cache entry so it doesn't need to be verified.


Here is a modified version of your patch for consideration:



read_directory() code ignores all paths named ".git" even if it's not
a valid git repository. See treat_path() for details. Since ".git" is
basically invisible to read_directory(), when we are asked to
invalidate a path that contains ".git", we can safely ignore it
because the slow path would not consider it anyway.

This helps when fsmonitor is used and we have a real ".git" repo at
worktree top. Occasionally .git/index will be updated and if the
fsmonitor hook does not filter it, untracked cache is asked to
invalidate the path ".git/index".

Without this patch, we invalidate the root directory unnecessarily,
which:

- makes read_directory() fall back to slow path for root directory
  (slower)

- makes the index dirty (because UNTR extension is updated). Depending
  on the index size, writing it down could also be slow.

Noticed-by: Ævar Arnfjörð Bjarmason 
Signed-off-by: Nguyễn Thái Ngọc Duy 
Signed-off-by: Ben Peart 
---

Notes:
Base Ref: master
Web-Diff: https://github.com/benpeart/git/commit/218a577618
Checkout: git fetch https://github.com/benpeart/git verify_path-v1 
&& git checkout 218a577618


 dir.c   |  2 +-
 fsmonitor.c |  6 +-
 t/t7519-status-fsmonitor.sh | 39 +++
 3 files changed, 45 insertions(+), 2 deletions(-)

diff --git a/dir.c b/dir.c
index 7c4b45e30e..d431da46f5 100644
--- a/dir.c
+++ b/dir.c
@@ -1773,7 +1773,7 @@ static enum path_treatment treat_path(struct 
dir_struct *dir,

if (!de)
return treat_path_fast(dir, untracked, cdir, istate, path,
   baselen, pathspec);
-   if (is_dot_or_dotdot(de->d_name) || !strcmp(de->d_name, ".git"))
+   if (is_dot_or_dotdot(de->d_name) || !fspathcmp(de->d_name, ".git"))
return path_none;
strbuf_setlen(path, baselen);
strbuf_addstr(path, de->d_name);
diff --git a/fsmonitor.c b/fsmonitor.c
index 0af7c4edba..019576f306 100644
--- a/fsmonitor.c
+++ b/fsmonitor.c
@@ -118,8 +118,12 @@ static int query_fsmonitor(int version, uint64_t 
last_update, struct strbuf *que


 static void fsmonitor_refresh_callback(struct index_state *istate, 
const char *name)

 {
-   int pos = index_name_pos(istate, name, strlen(name));
+   int pos;

+   if (!verify_path(name))
+   return;
+
+   pos = index_name_pos(istate, name, strlen(name));
if (pos >= 0) {
struct cache_entry *ce = istate->cache[pos];
ce->ce_flags &= ~CE_FSMONITOR_VALID;
diff --git a/t/t7519-status-fsmonitor.sh b/t/t7519-status-fsmonitor.sh
index eb2d13bbcf..756beb0d8e 100755
--- a/t/t7519-status-fsmonitor.sh
+++ b/t/t7519-status-fsmonitor.sh
@@ -314,4 +314,43 @@ test_expect_success 'splitting the