Re: Bug report about symlinks
René Scharfe writes: > Am 03.08.2014 um 19:19 schrieb Junio C Hamano: > >>> And do we need to use the threaded_ variant of the function here? >> >> Hmmm, this is a tangent, but you comment made me wonder if we also >> need to adjust preload_thread() in preload-index.c somehow, but we >> do not touch CE_UPTODATE there, so it probably is not necessary. > > The function calls ce_mark_uptodate(), which does set CE_UPTODATE. It > calls threaded_has_symlink_leading_path() before lstat() already, > however. (Since f62ce3de: Make index preloading check the whole path > to the file.) Yeah, by "we do not touch", I meant "for paths that is beyond a symlink, we do not touch" (i.e. we have that "continue" before lstat-match-then-mark sequence). >> The caller of refresh_cache_ent() is walking an array of sorted >> pathnames aka istate->cache[] in a single-threaded fashion, possibly >> with a pathspec to limit the scan. > > There are two direct callers (refresh_index(), refresh_cache_entry()) > and several indirect ones. Do we have a way to detect unsynchronized > parallel access to the has_symlink_leading_path()-cache? Checking the > full callers-of-callers tree manually looks a bit scary to me. The threaded variant is not used anybody outside preload-index, so currently we should be OK, I would think. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
Duy Nguyen writes: > Just checking, you meant CE_UPTODATE, not CE_VALID, right? CE_VALID is > only used with --assume-unchanged Yup. Thanks. -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
On Mon, Aug 4, 2014 at 12:19 AM, Junio C Hamano wrote: > I think you, who dug to find out where to add the check, already > know this, and I am writing this mainly for myself and for the list > archive, but when the knee-jerk "has-syjmlink-leading-path missing?" > reaction came to me, two obvious optimizations also came to my mind: > > - When checking a cache entry "a/b/c/d/e" and we find out "a/b/c" >is a symbolic link, we mark it as ~CE_VALID, but at the same >time, we learn "a/b/c/any/thing" are also ~CE_VALID with that >check, so we _could_, after the has_symlink_leading_path once >returns true, so there may be an opportunity to fast-forward the >scan, marking all paths under "a/b/c" as ~CE_VALID. > > - After finding out "a/b/c" is a symbolic link to cause anything >underneath marked as ~CE_VALID, we also know "a/" and "a/b/" >exists as real directories, so a later check for "a/b/some/thing" >can start checking at "a/b/some/" without checking "a/" and "a/b/". Just checking, you meant CE_UPTODATE, not CE_VALID, right? CE_VALID is only used with --assume-unchanged > The latter is more important as it is a much more common case that > the shape of the working tree not to change. > > But I think the implementation of has_symlink_leading_path() already > has these optimizations built inside around the (unfortunately named) > "struct cache_def", so it probably would not give us much boost to > implement such a fast-forwarding of the scan. Yeah my first thought is another flood of lstat(). But it looks like has_symlink_leading_path() tries hard to reduce lstat() already. We could disable this call in filesytems that do not support symlinks (e.g. vfat) but I guess that's just a minority. -- Duy -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
Am 03.08.2014 um 19:19 schrieb Junio C Hamano: René Scharfe writes: How about the patch below? Before it checks if an index entry exists in the work tree, it checks if its path includes a symlink. Honestly, I didn't expect the fix to be in the refresh-index code path, but doing this there sort of makes sense. I found it through observation, not understanding. Just looked for stat/lstat calls executed by git status for b/different and b/equal using strace; debugging printfs told me where they came from. And do we need to use the threaded_ variant of the function here? Hmmm, this is a tangent, but you comment made me wonder if we also need to adjust preload_thread() in preload-index.c somehow, but we do not touch CE_UPTODATE there, so it probably is not necessary. The function calls ce_mark_uptodate(), which does set CE_UPTODATE. It calls threaded_has_symlink_leading_path() before lstat() already, however. (Since f62ce3de: Make index preloading check the whole path to the file.) The caller of refresh_cache_ent() is walking an array of sorted pathnames aka istate->cache[] in a single-threaded fashion, possibly with a pathspec to limit the scan. There are two direct callers (refresh_index(), refresh_cache_entry()) and several indirect ones. Do we have a way to detect unsynchronized parallel access to the has_symlink_leading_path()-cache? Checking the full callers-of-callers tree manually looks a bit scary to me. Do you mean that we may want to make istate->cache[] scanned by multiple threads? I am not sure. No, I didn't want to suggest any performance improvements. I'm only interested in correctness for now. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
René Scharfe writes: > How about the patch below? Before it checks if an index entry exists > in the work tree, it checks if its path includes a symlink. Honestly, I didn't expect the fix to be in the refresh-index code path, but doing this there sort of makes sense. I think you, who dug to find out where to add the check, already know this, and I am writing this mainly for myself and for the list archive, but when the knee-jerk "has-syjmlink-leading-path missing?" reaction came to me, two obvious optimizations also came to my mind: - When checking a cache entry "a/b/c/d/e" and we find out "a/b/c" is a symbolic link, we mark it as ~CE_VALID, but at the same time, we learn "a/b/c/any/thing" are also ~CE_VALID with that check, so we _could_, after the has_symlink_leading_path once returns true, so there may be an opportunity to fast-forward the scan, marking all paths under "a/b/c" as ~CE_VALID. - After finding out "a/b/c" is a symbolic link to cause anything underneath marked as ~CE_VALID, we also know "a/" and "a/b/" exists as real directories, so a later check for "a/b/some/thing" can start checking at "a/b/some/" without checking "a/" and "a/b/". The latter is more important as it is a much more common case that the shape of the working tree not to change. But I think the implementation of has_symlink_leading_path() already has these optimizations built inside around the (unfortunately named) "struct cache_def", so it probably would not give us much boost to implement such a fast-forwarding of the scan. > And do we need to use the threaded_ variant of the function here? Hmmm, this is a tangent, but you comment made me wonder if we also need to adjust preload_thread() in preload-index.c somehow, but we do not touch CE_UPTODATE there, so it probably is not necessary. The caller of refresh_cache_ent() is walking an array of sorted pathnames aka istate->cache[] in a single-threaded fashion, possibly with a pathspec to limit the scan. Do you mean that we may want to make istate->cache[] scanned by multiple threads? I am not sure. > diff --git a/read-cache.c b/read-cache.c > index 5d3c8bd..6f0057f 100644 > --- a/read-cache.c > +++ b/read-cache.c > @@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct > index_state *istate, > return ce; > } > > + if (has_symlink_leading_path(ce->name, ce_namelen(ce))) { > + if (ignore_missing) > + return ce; > + if (err) > + *err = ENOENT; > + return NULL; > + } > + > if (lstat(ce->name, &st) < 0) { > if (ignore_missing && errno == ENOENT) > return ce; -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
Am 01.08.2014 um 18:23 schrieb Junio C Hamano: > René Scharfe writes: > >> # Create test repo with two directories with two files each. >> $ git init >> Initialized empty Git repository in /tmp/.git/ >> $ mkdir a b >> $ echo x >a/equal >> $ echo x >b/equal >> $ echo y >a/different >> $ echo z >b/different >> $ git add a b >> $ git commit -minitial >> [master (root-commit) 6d36895] initial >> 4 files changed, 4 insertions(+) >> create mode 100644 a/different >> create mode 100644 a/equal >> create mode 100644 b/different >> create mode 100644 b/equal >> >> # Replace one directory with a symlink to the other. >> $ rm -rf b >> $ ln -s a b >> >> # First git status call. >> $ git status >> On branch master >> Changes not staged for commit: >>(use "git add/rm ..." to update what will be committed) >>(use "git checkout -- ..." to discard changes in working directory) >> >> deleted:b/different >> >> Untracked files: >>(use "git add ..." to include in what will be committed) >> >> b >> >> no changes added to commit (use "git add" and/or "git commit -a") >> ... >> >> It could be debated if the first git status call should also report >> b/equal as deleted. > > Hmph. > > I wonder if "could be" is an understatement. The difference of > reporting between b/equal and b/different may indicate that somebody > is mistakenly comparing the index with these paths, without first > checking each path with has_symlink_leading_path(), or something, > no? How about the patch below? Before it checks if an index entry exists in the work tree, it checks if its path includes a symlink. After the patch, git status reports b/equal as deleted, too. The test suite still passes. And do we need to use the threaded_ variant of the function here? diff --git a/read-cache.c b/read-cache.c index 5d3c8bd..6f0057f 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate, return ce; } + if (has_symlink_leading_path(ce->name, ce_namelen(ce))) { + if (ignore_missing) + return ce; + if (err) + *err = ENOENT; + return NULL; + } + if (lstat(ce->name, &st) < 0) { if (ignore_missing && errno == ENOENT) return ce; -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
On wo, 2014-07-30 at 15:30 +0400, NickKolok wrote: > Greetings from Russia, comrads! > > I've noticed something strange with git status when replacing a folder with > symlink to another folder. > There is a git repo with script with demo in the attachment. I think there is a bug here: + mkdir bug + cd bug + git init Initialized empty Git repository in /tmp/bug/.git/ + mkdir dir1 dir2 + echo 1 + echo 1 + echo 2a + echo 2b + git add dir1/1.txt dir1/2.txt dir2/1.txt dir2/2.txt + git commit -m first [master (root-commit) b60ecc8] first 4 files changed, 4 insertions(+) create mode 100644 dir1/1.txt create mode 100644 dir1/2.txt create mode 100644 dir2/1.txt create mode 100644 dir2/2.txt + rm -r dir2 + ln -s dir1 dir2 + git status On branch master Changes not staged for commit: (use "git add/rm ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) deleted:dir2/2.txt Untracked files: (use "git add ..." to include in what will be committed) dir2 no changes added to commit (use "git add" and/or "git commit -a") It looks like git status is thinking dir2/1.txt still exists with the same content, even though dir2 is gone, and now an untracked symlink. Moreover, git diff and git status disagree with each other: dennis@spirit:/tmp/bug$ git status On branch master Changes not staged for commit: (use "git add/rm ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) deleted:dir2/2.txt Untracked files: (use "git add ..." to include in what will be committed) dir2 no changes added to commit (use "git add" and/or "git commit -a") dennis@spirit:/tmp/bug$ git --no-pager diff diff --git a/dir2/1.txt b/dir2/1.txt deleted file mode 100644 index d00491f..000 --- a/dir2/1.txt +++ /dev/null @@ -1 +0,0 @@ -1 diff --git a/dir2/2.txt b/dir2/2.txt deleted file mode 100644 index b8a4cf4..000 --- a/dir2/2.txt +++ /dev/null @@ -1 +0,0 @@ -2b -- Dennis Kaarsemaker www.kaarsemaker.net -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
René Scharfe writes: > # Create test repo with two directories with two files each. > $ git init > Initialized empty Git repository in /tmp/.git/ > $ mkdir a b > $ echo x >a/equal > $ echo x >b/equal > $ echo y >a/different > $ echo z >b/different > $ git add a b > $ git commit -minitial > [master (root-commit) 6d36895] initial > 4 files changed, 4 insertions(+) > create mode 100644 a/different > create mode 100644 a/equal > create mode 100644 b/different > create mode 100644 b/equal > > # Replace one directory with a symlink to the other. > $ rm -rf b > $ ln -s a b > > # First git status call. > $ git status > On branch master > Changes not staged for commit: > (use "git add/rm ..." to update what will be committed) > (use "git checkout -- ..." to discard changes in working directory) > > deleted:b/different > > Untracked files: > (use "git add ..." to include in what will be committed) > > b > > no changes added to commit (use "git add" and/or "git commit -a") > ... > > It could be debated if the first git status call should also report > b/equal as deleted. Hmph. I wonder if "could be" is an understatement. The difference of reporting between b/equal and b/different may indicate that somebody is mistakenly comparing the index with these paths, without first checking each path with has_symlink_leading_path(), or something, no? Or am I misreading the report and codepath? Puzzled... -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
Am 31.07.2014 um 21:50 schrieb Nikolay Avdeev: I've noticed something strange with git status when replacing a folder with symlink to another folder. There is a git repo with script with demo in the attachment. Let's try and make this a bit easier for folks to follow along. # Create test repo with two directories with two files each. $ git init Initialized empty Git repository in /tmp/.git/ $ mkdir a b $ echo x >a/equal $ echo x >b/equal $ echo y >a/different $ echo z >b/different $ git add a b $ git commit -minitial [master (root-commit) 6d36895] initial 4 files changed, 4 insertions(+) create mode 100644 a/different create mode 100644 a/equal create mode 100644 b/different create mode 100644 b/equal # Replace one directory with a symlink to the other. $ rm -rf b $ ln -s a b # First git status call. $ git status On branch master Changes not staged for commit: (use "git add/rm ..." to update what will be committed) (use "git checkout -- ..." to discard changes in working directory) deleted:b/different Untracked files: (use "git add ..." to include in what will be committed) b no changes added to commit (use "git add" and/or "git commit -a") # Stage the changes. $ git add b # Second git status call. $ git status On branch master Changes to be committed: (use "git reset HEAD ..." to unstage) new file: b deleted:b/different deleted:b/equal # Commit the staged changes. $ git commit -msymlinked [master 4498f28] symlinked 3 files changed, 1 insertion(+), 2 deletions(-) create mode 12 b delete mode 100644 b/different delete mode 100644 b/equal That the first and second status call report different results is a feature; staging changes using git add alters the status. A commit after the first status call would be empty. It could be debated if the first git status call should also report b/equal as deleted. René -- To unsubscribe from this list: send the line "unsubscribe git" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html