Re: Bug report about symlinks
On Mon, Aug 4, 2014 at 12:19 AM, Junio C Hamano gits...@pobox.com wrote: I think you, who dug to find out where to add the check, already know this, and I am writing this mainly for myself and for the list archive, but when the knee-jerk has-syjmlink-leading-path missing? reaction came to me, two obvious optimizations also came to my mind: - When checking a cache entry a/b/c/d/e and we find out a/b/c is a symbolic link, we mark it as ~CE_VALID, but at the same time, we learn a/b/c/any/thing are also ~CE_VALID with that check, so we _could_, after the has_symlink_leading_path once returns true, so there may be an opportunity to fast-forward the scan, marking all paths under a/b/c as ~CE_VALID. - After finding out a/b/c is a symbolic link to cause anything underneath marked as ~CE_VALID, we also know a/ and a/b/ exists as real directories, so a later check for a/b/some/thing can start checking at a/b/some/ without checking a/ and a/b/. Just checking, you meant CE_UPTODATE, not CE_VALID, right? CE_VALID is only used with --assume-unchanged The latter is more important as it is a much more common case that the shape of the working tree not to change. But I think the implementation of has_symlink_leading_path() already has these optimizations built inside around the (unfortunately named) struct cache_def, so it probably would not give us much boost to implement such a fast-forwarding of the scan. Yeah my first thought is another flood of lstat(). But it looks like has_symlink_leading_path() tries hard to reduce lstat() already. We could disable this call in filesytems that do not support symlinks (e.g. vfat) but I guess that's just a minority. -- Duy -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
Duy Nguyen pclo...@gmail.com writes: Just checking, you meant CE_UPTODATE, not CE_VALID, right? CE_VALID is only used with --assume-unchanged Yup. Thanks. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
René Scharfe l@web.de writes: Am 03.08.2014 um 19:19 schrieb Junio C Hamano: And do we need to use the threaded_ variant of the function here? Hmmm, this is a tangent, but you comment made me wonder if we also need to adjust preload_thread() in preload-index.c somehow, but we do not touch CE_UPTODATE there, so it probably is not necessary. The function calls ce_mark_uptodate(), which does set CE_UPTODATE. It calls threaded_has_symlink_leading_path() before lstat() already, however. (Since f62ce3de: Make index preloading check the whole path to the file.) Yeah, by we do not touch, I meant for paths that is beyond a symlink, we do not touch (i.e. we have that continue before lstat-match-then-mark sequence). The caller of refresh_cache_ent() is walking an array of sorted pathnames aka istate-cache[] in a single-threaded fashion, possibly with a pathspec to limit the scan. There are two direct callers (refresh_index(), refresh_cache_entry()) and several indirect ones. Do we have a way to detect unsynchronized parallel access to the has_symlink_leading_path()-cache? Checking the full callers-of-callers tree manually looks a bit scary to me. The threaded variant is not used anybody outside preload-index, so currently we should be OK, I would think. -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
René Scharfe l@web.de writes: How about the patch below? Before it checks if an index entry exists in the work tree, it checks if its path includes a symlink. Honestly, I didn't expect the fix to be in the refresh-index code path, but doing this there sort of makes sense. I think you, who dug to find out where to add the check, already know this, and I am writing this mainly for myself and for the list archive, but when the knee-jerk has-syjmlink-leading-path missing? reaction came to me, two obvious optimizations also came to my mind: - When checking a cache entry a/b/c/d/e and we find out a/b/c is a symbolic link, we mark it as ~CE_VALID, but at the same time, we learn a/b/c/any/thing are also ~CE_VALID with that check, so we _could_, after the has_symlink_leading_path once returns true, so there may be an opportunity to fast-forward the scan, marking all paths under a/b/c as ~CE_VALID. - After finding out a/b/c is a symbolic link to cause anything underneath marked as ~CE_VALID, we also know a/ and a/b/ exists as real directories, so a later check for a/b/some/thing can start checking at a/b/some/ without checking a/ and a/b/. The latter is more important as it is a much more common case that the shape of the working tree not to change. But I think the implementation of has_symlink_leading_path() already has these optimizations built inside around the (unfortunately named) struct cache_def, so it probably would not give us much boost to implement such a fast-forwarding of the scan. And do we need to use the threaded_ variant of the function here? Hmmm, this is a tangent, but you comment made me wonder if we also need to adjust preload_thread() in preload-index.c somehow, but we do not touch CE_UPTODATE there, so it probably is not necessary. The caller of refresh_cache_ent() is walking an array of sorted pathnames aka istate-cache[] in a single-threaded fashion, possibly with a pathspec to limit the scan. Do you mean that we may want to make istate-cache[] scanned by multiple threads? I am not sure. diff --git a/read-cache.c b/read-cache.c index 5d3c8bd..6f0057f 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate, return ce; } + if (has_symlink_leading_path(ce-name, ce_namelen(ce))) { + if (ignore_missing) + return ce; + if (err) + *err = ENOENT; + return NULL; + } + if (lstat(ce-name, st) 0) { if (ignore_missing errno == ENOENT) return ce; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
Am 03.08.2014 um 19:19 schrieb Junio C Hamano: René Scharfe l@web.de writes: How about the patch below? Before it checks if an index entry exists in the work tree, it checks if its path includes a symlink. Honestly, I didn't expect the fix to be in the refresh-index code path, but doing this there sort of makes sense. I found it through observation, not understanding. Just looked for stat/lstat calls executed by git status for b/different and b/equal using strace; debugging printfs told me where they came from. And do we need to use the threaded_ variant of the function here? Hmmm, this is a tangent, but you comment made me wonder if we also need to adjust preload_thread() in preload-index.c somehow, but we do not touch CE_UPTODATE there, so it probably is not necessary. The function calls ce_mark_uptodate(), which does set CE_UPTODATE. It calls threaded_has_symlink_leading_path() before lstat() already, however. (Since f62ce3de: Make index preloading check the whole path to the file.) The caller of refresh_cache_ent() is walking an array of sorted pathnames aka istate-cache[] in a single-threaded fashion, possibly with a pathspec to limit the scan. There are two direct callers (refresh_index(), refresh_cache_entry()) and several indirect ones. Do we have a way to detect unsynchronized parallel access to the has_symlink_leading_path()-cache? Checking the full callers-of-callers tree manually looks a bit scary to me. Do you mean that we may want to make istate-cache[] scanned by multiple threads? I am not sure. No, I didn't want to suggest any performance improvements. I'm only interested in correctness for now. René -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
Am 01.08.2014 um 18:23 schrieb Junio C Hamano: René Scharfe l@web.de writes: # Create test repo with two directories with two files each. $ git init Initialized empty Git repository in /tmp/.git/ $ mkdir a b $ echo x a/equal $ echo x b/equal $ echo y a/different $ echo z b/different $ git add a b $ git commit -minitial [master (root-commit) 6d36895] initial 4 files changed, 4 insertions(+) create mode 100644 a/different create mode 100644 a/equal create mode 100644 b/different create mode 100644 b/equal # Replace one directory with a symlink to the other. $ rm -rf b $ ln -s a b # First git status call. $ git status On branch master Changes not staged for commit: (use git add/rm file... to update what will be committed) (use git checkout -- file... to discard changes in working directory) deleted:b/different Untracked files: (use git add file... to include in what will be committed) b no changes added to commit (use git add and/or git commit -a) ... It could be debated if the first git status call should also report b/equal as deleted. Hmph. I wonder if could be is an understatement. The difference of reporting between b/equal and b/different may indicate that somebody is mistakenly comparing the index with these paths, without first checking each path with has_symlink_leading_path(), or something, no? How about the patch below? Before it checks if an index entry exists in the work tree, it checks if its path includes a symlink. After the patch, git status reports b/equal as deleted, too. The test suite still passes. And do we need to use the threaded_ variant of the function here? diff --git a/read-cache.c b/read-cache.c index 5d3c8bd..6f0057f 100644 --- a/read-cache.c +++ b/read-cache.c @@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct index_state *istate, return ce; } + if (has_symlink_leading_path(ce-name, ce_namelen(ce))) { + if (ignore_missing) + return ce; + if (err) + *err = ENOENT; + return NULL; + } + if (lstat(ce-name, st) 0) { if (ignore_missing errno == ENOENT) return ce; -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
René Scharfe l@web.de writes: # Create test repo with two directories with two files each. $ git init Initialized empty Git repository in /tmp/.git/ $ mkdir a b $ echo x a/equal $ echo x b/equal $ echo y a/different $ echo z b/different $ git add a b $ git commit -minitial [master (root-commit) 6d36895] initial 4 files changed, 4 insertions(+) create mode 100644 a/different create mode 100644 a/equal create mode 100644 b/different create mode 100644 b/equal # Replace one directory with a symlink to the other. $ rm -rf b $ ln -s a b # First git status call. $ git status On branch master Changes not staged for commit: (use git add/rm file... to update what will be committed) (use git checkout -- file... to discard changes in working directory) deleted:b/different Untracked files: (use git add file... to include in what will be committed) b no changes added to commit (use git add and/or git commit -a) ... It could be debated if the first git status call should also report b/equal as deleted. Hmph. I wonder if could be is an understatement. The difference of reporting between b/equal and b/different may indicate that somebody is mistakenly comparing the index with these paths, without first checking each path with has_symlink_leading_path(), or something, no? Or am I misreading the report and codepath? Puzzled... -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: Bug report about symlinks
On wo, 2014-07-30 at 15:30 +0400, NickKolok wrote: Greetings from Russia, comrads! I've noticed something strange with git status when replacing a folder with symlink to another folder. There is a git repo with script with demo in the attachment. I think there is a bug here: + mkdir bug + cd bug + git init Initialized empty Git repository in /tmp/bug/.git/ + mkdir dir1 dir2 + echo 1 + echo 1 + echo 2a + echo 2b + git add dir1/1.txt dir1/2.txt dir2/1.txt dir2/2.txt + git commit -m first [master (root-commit) b60ecc8] first 4 files changed, 4 insertions(+) create mode 100644 dir1/1.txt create mode 100644 dir1/2.txt create mode 100644 dir2/1.txt create mode 100644 dir2/2.txt + rm -r dir2 + ln -s dir1 dir2 + git status On branch master Changes not staged for commit: (use git add/rm file... to update what will be committed) (use git checkout -- file... to discard changes in working directory) deleted:dir2/2.txt Untracked files: (use git add file... to include in what will be committed) dir2 no changes added to commit (use git add and/or git commit -a) It looks like git status is thinking dir2/1.txt still exists with the same content, even though dir2 is gone, and now an untracked symlink. Moreover, git diff and git status disagree with each other: dennis@spirit:/tmp/bug$ git status On branch master Changes not staged for commit: (use git add/rm file... to update what will be committed) (use git checkout -- file... to discard changes in working directory) deleted:dir2/2.txt Untracked files: (use git add file... to include in what will be committed) dir2 no changes added to commit (use git add and/or git commit -a) dennis@spirit:/tmp/bug$ git --no-pager diff diff --git a/dir2/1.txt b/dir2/1.txt deleted file mode 100644 index d00491f..000 --- a/dir2/1.txt +++ /dev/null @@ -1 +0,0 @@ -1 diff --git a/dir2/2.txt b/dir2/2.txt deleted file mode 100644 index b8a4cf4..000 --- a/dir2/2.txt +++ /dev/null @@ -1 +0,0 @@ -2b -- Dennis Kaarsemaker www.kaarsemaker.net -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug report about symlinks
Greetings from Russia, comrads! I've noticed something strange with git status when replacing a folder with symlink to another folder. There is a git repo with script with demo in the attachment. Yours sincerely, NickKolok aka Nikolay Avdeev. Доброго времени суток, товарищи! При замене папки на симлинк на другую папку с похожими, но не одинаковыми файлами git status ведёт себя странно. Прилагаю архив с репозиторием. В скрипте - пример и пояснения. С уважением, Николай Авдеев aka NickKolok. git-bug-demo.tar.bz2 Description: application/bzip
Re: Bug report about symlinks
Am 31.07.2014 um 21:50 schrieb Nikolay Avdeev: I've noticed something strange with git status when replacing a folder with symlink to another folder. There is a git repo with script with demo in the attachment. Let's try and make this a bit easier for folks to follow along. # Create test repo with two directories with two files each. $ git init Initialized empty Git repository in /tmp/.git/ $ mkdir a b $ echo x a/equal $ echo x b/equal $ echo y a/different $ echo z b/different $ git add a b $ git commit -minitial [master (root-commit) 6d36895] initial 4 files changed, 4 insertions(+) create mode 100644 a/different create mode 100644 a/equal create mode 100644 b/different create mode 100644 b/equal # Replace one directory with a symlink to the other. $ rm -rf b $ ln -s a b # First git status call. $ git status On branch master Changes not staged for commit: (use git add/rm file... to update what will be committed) (use git checkout -- file... to discard changes in working directory) deleted:b/different Untracked files: (use git add file... to include in what will be committed) b no changes added to commit (use git add and/or git commit -a) # Stage the changes. $ git add b # Second git status call. $ git status On branch master Changes to be committed: (use git reset HEAD file... to unstage) new file: b deleted:b/different deleted:b/equal # Commit the staged changes. $ git commit -msymlinked [master 4498f28] symlinked 3 files changed, 1 insertion(+), 2 deletions(-) create mode 12 b delete mode 100644 b/different delete mode 100644 b/equal That the first and second status call report different results is a feature; staging changes using git add alters the status. A commit after the first status call would be empty. It could be debated if the first git status call should also report b/equal as deleted. René -- To unsubscribe from this list: send the line unsubscribe git in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
Bug report about symlinks
Greetings from Russia, comrads! I've noticed something strange with git status when replacing a folder with symlink to another folder. There is a git repo with script with demo in the attachment. Yours sincerely, NickKolok aka Nikolay Avdeev. Доброго времени суток, товарищи! При замене папки на симлинк на другую папку с похожими, но не одинаковыми файлами git status ведёт себя странно. Прилагаю архив с репозиторием. В скрипте - пример и пояснения. С уважением, Николай Авдеев aka NickKolok. git-bug-demo.tar.bz2 Description: Binary data