Re: Bug report about symlinks

2014-08-04 Thread Duy Nguyen
On Mon, Aug 4, 2014 at 12:19 AM, Junio C Hamano gits...@pobox.com wrote:
 I think you, who dug to find out where to add the check, already
 know this, and I am writing this mainly for myself and for the list
 archive, but when the knee-jerk has-syjmlink-leading-path missing?
 reaction came to me, two obvious optimizations also came to my mind:

  - When checking a cache entry a/b/c/d/e and we find out a/b/c
is a symbolic link, we mark it as ~CE_VALID, but at the same
time, we learn a/b/c/any/thing are also ~CE_VALID with that
check, so we _could_, after the has_symlink_leading_path once
returns true, so there may be an opportunity to fast-forward the
scan, marking all paths under a/b/c as ~CE_VALID.

  - After finding out a/b/c is a symbolic link to cause anything
underneath marked as ~CE_VALID, we also know a/ and a/b/
exists as real directories, so a later check for a/b/some/thing
can start checking at a/b/some/ without checking a/ and a/b/.

Just checking, you meant CE_UPTODATE, not CE_VALID, right? CE_VALID is
only used with --assume-unchanged

 The latter is more important as it is a much more common case that
 the shape of the working tree not to change.

 But I think the implementation of has_symlink_leading_path() already
 has these optimizations built inside around the (unfortunately named)
 struct cache_def, so it probably would not give us much boost to
 implement such a fast-forwarding of the scan.

Yeah my first thought is another flood of lstat(). But it looks like
has_symlink_leading_path() tries hard to reduce lstat() already. We
could disable this call in filesytems that do not support symlinks
(e.g. vfat) but I guess that's just a minority.
-- 
Duy
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-04 Thread Junio C Hamano
Duy Nguyen pclo...@gmail.com writes:

 Just checking, you meant CE_UPTODATE, not CE_VALID, right? CE_VALID is
 only used with --assume-unchanged

Yup.  Thanks.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-04 Thread Junio C Hamano
René Scharfe l@web.de writes:

 Am 03.08.2014 um 19:19 schrieb Junio C Hamano:

 And do we need to use the threaded_ variant of the function here?

 Hmmm, this is a tangent, but you comment made me wonder if we also
 need to adjust preload_thread() in preload-index.c somehow, but we
 do not touch CE_UPTODATE there, so it probably is not necessary.

 The function calls ce_mark_uptodate(), which does set CE_UPTODATE.  It
 calls threaded_has_symlink_leading_path() before lstat() already,
 however.  (Since f62ce3de: Make index preloading check the whole path
 to the file.)

Yeah, by we do not touch, I meant for paths that is beyond a
symlink, we do not touch (i.e. we have that continue before
lstat-match-then-mark sequence).

 The caller of refresh_cache_ent() is walking an array of sorted
 pathnames aka istate-cache[] in a single-threaded fashion, possibly
 with a pathspec to limit the scan.

 There are two direct callers (refresh_index(), refresh_cache_entry())
 and several indirect ones.  Do we have a way to detect unsynchronized
 parallel access to the has_symlink_leading_path()-cache?  Checking the
 full callers-of-callers tree manually looks a bit scary to me.

The threaded variant is not used anybody outside preload-index, so
currently we should be OK, I would think.
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-03 Thread Junio C Hamano
René Scharfe l@web.de writes:

 How about the patch below?  Before it checks if an index entry exists
 in the work tree, it checks if its path includes a symlink.

Honestly, I didn't expect the fix to be in the refresh-index code
path, but doing this there sort of makes sense.

I think you, who dug to find out where to add the check, already
know this, and I am writing this mainly for myself and for the list
archive, but when the knee-jerk has-syjmlink-leading-path missing?
reaction came to me, two obvious optimizations also came to my mind:

 - When checking a cache entry a/b/c/d/e and we find out a/b/c
   is a symbolic link, we mark it as ~CE_VALID, but at the same
   time, we learn a/b/c/any/thing are also ~CE_VALID with that
   check, so we _could_, after the has_symlink_leading_path once
   returns true, so there may be an opportunity to fast-forward the
   scan, marking all paths under a/b/c as ~CE_VALID.

 - After finding out a/b/c is a symbolic link to cause anything
   underneath marked as ~CE_VALID, we also know a/ and a/b/
   exists as real directories, so a later check for a/b/some/thing
   can start checking at a/b/some/ without checking a/ and a/b/.

The latter is more important as it is a much more common case that
the shape of the working tree not to change.

But I think the implementation of has_symlink_leading_path() already
has these optimizations built inside around the (unfortunately named)
struct cache_def, so it probably would not give us much boost to
implement such a fast-forwarding of the scan.

 And do we need to use the threaded_ variant of the function here?

Hmmm, this is a tangent, but you comment made me wonder if we also
need to adjust preload_thread() in preload-index.c somehow, but we
do not touch CE_UPTODATE there, so it probably is not necessary.

The caller of refresh_cache_ent() is walking an array of sorted
pathnames aka istate-cache[] in a single-threaded fashion, possibly
with a pathspec to limit the scan.  Do you mean that we may want to
make istate-cache[] scanned by multiple threads?  I am not sure.

 diff --git a/read-cache.c b/read-cache.c
 index 5d3c8bd..6f0057f 100644
 --- a/read-cache.c
 +++ b/read-cache.c
 @@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct 
 index_state *istate,
   return ce;
   }
  
 + if (has_symlink_leading_path(ce-name, ce_namelen(ce))) {
 + if (ignore_missing)
 + return ce;
 + if (err)
 + *err = ENOENT;
 + return NULL;
 + }
 +
   if (lstat(ce-name, st)  0) {
   if (ignore_missing  errno == ENOENT)
   return ce;
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-03 Thread René Scharfe

Am 03.08.2014 um 19:19 schrieb Junio C Hamano:

René Scharfe l@web.de writes:


How about the patch below?  Before it checks if an index entry exists
in the work tree, it checks if its path includes a symlink.


Honestly, I didn't expect the fix to be in the refresh-index code
path, but doing this there sort of makes sense.


I found it through observation, not understanding.  Just looked for 
stat/lstat calls executed by git status for b/different and b/equal 
using strace; debugging printfs told me where they came from.



And do we need to use the threaded_ variant of the function here?


Hmmm, this is a tangent, but you comment made me wonder if we also
need to adjust preload_thread() in preload-index.c somehow, but we
do not touch CE_UPTODATE there, so it probably is not necessary.


The function calls ce_mark_uptodate(), which does set CE_UPTODATE.  It 
calls threaded_has_symlink_leading_path() before lstat() already, 
however.  (Since f62ce3de: Make index preloading check the whole path to 
the file.)



The caller of refresh_cache_ent() is walking an array of sorted
pathnames aka istate-cache[] in a single-threaded fashion, possibly
with a pathspec to limit the scan.


There are two direct callers (refresh_index(), refresh_cache_entry()) 
and several indirect ones.  Do we have a way to detect unsynchronized 
parallel access to the has_symlink_leading_path()-cache?  Checking the 
full callers-of-callers tree manually looks a bit scary to me.



Do you mean that we may want to
make istate-cache[] scanned by multiple threads?  I am not sure.


No, I didn't want to suggest any performance improvements.  I'm only 
interested in correctness for now.


René
--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-02 Thread René Scharfe
Am 01.08.2014 um 18:23 schrieb Junio C Hamano:
 René Scharfe l@web.de writes:
 
 # Create test repo with two directories with two files each.
 $ git init
 Initialized empty Git repository in /tmp/.git/
 $ mkdir a b
 $ echo x a/equal
 $ echo x b/equal
 $ echo y a/different
 $ echo z b/different
 $ git add a b
 $ git commit -minitial
 [master (root-commit) 6d36895] initial
   4 files changed, 4 insertions(+)
   create mode 100644 a/different
   create mode 100644 a/equal
   create mode 100644 b/different
   create mode 100644 b/equal

 # Replace one directory with a symlink to the other.
 $ rm -rf b
 $ ln -s a b

 # First git status call.
 $ git status
 On branch master
 Changes not staged for commit:
(use git add/rm file... to update what will be committed)
(use git checkout -- file... to discard changes in working directory)

  deleted:b/different

 Untracked files:
(use git add file... to include in what will be committed)

  b

 no changes added to commit (use git add and/or git commit -a)
 ...

 It could be debated if the first git status call should also report
 b/equal as deleted.
 
 Hmph.
 
 I wonder if could be is an understatement.  The difference of
 reporting between b/equal and b/different may indicate that somebody
 is mistakenly comparing the index with these paths, without first
 checking each path with has_symlink_leading_path(), or something,
 no?

How about the patch below?  Before it checks if an index entry exists
in the work tree, it checks if its path includes a symlink.  After
the patch, git status reports b/equal as deleted, too.  The test
suite still passes.

And do we need to use the threaded_ variant of the function here?


diff --git a/read-cache.c b/read-cache.c
index 5d3c8bd..6f0057f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct 
index_state *istate,
return ce;
}
 
+   if (has_symlink_leading_path(ce-name, ce_namelen(ce))) {
+   if (ignore_missing)
+   return ce;
+   if (err)
+   *err = ENOENT;
+   return NULL;
+   }
+
if (lstat(ce-name, st)  0) {
if (ignore_missing  errno == ENOENT)
return ce;

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-01 Thread Junio C Hamano
René Scharfe l@web.de writes:

 # Create test repo with two directories with two files each.
 $ git init
 Initialized empty Git repository in /tmp/.git/
 $ mkdir a b
 $ echo x a/equal
 $ echo x b/equal
 $ echo y a/different
 $ echo z b/different
 $ git add a b
 $ git commit -minitial
 [master (root-commit) 6d36895] initial
  4 files changed, 4 insertions(+)
  create mode 100644 a/different
  create mode 100644 a/equal
  create mode 100644 b/different
  create mode 100644 b/equal

 # Replace one directory with a symlink to the other.
 $ rm -rf b
 $ ln -s a b

 # First git status call.
 $ git status
 On branch master
 Changes not staged for commit:
   (use git add/rm file... to update what will be committed)
   (use git checkout -- file... to discard changes in working directory)

 deleted:b/different

 Untracked files:
   (use git add file... to include in what will be committed)

 b

 no changes added to commit (use git add and/or git commit -a)
 ...

 It could be debated if the first git status call should also report
 b/equal as deleted.

Hmph.

I wonder if could be is an understatement.  The difference of
reporting between b/equal and b/different may indicate that somebody
is mistakenly comparing the index with these paths, without first
checking each path with has_symlink_leading_path(), or something,
no?

Or am I misreading the report and codepath?

Puzzled...

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-01 Thread Dennis Kaarsemaker
On wo, 2014-07-30 at 15:30 +0400, NickKolok wrote:
 Greetings from Russia, comrads!
 
 I've noticed something strange with git status when replacing a folder with 
 symlink to another folder.
 There is a git repo with script with demo in the attachment.

I think there is a bug here:

+ mkdir bug
+ cd bug
+ git init
Initialized empty Git repository in /tmp/bug/.git/
+ mkdir dir1 dir2
+ echo 1
+ echo 1
+ echo 2a
+ echo 2b
+ git add dir1/1.txt dir1/2.txt dir2/1.txt dir2/2.txt
+ git commit -m first
[master (root-commit) b60ecc8] first
 4 files changed, 4 insertions(+)
 create mode 100644 dir1/1.txt
 create mode 100644 dir1/2.txt
 create mode 100644 dir2/1.txt
 create mode 100644 dir2/2.txt
+ rm -r dir2
+ ln -s dir1 dir2
+ git status
On branch master
Changes not staged for commit:
  (use git add/rm file... to update what will be committed)
  (use git checkout -- file... to discard changes in working
directory)

deleted:dir2/2.txt

Untracked files:
  (use git add file... to include in what will be committed)

dir2

no changes added to commit (use git add and/or git commit -a)

It looks like git status is thinking dir2/1.txt still exists with the
same content, even though dir2 is gone, and now an untracked symlink. 

Moreover, git diff and git status disagree with each other:

dennis@spirit:/tmp/bug$ git status
On branch master
Changes not staged for commit:
  (use git add/rm file... to update what will be committed)
  (use git checkout -- file... to discard changes in working
directory)

deleted:dir2/2.txt

Untracked files:
  (use git add file... to include in what will be committed)

dir2

no changes added to commit (use git add and/or git commit -a)
dennis@spirit:/tmp/bug$ git --no-pager diff
diff --git a/dir2/1.txt b/dir2/1.txt
deleted file mode 100644
index d00491f..000
--- a/dir2/1.txt
+++ /dev/null
@@ -1 +0,0 @@
-1
diff --git a/dir2/2.txt b/dir2/2.txt
deleted file mode 100644
index b8a4cf4..000
--- a/dir2/2.txt
+++ /dev/null
@@ -1 +0,0 @@
-2b

-- 
Dennis Kaarsemaker
www.kaarsemaker.net

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug report about symlinks

2014-07-31 Thread Nikolay Avdeev
Greetings from Russia, comrads!

I've noticed something strange with git status when replacing a folder with
symlink to another folder.
There is a git repo with script with demo in the attachment.

Yours sincerely,
NickKolok aka Nikolay Avdeev.

Доброго времени суток, товарищи!

При замене папки на симлинк на другую папку с похожими, но не одинаковыми
файлами git status ведёт себя странно.
Прилагаю архив с репозиторием. В скрипте - пример и пояснения.

С уважением,
Николай Авдеев aka NickKolok.


git-bug-demo.tar.bz2
Description: application/bzip


Re: Bug report about symlinks

2014-07-31 Thread René Scharfe

Am 31.07.2014 um 21:50 schrieb Nikolay Avdeev:

I've noticed something strange with git status when replacing a folder with
symlink to another folder.
There is a git repo with script with demo in the attachment.


Let's try and make this a bit easier for folks to follow along.

# Create test repo with two directories with two files each.
$ git init
Initialized empty Git repository in /tmp/.git/
$ mkdir a b
$ echo x a/equal
$ echo x b/equal
$ echo y a/different
$ echo z b/different
$ git add a b
$ git commit -minitial
[master (root-commit) 6d36895] initial
 4 files changed, 4 insertions(+)
 create mode 100644 a/different
 create mode 100644 a/equal
 create mode 100644 b/different
 create mode 100644 b/equal

# Replace one directory with a symlink to the other.
$ rm -rf b
$ ln -s a b

# First git status call.
$ git status
On branch master
Changes not staged for commit:
  (use git add/rm file... to update what will be committed)
  (use git checkout -- file... to discard changes in working directory)

deleted:b/different

Untracked files:
  (use git add file... to include in what will be committed)

b

no changes added to commit (use git add and/or git commit -a)

# Stage the changes.
$ git add b

# Second git status call.
$ git status
On branch master
Changes to be committed:
  (use git reset HEAD file... to unstage)

new file:   b
deleted:b/different
deleted:b/equal

# Commit the staged changes.
$ git commit -msymlinked
[master 4498f28] symlinked
 3 files changed, 1 insertion(+), 2 deletions(-)
 create mode 12 b
 delete mode 100644 b/different
 delete mode 100644 b/equal


That the first and second status call report different results is a 
feature; staging changes using git add alters the status.  A commit 
after the first status call would be empty.


It could be debated if the first git status call should also report 
b/equal as deleted.


René

--
To unsubscribe from this list: send the line unsubscribe git in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Bug report about symlinks

2014-07-30 Thread NickKolok
 Greetings from Russia, comrads!

I've noticed something strange with git status when replacing a folder with 
symlink to another folder.
There is a git repo with script with demo in the attachment.

Yours sincerely,
NickKolok aka Nikolay Avdeev.

Доброго времени суток, товарищи!

При замене папки на симлинк на другую папку с похожими, но не одинаковыми 
файлами git status ведёт себя странно.
Прилагаю архив с репозиторием. В скрипте - пример и пояснения.

С уважением,
Николай Авдеев aka NickKolok.

git-bug-demo.tar.bz2
Description: Binary data