Re: Bug report about symlinks

2014-08-04 Thread Junio C Hamano
René Scharfe  writes:

> Am 03.08.2014 um 19:19 schrieb Junio C Hamano:
>
>>> And do we need to use the threaded_ variant of the function here?
>>
>> Hmmm, this is a tangent, but you comment made me wonder if we also
>> need to adjust preload_thread() in preload-index.c somehow, but we
>> do not touch CE_UPTODATE there, so it probably is not necessary.
>
> The function calls ce_mark_uptodate(), which does set CE_UPTODATE.  It
> calls threaded_has_symlink_leading_path() before lstat() already,
> however.  (Since f62ce3de: Make index preloading check the whole path
> to the file.)

Yeah, by "we do not touch", I meant "for paths that is beyond a
symlink, we do not touch" (i.e. we have that "continue" before
lstat-match-then-mark sequence).

>> The caller of refresh_cache_ent() is walking an array of sorted
>> pathnames aka istate->cache[] in a single-threaded fashion, possibly
>> with a pathspec to limit the scan.
>
> There are two direct callers (refresh_index(), refresh_cache_entry())
> and several indirect ones.  Do we have a way to detect unsynchronized
> parallel access to the has_symlink_leading_path()-cache?  Checking the
> full callers-of-callers tree manually looks a bit scary to me.

The threaded variant is not used anybody outside preload-index, so
currently we should be OK, I would think.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-04 Thread Junio C Hamano
Duy Nguyen  writes:

> Just checking, you meant CE_UPTODATE, not CE_VALID, right? CE_VALID is
> only used with --assume-unchanged

Yup.  Thanks.
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-04 Thread Duy Nguyen
On Mon, Aug 4, 2014 at 12:19 AM, Junio C Hamano  wrote:
> I think you, who dug to find out where to add the check, already
> know this, and I am writing this mainly for myself and for the list
> archive, but when the knee-jerk "has-syjmlink-leading-path missing?"
> reaction came to me, two obvious optimizations also came to my mind:
>
>  - When checking a cache entry "a/b/c/d/e" and we find out "a/b/c"
>is a symbolic link, we mark it as ~CE_VALID, but at the same
>time, we learn "a/b/c/any/thing" are also ~CE_VALID with that
>check, so we _could_, after the has_symlink_leading_path once
>returns true, so there may be an opportunity to fast-forward the
>scan, marking all paths under "a/b/c" as ~CE_VALID.
>
>  - After finding out "a/b/c" is a symbolic link to cause anything
>underneath marked as ~CE_VALID, we also know "a/" and "a/b/"
>exists as real directories, so a later check for "a/b/some/thing"
>can start checking at "a/b/some/" without checking "a/" and "a/b/".

Just checking, you meant CE_UPTODATE, not CE_VALID, right? CE_VALID is
only used with --assume-unchanged

> The latter is more important as it is a much more common case that
> the shape of the working tree not to change.
>
> But I think the implementation of has_symlink_leading_path() already
> has these optimizations built inside around the (unfortunately named)
> "struct cache_def", so it probably would not give us much boost to
> implement such a fast-forwarding of the scan.

Yeah my first thought is another flood of lstat(). But it looks like
has_symlink_leading_path() tries hard to reduce lstat() already. We
could disable this call in filesytems that do not support symlinks
(e.g. vfat) but I guess that's just a minority.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-03 Thread René Scharfe

Am 03.08.2014 um 19:19 schrieb Junio C Hamano:

René Scharfe  writes:


How about the patch below?  Before it checks if an index entry exists
in the work tree, it checks if its path includes a symlink.


Honestly, I didn't expect the fix to be in the refresh-index code
path, but doing this there sort of makes sense.


I found it through observation, not understanding.  Just looked for 
stat/lstat calls executed by git status for b/different and b/equal 
using strace; debugging printfs told me where they came from.



And do we need to use the threaded_ variant of the function here?


Hmmm, this is a tangent, but you comment made me wonder if we also
need to adjust preload_thread() in preload-index.c somehow, but we
do not touch CE_UPTODATE there, so it probably is not necessary.


The function calls ce_mark_uptodate(), which does set CE_UPTODATE.  It 
calls threaded_has_symlink_leading_path() before lstat() already, 
however.  (Since f62ce3de: Make index preloading check the whole path to 
the file.)



The caller of refresh_cache_ent() is walking an array of sorted
pathnames aka istate->cache[] in a single-threaded fashion, possibly
with a pathspec to limit the scan.


There are two direct callers (refresh_index(), refresh_cache_entry()) 
and several indirect ones.  Do we have a way to detect unsynchronized 
parallel access to the has_symlink_leading_path()-cache?  Checking the 
full callers-of-callers tree manually looks a bit scary to me.



Do you mean that we may want to
make istate->cache[] scanned by multiple threads?  I am not sure.


No, I didn't want to suggest any performance improvements.  I'm only 
interested in correctness for now.


René
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-03 Thread Junio C Hamano
René Scharfe  writes:

> How about the patch below?  Before it checks if an index entry exists
> in the work tree, it checks if its path includes a symlink.

Honestly, I didn't expect the fix to be in the refresh-index code
path, but doing this there sort of makes sense.

I think you, who dug to find out where to add the check, already
know this, and I am writing this mainly for myself and for the list
archive, but when the knee-jerk "has-syjmlink-leading-path missing?"
reaction came to me, two obvious optimizations also came to my mind:

 - When checking a cache entry "a/b/c/d/e" and we find out "a/b/c"
   is a symbolic link, we mark it as ~CE_VALID, but at the same
   time, we learn "a/b/c/any/thing" are also ~CE_VALID with that
   check, so we _could_, after the has_symlink_leading_path once
   returns true, so there may be an opportunity to fast-forward the
   scan, marking all paths under "a/b/c" as ~CE_VALID.

 - After finding out "a/b/c" is a symbolic link to cause anything
   underneath marked as ~CE_VALID, we also know "a/" and "a/b/"
   exists as real directories, so a later check for "a/b/some/thing"
   can start checking at "a/b/some/" without checking "a/" and "a/b/".

The latter is more important as it is a much more common case that
the shape of the working tree not to change.

But I think the implementation of has_symlink_leading_path() already
has these optimizations built inside around the (unfortunately named)
"struct cache_def", so it probably would not give us much boost to
implement such a fast-forwarding of the scan.

> And do we need to use the threaded_ variant of the function here?

Hmmm, this is a tangent, but you comment made me wonder if we also
need to adjust preload_thread() in preload-index.c somehow, but we
do not touch CE_UPTODATE there, so it probably is not necessary.

The caller of refresh_cache_ent() is walking an array of sorted
pathnames aka istate->cache[] in a single-threaded fashion, possibly
with a pathspec to limit the scan.  Do you mean that we may want to
make istate->cache[] scanned by multiple threads?  I am not sure.

> diff --git a/read-cache.c b/read-cache.c
> index 5d3c8bd..6f0057f 100644
> --- a/read-cache.c
> +++ b/read-cache.c
> @@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct 
> index_state *istate,
>   return ce;
>   }
>  
> + if (has_symlink_leading_path(ce->name, ce_namelen(ce))) {
> + if (ignore_missing)
> + return ce;
> + if (err)
> + *err = ENOENT;
> + return NULL;
> + }
> +
>   if (lstat(ce->name, &st) < 0) {
>   if (ignore_missing && errno == ENOENT)
>   return ce;
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-02 Thread René Scharfe
Am 01.08.2014 um 18:23 schrieb Junio C Hamano:
> René Scharfe  writes:
> 
>> # Create test repo with two directories with two files each.
>> $ git init
>> Initialized empty Git repository in /tmp/.git/
>> $ mkdir a b
>> $ echo x >a/equal
>> $ echo x >b/equal
>> $ echo y >a/different
>> $ echo z >b/different
>> $ git add a b
>> $ git commit -minitial
>> [master (root-commit) 6d36895] initial
>>   4 files changed, 4 insertions(+)
>>   create mode 100644 a/different
>>   create mode 100644 a/equal
>>   create mode 100644 b/different
>>   create mode 100644 b/equal
>>
>> # Replace one directory with a symlink to the other.
>> $ rm -rf b
>> $ ln -s a b
>>
>> # First git status call.
>> $ git status
>> On branch master
>> Changes not staged for commit:
>>(use "git add/rm ..." to update what will be committed)
>>(use "git checkout -- ..." to discard changes in working directory)
>>
>>  deleted:b/different
>>
>> Untracked files:
>>(use "git add ..." to include in what will be committed)
>>
>>  b
>>
>> no changes added to commit (use "git add" and/or "git commit -a")
>> ...
>>
>> It could be debated if the first git status call should also report
>> b/equal as deleted.
> 
> Hmph.
> 
> I wonder if "could be" is an understatement.  The difference of
> reporting between b/equal and b/different may indicate that somebody
> is mistakenly comparing the index with these paths, without first
> checking each path with has_symlink_leading_path(), or something,
> no?

How about the patch below?  Before it checks if an index entry exists
in the work tree, it checks if its path includes a symlink.  After
the patch, git status reports b/equal as deleted, too.  The test
suite still passes.

And do we need to use the threaded_ variant of the function here?


diff --git a/read-cache.c b/read-cache.c
index 5d3c8bd..6f0057f 100644
--- a/read-cache.c
+++ b/read-cache.c
@@ -1064,6 +1064,14 @@ static struct cache_entry *refresh_cache_ent(struct 
index_state *istate,
return ce;
}
 
+   if (has_symlink_leading_path(ce->name, ce_namelen(ce))) {
+   if (ignore_missing)
+   return ce;
+   if (err)
+   *err = ENOENT;
+   return NULL;
+   }
+
if (lstat(ce->name, &st) < 0) {
if (ignore_missing && errno == ENOENT)
return ce;

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-01 Thread Dennis Kaarsemaker
On wo, 2014-07-30 at 15:30 +0400, NickKolok wrote:
> Greetings from Russia, comrads!
> 
> I've noticed something strange with git status when replacing a folder with 
> symlink to another folder.
> There is a git repo with script with demo in the attachment.

I think there is a bug here:

+ mkdir bug
+ cd bug
+ git init
Initialized empty Git repository in /tmp/bug/.git/
+ mkdir dir1 dir2
+ echo 1
+ echo 1
+ echo 2a
+ echo 2b
+ git add dir1/1.txt dir1/2.txt dir2/1.txt dir2/2.txt
+ git commit -m first
[master (root-commit) b60ecc8] first
 4 files changed, 4 insertions(+)
 create mode 100644 dir1/1.txt
 create mode 100644 dir1/2.txt
 create mode 100644 dir2/1.txt
 create mode 100644 dir2/2.txt
+ rm -r dir2
+ ln -s dir1 dir2
+ git status
On branch master
Changes not staged for commit:
  (use "git add/rm ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working
directory)

deleted:dir2/2.txt

Untracked files:
  (use "git add ..." to include in what will be committed)

dir2

no changes added to commit (use "git add" and/or "git commit -a")

It looks like git status is thinking dir2/1.txt still exists with the
same content, even though dir2 is gone, and now an untracked symlink. 

Moreover, git diff and git status disagree with each other:

dennis@spirit:/tmp/bug$ git status
On branch master
Changes not staged for commit:
  (use "git add/rm ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working
directory)

deleted:dir2/2.txt

Untracked files:
  (use "git add ..." to include in what will be committed)

dir2

no changes added to commit (use "git add" and/or "git commit -a")
dennis@spirit:/tmp/bug$ git --no-pager diff
diff --git a/dir2/1.txt b/dir2/1.txt
deleted file mode 100644
index d00491f..000
--- a/dir2/1.txt
+++ /dev/null
@@ -1 +0,0 @@
-1
diff --git a/dir2/2.txt b/dir2/2.txt
deleted file mode 100644
index b8a4cf4..000
--- a/dir2/2.txt
+++ /dev/null
@@ -1 +0,0 @@
-2b

-- 
Dennis Kaarsemaker
www.kaarsemaker.net

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-08-01 Thread Junio C Hamano
René Scharfe  writes:

> # Create test repo with two directories with two files each.
> $ git init
> Initialized empty Git repository in /tmp/.git/
> $ mkdir a b
> $ echo x >a/equal
> $ echo x >b/equal
> $ echo y >a/different
> $ echo z >b/different
> $ git add a b
> $ git commit -minitial
> [master (root-commit) 6d36895] initial
>  4 files changed, 4 insertions(+)
>  create mode 100644 a/different
>  create mode 100644 a/equal
>  create mode 100644 b/different
>  create mode 100644 b/equal
>
> # Replace one directory with a symlink to the other.
> $ rm -rf b
> $ ln -s a b
>
> # First git status call.
> $ git status
> On branch master
> Changes not staged for commit:
>   (use "git add/rm ..." to update what will be committed)
>   (use "git checkout -- ..." to discard changes in working directory)
>
> deleted:b/different
>
> Untracked files:
>   (use "git add ..." to include in what will be committed)
>
> b
>
> no changes added to commit (use "git add" and/or "git commit -a")
> ...
>
> It could be debated if the first git status call should also report
> b/equal as deleted.

Hmph.

I wonder if "could be" is an understatement.  The difference of
reporting between b/equal and b/different may indicate that somebody
is mistakenly comparing the index with these paths, without first
checking each path with has_symlink_leading_path(), or something,
no?

Or am I misreading the report and codepath?

Puzzled...

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: Bug report about symlinks

2014-07-31 Thread René Scharfe

Am 31.07.2014 um 21:50 schrieb Nikolay Avdeev:

I've noticed something strange with git status when replacing a folder with
symlink to another folder.
There is a git repo with script with demo in the attachment.


Let's try and make this a bit easier for folks to follow along.

# Create test repo with two directories with two files each.
$ git init
Initialized empty Git repository in /tmp/.git/
$ mkdir a b
$ echo x >a/equal
$ echo x >b/equal
$ echo y >a/different
$ echo z >b/different
$ git add a b
$ git commit -minitial
[master (root-commit) 6d36895] initial
 4 files changed, 4 insertions(+)
 create mode 100644 a/different
 create mode 100644 a/equal
 create mode 100644 b/different
 create mode 100644 b/equal

# Replace one directory with a symlink to the other.
$ rm -rf b
$ ln -s a b

# First git status call.
$ git status
On branch master
Changes not staged for commit:
  (use "git add/rm ..." to update what will be committed)
  (use "git checkout -- ..." to discard changes in working directory)

deleted:b/different

Untracked files:
  (use "git add ..." to include in what will be committed)

b

no changes added to commit (use "git add" and/or "git commit -a")

# Stage the changes.
$ git add b

# Second git status call.
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD ..." to unstage)

new file:   b
deleted:b/different
deleted:b/equal

# Commit the staged changes.
$ git commit -msymlinked
[master 4498f28] symlinked
 3 files changed, 1 insertion(+), 2 deletions(-)
 create mode 12 b
 delete mode 100644 b/different
 delete mode 100644 b/equal


That the first and second status call report different results is a 
feature; staging changes using git add alters the status.  A commit 
after the first status call would be empty.


It could be debated if the first git status call should also report 
b/equal as deleted.


René

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html