Re: [msysGit] Re: [PATCH 7/7] t0000: verify that real_path() removes extra slashes

2012-09-06 Thread Nguyen Thai Ngoc Duy
On Fri, Sep 7, 2012 at 12:34 AM, Junio C Hamano  wrote:
> Modulo that I suspect you could get rid of offset_1st_component()
> altogether and has_dos_drive_prefix() return the length of the "d:"
> or "//d" part (which needs to be copied literally regardless of the
> "normalization"), what you suggest feels like the right approach.
> Why do you need the "keep_root" parameter and do things differently
> depending on the setting by the way?

That's how offset_1st_component() originally works, root slash if
present is counted.

> Wouldn't "skip the root level
> when computing the offset of the first path component" something the
> caller can easily decide to do or not to do, and wouldn't it make
> the semantics of the function cleaner and simpler by making it do
> only one thing and one thing well?

Yeah. I'll have a closer look later and see if we can simplify the function.
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] t0000: verify that real_path() removes extra slashes

2012-09-06 Thread Michael Haggerty
On 09/05/2012 10:40 AM, Johannes Sixt wrote:
> Am 9/4/2012 10:14, schrieb mhag...@alum.mit.edu:
>> From: Michael Haggerty 
>>
>> These tests already pass, but make sure they don't break in the
>> future.
>>
>> Signed-off-by: Michael Haggerty 
>> ---
>>
>> It would be great if somebody would check whether these tests pass on
>> Windows, and if not, give me a tip about how to fix them.
>>
>>  t/t-basic.sh | 11 +++
>>  1 file changed, 11 insertions(+)
>>
>> diff --git a/t/t-basic.sh b/t/t-basic.sh
>> index d929578..3c75e97 100755
>> --- a/t/t-basic.sh
>> +++ b/t/t-basic.sh
>> @@ -468,6 +468,17 @@ test_expect_success 'real path works on absolute paths' 
>> '
>>  test "$d/$nopath" = "$(test-path-utils real_path "$d/$nopath")"
>>  '
>>  
>> +test_expect_success 'real path removes extra slashes' '
>> +nopath="hopefully-absent-path" &&
>> +test "/" = "$(test-path-utils real_path "///")" &&
>> +test "/$nopath" = "$(test-path-utils real_path "///$nopath")" &&
> 
> Same here: test-path-utils operates on a DOS-style absolute path.

ACK.

> Furthermore, as Junio pointed out elsewhere, it is desirable that slashes
> (dir-separators) at the beginning are not collapsed if there are exactly
> two of them. Three or more can be collapsed to a single slash.

The empirical behavior of real_path() on Linux is (assuming "/tmp"
exists and "/foo" does not)

 Before my changes   After my changes
real_path("/tmp")/tmp/tmp
real_path("//tmp")   /tmp/tmp
real_path("/foo")$(pwd)/foo  /foo
real_path("//foo")   /foo/foo

real_path("/tmp/bar")/tmp/bar/tmp/bar
real_path("//tmp/bar")   /tmp/bar/tmp/bar
real_path("/foo/bar")--dies----dies--
real_path("//foo/bar")   --dies----dies--

So I think that my changes makes things strictly better (albeit probably
not perfect).

real_path() relies on chdir() / getcwd() to canonicalize the path,
except for one case:

If the specified path is not itself a directory, then it strips off the
last component of the name, tries chdir() / getcwd() on the shortened
path, then pastes the last component on again.  The stripping off is
done by find_last_dir_sep(), which literally just looks for the last '/'
(or for Windows also '\') in the string.  Please note that the pasting
back is done using '/' as a separator.

So I think that the only problematic case is a path that only includes a
single group of slashes, like "//foo" or maybe "c:\\foo", and also is
not is_directory() [1].  What should be the algorithm for such cases,
and how should it depend on the platform?  (And does it really matter?
Top-level Linux paths are usually directories.  Are Windows-style
network shares also considered directories in the sense of is_directory()?)

I will make an easy improvement: not to remove *any* slashes when
stripping the last path component from the end of the path (letting
chdir() deal with them).  This results in the same results for Linux and
perhaps more hope under Windows:

On Linux: "//foo" -> (chdir("//"), "foo") -> ("/", "foo") -> "/foo"

On Windows: "//foo" -> (chdir("//"), "foo") -> (?, "foo") -> ?
(what is the result of chdir("//") then getcwd()?)

On Windows: "c:\foo" -> (chdir("c:\"), "foo") -> (?, "foo") -> ?
(what is the result of chdir("c:\") then getcwd()?)

>> +# We need an existing top-level directory to use in the
>> +# remaining tests.  Use the top-level ancestor of $(pwd):
>> +d=$(pwd -P | sed -e "s|^\(/[^/]*\)/.*|\1|") &&
> 
> Same here: Account for the drive letter-colon.

ACK

>> +test "$d" = "$(test-path-utils real_path "//$d///")" &&
>> +test "$d/$nopath" = "$(test-path-utils real_path "//$d///$nopath")"
> 
> Since $d is a DOS-style path on Windows, //$d means something entirely
> different than $d. You should split the test in two: One test checks that
> slashes at the end or before $nopath are collapsed (this must work on
> Windows as well), and another test protected by POSIX prerequisite to
> check that slashes at the beginning are collapsed.

Done.

Thanks again for your help.

Michael

[1] If there is more than one group of slashes in the name, like
"//foo//bar" or "c:\\foo\\bar" then:

* All but the last groups of slashes are handled by
  chdir("//foo/")/getcwd() and presumably de-duplicated or not as
  appropriate

* The extras from the last group of slashes are trailing slashes in
  the string passed to chdir() and are therefore removed.

so everything should be OK.

-- 
Michael Haggerty
mhag...@alum.mit.edu
http://softwareswirl.blogspot.com/
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] t0000: verify that real_path() removes extra slashes

2012-09-06 Thread Junio C Hamano
Nguyen Thai Ngoc Duy  writes:

> Just an idea. We could unify "[a-z]:" and "//host" into "dos root"
> concept. That would teach other code paths about UNC paths too.
> ...
> diff --git a/path.c b/path.c
> index 66acd24..0e4e2d7 100644
> --- a/path.c
> +++ b/path.c
> @@ -498,11 +498,12 @@ const char *relative_path(const char *abs, const char 
> *base)
>  int normalize_path_copy(char *dst, const char *src)
>  {
>   char *dst0;
> + int i, len;
>  
> - if (has_dos_drive_prefix(src)) {
> + len = offset_1st_component(src, 1);
> + for (i = 0; i < len; i++)
>   *dst++ = *src++;
> - *dst++ = *src++;
> - }
> +
>   dst0 = dst;

Modulo that I suspect you could get rid of offset_1st_component()
altogether and has_dos_drive_prefix() return the length of the "d:"
or "//d" part (which needs to be copied literally regardless of the
"normalization"), what you suggest feels like the right approach.
Why do you need the "keep_root" parameter and do things differently
depending on the setting by the way?  Wouldn't "skip the root level
when computing the offset of the first path component" something the
caller can easily decide to do or not to do, and wouldn't it make
the semantics of the function cleaner and simpler by making it do
only one thing and one thing well?


--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] t0000: verify that real_path() removes extra slashes

2012-09-05 Thread Nguyen Thai Ngoc Duy
On Wed, Sep 05, 2012 at 08:23:58PM -0700, Junio C Hamano wrote:
> Nguyen Thai Ngoc Duy  writes:
> 
> > diff --git a/path.c b/path.c
> > index 66acd24..ad2881c 100644
> > --- a/path.c
> > +++ b/path.c
> > @@ -503,6 +503,10 @@ int normalize_path_copy(char *dst, const char *src)
> > *dst++ = *src++;
> > *dst++ = *src++;
> > }
> > +#ifdef WIN32
> > +   else if (src[0] == '/' && src[1] == '/')
> > +   *dst++ = *src++;
> > +#endif
> 
> The two-byte copy we see above the context is conditional on a nice
> abstraction "has_dos_drive_prefix()" so that we do not have to
> suffer from these ugly ifdefs.  Could we do something similar?

Just an idea. We could unify "[a-z]:" and "//host" into "dos root"
concept. That would teach other code paths about UNC paths too.

Replace has_dos_drive_prefix() with has_dos_root_prefix(). Because the
root component's length is not fixed, offset_1st_component() should be
used instead of hard coding "2".

Something like this. Totally untested.

-- 8< --
diff --git a/cache.h b/cache.h
index 67f28b4..7946489 100644
--- a/cache.h
+++ b/cache.h
@@ -711,7 +711,7 @@ extern char *expand_user_path(const char *path);
 const char *enter_repo(const char *path, int strict);
 static inline int is_absolute_path(const char *path)
 {
-   return is_dir_sep(path[0]) || has_dos_drive_prefix(path);
+   return is_dir_sep(path[0]) || has_dos_root_prefix(path);
 }
 int is_directory(const char *);
 const char *real_path(const char *path);
@@ -721,7 +721,7 @@ int normalize_path_copy(char *dst, const char *src);
 int longest_ancestor_length(const char *path, const char *prefix_list);
 char *strip_path_suffix(const char *path, const char *suffix);
 int daemon_avoid_alias(const char *path);
-int offset_1st_component(const char *path);
+int offset_1st_component(const char *path, int keep_root);
 
 /* object replacement */
 #define READ_SHA1_FILE_REPLACE 1
diff --git a/compat/basename.c b/compat/basename.c
index d8f8a3c..178b60d 100644
--- a/compat/basename.c
+++ b/compat/basename.c
@@ -5,8 +5,7 @@ char *gitbasename (char *path)
 {
const char *base;
/* Skip over the disk name in MSDOS pathnames. */
-   if (has_dos_drive_prefix(path))
-   path += 2;
+   path += offset_1st_component(path, 0);
for (base = path; *path; path++) {
if (is_dir_sep(*path))
base = path + 1;
diff --git a/compat/mingw.h b/compat/mingw.h
index 61a6521..1ca3e19 100644
--- a/compat/mingw.h
+++ b/compat/mingw.h
@@ -302,7 +302,9 @@ int winansi_fprintf(FILE *stream, const char *format, ...) 
__attribute__((format
  * git specific compatibility
  */
 
-#define has_dos_drive_prefix(path) (isalpha(*(path)) && (path)[1] == ':')
+#define has_dos_root_prefix(path) \
+   ((is_dir_sep[(path)[0]] && is_dir_sep[(path)[1]]) \
+|| (isalpha(*(path)) && (path)[1] == ':'))
 #define is_dir_sep(c) ((c) == '/' || (c) == '\\')
 static inline char *mingw_find_last_dir_sep(const char *path)
 {
diff --git a/connect.c b/connect.c
index 55a85ad..3d9f7fe 100644
--- a/connect.c
+++ b/connect.c
@@ -521,7 +521,7 @@ struct child_process *git_connect(int fd[2], const char 
*url_orig,
end = host;
 
path = strchr(end, c);
-   if (path && !has_dos_drive_prefix(end)) {
+   if (path && (!isalpha(*end) || end[1] != ':')) {
if (c == ':') {
protocol = PROTO_SSH;
*path++ = '\0';
diff --git a/git-compat-util.h b/git-compat-util.h
index 35b095e..c6656e2 100644
--- a/git-compat-util.h
+++ b/git-compat-util.h
@@ -216,8 +216,8 @@ extern char *gitbasename(char *);
 #define STRIP_EXTENSION ""
 #endif
 
-#ifndef has_dos_drive_prefix
-#define has_dos_drive_prefix(path) 0
+#ifndef has_dos_root_prefix
+#define has_dos_root_prefix(path) 0
 #endif
 
 #ifndef is_dir_sep
diff --git a/path.c b/path.c
index 66acd24..0e4e2d7 100644
--- a/path.c
+++ b/path.c
@@ -498,11 +498,12 @@ const char *relative_path(const char *abs, const char 
*base)
 int normalize_path_copy(char *dst, const char *src)
 {
char *dst0;
+   int i, len;
 
-   if (has_dos_drive_prefix(src)) {
+   len = offset_1st_component(src, 1);
+   for (i = 0; i < len; i++)
*dst++ = *src++;
-   *dst++ = *src++;
-   }
+
dst0 = dst;
 
if (is_dir_sep(*src)) {
@@ -702,9 +703,18 @@ int daemon_avoid_alias(const char *p)
}
 }
 
-int offset_1st_component(const char *path)
+int offset_1st_component(const char *path, int keep_root)
 {
-   if (has_dos_drive_prefix(path))
-   return 2 + is_dir_sep(path[2]);
-   return is_dir_sep(path[0]);
+   if (has_dos_root_prefix(path)) {
+   if (path[1] == ':')
+   return 2 + (keep_root ? 0 : is_dir_sep(path[2]));
+   else {
+   const char *s = strchr(path + 2, '/');
+   /* //host is cons

Re: [PATCH 7/7] t0000: verify that real_path() removes extra slashes

2012-09-05 Thread Junio C Hamano
Nguyen Thai Ngoc Duy  writes:

> diff --git a/path.c b/path.c
> index 66acd24..ad2881c 100644
> --- a/path.c
> +++ b/path.c
> @@ -503,6 +503,10 @@ int normalize_path_copy(char *dst, const char *src)
> *dst++ = *src++;
> *dst++ = *src++;
> }
> +#ifdef WIN32
> +   else if (src[0] == '/' && src[1] == '/')
> +   *dst++ = *src++;
> +#endif

The two-byte copy we see above the context is conditional on a nice
abstraction "has_dos_drive_prefix()" so that we do not have to
suffer from these ugly ifdefs.  Could we do something similar?
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] t0000: verify that real_path() removes extra slashes

2012-09-05 Thread Nguyen Thai Ngoc Duy
On Wed, Sep 5, 2012 at 6:05 PM, Orgad and Raizel Shaneh
 wrote:
> This seems to fix it. Just change src[x] == '/' to is_dir_sep(src[x]).

So you are able to test the changes. Would you mind submitting a patch
so other users benefit it as well?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] t0000: verify that real_path() removes extra slashes

2012-09-05 Thread Nguyen Thai Ngoc Duy
On Wed, Sep 5, 2012 at 1:19 AM, Junio C Hamano  wrote:
> mhag...@alum.mit.edu writes:
>
>> From: Michael Haggerty 
>>
>> These tests already pass, but make sure they don't break in the
>> future.
>>
>> Signed-off-by: Michael Haggerty 
>> ---
>>
>> It would be great if somebody would check whether these tests pass on
>> Windows, and if not, give me a tip about how to fix them.
>
> I think this (perhaps unwarranted) removal of the double leading
> slashes is the root cause of
>
> http://thread.gmane.org/gmane.comp.version-control.git/204469
>
> (people involved in that thread Cc'ed).

I gave up on that thread because I did not have a proper environment
to further troubleshoot. Thanks Mike for giving me the clue about
real_path(). I traced link_alt_odb_entry() and it does this:

if (!is_absolute_path(entry) && relative_base) {
strbuf_addstr(&pathbuf, real_path(relative_base));
strbuf_addch(&pathbuf, '/');
}
strbuf_add(&pathbuf, entry, len);
normalize_path_copy(pathbuf.buf, pathbuf.buf);

The culprit might be normalize_path_copy (I don't have Windows to test
whether real_path() strips the leading slash in //server/somewhere). I
tested normalize_path_copy() separately and it does strip leading
slashes, so it needs to be fixed. Something like maybe:

diff --git a/path.c b/path.c
index 66acd24..ad2881c 100644
--- a/path.c
+++ b/path.c
@@ -503,6 +503,10 @@ int normalize_path_copy(char *dst, const char *src)
*dst++ = *src++;
*dst++ = *src++;
}
+#ifdef WIN32
+   else if (src[0] == '/' && src[1] == '/')
+   *dst++ = *src++;
+#endif
dst0 = dst;

if (is_dir_sep(*src)) {

I'm not suitable for following this up as I don't have environment to
test it. Maybe some msysgit guy want to step in?
-- 
Duy
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] t0000: verify that real_path() removes extra slashes

2012-09-05 Thread Johannes Sixt
Am 9/4/2012 10:14, schrieb mhag...@alum.mit.edu:
> From: Michael Haggerty 
> 
> These tests already pass, but make sure they don't break in the
> future.
> 
> Signed-off-by: Michael Haggerty 
> ---
> 
> It would be great if somebody would check whether these tests pass on
> Windows, and if not, give me a tip about how to fix them.
> 
>  t/t-basic.sh | 11 +++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/t/t-basic.sh b/t/t-basic.sh
> index d929578..3c75e97 100755
> --- a/t/t-basic.sh
> +++ b/t/t-basic.sh
> @@ -468,6 +468,17 @@ test_expect_success 'real path works on absolute paths' '
>   test "$d/$nopath" = "$(test-path-utils real_path "$d/$nopath")"
>  '
>  
> +test_expect_success 'real path removes extra slashes' '
> + nopath="hopefully-absent-path" &&
> + test "/" = "$(test-path-utils real_path "///")" &&
> + test "/$nopath" = "$(test-path-utils real_path "///$nopath")" &&

Same here: test-path-utils operates on a DOS-style absolute path.
Furthermore, as Junio pointed out elsewhere, it is desirable that slashes
(dir-separators) at the beginning are not collapsed if there are exactly
two of them. Three or more can be collapsed to a single slash.

> + # We need an existing top-level directory to use in the
> + # remaining tests.  Use the top-level ancestor of $(pwd):
> + d=$(pwd -P | sed -e "s|^\(/[^/]*\)/.*|\1|") &&

Same here: Account for the drive letter-colon.

> + test "$d" = "$(test-path-utils real_path "//$d///")" &&
> + test "$d/$nopath" = "$(test-path-utils real_path "//$d///$nopath")"

Since $d is a DOS-style path on Windows, //$d means something entirely
different than $d. You should split the test in two: One test checks that
slashes at the end or before $nopath are collapsed (this must work on
Windows as well), and another test protected by POSIX prerequisite to
check that slashes at the beginning are collapsed.

-- Hannes
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


Re: [PATCH 7/7] t0000: verify that real_path() removes extra slashes

2012-09-04 Thread Junio C Hamano
mhag...@alum.mit.edu writes:

> From: Michael Haggerty 
>
> These tests already pass, but make sure they don't break in the
> future.
>
> Signed-off-by: Michael Haggerty 
> ---
>
> It would be great if somebody would check whether these tests pass on
> Windows, and if not, give me a tip about how to fix them.

I think this (perhaps unwarranted) removal of the double leading
slashes is the root cause of

http://thread.gmane.org/gmane.comp.version-control.git/204469

(people involved in that thread Cc'ed).

>  t/t-basic.sh | 11 +++
>  1 file changed, 11 insertions(+)
>
> diff --git a/t/t-basic.sh b/t/t-basic.sh
> index d929578..3c75e97 100755
> --- a/t/t-basic.sh
> +++ b/t/t-basic.sh
> @@ -468,6 +468,17 @@ test_expect_success 'real path works on absolute paths' '
>   test "$d/$nopath" = "$(test-path-utils real_path "$d/$nopath")"
>  '
>  
> +test_expect_success 'real path removes extra slashes' '
> + nopath="hopefully-absent-path" &&
> + test "/" = "$(test-path-utils real_path "///")" &&
> + test "/$nopath" = "$(test-path-utils real_path "///$nopath")" &&
> + # We need an existing top-level directory to use in the
> + # remaining tests.  Use the top-level ancestor of $(pwd):
> + d=$(pwd -P | sed -e "s|^\(/[^/]*\)/.*|\1|") &&
> + test "$d" = "$(test-path-utils real_path "//$d///")" &&
> + test "$d/$nopath" = "$(test-path-utils real_path "//$d///$nopath")"
> +'
> +
>  test_expect_success SYMLINKS 'real path works on symlinks' '
>   mkdir first &&
>   ln -s ../.git first/.git &&
--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[PATCH 7/7] t0000: verify that real_path() removes extra slashes

2012-09-04 Thread mhagger
From: Michael Haggerty 

These tests already pass, but make sure they don't break in the
future.

Signed-off-by: Michael Haggerty 
---

It would be great if somebody would check whether these tests pass on
Windows, and if not, give me a tip about how to fix them.

 t/t-basic.sh | 11 +++
 1 file changed, 11 insertions(+)

diff --git a/t/t-basic.sh b/t/t-basic.sh
index d929578..3c75e97 100755
--- a/t/t-basic.sh
+++ b/t/t-basic.sh
@@ -468,6 +468,17 @@ test_expect_success 'real path works on absolute paths' '
test "$d/$nopath" = "$(test-path-utils real_path "$d/$nopath")"
 '
 
+test_expect_success 'real path removes extra slashes' '
+   nopath="hopefully-absent-path" &&
+   test "/" = "$(test-path-utils real_path "///")" &&
+   test "/$nopath" = "$(test-path-utils real_path "///$nopath")" &&
+   # We need an existing top-level directory to use in the
+   # remaining tests.  Use the top-level ancestor of $(pwd):
+   d=$(pwd -P | sed -e "s|^\(/[^/]*\)/.*|\1|") &&
+   test "$d" = "$(test-path-utils real_path "//$d///")" &&
+   test "$d/$nopath" = "$(test-path-utils real_path "//$d///$nopath")"
+'
+
 test_expect_success SYMLINKS 'real path works on symlinks' '
mkdir first &&
ln -s ../.git first/.git &&
-- 
1.7.11.3

--
To unsubscribe from this list: send the line "unsubscribe git" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html