Re: [PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory

2018-01-05 Thread Johannes Schindelin
Hi Ævar,

On Fri, 5 Jan 2018, Ævar Arnfjörð Bjarmason wrote:

> On Fri, Jan 05 2018, Johannes Schindelin jotted:
> 
> > [...]
> >
> > In short: the Unix shell script t3070 manages to write what it thinks is a
> > file called 'foo*', but Git only sees 'foo'.
> >
> > I tried to address this problem with this patch:
> 
> ...I don't see any particular value in trying to do these full roundtrip
> tests on platforms like Windows. Perhaps we should just do these on a
> whitelist of POSIX systems for now, and leave expanding that list to
> some future step.

I don't think so. Windows is already handled as a second-class citizen, as
if nobody developed on it. As a consequence, only very few of the
gazillions of Windows developers... develop Git. We could worsify the
situation, of course, but why? Shouldn't we at least pretend to try the
opposite?

> > -- snip --
> > diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
> > index f606f91acbb..51dcb675e7b 100755
> > --- a/t/t3070-wildmatch.sh
> > +++ b/t/t3070-wildmatch.sh
> > @@ -4,6 +4,14 @@ test_description='wildmatch tests'
> >
> >  . ./test-lib.sh
> >
> > +if test_have_prereq !MINGW && touch -- 'a*b' 2>/dev/null
> > +then
> > +   test_set_prereq FILENAMESWITHSTARS
> > +else
> > +   say 'Your filesystem does not allow stars in filenames.'
> > +fi
> > +rm -f 'a*b'
> > +
> >  create_test_file() {
> > file=$1
> >
> > @@ -28,6 +36,17 @@ create_test_file() {
> > */)
> > return 1
> > ;;
> > +   # On Windows, stars are not allowed in filenames. Git for Windows'
> > +   # Bash, however, is based on Cygwin which plays funny names with a
> > +   # private Unicode page to emulate stars in filenames. Meaning that
> > +   # the shell script will "succeed" to write the file, but Git will
> > +   # not see it. Nor any other, regular Windows process.
> > +   *\**|*\[*)
> > +   if ! test_have_prereq FILENAMESWITHSTARS
> > +   then
> > +   return 1
> > +   fi
> > +   ;;
> > # On Windows, \ in paths is silently converted to /, which
> > # would result in the "touch" below working, but the test
> > # itself failing. See 6fd1106aa4 ("t3700: Skip a test with
> > -- snap --
> >
> > This gets us further. But not by much:
> 
> Okey, that's very weird. So you can:
> 
> touch "./*"; echo $?
> 
> And it'll return 0 but then the file won't exist?

Almost. The file exists, but it won't have the name '*'. It will have as
name a Unicode character that is in a private page, not standardized by
the Unicode specification.

> How about this:
> 
> touch "./*" && test -e "./*"; echo $?

Would return 0. Why? Because *you are still in a Unix shell script, so the
Cygwin cuteness still applies*.

> The reason this latest version stopped creating files with "\" in them
> unless under BSLASHPSPEC is because Cygwin would silently translate it,
> so it would create the file but it would actually mean something the
> tests didn't expect.

I understand that. And I would wish that the test would be designed in a
more cross-platform-aware mindset.

> For anything else, such as stars not being allowed in filenames I was
> expecting that "touch" command to return an error, but if that's not the
> case maybe we need the "test -e" as well, unless I'm missing something
> here.

This is one of the many bad consequences of Git relying so much on Unix
shell scripting. Despite what many, many, many Git developers think: shell
scripting is not portable.

Cygwin does a good job of pretending that it is, and MSYS2 exacerbates
that notion, but it comes back to haunt you right here and right now. The
`touch` invocation will *report success*, but it will have done something
different than you wanted. It's like the Thinking: Fast and Slow.

> > fatal: Invalid path '\[ab]': No such file or directory
> >
> > You see, any path starting with a backslash *is* an absolute path on
> > Windows. It is relative to the current drive.
> 
> Right, which I was trying to avoid by not actually creating \[ab], but
> "./\[ab]", is that the same filename on Windows?

Whoops. I managed to copy-paste the *wrong* command's error message. Sorry
about that. The correct one:

$ git --glob-pathspecs ls-files -z -- '\[ab]'
fatal: \[ab]: '\[ab]' is outside repository

Note how it is Git reporting that you asked for a path that is outside?
That's because it thinks you are referring to C:\[ab] (if your current
directory is on the C: drive).

And it would be correct to complain on Windows: the `\[ab]` parameter
refers to an absolute path.

> > This affects *quite* a few of the test cases you added.
> >
> > And even if I just comment all of those out, I run into the next problem
> > where you try to create a file whose name consists of a single space,
> > which is also illegal on Windows.
> 
> Okey, but ditto above about touch not catching it, does:
> 
> touch "./ "; echo $?
> 
> Not return an error? Then how about:
> 
> touch "./ " && test -e "./ 

Re: [PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory

2018-01-05 Thread Ævar Arnfjörð Bjarmason

On Fri, Jan 05 2018, Johannes Schindelin jotted:

> I spent the last 3.5 hours chasing down a bug in your patch series that
> seems to be Windows specific, so please forgive me if the following might
> leak a bit of my frustration into the open...

Thanks for looking into this.

> On Thu, 4 Jan 2018, Ævar Arnfjörð Bjarmason wrote:
>
>>  t/t3070-wildmatch.sh | 301 
>> +--
>
> Wow, this file is ugly. Sorry, that was an outburst, you did not deserve
> that, but I have to say that it is *really* hard to wrap your head around
> lines like this:
>
>   wildtest 1 1 1 1 'foo*' 'foo\*'
>
> What are those 1s supposed to mean? Granted, for you, the author of this
> line, it is easy.

Just for some context, this line has looked very similar to this since
feabcc173b ("Integrate wildmatch to git", 2012-10-15):

match 1 1 'foo' 'foo'

It could be made less verbose at the expense of some complexity in
wildmatch(), i.e.:

match 1 'foo' 'foo'

Which would implicitly mean:

match 1 1 1 1 'foo' 'foo'

But I thought it was more readable to maintain vertical alignment of the
patterns, but I'm open to ideas on how to make this less shitty.

> The point, though, of regression tests is not only to catch regressions
> but also to make it easy to fix them. Not only for the test script's
> author, but for others, too (and your script is not even the best example
> we have for a script that needs hours to study before one can even hope to
> begin to see what is going wrong... I am looking at you, t0027, yes, you.
> You know why).
>
> So then I go and see that the `wildtest` function has magic to handle both
> 6 and 10 parameters. And those parameters get assigned to variable names
> as intuitive as `match_f_w_pathmatchi`...
>
> The next thing about the test script is this: calling it with -x is pretty
> much useless, because *so much* is happening outside of
> test_expect_success clauses (and is therefore *not* traced).

Good point. I guess create_test_file() could create a file only if we
can create the test file, and then we run the actual test as a condition
of that. I.e. to move its logic into a test_expect_success() which would
always succeed.

> Of course I can debug this, but can't this be easier? And this is not even
> a regression after a couple of years, this is fresh from the start...
>
> So here is the first bummer about your rather extensive test (which I
> think tests the same behavior over and over and over again, just with
> slight variations which however do not matter all that much... for
> example, it should be enough to verify *without* filenames that the
> globbing of wildmatch() works, and then a single test *with* a filename
> should suffice to verify that the connection between globbing and files
> works): it requires filenames that are illegal on Windows. Stars, question
> marks: forbidden.

That's really not the case. The path match code is not simply taking a
full path and a pattern and calling wildmatch(), if it was I'd agree
that roundtrip testing like this on every single pattern would be
completely pointless.

As noted in v1
(https://public-inbox.org/git/20171223213012.1962-1-ava...@gmail.com/)
this series came out of my attempts to replace the underlying wildmatch
engine, which after trying for a bit I found I was blocked by rather bad
wildmatch test coverage.

I'd make code changes and some random other test would fail, but not
t3070-wildmatch.sh, which later turned out to be a blindspot that this
series rectifies.

A pattern like "t/**.sh" isn't just matched against a path like
"t/test-lib.sh" internally, instead we peel off "t/" since we know it's
a dir, and then try to match "**.sh" against "test-lib.sh".

As 7/7 shows there's several cases where the behavior is different, and
without roundtrip testing like this there's no telling what other case
might inadvertently be added in the future.

However, read on...

> Worse, since we have to use Unix shell scripts to perform our
> "platform-independent" tests, we have to rely on a Unix shell interpreter,
> and Git for Windows uses MSYS2's Bash, which means that we inherit most of
> Cygwin's behavior.
>
> Now, Cygwin wants to be cute and allow those illegal filenames because (as
> is demonstrated here quite nicely) Unix programmers don't give one bit of
> a flying fish about portable filesystem requirements.
>
> So Cygwin maps those illegal characters into a private Unicode page. Which
> means that Cygwin can read them alright, but no non-Cygwin program
> recognizes the stars and question marks and tabs and newlines. Including
> Git.
>
> In short: the Unix shell script t3070 manages to write what it thinks is a
> file called 'foo*', but Git only sees 'foo'.
>
> I tried to address this problem with this patch:

...I don't see any particular value in trying to do these full roundtrip
tests on platforms like Windows. Perhaps we should just do these on a
whitelist of POSIX systems for now, and 

Re: [PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory

2018-01-05 Thread Johannes Schindelin
Hi Ævar,

I spent the last 3.5 hours chasing down a bug in your patch series that
seems to be Windows specific, so please forgive me if the following might
leak a bit of my frustration into the open...

On Thu, 4 Jan 2018, Ævar Arnfjörð Bjarmason wrote:

>  t/t3070-wildmatch.sh | 301 
> +--

Wow, this file is ugly. Sorry, that was an outburst, you did not deserve
that, but I have to say that it is *really* hard to wrap your head around
lines like this:

wildtest 1 1 1 1 'foo*' 'foo\*'

What are those 1s supposed to mean? Granted, for you, the author of this
line, it is easy.

The point, though, of regression tests is not only to catch regressions
but also to make it easy to fix them. Not only for the test script's
author, but for others, too (and your script is not even the best example
we have for a script that needs hours to study before one can even hope to
begin to see what is going wrong... I am looking at you, t0027, yes, you.
You know why).

So then I go and see that the `wildtest` function has magic to handle both
6 and 10 parameters. And those parameters get assigned to variable names
as intuitive as `match_f_w_pathmatchi`...

The next thing about the test script is this: calling it with -x is pretty
much useless, because *so much* is happening outside of
test_expect_success clauses (and is therefore *not* traced).

Of course I can debug this, but can't this be easier? And this is not even
a regression after a couple of years, this is fresh from the start...

So here is the first bummer about your rather extensive test (which I
think tests the same behavior over and over and over again, just with
slight variations which however do not matter all that much... for
example, it should be enough to verify *without* filenames that the
globbing of wildmatch() works, and then a single test *with* a filename
should suffice to verify that the connection between globbing and files
works): it requires filenames that are illegal on Windows. Stars, question
marks: forbidden.

Worse, since we have to use Unix shell scripts to perform our
"platform-independent" tests, we have to rely on a Unix shell interpreter,
and Git for Windows uses MSYS2's Bash, which means that we inherit most of
Cygwin's behavior.

Now, Cygwin wants to be cute and allow those illegal filenames because (as
is demonstrated here quite nicely) Unix programmers don't give one bit of
a flying fish about portable filesystem requirements.

So Cygwin maps those illegal characters into a private Unicode page. Which
means that Cygwin can read them alright, but no non-Cygwin program
recognizes the stars and question marks and tabs and newlines. Including
Git.

In short: the Unix shell script t3070 manages to write what it thinks is a
file called 'foo*', but Git only sees 'foo'.

I tried to address this problem with this patch:

-- snip --
diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index f606f91acbb..51dcb675e7b 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -4,6 +4,14 @@ test_description='wildmatch tests'
 
 . ./test-lib.sh
 
+if test_have_prereq !MINGW && touch -- 'a*b' 2>/dev/null
+then
+   test_set_prereq FILENAMESWITHSTARS
+else
+   say 'Your filesystem does not allow stars in filenames.'
+fi
+rm -f 'a*b'
+
 create_test_file() {
file=$1
 
@@ -28,6 +36,17 @@ create_test_file() {
*/)
return 1
;;
+   # On Windows, stars are not allowed in filenames. Git for Windows'
+   # Bash, however, is based on Cygwin which plays funny names with a
+   # private Unicode page to emulate stars in filenames. Meaning that
+   # the shell script will "succeed" to write the file, but Git will
+   # not see it. Nor any other, regular Windows process.
+   *\**|*\[*)
+   if ! test_have_prereq FILENAMESWITHSTARS
+   then
+   return 1
+   fi
+   ;;
# On Windows, \ in paths is silently converted to /, which
# would result in the "touch" below working, but the test
# itself failing. See 6fd1106aa4 ("t3700: Skip a test with
-- snap --

This gets us further. But not by much:

fatal: Invalid path '\[ab]': No such file or directory

You see, any path starting with a backslash *is* an absolute path on
Windows. It is relative to the current drive.

This affects *quite* a few of the test cases you added.

And even if I just comment all of those out, I run into the next problem
where you try to create a file whose name consists of a single space,
which is also illegal on Windows.

These woes demonstrate one problem with the approach of overzealously
testing *everything*. You are kind of performing an integration test of
the wildmatch() functionality, the integration into ls-files, *and* of a
filesystem that behaves like the filesystems you are used to.

All you *should* want to test, though, is the wildmatch() functionality.
And 

[PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory

2018-01-04 Thread Ævar Arnfjörð Bjarmason
There has never been any full roundtrip testing of what git-ls-files
and other commands that use wildmatch() actually do, rather we've been
satisfied with just testing the underlying C function.

Due to git-ls-files and friends having their own codepaths before they
call wildmatch() there's sometimes differences in the behavior between
the two. Even when we test for those (as with [1]), there was no one
place where you can review how these two modes differ.

Now there is. We now attempt to create a file called $haystack and
match $needle against it for each pair of $needle and $haystack that
we were passing to test-wildmatch.

If we can't create the file we skip the test. This ensures that we can
run this on all platforms and not maintain some infinitely growing
whitelist of e.g. platforms that don't support certain characters in
filenames.

As a result of doing this we can now see the cases where these two
ways of testing wildmatch differ:

 * Creating a file called 'a[]b' and running ls-files 'a[]b' will show
   that file, but wildmatch("a[]b", "a[]b") will not match

 * wildmatch() won't match a file called \ against \, but ls-files
   will.

 * `git --glob-pathspecs ls-files 'foo**'` will match a file
   'foo/bba/arr', but wildmatch won't, however pathmatch will.

   This seems like a bug to me, the two are otherwise equivalent as
   these tests show.

This also reveals the case discussed in [1] above, where '' is now an
error as far as ls-files is concerned, but wildmatch() itself happily
accepts it.

1. 9e4e8a64c2 ("pathspec: die on empty strings as pathspec",
   2017-06-06)

Signed-off-by: Ævar Arnfjörð Bjarmason 
---
 t/t3070-wildmatch.sh | 301 +--
 1 file changed, 290 insertions(+), 11 deletions(-)

diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh
index 06db053ae2..f606f91acb 100755
--- a/t/t3070-wildmatch.sh
+++ b/t/t3070-wildmatch.sh
@@ -4,14 +4,113 @@ test_description='wildmatch tests'
 
 . ./test-lib.sh
 
+create_test_file() {
+   file=$1
+
+   case $file in
+   # `touch .` will succeed but obviously not do what we intend
+   # here.
+   ".")
+   return 1
+   ;;
+   # We cannot create a file with an empty filename.
+   "")
+   return 1
+   ;;
+   # The tests that are testing that e.g. foo//bar is matched by
+   # foo/*/bar can't be tested on filesystems since there's no
+   # way we're getting a double slash.
+   *//*)
+   return 1
+   ;;
+   # When testing the difference between foo/bar and foo/bar/ we
+   # can't test the latter.
+   */)
+   return 1
+   ;;
+   # On Windows, \ in paths is silently converted to /, which
+   # would result in the "touch" below working, but the test
+   # itself failing. See 6fd1106aa4 ("t3700: Skip a test with
+   # backslashes in pathspec", 2009-03-13) for prior art and
+   # details.
+   *\\*)
+   if ! test_have_prereq BSLASHPSPEC
+   then
+   return 1
+   fi
+   # NOTE: The ;;& bash extension is not portable, so
+   # this test needs to be at the end of the pattern
+   # list.
+   #
+   # If we want to add more conditional returns we either
+   # need a new case statement, or turn this whole thing
+   # into a series of "if" tests.
+   ;;
+   esac
+
+   # Turn foo/bar/baz into foo/bar to create foo/bar as a
+   # directory structure.
+   dirs=${file%/*}
+
+   # We touch "./$file" instead of "$file" because even an
+   # escaped "touch -- -" means get arguments from stdin.
+   if test "$file" != "$dirs"
+   then
+   mkdir -p -- "$dirs" &&
+   touch -- "./$file" &&
+   return 0
+   else
+   touch -- "./$file" &&
+   return 0
+   fi
+   return 1
+}
+
+wildtest_file_setup() {
+   test_when_finished "
+   rm -rf -- * &&
+   git reset
+   " &&
+   git add -A &&
+   >expect.err
+}
+
+wildtest_stdout_stderr_cmp() {
+   tr -d '\0' actual &&
+   test_cmp expect.err actual.err &&
+   test_cmp expect actual
+}
+
 wildtest() {
-   match_w_glob=$1
-   match_w_globi=$2
-   match_w_pathmatch=$3
-   match_w_pathmatchi=$4
-   text=$5
-   pattern=$6
+   if test "$#" = 6
+   then
+   # When test-wildmatch and git ls-files produce the same
+   # result.
+   match_w_glob=$1
+   match_f_w_glob=$match_w_glob
+   match_w_globi=$2
+   match_f_w_globi=$match_w_globi
+   match_w_pathmatch=$3
+   match_f_w_pathmatch=$match_w_pathmatch
+   match_w_pathmatchi=$4
+