Re: [PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory
Hi Ævar, On Fri, 5 Jan 2018, Ævar Arnfjörð Bjarmason wrote: > On Fri, Jan 05 2018, Johannes Schindelin jotted: > > > [...] > > > > In short: the Unix shell script t3070 manages to write what it thinks is a > > file called 'foo*', but Git only sees 'foo'. > > > > I tried to address this problem with this patch: > > ...I don't see any particular value in trying to do these full roundtrip > tests on platforms like Windows. Perhaps we should just do these on a > whitelist of POSIX systems for now, and leave expanding that list to > some future step. I don't think so. Windows is already handled as a second-class citizen, as if nobody developed on it. As a consequence, only very few of the gazillions of Windows developers... develop Git. We could worsify the situation, of course, but why? Shouldn't we at least pretend to try the opposite? > > -- snip -- > > diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh > > index f606f91acbb..51dcb675e7b 100755 > > --- a/t/t3070-wildmatch.sh > > +++ b/t/t3070-wildmatch.sh > > @@ -4,6 +4,14 @@ test_description='wildmatch tests' > > > > . ./test-lib.sh > > > > +if test_have_prereq !MINGW && touch -- 'a*b' 2>/dev/null > > +then > > + test_set_prereq FILENAMESWITHSTARS > > +else > > + say 'Your filesystem does not allow stars in filenames.' > > +fi > > +rm -f 'a*b' > > + > > create_test_file() { > > file=$1 > > > > @@ -28,6 +36,17 @@ create_test_file() { > > */) > > return 1 > > ;; > > + # On Windows, stars are not allowed in filenames. Git for Windows' > > + # Bash, however, is based on Cygwin which plays funny names with a > > + # private Unicode page to emulate stars in filenames. Meaning that > > + # the shell script will "succeed" to write the file, but Git will > > + # not see it. Nor any other, regular Windows process. > > + *\**|*\[*) > > + if ! test_have_prereq FILENAMESWITHSTARS > > + then > > + return 1 > > + fi > > + ;; > > # On Windows, \ in paths is silently converted to /, which > > # would result in the "touch" below working, but the test > > # itself failing. See 6fd1106aa4 ("t3700: Skip a test with > > -- snap -- > > > > This gets us further. But not by much: > > Okey, that's very weird. So you can: > > touch "./*"; echo $? > > And it'll return 0 but then the file won't exist? Almost. The file exists, but it won't have the name '*'. It will have as name a Unicode character that is in a private page, not standardized by the Unicode specification. > How about this: > > touch "./*" && test -e "./*"; echo $? Would return 0. Why? Because *you are still in a Unix shell script, so the Cygwin cuteness still applies*. > The reason this latest version stopped creating files with "\" in them > unless under BSLASHPSPEC is because Cygwin would silently translate it, > so it would create the file but it would actually mean something the > tests didn't expect. I understand that. And I would wish that the test would be designed in a more cross-platform-aware mindset. > For anything else, such as stars not being allowed in filenames I was > expecting that "touch" command to return an error, but if that's not the > case maybe we need the "test -e" as well, unless I'm missing something > here. This is one of the many bad consequences of Git relying so much on Unix shell scripting. Despite what many, many, many Git developers think: shell scripting is not portable. Cygwin does a good job of pretending that it is, and MSYS2 exacerbates that notion, but it comes back to haunt you right here and right now. The `touch` invocation will *report success*, but it will have done something different than you wanted. It's like the Thinking: Fast and Slow. > > fatal: Invalid path '\[ab]': No such file or directory > > > > You see, any path starting with a backslash *is* an absolute path on > > Windows. It is relative to the current drive. > > Right, which I was trying to avoid by not actually creating \[ab], but > "./\[ab]", is that the same filename on Windows? Whoops. I managed to copy-paste the *wrong* command's error message. Sorry about that. The correct one: $ git --glob-pathspecs ls-files -z -- '\[ab]' fatal: \[ab]: '\[ab]' is outside repository Note how it is Git reporting that you asked for a path that is outside? That's because it thinks you are referring to C:\[ab] (if your current directory is on the C: drive). And it would be correct to complain on Windows: the `\[ab]` parameter refers to an absolute path. > > This affects *quite* a few of the test cases you added. > > > > And even if I just comment all of those out, I run into the next problem > > where you try to create a file whose name consists of a single space, > > which is also illegal on Windows. > > Okey, but ditto above about touch not catching it, does: > > touch "./ "; echo $? > > Not return an error? Then how about: > > touch "./ " && test -e "./
Re: [PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory
On Fri, Jan 05 2018, Johannes Schindelin jotted: > I spent the last 3.5 hours chasing down a bug in your patch series that > seems to be Windows specific, so please forgive me if the following might > leak a bit of my frustration into the open... Thanks for looking into this. > On Thu, 4 Jan 2018, Ævar Arnfjörð Bjarmason wrote: > >> t/t3070-wildmatch.sh | 301 >> +-- > > Wow, this file is ugly. Sorry, that was an outburst, you did not deserve > that, but I have to say that it is *really* hard to wrap your head around > lines like this: > > wildtest 1 1 1 1 'foo*' 'foo\*' > > What are those 1s supposed to mean? Granted, for you, the author of this > line, it is easy. Just for some context, this line has looked very similar to this since feabcc173b ("Integrate wildmatch to git", 2012-10-15): match 1 1 'foo' 'foo' It could be made less verbose at the expense of some complexity in wildmatch(), i.e.: match 1 'foo' 'foo' Which would implicitly mean: match 1 1 1 1 'foo' 'foo' But I thought it was more readable to maintain vertical alignment of the patterns, but I'm open to ideas on how to make this less shitty. > The point, though, of regression tests is not only to catch regressions > but also to make it easy to fix them. Not only for the test script's > author, but for others, too (and your script is not even the best example > we have for a script that needs hours to study before one can even hope to > begin to see what is going wrong... I am looking at you, t0027, yes, you. > You know why). > > So then I go and see that the `wildtest` function has magic to handle both > 6 and 10 parameters. And those parameters get assigned to variable names > as intuitive as `match_f_w_pathmatchi`... > > The next thing about the test script is this: calling it with -x is pretty > much useless, because *so much* is happening outside of > test_expect_success clauses (and is therefore *not* traced). Good point. I guess create_test_file() could create a file only if we can create the test file, and then we run the actual test as a condition of that. I.e. to move its logic into a test_expect_success() which would always succeed. > Of course I can debug this, but can't this be easier? And this is not even > a regression after a couple of years, this is fresh from the start... > > So here is the first bummer about your rather extensive test (which I > think tests the same behavior over and over and over again, just with > slight variations which however do not matter all that much... for > example, it should be enough to verify *without* filenames that the > globbing of wildmatch() works, and then a single test *with* a filename > should suffice to verify that the connection between globbing and files > works): it requires filenames that are illegal on Windows. Stars, question > marks: forbidden. That's really not the case. The path match code is not simply taking a full path and a pattern and calling wildmatch(), if it was I'd agree that roundtrip testing like this on every single pattern would be completely pointless. As noted in v1 (https://public-inbox.org/git/20171223213012.1962-1-ava...@gmail.com/) this series came out of my attempts to replace the underlying wildmatch engine, which after trying for a bit I found I was blocked by rather bad wildmatch test coverage. I'd make code changes and some random other test would fail, but not t3070-wildmatch.sh, which later turned out to be a blindspot that this series rectifies. A pattern like "t/**.sh" isn't just matched against a path like "t/test-lib.sh" internally, instead we peel off "t/" since we know it's a dir, and then try to match "**.sh" against "test-lib.sh". As 7/7 shows there's several cases where the behavior is different, and without roundtrip testing like this there's no telling what other case might inadvertently be added in the future. However, read on... > Worse, since we have to use Unix shell scripts to perform our > "platform-independent" tests, we have to rely on a Unix shell interpreter, > and Git for Windows uses MSYS2's Bash, which means that we inherit most of > Cygwin's behavior. > > Now, Cygwin wants to be cute and allow those illegal filenames because (as > is demonstrated here quite nicely) Unix programmers don't give one bit of > a flying fish about portable filesystem requirements. > > So Cygwin maps those illegal characters into a private Unicode page. Which > means that Cygwin can read them alright, but no non-Cygwin program > recognizes the stars and question marks and tabs and newlines. Including > Git. > > In short: the Unix shell script t3070 manages to write what it thinks is a > file called 'foo*', but Git only sees 'foo'. > > I tried to address this problem with this patch: ...I don't see any particular value in trying to do these full roundtrip tests on platforms like Windows. Perhaps we should just do these on a whitelist of POSIX systems for now, and
Re: [PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory
Hi Ævar, I spent the last 3.5 hours chasing down a bug in your patch series that seems to be Windows specific, so please forgive me if the following might leak a bit of my frustration into the open... On Thu, 4 Jan 2018, Ævar Arnfjörð Bjarmason wrote: > t/t3070-wildmatch.sh | 301 > +-- Wow, this file is ugly. Sorry, that was an outburst, you did not deserve that, but I have to say that it is *really* hard to wrap your head around lines like this: wildtest 1 1 1 1 'foo*' 'foo\*' What are those 1s supposed to mean? Granted, for you, the author of this line, it is easy. The point, though, of regression tests is not only to catch regressions but also to make it easy to fix them. Not only for the test script's author, but for others, too (and your script is not even the best example we have for a script that needs hours to study before one can even hope to begin to see what is going wrong... I am looking at you, t0027, yes, you. You know why). So then I go and see that the `wildtest` function has magic to handle both 6 and 10 parameters. And those parameters get assigned to variable names as intuitive as `match_f_w_pathmatchi`... The next thing about the test script is this: calling it with -x is pretty much useless, because *so much* is happening outside of test_expect_success clauses (and is therefore *not* traced). Of course I can debug this, but can't this be easier? And this is not even a regression after a couple of years, this is fresh from the start... So here is the first bummer about your rather extensive test (which I think tests the same behavior over and over and over again, just with slight variations which however do not matter all that much... for example, it should be enough to verify *without* filenames that the globbing of wildmatch() works, and then a single test *with* a filename should suffice to verify that the connection between globbing and files works): it requires filenames that are illegal on Windows. Stars, question marks: forbidden. Worse, since we have to use Unix shell scripts to perform our "platform-independent" tests, we have to rely on a Unix shell interpreter, and Git for Windows uses MSYS2's Bash, which means that we inherit most of Cygwin's behavior. Now, Cygwin wants to be cute and allow those illegal filenames because (as is demonstrated here quite nicely) Unix programmers don't give one bit of a flying fish about portable filesystem requirements. So Cygwin maps those illegal characters into a private Unicode page. Which means that Cygwin can read them alright, but no non-Cygwin program recognizes the stars and question marks and tabs and newlines. Including Git. In short: the Unix shell script t3070 manages to write what it thinks is a file called 'foo*', but Git only sees 'foo'. I tried to address this problem with this patch: -- snip -- diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index f606f91acbb..51dcb675e7b 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -4,6 +4,14 @@ test_description='wildmatch tests' . ./test-lib.sh +if test_have_prereq !MINGW && touch -- 'a*b' 2>/dev/null +then + test_set_prereq FILENAMESWITHSTARS +else + say 'Your filesystem does not allow stars in filenames.' +fi +rm -f 'a*b' + create_test_file() { file=$1 @@ -28,6 +36,17 @@ create_test_file() { */) return 1 ;; + # On Windows, stars are not allowed in filenames. Git for Windows' + # Bash, however, is based on Cygwin which plays funny names with a + # private Unicode page to emulate stars in filenames. Meaning that + # the shell script will "succeed" to write the file, but Git will + # not see it. Nor any other, regular Windows process. + *\**|*\[*) + if ! test_have_prereq FILENAMESWITHSTARS + then + return 1 + fi + ;; # On Windows, \ in paths is silently converted to /, which # would result in the "touch" below working, but the test # itself failing. See 6fd1106aa4 ("t3700: Skip a test with -- snap -- This gets us further. But not by much: fatal: Invalid path '\[ab]': No such file or directory You see, any path starting with a backslash *is* an absolute path on Windows. It is relative to the current drive. This affects *quite* a few of the test cases you added. And even if I just comment all of those out, I run into the next problem where you try to create a file whose name consists of a single space, which is also illegal on Windows. These woes demonstrate one problem with the approach of overzealously testing *everything*. You are kind of performing an integration test of the wildmatch() functionality, the integration into ls-files, *and* of a filesystem that behaves like the filesystems you are used to. All you *should* want to test, though, is the wildmatch() functionality. And
[PATCH v4 7/7] wildmatch test: create & test files on disk in addition to in-memory
There has never been any full roundtrip testing of what git-ls-files and other commands that use wildmatch() actually do, rather we've been satisfied with just testing the underlying C function. Due to git-ls-files and friends having their own codepaths before they call wildmatch() there's sometimes differences in the behavior between the two. Even when we test for those (as with [1]), there was no one place where you can review how these two modes differ. Now there is. We now attempt to create a file called $haystack and match $needle against it for each pair of $needle and $haystack that we were passing to test-wildmatch. If we can't create the file we skip the test. This ensures that we can run this on all platforms and not maintain some infinitely growing whitelist of e.g. platforms that don't support certain characters in filenames. As a result of doing this we can now see the cases where these two ways of testing wildmatch differ: * Creating a file called 'a[]b' and running ls-files 'a[]b' will show that file, but wildmatch("a[]b", "a[]b") will not match * wildmatch() won't match a file called \ against \, but ls-files will. * `git --glob-pathspecs ls-files 'foo**'` will match a file 'foo/bba/arr', but wildmatch won't, however pathmatch will. This seems like a bug to me, the two are otherwise equivalent as these tests show. This also reveals the case discussed in [1] above, where '' is now an error as far as ls-files is concerned, but wildmatch() itself happily accepts it. 1. 9e4e8a64c2 ("pathspec: die on empty strings as pathspec", 2017-06-06) Signed-off-by: Ævar Arnfjörð Bjarmason--- t/t3070-wildmatch.sh | 301 +-- 1 file changed, 290 insertions(+), 11 deletions(-) diff --git a/t/t3070-wildmatch.sh b/t/t3070-wildmatch.sh index 06db053ae2..f606f91acb 100755 --- a/t/t3070-wildmatch.sh +++ b/t/t3070-wildmatch.sh @@ -4,14 +4,113 @@ test_description='wildmatch tests' . ./test-lib.sh +create_test_file() { + file=$1 + + case $file in + # `touch .` will succeed but obviously not do what we intend + # here. + ".") + return 1 + ;; + # We cannot create a file with an empty filename. + "") + return 1 + ;; + # The tests that are testing that e.g. foo//bar is matched by + # foo/*/bar can't be tested on filesystems since there's no + # way we're getting a double slash. + *//*) + return 1 + ;; + # When testing the difference between foo/bar and foo/bar/ we + # can't test the latter. + */) + return 1 + ;; + # On Windows, \ in paths is silently converted to /, which + # would result in the "touch" below working, but the test + # itself failing. See 6fd1106aa4 ("t3700: Skip a test with + # backslashes in pathspec", 2009-03-13) for prior art and + # details. + *\\*) + if ! test_have_prereq BSLASHPSPEC + then + return 1 + fi + # NOTE: The ;;& bash extension is not portable, so + # this test needs to be at the end of the pattern + # list. + # + # If we want to add more conditional returns we either + # need a new case statement, or turn this whole thing + # into a series of "if" tests. + ;; + esac + + # Turn foo/bar/baz into foo/bar to create foo/bar as a + # directory structure. + dirs=${file%/*} + + # We touch "./$file" instead of "$file" because even an + # escaped "touch -- -" means get arguments from stdin. + if test "$file" != "$dirs" + then + mkdir -p -- "$dirs" && + touch -- "./$file" && + return 0 + else + touch -- "./$file" && + return 0 + fi + return 1 +} + +wildtest_file_setup() { + test_when_finished " + rm -rf -- * && + git reset + " && + git add -A && + >expect.err +} + +wildtest_stdout_stderr_cmp() { + tr -d '\0' actual && + test_cmp expect.err actual.err && + test_cmp expect actual +} + wildtest() { - match_w_glob=$1 - match_w_globi=$2 - match_w_pathmatch=$3 - match_w_pathmatchi=$4 - text=$5 - pattern=$6 + if test "$#" = 6 + then + # When test-wildmatch and git ls-files produce the same + # result. + match_w_glob=$1 + match_f_w_glob=$match_w_glob + match_w_globi=$2 + match_f_w_globi=$match_w_globi + match_w_pathmatch=$3 + match_f_w_pathmatch=$match_w_pathmatch + match_w_pathmatchi=$4 +