Hi Philip, Apologies for the delay; many fires to put out last week.
On Sun, 2018 Apr 29 16:20+0100, p...@hermes.cam.ac.uk wrote: > On Sat, 28 Apr 2018, I wrote: > > > So the problem is why is the match limit not biting in your > > environment? Did you change the default when you build PCRE2? > > I now think you must have done so, because when I build with --with-match- > limit=4000000000 I get the same errors as you did. About the only significant configuration tweak I'm making is --with-link-size=4, needed to handle some gargantuan regexes. > I have now patched the JIT test program (pcre2_jit_test) to > ensure that it sets a match limit no greater than the 10 000 000 > default (r934). > > I think all the issues you raised are now dealt with. Thanks for > bringing them up. Everything is now testing out cleanly for me on Windows, once pcre2test is built with /STACK:3000000. I'll see if I can get that in there so that this does not have to be put in manually. For now, I have a few further changes to contribute. The attached patch is against r934: * In both the Autoconf and CMake configuration, set variables to 0 or 1 indicating the presence or absence of both inttypes.h and stdint.h. * These variables are then baked into the generated pcre.h, so that it #includes one or none of these headers depending on whether the system has them or not. (Note that inttypes.h handily provides the integer-related definitions needed to build PCRE2 on older Unix systems that do not have stdint.h.) If the system has neither header (as on Windows and even older Unix systems), then for now, the consuming application would be responsible for defining the integer-related symbols needed by the PCRE2 API. * RunGrepTest: Changed all the double-quoted printf(1) strings to single-quoted, to avoid any potential escaping issues. * Older versions of printf(1) do not interpret "\x00", but can handle "\0" or "\000" just fine. * Older C compilers choke on C++ comments. * Some older C compilers, like MSVC, do not support "%td". I added a check that uses "%lu" for any non-C99 build environment, as a catch-all. * Similarly, older C compilers do not support "%zu". I changed this to "%lu" and a cast, as I presume pcre2test is not likely to ever handle a >4 GB pattern. With this, there are still a few issues remaining: * The /STACK:3000000 thing on Windows noted earlier. * Possibly some way of defining the integer types needed for systems that lack both inttypes.h and stdint.h, although I'm not sure how to safely define e.g. uint32_t for the benefit of prototypes in pcre2.h since the consuming application will necessarily see those definitions too. * Currently, the generic pcre2.h header (under src/) in the source tree is used instead of the generated pcre2.h in the build tree. This is due to the use of #include"pcre2.h" instead of #include<pcre2.h>. Would it be all right to change all instances of the former to the latter in the PCRE2 source? * I am seeing some errors from the French-locale test on an older Solaris system, which has fr_CA. I'm not quite sure how to diagnose the problem. Would you be able to make heads or tails of it if I posted the output? (It's conceivable that this system is using a different encoding for this locale, although unfortunately the system iconv facilities have not been helpful.) --Daniel -- Daniel Richard G. || sk...@iskunk.org My ASCII-art .sig got a bad case of Times New Roman.
Index: CMakeLists.txt =================================================================== --- CMakeLists.txt (revision 934) +++ CMakeLists.txt (working copy) @@ -113,6 +113,15 @@ CHECK_INCLUDE_FILE(unistd.h HAVE_UNISTD_H) CHECK_INCLUDE_FILE(windows.h HAVE_WINDOWS_H) +SET(PCRE2_HAVE_INTTYPES_H 0) +SET(PCRE2_HAVE_STDINT_H 0) +IF(HAVE_INTTYPES_H) + SET(PCRE2_HAVE_INTTYPES_H 1) +ENDIF(HAVE_INTTYPES_H) +IF(HAVE_STDINT_H) + SET(PCRE2_HAVE_STDINT_H 1) +ENDIF(HAVE_STDINT_H) + CHECK_FUNCTION_EXISTS(bcopy HAVE_BCOPY) CHECK_FUNCTION_EXISTS(memmove HAVE_MEMMOVE) CHECK_FUNCTION_EXISTS(strerror HAVE_STRERROR) Index: RunGrepTest =================================================================== --- RunGrepTest (revision 934) +++ RunGrepTest (working copy) @@ -600,7 +600,7 @@ echo "RC=$?" >>testtrygrep echo "---------------------------- Test 119 -----------------------------" >>testtrygrep -printf "123\n456\n789\n---abc\ndef\nxyz\n---\n" >testNinputgrep +printf '123\n456\n789\n---abc\ndef\nxyz\n---\n' >testNinputgrep $valgrind $vjs $pcre2grep -Mo '(\n|[^-])*---' testNinputgrep >>testtrygrep echo "RC=$?" >>testtrygrep @@ -631,7 +631,7 @@ echo "RC=$?" >>testtrygrep echo "---------------------------- Test 125 -----------------------------" >>testtrygrep -printf "abcd\n" >testNinputgrep +printf 'abcd\n' >testNinputgrep $valgrind $vjs $pcre2grep --colour=always '(?<=\K.)' testNinputgrep >>testtrygrep echo "RC=$?" >>testtrygrep $valgrind $vjs $pcre2grep --colour=always '(?=.\K)' testNinputgrep >>testtrygrep @@ -642,8 +642,8 @@ echo "RC=$?" >>testtrygrep echo "---------------------------- Test 126 -----------------------------" >>testtrygrep -printf "Next line pattern has binary zero\nABC\x00XYZ\n" >testtemp1grep -printf "ABC\x00XYZ\nABCDEF\nDEFABC\n" >testtemp2grep +printf 'Next line pattern has binary zero\nABC\0XYZ\n' >testtemp1grep +printf 'ABC\0XYZ\nABCDEF\nDEFABC\n' >testtemp2grep $valgrind $vjs $pcre2grep -a -f testtemp1grep testtemp2grep >>testtrygrep echo "RC=$?" >>testtrygrep @@ -687,25 +687,25 @@ # starts with a hyphen. These tests are run in the build directory. echo "Testing pcre2grep newline settings" -printf "abc\rdef\r\nghi\njkl" >testNinputgrep +printf 'abc\rdef\r\nghi\njkl' >testNinputgrep -printf "%c--------------------------- Test N1 ------------------------------\r\n" - >testtrygrep +printf '%c--------------------------- Test N1 ------------------------------\r\n' - >testtrygrep $valgrind $vjs $pcre2grep -n -N CR "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep -printf "%c--------------------------- Test N2 ------------------------------\r\n" - >>testtrygrep +printf '%c--------------------------- Test N2 ------------------------------\r\n' - >>testtrygrep $valgrind $vjs $pcre2grep -n --newline=crlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep -printf "%c--------------------------- Test N3 ------------------------------\r\n" - >>testtrygrep +printf '%c--------------------------- Test N3 ------------------------------\r\n' - >>testtrygrep pattern=`printf 'def\rjkl'` $valgrind $vjs $pcre2grep -n --newline=cr -F "$pattern" testNinputgrep >>testtrygrep -printf "%c--------------------------- Test N4 ------------------------------\r\n" - >>testtrygrep +printf '%c--------------------------- Test N4 ------------------------------\r\n' - >>testtrygrep $valgrind $vjs $pcre2grep -n --newline=crlf -F -f $srcdir/testdata/greppatN4 testNinputgrep >>testtrygrep -printf "%c--------------------------- Test N5 ------------------------------\r\n" - >>testtrygrep +printf '%c--------------------------- Test N5 ------------------------------\r\n' - >>testtrygrep $valgrind $vjs $pcre2grep -n --newline=any "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep -printf "%c--------------------------- Test N6 ------------------------------\r\n" - >>testtrygrep +printf '%c--------------------------- Test N6 ------------------------------\r\n' - >>testtrygrep $valgrind $vjs $pcre2grep -n --newline=anycrlf "^(abc|def|ghi|jkl)" testNinputgrep >>testtrygrep # It seems inpossible to handle NUL characters easily in Solaris (aka SunOS). @@ -713,10 +713,10 @@ # don't run this test under SunOS. Fudge the output so that the comparison # works. A similar problem has also been reported for MacOS (Darwin). -printf "%c--------------------------- Test N7 ------------------------------\r\n" - >>testtrygrep +printf '%c--------------------------- Test N7 ------------------------------\r\n' - >>testtrygrep uname=`uname` if [ "$uname" != "SunOS" -a "$uname" != "Darwin" ] ; then - printf "abc\0def" >testNinputgrep + printf 'abc\0def' >testNinputgrep $valgrind $vjs $pcre2grep -na --newline=nul "^(abc|def)" testNinputgrep | sed 's/\x00/ZERO/' >>testtrygrep echo "" >>testtrygrep else Index: configure.ac =================================================================== --- configure.ac (revision 934) +++ configure.ac (working copy) @@ -451,6 +451,11 @@ AC_CHECK_HEADERS([windows.h], [HAVE_WINDOWS_H=1]) AC_CHECK_HEADERS([sys/wait.h], [HAVE_SYS_WAIT_H=1]) +PCRE2_HAVE_INTTYPES_H=`test "x$ac_cv_header_inttypes_h" = "xyes" && echo 1 || echo 0` +PCRE2_HAVE_STDINT_H=`test "x$ac_cv_header_stdint_h" = "xyes" && echo 1 || echo 0` +AC_SUBST([PCRE2_HAVE_INTTYPES_H]) +AC_SUBST([PCRE2_HAVE_STDINT_H]) + # Conditional compilation AM_CONDITIONAL(WITH_PCRE2_8, test "x$enable_pcre2_8" = "xyes") AM_CONDITIONAL(WITH_PCRE2_16, test "x$enable_pcre2_16" = "xyes") Index: src/pcre2.h.in =================================================================== --- src/pcre2.h.in (revision 934) +++ src/pcre2.h.in (working copy) @@ -81,12 +81,27 @@ #define PCRE2_CALL_CONVENTION #endif +/* Does this system have inttypes.h and/or stdint.h? */ + +#if @PCRE2_HAVE_INTTYPES_H@ +#define PCRE2_HAVE_INTTYPES_H +#endif + +#if @PCRE2_HAVE_STDINT_H@ +#define PCRE2_HAVE_STDINT_H +#endif + /* Have to include limits.h, stdlib.h and stdint.h to ensure that size_t and uint8_t, UCHAR_MAX, etc are defined. */ #include <limits.h> #include <stdlib.h> + +#if defined(PCRE2_HAVE_STDINT_H) #include <stdint.h> +#elif defined(PCRE2_HAVE_INTTYPES_H) +#include <inttypes.h> +#endif /* Allow for C++ users compiling this directly. */ Index: src/pcre2_compile.c =================================================================== --- src/pcre2_compile.c (revision 934) +++ src/pcre2_compile.c (working copy) @@ -63,8 +63,8 @@ /* Other debugging code can be enabled by these defines. */ -// #define DEBUG_SHOW_CAPTURES -// #define DEBUG_SHOW_PARSED +/* #define DEBUG_SHOW_CAPTURES */ +/* #define DEBUG_SHOW_PARSED */ /* There are a few things that vary with different code unit sizes. Handle them by defining macros in order to minimize #if usage. */ Index: src/pcre2_match.c =================================================================== --- src/pcre2_match.c (revision 934) +++ src/pcre2_match.c (working copy) @@ -45,9 +45,9 @@ /* These defines enables debugging code */ -//#define DEBUG_FRAMES_DISPLAY -//#define DEBUG_SHOW_OPS -//#define DEBUG_SHOW_RMATCH +/* #define DEBUG_FRAMES_DISPLAY */ +/* #define DEBUG_SHOW_OPS */ +/* #define DEBUG_SHOW_RMATCH */ #ifdef DEBUG_FRAME_DISPLAY #include <stdarg.h> Index: src/pcre2test.c =================================================================== --- src/pcre2test.c (revision 934) +++ src/pcre2test.c (working copy) @@ -80,7 +80,7 @@ /* Debugging code enabler */ -// #define DEBUG_SHOW_MALLOC_ADDRESSES +/* #define DEBUG_SHOW_MALLOC_ADDRESSES */ /* Both libreadline and libedit are optionally supported. The user-supplied original patch uses readline/readline.h for libedit, but in at least one system @@ -163,7 +163,7 @@ #endif /* VC doesn't support "%td". */ -#ifdef _MSC_VER +#if defined(_MSC_VER) || __STDC_VERSION__ < 199901L #define PTR_SPEC "%lu" #else #define PTR_SPEC "%td" @@ -5470,8 +5470,8 @@ if (rc != 0) { - fprintf(outfile, "** Pattern conversion error at offset %zu: ", - converted_length); + fprintf(outfile, "** Pattern conversion error at offset %lu: ", + (unsigned long)converted_length); convert_return = print_error_message(rc, "", "\n")? PR_SKIP:PR_ABEND; }
-- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev