On Mon, 25 Nov 2013, Ralf Junker wrote: > The problem surface when I tested the PCRE 8.34-RC1 without updating > pcre_chartables.c. The old file does not define VT as white space and with > this, PCRE gives different results for [\s] and \s which I believed it should > not.
It should certainly be the same for [\s] and \s. However, without remaking pcre_chartables.c things will go wrong. I had forgotten that pcre_maketables.c had been updated when removing the special nature of VT from the code. > On further investigation, it turned out that updating > pcre_chartables.c solved the issue. Good! I presume that means that I do not have to do anything more. > It seems that for [\s], white space are determined by cbits in pcre_compile.c > line 5030. Yes, cbits are tables of "class bits" that can quickly be "or-ed" into the table that is being build for [] classes. The "space" class always did contain VT because it was/is used for POSIX [:space:], which always recognized VT as a space character. There was a fiddle in the code for [\s] to exclude VT. This fiddle has been removed. > For \s, the white space is determined by md->ctypes in pcre_exec.c line 4815. This table is used for "is this character a (Perl) space?" while compiling and matching, and it did not used to contain VT, but now it does. > I do not fully understand how these variables are filled, but they contain > different values for VT. The tables are created by pcre_maketables using functions like isspace(), isupper(), etc, but previously there was a fiddle for VT. Before 8.34, the tables would be different in their treatment of VT; for 8.34 they should be the same. > Btw., PCRE_EXTENDED seems to be affected by the problem as well: Unless VT is > defined as white space in pcre_chartables.c, VT is not removed from the > pattern in extended mode. Yes, indeed. The tables are used for that purpose as well. Philip -- Philip Hazel -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
