On 05/10/16 19:29, Ingo Schwarze wrote:
> Hi Martijn,
>
> Martijn van Duren wrote on Tue, May 10, 2016 at 02:43:54PM +0200:
>
>> Index: ./lib/libc/regex/engine.c
>> ===================================================================
>> RCS file: /cvs/src/lib/libc/regex/engine.c,v
>> retrieving revision 1.19
>> diff -u -p -r1.19 engine.c
>> --- ./lib/libc/regex/engine.c 28 Dec 2015 23:01:22 -0000 1.19
>> +++ ./lib/libc/regex/engine.c 2 May 2016 08:50:20 -0000
>> @@ -674,7 +674,7 @@ fast(struct match *m, char *start, char
>> states fresh = m->fresh;
>> states tmp = m->tmp;
>> char *p = start;
>> - int c = (start == m->beginp) ? OUT : *(start-1);
>> + int c = (start == m->offp) ? OUT : *(start-1);
>> int lastc; /* previous c */
>> int flagch;
>> int i;
>> @@ -758,7 +758,7 @@ slow(struct match *m, char *start, char
>> states empty = m->empty;
>> states tmp = m->tmp;
>> char *p = start;
>> - int c = (start == m->beginp) ? OUT : *(start-1);
>> + int c = (start == m->offp) ? OUT : *(start-1);
>> int lastc; /* previous c */
>> int flagch;
>> int i;
>
> i hate to say that this change appears to cause a regression.
>
> The regexec(3) manual explicitly says:
>
> REG_STARTEND The string is considered to start at [...]
> Note that a non-zero rm_so does not imply REG_NOTBOL;
> REG_STARTEND affects only the location of the string,
> not how it is matched.
>
> Right now, the library actually implements that. The test program
> appended below produces the following output, as documented:
>
> rt: regcomp: OK
> rt: mismatch: regexec() failed to match
> rt: BOL match: OK
> rt: ST match: OK
>
> With your change, the library now fails to match:
>
> rt: regcomp: OK
> rt: mismatch: regexec() failed to match
> rt: BOL match: OK
> rt: ST match: regexec() failed to match
>
> I don't think that change is intentional, or is it?
This change is intentional. You try to match y on the start of the
string. This falsely succeeds in the current library, but is fixed in my
change.
This needs to be fixed for my sed change to work correctly.
>
> I'll have a look whether it is possible to conditionally pass
> REG_NOTBOL from sed(1) to solve your original issue. I didn't
> look into the sed(1) code yet because i wanted to report this
> regression as soon as i found it.
>
> Yours,
> Ingo
>
>
> #include <sys/types.h>
> #include <err.h>
> #include <regex.h>
>
> static regex_t re;
>
> static int
> report(int errcode, const char *msg)
> {
> const size_t errbuf_size = 2048;
> char errbuf[errbuf_size];
> size_t sz;
>
> if (errcode) {
> sz = regerror(errcode, &re, errbuf, errbuf_size);
> warnx("%s: %s%s", msg, errbuf,
> sz > errbuf_size ? "[...]" : "");
> } else
> warnx("%s: OK", msg);
> return errcode;
> }
>
> int
> main(void)
> {
> regmatch_t pmatch;
>
> if (report(regcomp(&re, "^y", REG_EXTENDED), "regcomp"))
> return 1;
>
> report(regexec(&re, "xy", 0, NULL, 0), "mismatch");
> report(regexec(&re, "yz", 0, NULL, 0), "BOL match");
>
> pmatch.rm_so = 1;
> pmatch.rm_eo = 2;
>
> report(regexec(&re, "xyz", 0, &pmatch, REG_STARTEND), "ST match");
> return 0;
> }
>