[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

Matthew Barnett Fri, 14 Dec 2012 16:41:32 -0800

Matthew Barnett added the comment:

In function SRE_MATCH, the code for SRE_OP_GROUPREF (line 1290) contains this:


    while (p < e) {
        if (ctx->ptr >= end ||
            SRE_CHARGET(state, ctx->ptr, 0) != SRE_CHARGET(state, p, 0))
            RETURN_FAILURE;
        p += state->charsize;
        ctx->ptr += state->charsize;
    }

However, the code for SRE_OP_GROUPREF_IGNORE (line 1316) contains this:

    while (p < e) {
        if (ctx->ptr >= end ||
            state->lower(SRE_CHARGET(state, ctx->ptr, 0)) != state->lower(*p))
            RETURN_FAILURE;
        p++;
        ctx->ptr += state->charsize;
    }

(In both cases 'p' is of type 'char*'.)

The problem appears to be that the latter is still using '*p' and 'p++' and is 
thus always working with chars (it gets and advances 1 byte at a time instead 
of 1, 2 or 4 bytes for Unicode).

----------

_______________________________________
Python tracker <[email protected]>
<http://bugs.python.org/issue16688>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
http://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

[issue16688] Backreferences make case-insensitive regex fail on non-ASCII strings.

Reply via email to