Hi,

On Friday 19 May 2006 07:00, Marvin Humphrey wrote:
> I have a tokenizing algorithm which uses regexes, and it would
> presumably faster if it were implemented in XS.  The algorithm
...
> It looks like the relevant functions are pregcomp() and pregexec().
> There isn't anything about these in perlapi, so accessing them might
> be a little naughty.  However, I have found some prior art: Tk uses
> them, in the file tkGlue.c.
For an XS perspective on Perl regexes, I found 
http://perl.plover.com/Rx/paper/ useful - it does very fancy regex matching.

> regexp *
> Perl_pregcomp(pTHX_ char *exp, char *xend, PMOP *pm)
>
> I gather that the first two arguments to pregcomp are the start and
> the limit (a la SvEND) of the pattern.  The returned regexp*, it
> looks like I would immediately supply to pregexec().  I'm not too
> sure how to supply a PMOP*, but I saw in a Nick Ing-Simmons post to
> p5p that you have to "fake an op" in order to make this work.  Looks
I hacked 
    PMOP *pm;
    char *end;
    regexp *rx;

    Newz(1, pm, 1, PMOP);
    end = strchr(rx_string, '\0');
    rx = pregcomp(rx_string, end, pm);
out of Rx, and it seems to be working (I'm not setting any flags, but Rx does 
implement that). One unsettling thing is that Rx also does some magic around 
the call to pregcomp, which changes the regex slightly (i.e. my module gets a 
different regex signature from the one documented in the Rx paper), but I 
didn't notice any negative effects of my simplification so far... Also, Rx 
doesn't bother with memory management - AFAIK pm should be freed with 
safefree and rx with pregfree.

> I32
> Perl_pregexec(pTHX_ register regexp *prog, char *stringarg, register
> char *strend,
>       char *strbeg, I32 minend, SV *screamer, U32 nosave)
Didn't get to use that - I'll pass.

        Bye
                Vasek

Reply via email to