On 10 August 2016 at 18:15, Oliver Keyes wrote:
| I'm trying to incorporate PCRE-compliant regular expressions into C
| code in an R package.
| 
| >From digging around in R's source code, it appears that R (pretty
| much?) guarantees the presence of either a system-level PCRE library,
| or an R-internal one.[0] Is this exposed (or grabbable) via the R C
| API in any way?

The key to realize here is that R does indeed provide an environment.  And at
least where I like to work, in get this right off the bat:

    edd@max:/tmp$ grep lpcre /etc/R/*
    /etc/R/Makeconf:LIBS =  -lpcre -llzma -lbz2 -lz -lrt -ldl -lm
    edd@max:/tmp$ 

So pcre plus a bunch of compression libraries (lzma, bz2, z) and more are
essentially "there for the taking". If built as a shared library.

An existence proof is below; it is based on the 2nd Google hit I got for
'libpcre example' and has the advantge of being shorter than the first hit.

I first created baseline. The example, as given and then repaired, gets us:

    edd@max:/tmp$ ./ex_pcre 
     0: From:regular.expressi...@example.com
     1: regular.expressions
     2: example.com
     0: From:ex...@43434.com
     1: exddd
     2: 43434.com
     0: From:7853...@exgem.com
     1: 7853456
     2: exgem.com
    edd@max:/tmp$ 

Turning that into something callable from R took about another minute. It
looks like this:

-----------------------------------------------------------------------------
// modified (and repaired) example from 
http://stackoverflow.com/a/1421923/143305
#include "pcre.h"
#include <Rcpp.h>

// [[Rcpp::export()]]
void foo() {
    const char *error;
    int   erroffset;
    pcre *re;
    int   rc;
    int   i;
    int   ovector[100];

    const char *regex = "From:([^@]+)@([^\r]+)";
    char str[]  = "From:regular.expressi...@example.com\r\n"\
                  "From:ex...@43434.com\r\n"\
                  "From:7853...@exgem.com\r\n";

    re = pcre_compile (regex,          /* the pattern */
                       PCRE_MULTILINE,
                       &error,         /* for error message */
                       &erroffset,     /* for error offset */
                       0);             /* use default character tables */
    if (!re) Rcpp::stop("pcre_compile failed (offset: %d), %s\n", erroffset, 
error);

    unsigned int offset = 0;
    unsigned int len    = strlen(str);
    while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, ovector, 
sizeof(ovector))) >= 0) {
        for(int i = 0; i < rc; ++i) {
            Rprintf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + 
ovector[2*i]);
        }
        offset = ovector[1];
    }
}

/*** R
foo()
*/
-----------------------------------------------------------------------------

and, lo and behold, produces the same output demonstrating that, yes,
Veronica, we do get pcre for free:

    R> library(Rcpp)
    R> sourceCpp("/tmp/oliver.cpp")
    
    R> foo()
     0: From:regular.expressi...@example.com
     1: regular.expressions
     2: example.com
     0: From:ex...@43434.com
     1: exddd
     2: 43434.com
     0: From:7853...@exgem.com
     1: 7853456
     2: exgem.com
    R> 

Your package will probably want to a litmus test in configure to see if this
really holds on the platform it is currently being built on.

Dirk

-- 
http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org

______________________________________________
R-package-devel@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-package-devel

Reply via email to