On 10 August 2016 at 18:15, Oliver Keyes wrote: | I'm trying to incorporate PCRE-compliant regular expressions into C | code in an R package. | | >From digging around in R's source code, it appears that R (pretty | much?) guarantees the presence of either a system-level PCRE library, | or an R-internal one.[0] Is this exposed (or grabbable) via the R C | API in any way?
The key to realize here is that R does indeed provide an environment. And at least where I like to work, in get this right off the bat: edd@max:/tmp$ grep lpcre /etc/R/* /etc/R/Makeconf:LIBS = -lpcre -llzma -lbz2 -lz -lrt -ldl -lm edd@max:/tmp$ So pcre plus a bunch of compression libraries (lzma, bz2, z) and more are essentially "there for the taking". If built as a shared library. An existence proof is below; it is based on the 2nd Google hit I got for 'libpcre example' and has the advantge of being shorter than the first hit. I first created baseline. The example, as given and then repaired, gets us: edd@max:/tmp$ ./ex_pcre 0: From:regular.expressi...@example.com 1: regular.expressions 2: example.com 0: From:ex...@43434.com 1: exddd 2: 43434.com 0: From:7853...@exgem.com 1: 7853456 2: exgem.com edd@max:/tmp$ Turning that into something callable from R took about another minute. It looks like this: ----------------------------------------------------------------------------- // modified (and repaired) example from http://stackoverflow.com/a/1421923/143305 #include "pcre.h" #include <Rcpp.h> // [[Rcpp::export()]] void foo() { const char *error; int erroffset; pcre *re; int rc; int i; int ovector[100]; const char *regex = "From:([^@]+)@([^\r]+)"; char str[] = "From:regular.expressi...@example.com\r\n"\ "From:ex...@43434.com\r\n"\ "From:7853...@exgem.com\r\n"; re = pcre_compile (regex, /* the pattern */ PCRE_MULTILINE, &error, /* for error message */ &erroffset, /* for error offset */ 0); /* use default character tables */ if (!re) Rcpp::stop("pcre_compile failed (offset: %d), %s\n", erroffset, error); unsigned int offset = 0; unsigned int len = strlen(str); while (offset < len && (rc = pcre_exec(re, 0, str, len, offset, 0, ovector, sizeof(ovector))) >= 0) { for(int i = 0; i < rc; ++i) { Rprintf("%2d: %.*s\n", i, ovector[2*i+1] - ovector[2*i], str + ovector[2*i]); } offset = ovector[1]; } } /*** R foo() */ ----------------------------------------------------------------------------- and, lo and behold, produces the same output demonstrating that, yes, Veronica, we do get pcre for free: R> library(Rcpp) R> sourceCpp("/tmp/oliver.cpp") R> foo() 0: From:regular.expressi...@example.com 1: regular.expressions 2: example.com 0: From:ex...@43434.com 1: exddd 2: 43434.com 0: From:7853...@exgem.com 1: 7853456 2: exgem.com R> Your package will probably want to a litmus test in configure to see if this really holds on the platform it is currently being built on. Dirk -- http://dirk.eddelbuettel.com | @eddelbuettel | e...@debian.org ______________________________________________ R-package-devel@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-package-devel