On Thu, Jun 07, 2001, Bill Nalen/Towers Perrin wrote:
> Okay, let me try again.  I need to end up with a string that C++ can
> swallow. The way I have it coded now it will find
> match(0):  href="http://someurl/page.html";
> match(1): href
> match(2): ="http://someurl/page.html";

Hi Bill,

I believe you should get one more regex match. I hacked together a 
quick test program to try it out,

#include <stdio.h>
#include <string.h>
#include <regex.h>

#define REGEX   "[ \r\t\n]*([a-zA-Z_][-.a-zA-Z_0-9]*)([ \r\t\n]*=[ 
\r\t\n]*('[^']*'|\"[^\"]*\"|[-a-zA-Z0-9./:+*%?!()_#=~]*))?"
#define NMATCH  4

int main(int argc, char* argv[])
{
    regex_t     compiled_pattern;
    regmatch_t  match[NMATCH];
    char*       string;
    int         flag;
    int         i;

    if (argc != 2)
        exit(1);

    string = (char*) malloc(strlen(argv[1]) + 1);
    if (string == NULL)
        exit(1);

    flag = REG_EXTENDED;
    regcomp(&compiled_pattern, REGEX, flag);

    flag = 0;
    regexec(&compiled_pattern, argv[1], NMATCH, match, flag);

    for (i = 0; i < NMATCH; i++) {
        strncpy(string, &argv[1][match[i].rm_so], match[i].rm_eo - match[i].rm_so);
        string[match[i].rm_eo - match[i].rm_so] = '\0';
        fprintf(stdout, "Match(%d): %s\n", i, string);
    }

    regfree(&compiled_pattern);

    return 0;
}

'href="http://someurl/page.html";' will give the following output,

Match(0): href="http://someurl/page.html";
Match(1): href
Match(2): ="http://someurl/page.html";
Match(3): "http://someurl/page.html";

> Can someone tell me what the r does?

'r' is used for raw strings. It turns off the interpretation of the
backslash characters.

/Mike

Reply via email to