On Mon, 29 Jan 2018 23:23:18 -0600, Edgar Pettijohn <ed...@pettijohn-web.com> 
wrote:

> I'm trying to use patterns.c for some pattern matching. The manual 
> mentions captures using "()" around what you want to capture.  I don't 
> see how to get at the data though.  Here is a sample program.
> 
> #include <stdio.h>
> #include "patterns.h"
> 
> int
> main(int argc, char *argv[])
> {
>      const char        *errstr = NULL;
>      const char        *string = "the quick the brown the fox";
>      const char        *pattern = "the";
>      int            ret;
>      struct str_match     match;
> 
>      ret = str_match(string, pattern, &match, &errstr);
> 
>      if (errstr != NULL)
>          printf("%s\n", errstr);
>      else
>          printf("number of matches %d\n", match.sm_nmatch);
> 
>      return 0;
> }
> 
> It prints 2 which I was expecting 3. I've tried multiple other patterns 
> and it seems the answer is always 2. Which leads me to believe I'm doing 
> something wrong.  Any assistance appreciated.
> 
> 
> Thanks,
> 
> 
> Edgar

The code is looking for a match of the pattern in the string, not all matches
of the pattern in the string. It also makes the (IMO, surprising) decision
that not having any capture groups in the pattern implies capturing the whole
pattern. The whole string goes into the first match.

So, in your case, you're matching:

        "the quick the brown the fox";
         ^^^

Accordingly:

        matches.sm_match[0] = "the quick the brown the fox"
        matches.sm_match[1] = "the"

If you had 'quick', you'd get similar behavior:

        "the quick the brown the fox";
             ^^^^

Equivalently, putting the whole pattern in '()' will match the same thing:

        pattern = "(quick)"

But multiple parens will match their substrings:

        pattern = "(qu)ick (the)"

        "the quick the brown the fox";
             ^^    ^^^
        matches.sm_match[0] = "the quick the brown the fox"
        matches.sm_match[1] = "qu"
        matches.sm_match[2] = "the"

The choice to capture implicitly, I think, is confusing, but the behavior
seems to me to be correct.

-- 
    Ori Bernstein

Reply via email to