Re: patterns.c question or possible bug

2018-01-30 Thread edgar

On Jan 30, 2018 12:05 AM, Ori Bernstein  wrote:
>
> On Mon, 29 Jan 2018 23:23:18 -0600, Edgar Pettijohn  
> wrote:
>
> > I'm trying to use patterns.c for some pattern matching. The manual 
> > mentions captures using "()" around what you want to capture.  I don't 
> > see how to get at the data though.  Here is a sample program.
> > 
> > #include 
> > #include "patterns.h"
> > 
> > int
> > main(int argc, char *argv[])
> > {
> >  const char    *errstr = NULL;
> >  const char    *string = "the quick the brown the fox";
> >  const char    *pattern = "the";
> >  int    ret;
> >  struct str_match match;
> > 
> >  ret = str_match(string, pattern, &match, &errstr);
> > 
> >  if (errstr != NULL)
> >  printf("%s\n", errstr);
> >  else
> >  printf("number of matches %d\n", match.sm_nmatch);
> > 
> >  return 0;
> > }
> > 
> > It prints 2 which I was expecting 3. I've tried multiple other patterns 
> > and it seems the answer is always 2. Which leads me to believe I'm doing 
> > something wrong.  Any assistance appreciated.
> > 
> > 
> > Thanks,
> > 
> > 
> > Edgar
>
> The code is looking for a match of the pattern in the string, not all matches
> of the pattern in the string. It also makes the (IMO, surprising) decision
> that not having any capture groups in the pattern implies capturing the whole
> pattern. The whole string goes into the first match.
>
> So, in your case, you're matching:
>
> "the quick the brown the fox";
> ^^^
>
> Accordingly:
>
> matches.sm_match[0] = "the quick the brown the fox"
> matches.sm_match[1] = "the"
>
> If you had 'quick', you'd get similar behavior:
>
> "the quick the brown the fox";
>  
>
> Equivalently, putting the whole pattern in '()' will match the same thing:
>
> pattern = "(quick)"
>
> But multiple parens will match their substrings:
>
> pattern = "(qu)ick (the)"
>
> "the quick the brown the fox";
>  ^^    ^^^
> matches.sm_match[0] = "the quick the brown the fox"
> matches.sm_match[1] = "qu"
> matches.sm_match[2] = "the"
>
> The choice to capture implicitly, I think, is confusing, but the behavior
> seems to me to be correct.
>
> -- 
>     Ori Bernstein

Thanks. Makes sense now. Probably would have figured it out for myself if I'd 
have printed out matches.sm_match[0], etc. Live and learn.

Edgar


Re: patterns.c question or possible bug

2018-01-30 Thread Hiltjo Posthuma
On Tue, Jan 30, 2018 at 07:48:17AM +0100, Otto Moerbeek wrote:
> On Mon, Jan 29, 2018 at 11:23:18PM -0600, Edgar Pettijohn wrote:
> 
> > I'm trying to use patterns.c for some pattern matching. The manual mentions
> > captures using "()" around what you want to capture.  I don't see how to get
> > at the data though.  Here is a sample program.
> > 
> > #include 
> > #include "patterns.h"
> > 
> > int
> > main(int argc, char *argv[])
> > {
> > const char*errstr = NULL;
> > const char*string = "the quick the brown the fox";
> > const char*pattern = "the";
> > intret;
> > struct str_match match;
> > 
> > ret = str_match(string, pattern, &match, &errstr);
> > 
> > if (errstr != NULL)
> > printf("%s\n", errstr);
> > else
> > printf("number of matches %d\n", match.sm_nmatch);
> > 
> > return 0;
> > }
> > 
> > It prints 2 which I was expecting 3. I've tried multiple other patterns and
> > it seems the answer is always 2. Which leads me to believe I'm doing
> > something wrong.  Any assistance appreciated.
> > 
> > 
> > Thanks,
> > 
> > 
> > Edgar
> 
> Hmm, str_match() isn't a function in any OpenBSD API. So I have no
> idea what function you are talking about.
> 
>   -Otto
> 

It is in httpd patterns.c, which is based on the LUA pattern matching code.

-- 
Kind regards,
Hiltjo



Re: patterns.c question or possible bug

2018-01-29 Thread Ori Bernstein
On Mon, 29 Jan 2018 23:23:18 -0600, Edgar Pettijohn  
wrote:

> I'm trying to use patterns.c for some pattern matching. The manual 
> mentions captures using "()" around what you want to capture.  I don't 
> see how to get at the data though.  Here is a sample program.
> 
> #include 
> #include "patterns.h"
> 
> int
> main(int argc, char *argv[])
> {
>  const char*errstr = NULL;
>  const char*string = "the quick the brown the fox";
>  const char*pattern = "the";
>  intret;
>  struct str_match match;
> 
>  ret = str_match(string, pattern, &match, &errstr);
> 
>  if (errstr != NULL)
>  printf("%s\n", errstr);
>  else
>  printf("number of matches %d\n", match.sm_nmatch);
> 
>  return 0;
> }
> 
> It prints 2 which I was expecting 3. I've tried multiple other patterns 
> and it seems the answer is always 2. Which leads me to believe I'm doing 
> something wrong.  Any assistance appreciated.
> 
> 
> Thanks,
> 
> 
> Edgar

The code is looking for a match of the pattern in the string, not all matches
of the pattern in the string. It also makes the (IMO, surprising) decision
that not having any capture groups in the pattern implies capturing the whole
pattern. The whole string goes into the first match.

So, in your case, you're matching:

"the quick the brown the fox";
 ^^^

Accordingly:

matches.sm_match[0] = "the quick the brown the fox"
matches.sm_match[1] = "the"

If you had 'quick', you'd get similar behavior:

"the quick the brown the fox";
 

Equivalently, putting the whole pattern in '()' will match the same thing:

pattern = "(quick)"

But multiple parens will match their substrings:

pattern = "(qu)ick (the)"

"the quick the brown the fox";
 ^^^^^
matches.sm_match[0] = "the quick the brown the fox"
matches.sm_match[1] = "qu"
matches.sm_match[2] = "the"

The choice to capture implicitly, I think, is confusing, but the behavior
seems to me to be correct.

-- 
Ori Bernstein



Re: patterns.c question or possible bug

2018-01-29 Thread Otto Moerbeek
On Mon, Jan 29, 2018 at 11:23:18PM -0600, Edgar Pettijohn wrote:

> I'm trying to use patterns.c for some pattern matching. The manual mentions
> captures using "()" around what you want to capture.  I don't see how to get
> at the data though.  Here is a sample program.
> 
> #include 
> #include "patterns.h"
> 
> int
> main(int argc, char *argv[])
> {
> const char*errstr = NULL;
> const char*string = "the quick the brown the fox";
> const char*pattern = "the";
> intret;
> struct str_match match;
> 
> ret = str_match(string, pattern, &match, &errstr);
> 
> if (errstr != NULL)
> printf("%s\n", errstr);
> else
> printf("number of matches %d\n", match.sm_nmatch);
> 
> return 0;
> }
> 
> It prints 2 which I was expecting 3. I've tried multiple other patterns and
> it seems the answer is always 2. Which leads me to believe I'm doing
> something wrong.  Any assistance appreciated.
> 
> 
> Thanks,
> 
> 
> Edgar

Hmm, str_match() isn't a function in any OpenBSD API. So I have no
idea what function you are talking about.

-Otto



patterns.c question or possible bug

2018-01-29 Thread Edgar Pettijohn
I'm trying to use patterns.c for some pattern matching. The manual 
mentions captures using "()" around what you want to capture.  I don't 
see how to get at the data though.  Here is a sample program.


#include 
#include "patterns.h"

int
main(int argc, char *argv[])
{
const char*errstr = NULL;
const char*string = "the quick the brown the fox";
const char*pattern = "the";
intret;
struct str_match match;

ret = str_match(string, pattern, &match, &errstr);

if (errstr != NULL)
printf("%s\n", errstr);
else
printf("number of matches %d\n", match.sm_nmatch);

return 0;
}

It prints 2 which I was expecting 3. I've tried multiple other patterns 
and it seems the answer is always 2. Which leads me to believe I'm doing 
something wrong.  Any assistance appreciated.



Thanks,


Edgar