subject:"\[NTG\-context\] Off\-topic\: Struggles with LPEG grammar"

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread Hans Hagen


On 12/21/2020 3:10 PM, Mojca Miklavec wrote:


 lpeg.P('b') + lpeg.P('bb') + lpeg.P('bbb')

always put the longest (for overlapping) first, in this case you could do:

P('b')^-3

at most 3

-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread Taco Hoekwater

Hi,

Also good reading are the first sections in Roberto’s paper:

  http://www.inf.puc-rio.br/~roberto/docs/peg.pdf

> 
> In reality it just means that I was trying to add a new rule to solve
> the second part of the puzzle (hidden on the website until you solve
> the first part), which read something like
>6: 3 | 3 6
> which would in theory be translated into something like
>r6 = lpeg.V"r3" + lpeg.V"r3" * lpeg.V"r6",
> if PEGs worked the way I imagined they worked, that is. Apparently they don't 
> :)

Try this, blind guess: 

r6 = lpeg.V"r3" * lpeg.V”r6” + lpeg.V"r3",

Best wishes,
Taco


Taco Hoekwater
Elvenkind BV




___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread Mojca Miklavec

Hi,

Thanks to both Taco and Arthur for clarifying this and pointing out
the difference between PEG and PCRE.

I'll put some links like
https://en.wikipedia.org/wiki/Parsing_expression_grammar
https://en.wikipedia.org/wiki/Perl_Compatible_Regular_Expressions
on my reading list and try to understand it more thoroughly than I do
at the moment.
After that I'll probably need to re-read Taco's answer a couple more
times, and probably write some parsing grammar as an exercise, but
I'll eventually get there :)

Just a correction of my previous statement. The following doesn't
really work either:
lpeg.P('b') + lpeg.P('bb') + lpeg.P('bbb')
it just happened to work in one particular case on my minimal example,
but it doesn't help in general.
That's probably expected based on what Arthur and Taco explained to me.

> In your example, the fact that you even considered ^1 means that you were 
> still thinking too much in terms of regular expressions,

In reality it just means that I was trying to add a new rule to solve
the second part of the puzzle (hidden on the website until you solve
the first part), which read something like
6: 3 | 3 6
which would in theory be translated into something like
r6 = lpeg.V"r3" + lpeg.V"r3" * lpeg.V"r6",
if PEGs worked the way I imagined they worked, that is. Apparently they don't :)

It seemed so obvious, but it would be too easy if it was true ;),
so it will force me to solve it "ab initio" (as it was initially meant) anyway.

Thanks a lot to everyone for the extra patience to explain me those
basic principles & differences,
Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread Hans Hagen


On 12/21/2020 1:16 PM, Mojca Miklavec wrote:


I'm sorry for being slightly off-topic here, but this list might still
be the best place to resolve lpeg-related questions :)
spoiler attached ... doesn't deserve a beauty contest price not one for 
performence (not much different from your btw) but maybe easier for 
testing all kind of variants


(i cannot use context helpers of course)

Hans


-
  Hans Hagen | PRAGMA ADE
  Ridderstraat 27 | 8061 GH Hasselt | The Netherlands
   tel: 038 477 53 69 | www.pragma-ade.nl | www.pragma-pod.nl
-
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread luigi scarso

On Mon, Dec 21, 2020 at 2:36 PM Taco Hoekwater  wrote:

>
>
> > On 21 Dec 2020, at 14:08, Mojca Miklavec 
> wrote:
> >
> > Dear Taco,
> >
> > On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote:
> >>> On 21 Dec 2020, at 13:16, Mojca Miklavec wrote:
> >>>
> >>> My only explanation would be that perhaps "^1" is so greedy that the
> >>> rest of the pattern doesn't get found. But I don't want to believe
> >>> that explanation.
> >>
> >> Which (of course) means that that is exactly what happens ;)
> >>
> >> The ones that match are
> >>
> >> ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5
> >> abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
> >>
> >> With the ^1, in the “bb” cases the first “b” eats all three “b”s:
> >>
> >> ababbb fails the r5 at the end
> >>
> >> abbbab fails the first r2 already (since the second r5 therein never
> happens)
> >
> > Is this a deliberate choice, a limitation of the grammar
> > expressiveness, some misuse on my side that could/should/needs to be
> > implemented in a different way, or does it count as a "bug" on the
> > lpeg side?
> >
> > For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just
> > because "b+" would eat all three "b"s at once (the regexp "b+b" in
> > fact finds "bbb", and I would expect a less-than-totally-greedy hit
> > with lpeg as well). Or is my reasoning wrong here?
>
> PEGs are greedy by design, which is a consequence of the fact that PEGS do
> not backtrack, which goes back to the underlying assumptive rule of PEGs
> that there is one (and only one!) ‘correct’ way to parse the input.
> Allowing backtracking destroys that assumption and by doing so would
> complicate the system to a level that would make it comparable to PCRE
> (with all the associated penalties on processing speed and a much greater
> codebase).
>

greedy vs non-greedy is one of the things that I always keep in mind when I
start with lpeg, and regularly I fail to apply -- because I think in the
"perl regex way".
Anyway,
http://www.gammon.com.au/lpeg
has some good lines:
e.g. this one (from the lpeg site) find the pattern anywhere in the line:

function anywhere (p)
  return lpeg.P { p + 1 * lpeg.V(1) }
end
print (lpeg.match (anywhere ("dog"), target))

-- 
luigi
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread Arthur Reutenauer

On Mon, Dec 21, 2020 at 02:08:11PM +0100, Mojca Miklavec wrote:
> Is this a deliberate choice, a limitation of the grammar
> expressiveness, some misuse on my side that could/should/needs to be
> implemented in a different way, or does it count as a "bug" on the
> lpeg side?

  That’s definitely the way parsing expression grammars work.  Or rather:
that’s the way LPeg (the only parsing expression grammar) works :-)

> For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just
> because "b+" would eat all three "b"s at once (the regexp "b+b" in
> fact finds "bbb", and I would expect a less-than-totally-greedy hit
> with lpeg as well). Or is my reasoning wrong here?

  Your reasoning is correct, but applies to regular expressions, not
parsing-expression grammars.

Arthur
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread Taco Hoekwater

> On 21 Dec 2020, at 14:08, Mojca Miklavec  
> wrote:
> 
> Dear Taco,
> 
> On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote:
>>> On 21 Dec 2020, at 13:16, Mojca Miklavec wrote:
>>> 
>>> My only explanation would be that perhaps "^1" is so greedy that the
>>> rest of the pattern doesn't get found. But I don't want to believe
>>> that explanation.
>> 
>> Which (of course) means that that is exactly what happens ;)
>> 
>> The ones that match are
>> 
>> ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5
>> abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
>> 
>> With the ^1, in the “bb” cases the first “b” eats all three “b”s:
>> 
>> ababbb fails the r5 at the end
>> 
>> abbbab fails the first r2 already (since the second r5 therein never happens)
> 
> Is this a deliberate choice, a limitation of the grammar
> expressiveness, some misuse on my side that could/should/needs to be
> implemented in a different way, or does it count as a "bug" on the
> lpeg side?
> 
> For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just
> because "b+" would eat all three "b"s at once (the regexp "b+b" in
> fact finds "bbb", and I would expect a less-than-totally-greedy hit
> with lpeg as well). Or is my reasoning wrong here?

PEGs are greedy by design, which is a consequence of the fact that PEGS do not 
backtrack, which goes back to the underlying assumptive rule of PEGs that there 
is one (and only one!) ‘correct’ way to parse the input. Allowing backtracking 
destroys that assumption and by doing so would complicate the system to a level 
that would make it comparable to PCRE (with all the associated penalties on 
processing speed and a much greater codebase).

Another way of thinking of it (perhaps that helps): PEGs are _declarative_ on 
the input, whereas REs are _interpretive_.

Yet another way of thinking about it: PEGs rigorously define the (programming) 
language of the input.

It takes a bit of rewiring your brain to get comfortable with PEGs when you are 
used to REs, and unpredictable input is much harder to tackle in PEGs. OTOH, 
using PEGs are much better if there is an explicit grammar.

In your example, the fact that you even considered ^1 means that you were still 
thinking too much in terms of regular expressions, but I know it is very hard 
to learn how to do something that is _only a little_ different from something 
you know well already.

Best wishes,
Taco

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread Mojca Miklavec

Dear Taco,

On Mon, 21 Dec 2020 at 13:46, Taco Hoekwater wrote:
> > On 21 Dec 2020, at 13:16, Mojca Miklavec wrote:
> >
> > My only explanation would be that perhaps "^1" is so greedy that the
> > rest of the pattern doesn't get found. But I don't want to believe
> > that explanation.
>
> Which (of course) means that that is exactly what happens ;)
>
> The ones that match are
>
> ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5
> abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
>
> With the ^1, in the “bb” cases the first “b” eats all three “b”s:
>
> ababbb fails the r5 at the end
>
> abbbab fails the first r2 already (since the second r5 therein never happens)

Is this a deliberate choice, a limitation of the grammar
expressiveness, some misuse on my side that could/should/needs to be
implemented in a different way, or does it count as a "bug" on the
lpeg side?

For example, I wouldn't expect a regexp "b+b" to fail on "bbb" just
because "b+" would eat all three "b"s at once (the regexp "b+b" in
fact finds "bbb", and I would expect a less-than-totally-greedy hit
with lpeg as well). Or is my reasoning wrong here?

It certainly works if I use
lpeg.P('b') + lpeg.P('bb') + lpeg.P('bbb') -- and a couple more
(as long as I can predict the maximum length)
but that's not really a viable workaround in general.

Thank you,
Mojca

PS: sorry, a tiny bug also crippled into my sample code. The line
after matching the 'parser1' should have used 'total1' rather than
'total':
if lpeg.match(parser1, s) then
total1 = total1 + 1
end
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread Taco Hoekwater



> On 21 Dec 2020, at 13:46, Taco Hoekwater  wrote:
> 
> 
> 
>> On 21 Dec 2020, at 13:16, Mojca Miklavec  
>> wrote:
>> 
>> My only explanation would be that perhaps "^1" is so greedy that the
>> rest of the pattern doesn't get found. But I don't want to believe
>> that explanation.
> 
> Which (of course) means that that is exactly what happens ;)
> 
> The ones that match are
> 
> ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5
> abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5
> 
> With the ^1, in the “bb” cases the first “b” eats all three “b”s: 
> 
> ababbb fails the r5 at the end

Sorry, that was wrong, it fails at the second r5 in the r2 as well, for the same
reason as below.

> 
> abbbab fails the first r2 already (since the second r5 therein never happens)
> 
> Best wishes,
> Taco

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread Taco Hoekwater



> On 21 Dec 2020, at 13:16, Mojca Miklavec  
> wrote:
> 
> My only explanation would be that perhaps "^1" is so greedy that the
> rest of the pattern doesn't get found. But I don't want to believe
> that explanation.

Which (of course) means that that is exactly what happens ;)

The ones that match are

ababbb (a (ba+bb) b) => r4 r1(r3(r5 r4) r2(r5 r5)) r5
abbbab (a (bb+ba) b) => r4 r1(r2(r5 r5) r3(r5 r4)) r5

With the ^1, in the “bb” cases the first “b” eats all three “b”s: 

ababbb fails the r5 at the end

abbbab fails the first r2 already (since the second r5 therein never happens)

Best wishes,
Taco

___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

[NTG-context] Off-topic: Struggles with LPEG grammar

2020-12-21 Thread Mojca Miklavec

Hi,

I'm sorry for being slightly off-topic here, but this list might still
be the best place to resolve lpeg-related questions :)

0.) Disclaimer: the challenge that triggered this curiosity came from
Advent of Code 2020. In case you are taking part and you wan't to
avoid spoilers, please stop reading here! (You have been warned.)
https://adventofcode.com/2020/day/19

1.) My question: I don't understand why I cannot get ^1 to work "as
advertised". Isn't this supposed to mean "one or more occurences of
the pattern"? If I change "lpeg.P('b')" into "lpeg.P('b')^1" in the
example below, the strings that match the initial grammar no longer
match the modified grammar. (I would naively imagine that the secord
pattern would get more rather than less matches.)

2.) Background: Most definitely the task on that page is supposed to
be solved in a different way, but many people use Advent of Code as an
opportunity to learn a new programming language, and when I read the
task description, I wanted to figure out if I could solve it using the
cute little lpeg. My initial attempt worked correctly (at least to
solve the first puzzle), but then I realized that I cannot easily
change the pattern from "matches a letter b" into "matches any number
of b-s", and I fail to figure out why. Any hints would be greatly
appreciated.

Below is a not-so-minimal example. I can certainly try to reduce it
further, but I would first like to ask whether I'm doing something
obviously wrong by trying to replace
r5 = lpeg.P('b')
by
r5 = lpeg.P('b')^1
in order to allow more than one occurrences of the letter b?
My only explanation would be that perhaps "^1" is so greedy that the
rest of the pattern doesn't get found. But I don't want to believe
that explanation.


local lpeg = require "lpeg"

--[[
0: 4 1 5
1: 2 3 | 3 2
2: 4 4 | 5 5
3: 4 5 | 5 4
4: "a"
5: "b"
]]--

local parser = lpeg.P{
"r0";
r0 = lpeg.V"r4" * lpeg.V"r1" * lpeg.V"r5",
r1 = lpeg.V"r2" * lpeg.V"r3" + lpeg.V"r3" * lpeg.V"r2",
r2 = lpeg.V"r4" * lpeg.V"r4" + lpeg.V"r5" * lpeg.V"r5",
r3 = lpeg.V"r4" * lpeg.V"r5" + lpeg.V"r5" * lpeg.V"r4",
r4 = lpeg.P('a'),
r5 = lpeg.P('b'),
} * -1

local parser1 = lpeg.P{
"r0";
r0 = lpeg.V"r4" * lpeg.V"r1" * lpeg.V"r5",
r1 = lpeg.V"r2" * lpeg.V"r3" + lpeg.V"r3" * lpeg.V"r2",
r2 = lpeg.V"r4" * lpeg.V"r4" + lpeg.V"r5" * lpeg.V"r5",
r3 = lpeg.V"r4" * lpeg.V"r5" + lpeg.V"r5" * lpeg.V"r4",
r4 = lpeg.P('a'),
r5 = lpeg.P('b')^1, -- modified part that doesn't seem to work
  } * -1

strings = {
  "ababbb",
  "bababa",
  "abbbab",
  "aaabbb",
  "bbb",
};

local total = 0
local total1 = 0
for _, s in ipairs(strings) do
if lpeg.match(parser, s) then
total = total + 1
end
if lpeg.match(parser1, s) then
total = total + 1
end
end
print('total:', total, total1)


In this example, total=2, total1=0.
What I don't understand is why total1 is zero.

Thank you,
Mojca
___
If your question is of interest to others as well, please add an entry to the 
Wiki!

maillist : ntg-context@ntg.nl / http://www.ntg.nl/mailman/listinfo/ntg-context
webpage  : http://www.pragma-ade.nl / http://context.aanhet.net
archive  : https://bitbucket.org/phg/context-mirror/commits/
wiki : http://contextgarden.net
___

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

Re: [NTG-context] Off-topic: Struggles with LPEG grammar

[NTG-context] Off-topic: Struggles with LPEG grammar

11 matches

Site Navigation

Mail list logo

Footer information