Re: Perl6 grammars -- Parsing english

Lard Farnwell Wed, 04 Jul 2012 06:10:12 -0700

Hi Moritz,

Thanks that was interesting. My investigation into grammars took a while but 
here are the results thus far:


> Grammar rules and regexes are just methods…

I hadn't thought about what a grammar and rule actually was before. This 
inspired me to try:

---------------------------
grammar Gram{
    has $.x;
    
    rule TOP{
        {say $.x}
    }
    
    method test{
        say $.x
    }
}
my Gram $test .= new(:x("hello"));
$test.parse("ignore this");
$test.test;
say $test.TOP;
---------------------------
which outputs:
Any()                          #output of TOP in parse
hello                            #output of test.test
hello                            #outputted on direct call to rule
Gram.new(x => Any) #the return value of $test.TOP


So rules can't interpolate their grammar's attributes when being called by 
'parse' but can when called as a method. Also rules being called directly as 
methods return the parent grammar. I'm not sure whether either of these things 
are intended…

=============================

Also I tried rules with arguments and it worked from grammar->parse but not 
from calling directly as a method. 

---------------------------
grammar Gram{
    
    rule TOP{
        <test_rule('hello')>
    }

    rule test_rule($a){
      $a
    }
}

my Gram $test .= new();
$test.parse("hello") #returns true
$test.test_rule("hello") #error
---------------------------

The error is:

Invalid operation on null string
  in any !LITERAL at src/stage2/QRegex.nqp:653
  in method INTERPOLATE at src/gen/CORE.setting:9731
 (at the line where test_rule starts)

=============================

Ok now to try the things you mentioned:

First I tried using a parcel instead of an array as the role prototype (array 
resulted in error):
---------------------------
role roley [$foo]{
    token tokeny { $foo }
}

grammar gram {
    token TOP { <tokeny> }
}
---------------------------
my gram $gram .= new  does roley[('this','or', 'that')];
$gram.parse('this or that');  #returns true

So parcels get joined with spaces into one token

=============================

Now to try the around about way:

---------------------------
role roley [$foo]{
    token tokeny:sym<dynamic> { $foo }
}

grammar gram {
    token TOP { <tokeny>[\ <tokeny>]* }
    proto token tokeny {*}
}

my gram $gram .= new;
$gram does roley[$_] for <that this>;
$gram.parse('this'); #matches
$gram.parse('that'); #nope
---------------------------

Each iteration overwrites the previous one in terms of what 'tokeny' resolves 
to rather than adding it (symmetrically? is that what sym is short for?)

============================

One more thing I found which seems to be a bug. I defined my nouns/pronouns 
like:

---------------------------
token PN:sym<John> { <.sym> } #The dot should mean it doesn't get captured
token N:sym<ball> { <.sym> }
---------------------------

when my grammar parses this it ends up with a tree like this:
---------------------------
 sentence => q[John hit the ball]
  statement => q[John hit the ball]
   NP => q[John]
    PN => q[John]
      => q[John]
   VP => q[hit the ball]
    verb => q[hit]
      => q[hit]
    NP => q[the ball]
     D => q[the]
       => q[the]
     N => q[ball]
       => q[ball]
---------------------------

Notice the empty slots on the left. Rather than not capturing the <sym>  the 
<.sym> just means it doesn't capture it's name :S

============================

So after all this I have a much better understanding of what grammars really 
are but I'm still confused about a few things:

grammars are like classes. They are special because they have a method called 
'parse' which applies a rule/token definition (regex) called TOP (or whatever 
is set by the  :rule argument to parse).
Q: Are grammars meant to be able to have attributes like classes and are they 
meant to be able to interpolate them into their rules/token?
rules and tokens are just special types of methods who's body is a regex rather 
than perl6 code.
Q: What is the meaning of the return values of tokens/rules when called as 
methods?
Q: Is it possible to write a normal method that conforms the the same interface 
as rules/tokens (whatever that is). i.e. where we can use <normal_method> in 
rules/tokens which is passed arguments and somehow matches and sets position 
etc.
Q: Are rules/tokens meant to be able to have arguments like methods and if so 
how do they fit in.
grammars don't check whether the things in their tokens/rules like <foo> are 
actually defined until it comes time to call them
Q: Is this the way it's meant to be?

I saw your post on doc.perl6.org docs. If I can get my head around all this I 
would be happy to help document grammars!

Cheers,

Lard


On 27/06/2012, at 12:49 AM, Moritz Lenz wrote:

> 
> 
> On 06/26/2012 02:04 PM, Lard Farnwell wrote:
>> Hi guys,
>> 
>> To understand and play around with perl6 grammars I was trying to do a 
>> simple NLP parts of speech parser in perl6 grammars. This is sort of what I 
>> did: 
>> 
>> ---------------------------
>> grammar Sentence{
>>        proto rule VP {*}
>>        proto rule NP {*}
>>      
>>      rule sentence {
>>              <imperative>|<statement>
>>      }
>>       rule imperative {<VP>}
>>       rule statement {<NP> <VP>}
>> }
>> 
>> grammar VerbPhrase is Sentence{
>>  rule VP:sym<hit>  {<sym>  <NP>}
>>  rule VP:sym<kill> {<sym>  <NP>}
>> }
>> 
>> grammar NounPhrase is Sentence{
>>      #define NP:sym etc
>> }
>> 
>> 
>> grammar English is NounPhrase is VerbPhrase {
>>      rule TOP {
>>              <Sentence>[\. <Sentence]*
>>        }
>> }
>> --------------------------------
>> 
>> So in case you don't get it, A sentence is made up of phrases which in turn 
>> can be made up on other phrases. And English is made up of Sentences.
>> This sort of thing works but doesn't make much sense.
>> 
>> The obvious problem is that to get the correct definitions of the proto 
>> rules in Sentence I have to say "verbPhrase is Sentence" and then "English 
>> is NounPhrase is VerbPhrase etc" .  This makes me feel like I'm doing it 
>> wrong.
> 
> Indeed. The intended mechanism for code reuse in object oriented Perl 6
> code is role composition.
> 
> Grammar rules and regexes are just methods, so defining them in a role
> and applying it to a class sounds like a good idea to me.
> 
> role VerbPhrase {
>    rule VP { <verb> <NP> }
>    proto token verb  {*}
>    token verb:sym<hit>  { <sym> }
>    token verb:sym<kill> { <sym> }
> }
> 
> Define NounPhrase in a similar way, leave out the definition of NP and
> VP from Sentence, and then write
> 
> grammar English does NounPhrase does VerbPhrase is Sentence {
>    token TOP { ... }
> }
> 
> Role composition has much more transparent error modes than inheritance,
> and probably works better for you.
> 
> 
>> How do I build a flexible dynamic grammar in a OO sort of way. For example 
>> how could I do this so:
>> 
>> 1) I define all my phrase structures (NP,VP,PP etc) in their own file while 
>> still being able to use each other. There are VPs can be made of NPs and NPs 
>> can be made up of VPs. 
> 
> See above
> 
>> 2) Add to these definitions dynamically. For example, here I have defined 
>> "hit and kill" VPs. What if I wanted to add "dance" VP definition at run 
>> time?
> 
> In theory you can write
> 
> role VerbPhrases[@verbs] {
>     token verb:sym<dynamic> { @verbs }
>     # note that 'dynamic' has no special meaning here, but since
>     # we don't use <sym> in the regex body, it doesn't matter what
>     # we write
> }
> 
> And then instantiate your grammar as
> 
> my $g = English.new does VerbPhrases[<dance listen juggle ...>];
> my $match = $g.parse($yourstring);
> 
> But Rakudo doesn't yet properly handle array variables in regexes, so
> you have to write something like
> 
> role AdditionalVerbPhrase[$verb] {
>    token verb:sym<dynamic> { $verb };
> }
> 
> my $g = English.new;
> $g does AddtionalVerbPhrase[$_] for <dance listen juggle ...>;
> my $match = $g.parse(...);
> 
> I haven't tested it though.
> If you experiment with it, please report your findings here, I'm curious
> about what works right now. If it doesn't work, we can surely find some
> way to make it work by going through the meta object to add methods to
> the grammar.
> 
> Cheers,
> Moritz

Re: Perl6 grammars -- Parsing english

Reply via email to