Hi Moritz,
Thanks that was interesting. My investigation into grammars took a while but
here are the results thus far:
> Grammar rules and regexes are just methods…
I hadn't thought about what a grammar and rule actually was before. This
inspired me to try:
---------------------------
grammar Gram{
has $.x;
rule TOP{
{say $.x}
}
method test{
say $.x
}
}
my Gram $test .= new(:x("hello"));
$test.parse("ignore this");
$test.test;
say $test.TOP;
---------------------------
which outputs:
Any() #output of TOP in parse
hello #output of test.test
hello #outputted on direct call to rule
Gram.new(x => Any) #the return value of $test.TOP
So rules can't interpolate their grammar's attributes when being called by
'parse' but can when called as a method. Also rules being called directly as
methods return the parent grammar. I'm not sure whether either of these things
are intended…
=============================
Also I tried rules with arguments and it worked from grammar->parse but not
from calling directly as a method.
---------------------------
grammar Gram{
rule TOP{
<test_rule('hello')>
}
rule test_rule($a){
$a
}
}
my Gram $test .= new();
$test.parse("hello") #returns true
$test.test_rule("hello") #error
---------------------------
The error is:
Invalid operation on null string
in any !LITERAL at src/stage2/QRegex.nqp:653
in method INTERPOLATE at src/gen/CORE.setting:9731
(at the line where test_rule starts)
=============================
Ok now to try the things you mentioned:
First I tried using a parcel instead of an array as the role prototype (array
resulted in error):
---------------------------
role roley [$foo]{
token tokeny { $foo }
}
grammar gram {
token TOP { <tokeny> }
}
---------------------------
my gram $gram .= new does roley[('this','or', 'that')];
$gram.parse('this or that'); #returns true
So parcels get joined with spaces into one token
=============================
Now to try the around about way:
---------------------------
role roley [$foo]{
token tokeny:sym<dynamic> { $foo }
}
grammar gram {
token TOP { <tokeny>[\ <tokeny>]* }
proto token tokeny {*}
}
my gram $gram .= new;
$gram does roley[$_] for <that this>;
$gram.parse('this'); #matches
$gram.parse('that'); #nope
---------------------------
Each iteration overwrites the previous one in terms of what 'tokeny' resolves
to rather than adding it (symmetrically? is that what sym is short for?)
============================
One more thing I found which seems to be a bug. I defined my nouns/pronouns
like:
---------------------------
token PN:sym<John> { <.sym> } #The dot should mean it doesn't get captured
token N:sym<ball> { <.sym> }
---------------------------
when my grammar parses this it ends up with a tree like this:
---------------------------
sentence => q[John hit the ball]
statement => q[John hit the ball]
NP => q[John]
PN => q[John]
=> q[John]
VP => q[hit the ball]
verb => q[hit]
=> q[hit]
NP => q[the ball]
D => q[the]
=> q[the]
N => q[ball]
=> q[ball]
---------------------------
Notice the empty slots on the left. Rather than not capturing the <sym> the
<.sym> just means it doesn't capture it's name :S
============================
So after all this I have a much better understanding of what grammars really
are but I'm still confused about a few things:
grammars are like classes. They are special because they have a method called
'parse' which applies a rule/token definition (regex) called TOP (or whatever
is set by the :rule argument to parse).
Q: Are grammars meant to be able to have attributes like classes and are they
meant to be able to interpolate them into their rules/token?
rules and tokens are just special types of methods who's body is a regex rather
than perl6 code.
Q: What is the meaning of the return values of tokens/rules when called as
methods?
Q: Is it possible to write a normal method that conforms the the same interface
as rules/tokens (whatever that is). i.e. where we can use <normal_method> in
rules/tokens which is passed arguments and somehow matches and sets position
etc.
Q: Are rules/tokens meant to be able to have arguments like methods and if so
how do they fit in.
grammars don't check whether the things in their tokens/rules like <foo> are
actually defined until it comes time to call them
Q: Is this the way it's meant to be?
I saw your post on doc.perl6.org docs. If I can get my head around all this I
would be happy to help document grammars!
Cheers,
Lard
On 27/06/2012, at 12:49 AM, Moritz Lenz wrote:
>
>
> On 06/26/2012 02:04 PM, Lard Farnwell wrote:
>> Hi guys,
>>
>> To understand and play around with perl6 grammars I was trying to do a
>> simple NLP parts of speech parser in perl6 grammars. This is sort of what I
>> did:
>>
>> ---------------------------
>> grammar Sentence{
>> proto rule VP {*}
>> proto rule NP {*}
>>
>> rule sentence {
>> <imperative>|<statement>
>> }
>> rule imperative {<VP>}
>> rule statement {<NP> <VP>}
>> }
>>
>> grammar VerbPhrase is Sentence{
>> rule VP:sym<hit> {<sym> <NP>}
>> rule VP:sym<kill> {<sym> <NP>}
>> }
>>
>> grammar NounPhrase is Sentence{
>> #define NP:sym etc
>> }
>>
>>
>> grammar English is NounPhrase is VerbPhrase {
>> rule TOP {
>> <Sentence>[\. <Sentence]*
>> }
>> }
>> --------------------------------
>>
>> So in case you don't get it, A sentence is made up of phrases which in turn
>> can be made up on other phrases. And English is made up of Sentences.
>> This sort of thing works but doesn't make much sense.
>>
>> The obvious problem is that to get the correct definitions of the proto
>> rules in Sentence I have to say "verbPhrase is Sentence" and then "English
>> is NounPhrase is VerbPhrase etc" . This makes me feel like I'm doing it
>> wrong.
>
> Indeed. The intended mechanism for code reuse in object oriented Perl 6
> code is role composition.
>
> Grammar rules and regexes are just methods, so defining them in a role
> and applying it to a class sounds like a good idea to me.
>
> role VerbPhrase {
> rule VP { <verb> <NP> }
> proto token verb {*}
> token verb:sym<hit> { <sym> }
> token verb:sym<kill> { <sym> }
> }
>
> Define NounPhrase in a similar way, leave out the definition of NP and
> VP from Sentence, and then write
>
> grammar English does NounPhrase does VerbPhrase is Sentence {
> token TOP { ... }
> }
>
> Role composition has much more transparent error modes than inheritance,
> and probably works better for you.
>
>
>> How do I build a flexible dynamic grammar in a OO sort of way. For example
>> how could I do this so:
>>
>> 1) I define all my phrase structures (NP,VP,PP etc) in their own file while
>> still being able to use each other. There are VPs can be made of NPs and NPs
>> can be made up of VPs.
>
> See above
>
>> 2) Add to these definitions dynamically. For example, here I have defined
>> "hit and kill" VPs. What if I wanted to add "dance" VP definition at run
>> time?
>
> In theory you can write
>
> role VerbPhrases[@verbs] {
> token verb:sym<dynamic> { @verbs }
> # note that 'dynamic' has no special meaning here, but since
> # we don't use <sym> in the regex body, it doesn't matter what
> # we write
> }
>
> And then instantiate your grammar as
>
> my $g = English.new does VerbPhrases[<dance listen juggle ...>];
> my $match = $g.parse($yourstring);
>
> But Rakudo doesn't yet properly handle array variables in regexes, so
> you have to write something like
>
> role AdditionalVerbPhrase[$verb] {
> token verb:sym<dynamic> { $verb };
> }
>
> my $g = English.new;
> $g does AddtionalVerbPhrase[$_] for <dance listen juggle ...>;
> my $match = $g.parse(...);
>
> I haven't tested it though.
> If you experiment with it, please report your findings here, I'm curious
> about what works right now. If it doesn't work, we can surely find some
> way to make it work by going through the meta object to add methods to
> the grammar.
>
> Cheers,
> Moritz