Re: Perl6 grammars -- Parsing english

2012-07-04 Thread Lard Farnwell
Hi Moritz,

Thanks that was interesting. My investigation into grammars took a while but 
here are the results thus far:

 Grammar rules and regexes are just methods…

I hadn't thought about what a grammar and rule actually was before. This 
inspired me to try:

---
grammar Gram{
has $.x;

rule TOP{
{say $.x}
}

method test{
say $.x
}
}
my Gram $test .= new(:x(hello));
$test.parse(ignore this);
$test.test;
say $test.TOP;
---
which outputs:
Any()  #output of TOP in parse
hello#output of test.test
hello#outputted on direct call to rule
Gram.new(x = Any) #the return value of $test.TOP


So rules can't interpolate their grammar's attributes when being called by 
'parse' but can when called as a method. Also rules being called directly as 
methods return the parent grammar. I'm not sure whether either of these things 
are intended…

=

Also I tried rules with arguments and it worked from grammar-parse but not 
from calling directly as a method. 

---
grammar Gram{

rule TOP{
test_rule('hello')
}

rule test_rule($a){
  $a
}
}

my Gram $test .= new();
$test.parse(hello) #returns true
$test.test_rule(hello) #error
---

The error is:

Invalid operation on null string
  in any !LITERAL at src/stage2/QRegex.nqp:653
  in method INTERPOLATE at src/gen/CORE.setting:9731
 (at the line where test_rule starts)

=

Ok now to try the things you mentioned:

First I tried using a parcel instead of an array as the role prototype (array 
resulted in error):
---
role roley [$foo]{
token tokeny { $foo }
}

grammar gram {
token TOP { tokeny }
}
---
my gram $gram .= new  does roley[('this','or', 'that')];
$gram.parse('this or that');  #returns true

So parcels get joined with spaces into one token

=

Now to try the around about way:

---
role roley [$foo]{
token tokeny:symdynamic { $foo }
}

grammar gram {
token TOP { tokeny[\ tokeny]* }
proto token tokeny {*}
}

my gram $gram .= new;
$gram does roley[$_] for that this;
$gram.parse('this'); #matches
$gram.parse('that'); #nope
---

Each iteration overwrites the previous one in terms of what 'tokeny' resolves 
to rather than adding it (symmetrically? is that what sym is short for?)



One more thing I found which seems to be a bug. I defined my nouns/pronouns 
like:

---
token PN:symJohn { .sym } #The dot should mean it doesn't get captured
token N:symball { .sym }
---

when my grammar parses this it ends up with a tree like this:
---
 sentence = q[John hit the ball]
  statement = q[John hit the ball]
   NP = q[John]
PN = q[John]
  = q[John]
   VP = q[hit the ball]
verb = q[hit]
  = q[hit]
NP = q[the ball]
 D = q[the]
   = q[the]
 N = q[ball]
   = q[ball]
---

Notice the empty slots on the left. Rather than not capturing the sym  the 
.sym just means it doesn't capture it's name :S



So after all this I have a much better understanding of what grammars really 
are but I'm still confused about a few things:

grammars are like classes. They are special because they have a method called 
'parse' which applies a rule/token definition (regex) called TOP (or whatever 
is set by the  :rule argument to parse).
Q: Are grammars meant to be able to have attributes like classes and are they 
meant to be able to interpolate them into their rules/token?
rules and tokens are just special types of methods who's body is a regex rather 
than perl6 code.
Q: What is the meaning of the return values of tokens/rules when called as 
methods?
Q: Is it possible to write a normal method that conforms the the same interface 
as rules/tokens (whatever that is). i.e. where we can use normal_method in 
rules/tokens which is passed arguments and somehow matches and sets position 
etc.
Q: Are rules/tokens meant to be able to have arguments like methods and if so 
how do they fit in.
grammars don't check whether the things in their tokens/rules like foo are 
actually defined until it comes time to call them
Q: Is this the way it's meant to be?

I saw your post on doc.perl6.org docs. If I can get my head around all this I 
would be happy to help document grammars!

Cheers,

Lard


On 27/06/2012, at 12:49 AM, Moritz Lenz wrote:

 
 
 On 06/26/2012 02:04 PM, Lard Farnwell wrote:
 Hi guys,
 
 To understand and play around with perl6 grammars I was trying to do a 
 simple NLP parts of speech parser in perl6 grammars. This is sort of what I 
 did: 
 
 ---
 grammar 

Perl6 grammars -- Parsing english

2012-06-26 Thread Lard Farnwell
Hi guys,

To understand and play around with perl6 grammars I was trying to do a simple 
NLP parts of speech parser in perl6 grammars. This is sort of what I did: 

---
grammar Sentence{
  proto rule VP {*}
  proto rule NP {*}

rule sentence {
imperative|statement
}
   rule imperative {VP}
   rule statement {NP VP}
}

grammar VerbPhrase is Sentence{
  rule VP:symhit  {sym  NP}
  rule VP:symkill {sym  NP}
}

grammar NounPhrase is Sentence{
  #define NP:sym etc
}


grammar English is NounPhrase is VerbPhrase {
rule TOP {
Sentence[\. Sentence]*
}
}


So in case you don't get it, A sentence is made up of phrases which in turn can 
be made up on other phrases. And English is made up of Sentences.
This sort of thing works but doesn't make much sense.

The obvious problem is that to get the correct definitions of the proto rules 
in Sentence I have to say verbPhrase is Sentence and then English is 
NounPhrase is VerbPhrase etc .  This makes me feel like I'm doing it wrong.

How do I build a flexible dynamic grammar in a OO sort of way. For example how 
could I do this so:

1) I define all my phrase structures (NP,VP,PP etc) in their own file while 
still being able to use each other. There are VPs can be made of NPs and NPs 
can be made up of VPs. 

2) Add to these definitions dynamically. For example, here I have defined hit 
and kill VPs. What if I wanted to add dance VP definition at run time?


Thanks guys!

Lard




Re: Perl6 grammars -- Parsing english

2012-06-26 Thread Moritz Lenz


On 06/26/2012 02:04 PM, Lard Farnwell wrote:
 Hi guys,
 
 To understand and play around with perl6 grammars I was trying to do a simple 
 NLP parts of speech parser in perl6 grammars. This is sort of what I did: 
 
 ---
 grammar Sentence{
 proto rule VP {*}
 proto rule NP {*}
   
   rule sentence {
   imperative|statement
   }
rule imperative {VP}
rule statement {NP VP}
 }
 
 grammar VerbPhrase is Sentence{
   rule VP:symhit  {sym  NP}
   rule VP:symkill {sym  NP}
 }
 
 grammar NounPhrase is Sentence{
   #define NP:sym etc
 }
 
 
 grammar English is NounPhrase is VerbPhrase {
   rule TOP {
   Sentence[\. Sentence]*
 }
 }
 
 
 So in case you don't get it, A sentence is made up of phrases which in turn 
 can be made up on other phrases. And English is made up of Sentences.
 This sort of thing works but doesn't make much sense.
 
 The obvious problem is that to get the correct definitions of the proto rules 
 in Sentence I have to say verbPhrase is Sentence and then English is 
 NounPhrase is VerbPhrase etc .  This makes me feel like I'm doing it wrong.

Indeed. The intended mechanism for code reuse in object oriented Perl 6
code is role composition.

Grammar rules and regexes are just methods, so defining them in a role
and applying it to a class sounds like a good idea to me.

role VerbPhrase {
rule VP { verb NP }
proto token verb  {*}
token verb:symhit  { sym }
token verb:symkill { sym }
}

Define NounPhrase in a similar way, leave out the definition of NP and
VP from Sentence, and then write

grammar English does NounPhrase does VerbPhrase is Sentence {
token TOP { ... }
}

Role composition has much more transparent error modes than inheritance,
and probably works better for you.


 How do I build a flexible dynamic grammar in a OO sort of way. For example 
 how could I do this so:
 
 1) I define all my phrase structures (NP,VP,PP etc) in their own file while 
 still being able to use each other. There are VPs can be made of NPs and NPs 
 can be made up of VPs. 

See above

 2) Add to these definitions dynamically. For example, here I have defined 
 hit and kill VPs. What if I wanted to add dance VP definition at run time?

In theory you can write

role VerbPhrases[@verbs] {
 token verb:symdynamic { @verbs }
 # note that 'dynamic' has no special meaning here, but since
 # we don't use sym in the regex body, it doesn't matter what
 # we write
}

And then instantiate your grammar as

my $g = English.new does VerbPhrases[dance listen juggle ...];
my $match = $g.parse($yourstring);

But Rakudo doesn't yet properly handle array variables in regexes, so
you have to write something like

role AdditionalVerbPhrase[$verb] {
token verb:symdynamic { $verb };
}

my $g = English.new;
$g does AddtionalVerbPhrase[$_] for dance listen juggle ...;
my $match = $g.parse(...);

I haven't tested it though.
If you experiment with it, please report your findings here, I'm curious
about what works right now. If it doesn't work, we can surely find some
way to make it work by going through the meta object to add methods to
the grammar.

Cheers,
Moritz