Perl6 grammars -- Parsing english

2012-06-26 Thread Lard Farnwell
Hi guys,

To understand and play around with perl6 grammars I was trying to do a simple 
NLP parts of speech parser in perl6 grammars. This is sort of what I did: 

---
grammar Sentence{
  proto rule VP {*}
  proto rule NP {*}

rule sentence {
imperative|statement
}
   rule imperative {VP}
   rule statement {NP VP}
}

grammar VerbPhrase is Sentence{
  rule VP:symhit  {sym  NP}
  rule VP:symkill {sym  NP}
}

grammar NounPhrase is Sentence{
  #define NP:sym etc
}


grammar English is NounPhrase is VerbPhrase {
rule TOP {
Sentence[\. Sentence]*
}
}


So in case you don't get it, A sentence is made up of phrases which in turn can 
be made up on other phrases. And English is made up of Sentences.
This sort of thing works but doesn't make much sense.

The obvious problem is that to get the correct definitions of the proto rules 
in Sentence I have to say verbPhrase is Sentence and then English is 
NounPhrase is VerbPhrase etc .  This makes me feel like I'm doing it wrong.

How do I build a flexible dynamic grammar in a OO sort of way. For example how 
could I do this so:

1) I define all my phrase structures (NP,VP,PP etc) in their own file while 
still being able to use each other. There are VPs can be made of NPs and NPs 
can be made up of VPs. 

2) Add to these definitions dynamically. For example, here I have defined hit 
and kill VPs. What if I wanted to add dance VP definition at run time?


Thanks guys!

Lard




Re: Perl6 grammars -- Parsing english

2012-06-26 Thread Moritz Lenz


On 06/26/2012 02:04 PM, Lard Farnwell wrote:
 Hi guys,
 
 To understand and play around with perl6 grammars I was trying to do a simple 
 NLP parts of speech parser in perl6 grammars. This is sort of what I did: 
 
 ---
 grammar Sentence{
 proto rule VP {*}
 proto rule NP {*}
   
   rule sentence {
   imperative|statement
   }
rule imperative {VP}
rule statement {NP VP}
 }
 
 grammar VerbPhrase is Sentence{
   rule VP:symhit  {sym  NP}
   rule VP:symkill {sym  NP}
 }
 
 grammar NounPhrase is Sentence{
   #define NP:sym etc
 }
 
 
 grammar English is NounPhrase is VerbPhrase {
   rule TOP {
   Sentence[\. Sentence]*
 }
 }
 
 
 So in case you don't get it, A sentence is made up of phrases which in turn 
 can be made up on other phrases. And English is made up of Sentences.
 This sort of thing works but doesn't make much sense.
 
 The obvious problem is that to get the correct definitions of the proto rules 
 in Sentence I have to say verbPhrase is Sentence and then English is 
 NounPhrase is VerbPhrase etc .  This makes me feel like I'm doing it wrong.

Indeed. The intended mechanism for code reuse in object oriented Perl 6
code is role composition.

Grammar rules and regexes are just methods, so defining them in a role
and applying it to a class sounds like a good idea to me.

role VerbPhrase {
rule VP { verb NP }
proto token verb  {*}
token verb:symhit  { sym }
token verb:symkill { sym }
}

Define NounPhrase in a similar way, leave out the definition of NP and
VP from Sentence, and then write

grammar English does NounPhrase does VerbPhrase is Sentence {
token TOP { ... }
}

Role composition has much more transparent error modes than inheritance,
and probably works better for you.


 How do I build a flexible dynamic grammar in a OO sort of way. For example 
 how could I do this so:
 
 1) I define all my phrase structures (NP,VP,PP etc) in their own file while 
 still being able to use each other. There are VPs can be made of NPs and NPs 
 can be made up of VPs. 

See above

 2) Add to these definitions dynamically. For example, here I have defined 
 hit and kill VPs. What if I wanted to add dance VP definition at run time?

In theory you can write

role VerbPhrases[@verbs] {
 token verb:symdynamic { @verbs }
 # note that 'dynamic' has no special meaning here, but since
 # we don't use sym in the regex body, it doesn't matter what
 # we write
}

And then instantiate your grammar as

my $g = English.new does VerbPhrases[dance listen juggle ...];
my $match = $g.parse($yourstring);

But Rakudo doesn't yet properly handle array variables in regexes, so
you have to write something like

role AdditionalVerbPhrase[$verb] {
token verb:symdynamic { $verb };
}

my $g = English.new;
$g does AddtionalVerbPhrase[$_] for dance listen juggle ...;
my $match = $g.parse(...);

I haven't tested it though.
If you experiment with it, please report your findings here, I'm curious
about what works right now. If it doesn't work, we can surely find some
way to make it work by going through the meta object to add methods to
the grammar.

Cheers,
Moritz