Re: Perl6 grammars -- Parsing english

2012-07-11 Thread Moritz Lenz

Hi Lard,

sorry for the late and incomplete answer.

Am 04.07.2012 15:09, schrieb Lard Farnwell:

Hi Moritz,

Thanks that was interesting. My investigation into grammars took a while but 
here are the results thus far:


Grammar rules and regexes are just methods…


I hadn't thought about what a grammar and rule actually was before. This 
inspired me to try:

---
grammar Gram{
 has $.x;

 rule TOP{
{say $.x}
 }

 method test{
say $.x
 }
}
my Gram $test .= new(:x("hello"));
$test.parse("ignore this");
$test.test;
say $test.TOP;
---
which outputs:
Any()  #output of TOP in parse
hello#output of test.test
hello#outputted on direct call to rule
Gram.new(x => Any) #the return value of $test.TOP


So rules can't interpolate their grammar's attributes when being called by 
'parse' but can when called as a method. Also rules being called directly as 
methods return the parent grammar. I'm not sure whether either of these things 
are intended…


I'm not sure how it's intended to work either.

Notionally, grammar rules (and other components of regexes) communicate 
by passing "cursors" around. A cursor is an immutable object that points 
to a location in the string, and additionally keeps track of other 
information like captures.


So when you write 'grammar gram { ... }', you are actually inheriting 
from class Grammar, which in turn inherits from class Cursor.


When you call the .parse method, a cursor is intatiated automatically, 
which explains why its attribute is empty -- it's not the same object as 
you created in your code.


I'm not sure if there is a mechanism to passing around attributes -- so 
far I always just assumed it would work, but it doesn't.



=

Also I tried rules with arguments and it worked from grammar->parse but not 
from calling directly as a method.

---
grammar Gram{

 rule TOP{

 }

 rule test_rule($a){
   $a
 }
}

my Gram $test .= new();
$test.parse("hello") #returns true
$test.test_rule("hello") #error
---

The error is:

Invalid operation on null string
   in any !LITERAL at src/stage2/QRegex.nqp:653
   in method INTERPOLATE at src/gen/CORE.setting:9731
  (at the line where test_rule starts)

=

Ok now to try the things you mentioned:

First I tried using a parcel instead of an array as the role prototype (array 
resulted in error):
---
role roley [$foo]{
 token tokeny { $foo }
}

grammar gram {
 token TOP {  }
}
---
my gram $gram .= new  does roley[('this','or', 'that')];
$gram.parse('this or that');  #returns true

So parcels get joined with spaces into one token


That's a known not-yet-implemented part of Rakudo :(


=

Now to try the around about way:

---
role roley [$foo]{
 token tokeny:sym { $foo }
}

grammar gram {
 token TOP { [\ ]* }
 proto token tokeny {*}
}

my gram $gram .= new;
$gram does roley[$_] for ;
$gram.parse('this'); #matches
$gram.parse('that'); #nope
---

Each iteration overwrites the previous one in terms of what 'tokeny' resolves 
to rather than adding it (symmetrically? is that what sym is short for?)


"sym" stands for "symbol", the thing that appears in the name of the 
token inside <...>.





One more thing I found which seems to be a bug. I defined my nouns/pronouns 
like:

---
token PN:sym { <.sym> } #The dot should mean it doesn't get captured
token N:sym { <.sym> }
---

when my grammar parses this it ends up with a tree like this:
---
  sentence => q[John hit the ball]
   statement => q[John hit the ball]
NP => q[John]
 PN => q[John]
   => q[John]
VP => q[hit the ball]
 verb => q[hit]
   => q[hit]
 NP => q[the ball]
  D => q[the]
=> q[the]
  N => q[ball]
=> q[ball]
---

Notice the empty slots on the left. Rather than not capturing the   the 
<.sym> just means it doesn't capture it's name :S


I've recently discovered the same bug (and tried to fix it, instead of 
submitted it as a bug report; I failed to fix it though :/). Basically 
 is special-cased in the compiler, and the . modifier at the start 
simply doesn't harmonize with that special case.





So after all this I have a much better understanding of what grammars really 
are but I'm still confused about a few things:

grammars are like classes. They are special because they have a method called 
'parse' which applies a rule/token definition (regex) called TOP (or whatever 
is set by the  :rule argument to parse).
Q: Are grammars meant to be able to have at

Re: Perl6 grammars -- Parsing english

2012-07-04 Thread Lard Farnwell
Hi Moritz,

Thanks that was interesting. My investigation into grammars took a while but 
here are the results thus far:

> Grammar rules and regexes are just methods…

I hadn't thought about what a grammar and rule actually was before. This 
inspired me to try:

---
grammar Gram{
has $.x;

rule TOP{
{say $.x}
}

method test{
say $.x
}
}
my Gram $test .= new(:x("hello"));
$test.parse("ignore this");
$test.test;
say $test.TOP;
---
which outputs:
Any()  #output of TOP in parse
hello#output of test.test
hello#outputted on direct call to rule
Gram.new(x => Any) #the return value of $test.TOP


So rules can't interpolate their grammar's attributes when being called by 
'parse' but can when called as a method. Also rules being called directly as 
methods return the parent grammar. I'm not sure whether either of these things 
are intended…

=

Also I tried rules with arguments and it worked from grammar->parse but not 
from calling directly as a method. 

---
grammar Gram{

rule TOP{

}

rule test_rule($a){
  $a
}
}

my Gram $test .= new();
$test.parse("hello") #returns true
$test.test_rule("hello") #error
---

The error is:

Invalid operation on null string
  in any !LITERAL at src/stage2/QRegex.nqp:653
  in method INTERPOLATE at src/gen/CORE.setting:9731
 (at the line where test_rule starts)

=

Ok now to try the things you mentioned:

First I tried using a parcel instead of an array as the role prototype (array 
resulted in error):
---
role roley [$foo]{
token tokeny { $foo }
}

grammar gram {
token TOP {  }
}
---
my gram $gram .= new  does roley[('this','or', 'that')];
$gram.parse('this or that');  #returns true

So parcels get joined with spaces into one token

=

Now to try the around about way:

---
role roley [$foo]{
token tokeny:sym { $foo }
}

grammar gram {
token TOP { [\ ]* }
proto token tokeny {*}
}

my gram $gram .= new;
$gram does roley[$_] for ;
$gram.parse('this'); #matches
$gram.parse('that'); #nope
---

Each iteration overwrites the previous one in terms of what 'tokeny' resolves 
to rather than adding it (symmetrically? is that what sym is short for?)



One more thing I found which seems to be a bug. I defined my nouns/pronouns 
like:

---
token PN:sym { <.sym> } #The dot should mean it doesn't get captured
token N:sym { <.sym> }
---

when my grammar parses this it ends up with a tree like this:
---
 sentence => q[John hit the ball]
  statement => q[John hit the ball]
   NP => q[John]
PN => q[John]
  => q[John]
   VP => q[hit the ball]
verb => q[hit]
  => q[hit]
NP => q[the ball]
 D => q[the]
   => q[the]
 N => q[ball]
   => q[ball]
---

Notice the empty slots on the left. Rather than not capturing the   the 
<.sym> just means it doesn't capture it's name :S



So after all this I have a much better understanding of what grammars really 
are but I'm still confused about a few things:

grammars are like classes. They are special because they have a method called 
'parse' which applies a rule/token definition (regex) called TOP (or whatever 
is set by the  :rule argument to parse).
Q: Are grammars meant to be able to have attributes like classes and are they 
meant to be able to interpolate them into their rules/token?
rules and tokens are just special types of methods who's body is a regex rather 
than perl6 code.
Q: What is the meaning of the return values of tokens/rules when called as 
methods?
Q: Is it possible to write a normal method that conforms the the same interface 
as rules/tokens (whatever that is). i.e. where we can use  in 
rules/tokens which is passed arguments and somehow matches and sets position 
etc.
Q: Are rules/tokens meant to be able to have arguments like methods and if so 
how do they fit in.
grammars don't check whether the things in their tokens/rules like  are 
actually defined until it comes time to call them
Q: Is this the way it's meant to be?

I saw your post on doc.perl6.org docs. If I can get my head around all this I 
would be happy to help document grammars!

Cheers,

Lard


On 27/06/2012, at 12:49 AM, Moritz Lenz wrote:

> 
> 
> On 06/26/2012 02:04 PM, Lard Farnwell wrote:
>> Hi guys,
>> 
>> To understand and play around with perl6 grammars I was trying to do a 
>> simple NLP parts of speech parser in perl6 grammars. This is sort of what I 
>> did: 
>> 
>> ---
>> grammar Sentence{
>>proto rule VP {

Re: Perl6 grammars -- Parsing english

2012-06-26 Thread Moritz Lenz


On 06/26/2012 02:04 PM, Lard Farnwell wrote:
> Hi guys,
> 
> To understand and play around with perl6 grammars I was trying to do a simple 
> NLP parts of speech parser in perl6 grammars. This is sort of what I did: 
> 
> ---
> grammar Sentence{
> proto rule VP {*}
> proto rule NP {*}
>   
>   rule sentence {
>   |
>   }
>rule imperative {}
>rule statement { }
> }
> 
> grammar VerbPhrase is Sentence{
>   rule VP:sym  {  }
>   rule VP:sym {  }
> }
> 
> grammar NounPhrase is Sentence{
>   #define NP:sym etc
> }
> 
> 
> grammar English is NounPhrase is VerbPhrase {
>   rule TOP {
>   [\.  }
> }
> 
> 
> So in case you don't get it, A sentence is made up of phrases which in turn 
> can be made up on other phrases. And English is made up of Sentences.
> This sort of thing works but doesn't make much sense.
> 
> The obvious problem is that to get the correct definitions of the proto rules 
> in Sentence I have to say "verbPhrase is Sentence" and then "English is 
> NounPhrase is VerbPhrase etc" .  This makes me feel like I'm doing it wrong.

Indeed. The intended mechanism for code reuse in object oriented Perl 6
code is role composition.

Grammar rules and regexes are just methods, so defining them in a role
and applying it to a class sounds like a good idea to me.

role VerbPhrase {
rule VP {   }
proto token verb  {*}
token verb:sym  {  }
token verb:sym {  }
}

Define NounPhrase in a similar way, leave out the definition of NP and
VP from Sentence, and then write

grammar English does NounPhrase does VerbPhrase is Sentence {
token TOP { ... }
}

Role composition has much more transparent error modes than inheritance,
and probably works better for you.


> How do I build a flexible dynamic grammar in a OO sort of way. For example 
> how could I do this so:
> 
> 1) I define all my phrase structures (NP,VP,PP etc) in their own file while 
> still being able to use each other. There are VPs can be made of NPs and NPs 
> can be made up of VPs. 

See above

> 2) Add to these definitions dynamically. For example, here I have defined 
> "hit and kill" VPs. What if I wanted to add "dance" VP definition at run time?

In theory you can write

role VerbPhrases[@verbs] {
 token verb:sym { @verbs }
 # note that 'dynamic' has no special meaning here, but since
 # we don't use  in the regex body, it doesn't matter what
 # we write
}

And then instantiate your grammar as

my $g = English.new does VerbPhrases[];
my $match = $g.parse($yourstring);

But Rakudo doesn't yet properly handle array variables in regexes, so
you have to write something like

role AdditionalVerbPhrase[$verb] {
token verb:sym { $verb };
}

my $g = English.new;
$g does AddtionalVerbPhrase[$_] for ;
my $match = $g.parse(...);

I haven't tested it though.
If you experiment with it, please report your findings here, I'm curious
about what works right now. If it doesn't work, we can surely find some
way to make it work by going through the meta object to add methods to
the grammar.

Cheers,
Moritz