Re: [fonc] Earley Parsing Explained (incomplete first draft)

2014-09-21 Thread Loup Vaillant-David
On Sun, Sep 21, 2014 at 04:51:32PM -0400, Josh Grams wrote:
> The left back pointers move back through the rule. So a pre-order
> traversal will go like this (I'm using tildes because it was
> annoying to copy/paste the bullet):
> 
> A -> B C D ~  -- at this step you apply A => B C D.
> A -> B C ~ D
> A -> B ~ C D
> A -> ~ B C D
> No more left pointers, so return back up to A -> B ~ C D.
> Follow its right pointer to B ... ambiguous! So choose between:

Not quite. Assume this correspondence:

  A -> B C D ~  Item1in S(start)
  A -> B C ~ D  Item2 and 2_bis  in S(i) and S(j)
  A -> B ~ C D  Item3 and 3_bis  in S(k) and S(l)
  A -> ~ B C D  Item4in S(finish)

Looking at the left back pointers, ambiguities look like this:

+--- Item2 <--- item3 <---+
| |
   Item1 <--+ +--- item4
| |
+--- Item2_bis <--- item3_bis <---+

(Note that ambiguities always merge back.)

The only way to detect the ambiguity from Item1 to Item2, is to
construct all the relevant right pointers first.

+---> Item2 ---> item3 ---+
| |
   Item1 ---+ +--> item4
| |
+---> Item2_bis ---> item3_bis ---+

That way, you can detect your ambiguity from the start.  So far, so
good.  But there's a catch.  Sometimes, Sometimes, you'll get
something like this:

+--- Item2 <--- item3
|
   Item1 <--+
|
+--- Item2_bis <--- item3_bis

item3 and item3_bis are not in the same state set (if they were, they
would be the same Earley item).  So, no real ambiguity there.  But if
you naively reverse the pointers:

+---> Item2 ---> item3
|
   Item1 ---+
|
+---> Item2_bis ---> item3_bis

You will see an ambiguity that is not there.  Actually you need *two*
Item1.  One that ends where item3 is, S(k) and one that ends where
item3_bis ends, S(l).  Like this:

   Item1 > Item2 ---> item3

   Item1_bis ---> Item2_bis ---> item3_bis

Reversing pointers is not enough.  You have to reorganise everything.
Doing this reorganisation properly is not trivial when you have to
deal with all the Earley items, and all the back pointers.  Limiting
yourself to completed states, is such a breeze that it's probably
worth the harder search that will follow.

Loup.
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc


Re: [fonc] Earley Parsing Explained (incomplete first draft)

2014-09-21 Thread Josh Grams
On 2014-09-20 07:59PM, Loup Vaillant-David wrote:
>By default, Earley recognisers store their Earley items in such a way
>that reconstruction (or back pointer following) happens from right to
>left.  Which means the first ambiguities will be detected at the *end*
>of the parse.

But a depth-first traversal turns that around.

>In parsing expression grammars, it's very intuitive.  Imagine you
>parse this rule:
>
>  A -> B C D
>
>So you parse B, then C, then D (they are written in that order, after
>all).  If you detect an ambiguity in B (or anything below it),
>prioritised choice will take care of it.  Now imagine there are two
>ways to parse the following input:
>
>   _B_ _C_ _D_-- first  possibility
>  /   \   /   \   /   \
>  a   b   c   d   e   f   -- input
>  \___/   |   \___/
>  B   C D -- second possibility

The left back pointers move back through the rule. So a pre-order
traversal will go like this (I'm using tildes because it was
annoying to copy/paste the bullet):

A -> B C D ~  -- at this step you apply A => B C D.
A -> B C ~ D
A -> B ~ C D
A -> ~ B C D
No more left pointers, so return back up to A -> B ~ C D.
Follow its right pointer to B ... ambiguous! So choose between:

B -> a b ~
B -> a b c ~

Then return back up to A -> B C ~ D and follow its right pointer
to deal with C, etc.

--Josh
___
fonc mailing list
fonc@vpri.org
http://vpri.org/mailman/listinfo/fonc