So this is a bigger email than I realized and so I decided to make a
new thread of it.

----------------------

The semantics of explicit tail-calls in a production are somewhat more
complicated than I originally thought with respect to backtracking and
cascading. For the time being I have settled on doing an optimized
raise children '^' operation by not giving tail-called production
rules new parse trees but instead giving them their caller's parse
tree pointer to add their children to, thus eliminating any cascaded
and inefficient raise tree operators.

Here are some interesting things that I have been thinking of as I
continue to work on my TPDL parser with the extended AST-building
operators -, ^, and the automatic raise children rule.

Suppose the following notation is allowed: in a production rule
phrase, a '[' ... ']' can appear which signifies an inline definition
of an anonymous production rule. An anonymous production rule can
reference itself with $self. The effect of using an anonymous
production rule is to make a specially named production rule in the
grammar and call it in place of the anonymous rule.

Also imagine a simpler macro system to C's preprocessor being used to
add more semantic meaning to the TPDL. The effect of calling a macro
is a direct text insertion into the grammar. Macros would be
implemented by means of a pre-processor.

Note: the two above additions to the grammar language are syntactic
sugar and provide no more power as they simply represent text
insertion into a grammar.

Finally, I would add two things from Prolog: (fail)
meta-production-rule to the grammar language that always fails and the
cut '!' operator.

----------------------

Syntax and their meaning:

juxtaposition/sequencing of symbols represents a conjunction of rules,
in this context: concatenation of terminals and non-terminals.

! cut meta-production rule, after a cut, the production will commit to
matching the current phrase and if it fails then the production rule
fails and a cascading backtrack ensues. That is, no local backtracking
within the application of the production rule occurs once a cut
operator has been seen in the current phrase.

(fail), fail meta-production rule, automatically fails to match.

:, ordered choice, i.e. disjunction.

-, non-excludable prefix operator, require a symbol (terminal or
non-terminal) to appear as a node in the parse tree.

^, raise children prefix operator, raise branches of production rule
into place of production rule in the current phrase in the parse tree.
Raise children and non-excludable operator are mutually exclusive.

<>, epsilon, match the empty string. This has an equivalent meaning to
'succeed'.

<...>, terminal symbol, forgive me for not adding in string matching
support into my parser just yet, I'm bootstrapping the parser from its
simplest form at the moment.

Foo, non-terminal symbol, represents the application of a production
rule to the current position in the token stream

[ ... ], anonymous production rule

[ ... $self ... ], self-referencing anonymous production rule

Foo(bar) = ..., macro definition

Foo(...), macro call/expansion point

Finally, terminals that do not have the non-excludable operator
applied to them are not inserted into the parse tree. Non-terminals
without the non-excludable and raise branch operator and with only one
branch (child) are automatically replaced by their branch.

----------------------

We can now reason about PEG grammars in terms of our extended
Prolog-like TPDL. The following macros can be defined:

KleeneClosure(x) = ^[x ^$self : <>]
PositiveClosure(x) = x KleeneClosure(x)
Optional(x) = [x : <>]
NotFollowedBy(x) = [x ! (fail) : <>]
FollowedBy(x) = NotFollowedBy(NotFollowedBy(x))

This is cool! What can we say about this?

First, ignoring the convenient macro and anonymous production rule
syntax, the only additions to TPDL that need to be made to make it as
powerful as PEG is the cut '!' operator and the (fail) meta-production
rule. These operators are specifically useful to implement the
NotFollowedBy and FollowedBy operators for PEGs as they are special
insofar as they don't consume anything, or put another way: they
backtrack over what they consumed.

Second, in order to properly construct parse trees as PEGs do, the -,
^, operators and auto-raise rules are required in order to make lists
of things. I would argue that these two operators and the rule
actually give one more power than basic PEGs provide in terms of basic
parse tree building. This argument, however, follows from the fact
that at the moment I do not provide any rules to map the parse tree to
the AST and instead work with a reduced parse tree as if it were an
AST.

Finally, the macros and anonymous production rules provide the
syntactic sugar to make everything come together in a nice way without
actually adding any complexity to the core of the parser (although
their complexity is seen in the form of a pre-processor for a
grammar). Again, this is given that my parser is an independent
function+data structure that operators on a grammar data structure and
so what actually performs the parsing is self-contained and distinct
from the grammar and scanner.

Thoughts? I really like the idea of adding the Prolog-like operators
to the parser.

Best Regards,

Peter Goodman,
http://ioreader.com
70 Winston Circle,
Montreal, Quebec
H9S 4X6

_______________________________________________
PEG mailing list
PEG@lists.csail.mit.edu
https://lists.csail.mit.edu/mailman/listinfo/peg

Reply via email to