Hi Folks,
Lately I have been learning to create parsers using a parser tool called Flex &
Bison. I want to see how Flex & Bison parsers compare to DFDL parsers.
I learned that Flex & Bison parsers are built on solid theory:
The earliest parser back in the 1950s used utterly ad hoc techniques to analyze
the syntax of the source code of programs they were parsing. During the 1960s,
the field got a lot of academic attention and by the early 1970s, parsing was
no longer an arcane art. In the 1970s Aho, Ullman, Knuth, and many others put
parsing techniques solidly on their theoretical feet.
The book that I am reading said that one of the first techniques they (Aho,
Ullman, Knuth, and others) espoused was to separate lexing (aka scanning,
tokenizing) from parsing. Lexing built upon regular expressions, which built
upon Finite Automata (FA) theory and Nondeterministic Finite Automata (NFA)
theory. FA and NFA were brilliantly melded together with the famous Kleene
Theorem. Parsing built on top of a rich theory of grammars - Context Free
Grammars, Context Sensitive Grammars, etc. - that Chomsky formulated. Here's a
graphic I created depicting the foundation upon which Flex & Bison parsers are
built:
[cid:image001.png@01D79F15.E48107F0]
If we leave aside XML Schema which hosts DFDL, what theory underpins the set of
DFDL properties - separator, initiator, terminator, separatorPosition,
ignoreCase, lengthKind, etc.?
/Roger