> >Hey, Hey!  That would be great!  I'm translating the XML EBNF
> into the TGen
> >format (a pretty easy mapping).  From there, I'll specify additional TGen
> >stuff to create a more workable AST.  The XML spec doesn't describe XML
> >using a DTD.
>
> Right. But XML is the meta-langauge. Does the EBNF describe DTD or
> "well-formed" XML syntax? (I'm searching I'm searching I'm searching...)
>
> Whoops! Yes it does. Whoa. Cool!

Ok, a little while after writing that email, I realized DTDs are part of the
XML spec.  I've successfully translated about 75% of the productions into
TGen, however, I now have a "non-deterministic parser."  Can anyone comment
on the implications of a non-deterministic parser?  Compilers class is
turning into a fuzzy memory.

If I understand DTDs correctly, you can read an XML file, part of which
might contain a DTD (internally or externally described).  If so, the DTD
can be parsed, from which a description of the valid syntax for the
remainder of the XML file can determined.  It's kind of like a MIME type,
but with support for describing the well formed structure of the contents.

Thus, XHTML was devised to allow HTML like markup within the XML syntax.

I did some more digging and discovered a C library for XML parsing call
"expat."  It's used by Apache, Netscape, and some others.  Perhaps it might
be worthwhile to simply translate that parser into Squeak.  Or maybe just
call it (as a module).

So much to do.

- Stephen

Reply via email to