On Mon, 15 Jan 2024 at 18:56, Greg Ewing via Python-list
<python-list@python.org> wrote:
> On 15/01/24 1:28 am, Left Right wrote:
> > Python isn't a context-free language, so the grammar that is used to
> > describe it doesn't actually describe the language
> Very few languages have a formal grammar that *fully* describes
> the set of strings that constitute valid programs, including all
> the rules about things having to be declared, types matching up,
> etc. The only one I know of which attempted that is Algol 68,
> and it seems to be regarded as a technical success but a practical
> failure.
> > ... so, it's a "pretend grammar" that ignores indentation.
> Indentation isn't ignored, it appears in the grammar by means of
> INDENT and DEDENT lexical tokens.
> It's true that the meaning of these tokens is described informally
> elsewhere, but that's true of all the lexical features.

I've recently been doing a bit of work with grammar parsers, and to be
quite honest, the grammar is only about one third of the overall
parser.  There are three sections with roughly equal importance:

1. Tokenizer
2. Grammar
3. What to DO with that grammar (actions)

INDENT and DEDENT are being handled at the tokenizer stage, and so are
a lot of other rules like backslashes in quoted strings. On the flip
side, string prefixes (like b"...") seem to be handled in the third
phase, and the grammar actually doesn't concern itself with those

The grammar *can't* specify everything. If it did, it would have to
have rules for combining individual letters into a NAME and individual
characters into a string literal. The grammar would be completely
unreadable. (I tried, and even just building up a decimal literal in
that style was quite a pain.)


Reply via email to