Hummm… he is also mentioning NL and Newline tokens and if I recall correctly those are tokens that only appear in the Python tokenizer and are emitted differently from the C one (and therefore they are not used in the grammar).

Pablo Galindo Salgado

On 26 Oct 2022, at 21:57, Guido van Rossum <gu...@python.org> wrote:


I wonder if David may be struggling with the rule that a newline is significant in the grammar unless it appears inside matching brackets/parentheses/braces? I think that's in the lexer. Similarly, multiple newlines are collapsed.

On Wed, Oct 26, 2022 at 1:19 PM Pablo Galindo Salgado <pablog...@gmail.com> wrote:
Hi,

As I mentioned, NEWLINE is a token. All uppercase words in the grammar are tokens and therefore are produced by the lexer, not the parser. Is not a built-in rule. In particular, that token is produced here:



On Wed, 26 Oct 2022 at 20:59, David J W <ward.dav...@gmail.com> wrote:
Pablo,
    Nl and Newline are tokens but I am interested in NEWLINE's behavior in the Python grammar, note the casing.


Is that NEWLINE some sort of built in rule to the grammar?   In my project I am running into problems where the parser crashes any time there is some double like NL & N or Newline & NL but I want to nail down NEWLINE's behavior in CPython's PEG grammar.

On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <pablog...@gmail.com> wrote:
Hi,

I am not sure I understand exactly what you are asking but NEWLINE is a token, not a parser rule. What decides when NEWLINE is emitted is the lexer that has nothing to do with PEG. Normally PEG parsers also acts as tokenizers but the one in cpython does not.

Also notice that CPython’s parser uses a version of the tokeniser written in C that doesn’t share code with the exposed version. You will find that the tokenizer module in the standard library actually behaves differently regarding what tokens are emitted in new lines and indentations.

The only way to be sure is check the code unfortunately.

Hope this helps.

Regards from rainy London,
Pablo Galindo Salgado

> On 26 Oct 2022, at 19:12, David J W <ward.dav...@gmail.com> wrote:
>
> 
> I am writing a Rust version of Python for fun and I am at the parser stage of development.
>
> I copied and modified a PEG grammar ruleset from another open source project and I've already noticed some problems (ex Newline vs NL) with how they transcribed things.
>
> I am suspecting that CPython's grammar NEWLINE is a builtin rule for the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to sanity check if that is right before I figure out how to hack in a NEWLINE rule and update my grammar ruleset.
> _______________________________________________
> Python-Dev mailing list -- python-dev@python.org
> To unsubscribe send an email to python-dev-le...@python.org
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
> Code of Conduct: http://python.org/psf/codeofconduct/
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/python-dev@python.org/message/5ZV7BZOYHW3DELYIB4GKRWHUNTYW3V4K/
Code of Conduct: http://python.org/psf/codeofconduct/


--
--Guido van Rossum (python.org/~guido)
_______________________________________________
Python-Dev mailing list -- python-dev@python.org
To unsubscribe send an email to python-dev-le...@python.org
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/python-dev@python.org/message/KUXABSTZP33ZEXB74HS5262TGNFGBCP7/
Code of Conduct: http://python.org/psf/codeofconduct/

Reply via email to