[Python-Dev] Re: problem with Distributed File System Replication and Namespacing and different versions of Python 3

2022-10-26 Thread Steve Holden
I don't remember it being mentioned, but much of the traffic recently
migrated from this list to https://discuss.python.org/c/core-dev/23, which
you may wish to keep in touch with.

Kind regards,
Steve


On Tue, Oct 25, 2022 at 7:53 AM Juan Cristóbal Quesada <
[email protected]> wrote:

> Hi Steve,
> thanks! Will definitely have a look at it as soon as i can.
>
> Many thanks to all of you that replied. It was my first post in such
> python mailing lists and wasnt sure how accurate of a response i could
> have. You never know how active the mailing lists/forums are.
>
> Best Regards,
> JC
> ___
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/EJK5AFATVF6D44TVQRGSIXFHWTUDE6IS/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/HMEDAUC6O7NFVVMLEE3OLAVPPH6H67FV/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Possible bug in `re` module in Python 3.11?

2022-10-26 Thread Piotr Waszkiewicz
Hi, 
I would like to ask your guidance as I'm entirely sure whether the problem I'm 
experiencing should be posted in CPython's repo as a bug issue.
I've tried using newly released Python 3.11 interpreter in some of my projects 
and one of them failed to start with "RuntimeError: invalid SRE code" error.

Looking into implementation of one of the dependencies I've found out that the 
issue has been found out by a maintainer and fixed 
(https://github.com/pydicom/pydicom/issues/1658).
It looks like this particular regexp caused the `re.compile()` method to raise:

```
   re.compile(
r"(?P^([01][0-9]|2[0-3]))"
r"((?P([0-5][0-9]))?"
r"(?(5)(?P([0-5][0-9]|60))?)"
r"(?(7)(\.(?P([0-9]{1,6})?))?))$"
)
```

I've checked and this hasn't been an issue in all previous Python interpreter 
versions, starting from 3.6 (the oldest I've checked).
What's more the regex i correctly recognized and does not cause any issues in 
other regexp implementations, e.g. the online tool https://regex101.com/

Could somebody help me decide whether this is indeed a bug?

Best regards,
Piotr Waszkiewicz
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/NWXOPK5C4KTIHNVXVHSNHVFADTJCIE6N/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Possible bug in `re` module in Python 3.11?

2022-10-26 Thread MRAB

On 2022-10-26 09:17, Piotr Waszkiewicz wrote:

Hi,
I would like to ask your guidance as I'm entirely sure whether the problem I'm 
experiencing should be posted in CPython's repo as a bug issue.
I've tried using newly released Python 3.11 interpreter in some of my projects and one of 
them failed to start with "RuntimeError: invalid SRE code" error.

Looking into implementation of one of the dependencies I've found out that the 
issue has been found out by a maintainer and fixed 
(https://github.com/pydicom/pydicom/issues/1658).
It looks like this particular regexp caused the `re.compile()` method to raise:

```
re.compile(
 r"(?P^([01][0-9]|2[0-3]))"
 r"((?P([0-5][0-9]))?"
 r"(?(5)(?P([0-5][0-9]|60))?)"
 r"(?(7)(\.(?P([0-9]{1,6})?))?))$"
 )
```

I've checked and this hasn't been an issue in all previous Python interpreter 
versions, starting from 3.6 (the oldest I've checked).
What's more the regex i correctly recognized and does not cause any issues in 
other regexp implementations, e.g. the online tool https://regex101.com/

Could somebody help me decide whether this is indeed a bug?


It's definitely a bug.

If it was complaining about the pattern then it might or might not be a 
bug, you'd need to check more closely, but "invalid SRE code" is 
definitely a bug.

___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/AKNT5OED4FQ4WLETVA3TO4JL35OGXS6M/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Possible bug in `re` module in Python 3.11?

2022-10-26 Thread Serhiy Storchaka

26.10.22 11:17, Piotr Waszkiewicz пише:

Hi,
I would like to ask your guidance as I'm entirely sure whether the problem I'm 
experiencing should be posted in CPython's repo as a bug issue.
I've tried using newly released Python 3.11 interpreter in some of my projects and one of 
them failed to start with "RuntimeError: invalid SRE code" error.

Looking into implementation of one of the dependencies I've found out that the 
issue has been found out by a maintainer and fixed 
(https://github.com/pydicom/pydicom/issues/1658).
It looks like this particular regexp caused the `re.compile()` method to raise:

```
re.compile(
 r"(?P^([01][0-9]|2[0-3]))"
 r"((?P([0-5][0-9]))?"
 r"(?(5)(?P([0-5][0-9]|60))?)"
 r"(?(7)(\.(?P([0-9]{1,6})?))?))$"
 )
```

I've checked and this hasn't been an issue in all previous Python interpreter 
versions, starting from 3.6 (the oldest I've checked).
What's more the regex i correctly recognized and does not cause any issues in 
other regexp implementations, e.g. the online tool https://regex101.com/

Could somebody help me decide whether this is indeed a bug?


Yes, it is a bug, and I have found its cause. Please open an issue on 
the CPython bug tracker.




___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/GQP6P2CLVX4EOND73JYOXWNCIA3HIERW/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread David J W
I am writing a Rust version of Python for fun and I am at the parser stage
of development.

I copied and modified a PEG grammar ruleset from another open source
project and I've already noticed some problems (ex Newline vs NL) with how
they transcribed things.

I am suspecting that CPython's grammar NEWLINE is a builtin rule for the
parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
sanity check if that is right before I figure out how to hack in a NEWLINE
rule and update my grammar ruleset.
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Pablo Galindo Salgado
Hi,

I am not sure I understand exactly what you are asking but NEWLINE is a token, 
not a parser rule. What decides when NEWLINE is emitted is the lexer that has 
nothing to do with PEG. Normally PEG parsers also acts as tokenizers but the 
one in cpython does not.

Also notice that CPython’s parser uses a version of the tokeniser written in C 
that doesn’t share code with the exposed version. You will find that the 
tokenizer module in the standard library actually behaves differently regarding 
what tokens are emitted in new lines and indentations.

The only way to be sure is check the code unfortunately.

Hope this helps.

Regards from rainy London,
Pablo Galindo Salgado

> On 26 Oct 2022, at 19:12, David J W  wrote:
> 
> 
> I am writing a Rust version of Python for fun and I am at the parser stage of 
> development.
> 
> I copied and modified a PEG grammar ruleset from another open source project 
> and I've already noticed some problems (ex Newline vs NL) with how they 
> transcribed things.
> 
> I am suspecting that CPython's grammar NEWLINE is a builtin rule for the 
> parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to sanity 
> check if that is right before I figure out how to hack in a NEWLINE rule and 
> update my grammar ruleset.
> ___
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at 
> https://mail.python.org/archives/list/[email protected]/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
> Code of Conduct: http://python.org/psf/codeofconduct/
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/YWDKMMKQJN5UY44ONDGF6VD24M7H7HYB/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Pablo Galindo Salgado
Hi,

As I mentioned, NEWLINE is a token. All uppercase words in the grammar are
tokens and therefore are produced by the lexer, not the parser. Is not a
built-in rule. In particular, that token is produced here:

https://github.com/python/cpython/blob/6777e09166fc384ea0a4b50202c7b0bd7a23330c/Parser/tokenizer.c#L1773


On Wed, 26 Oct 2022 at 20:59, David J W  wrote:

> Pablo,
> Nl and Newline are tokens but I am interested in NEWLINE's behavior in
> the Python grammar, note the casing.
>
> For example in simple_stmts @
> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107
>
> Is that NEWLINE some sort of built in rule to the grammar?   In my project
> I am running into problems where the parser crashes any time there is some
> double like NL & N or Newline & NL but I want to nail down NEWLINE's
> behavior in CPython's PEG grammar.
>
> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <
> [email protected]> wrote:
>
>> Hi,
>>
>> I am not sure I understand exactly what you are asking but NEWLINE is a
>> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
>> that has nothing to do with PEG. Normally PEG parsers also acts as
>> tokenizers but the one in cpython does not.
>>
>> Also notice that CPython’s parser uses a version of the tokeniser written
>> in C that doesn’t share code with the exposed version. You will find that
>> the tokenizer module in the standard library actually behaves differently
>> regarding what tokens are emitted in new lines and indentations.
>>
>> The only way to be sure is check the code unfortunately.
>>
>> Hope this helps.
>>
>> Regards from rainy London,
>> Pablo Galindo Salgado
>>
>> > On 26 Oct 2022, at 19:12, David J W  wrote:
>> >
>> > 
>> > I am writing a Rust version of Python for fun and I am at the parser
>> stage of development.
>> >
>> > I copied and modified a PEG grammar ruleset from another open source
>> project and I've already noticed some problems (ex Newline vs NL) with how
>> they transcribed things.
>> >
>> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for
>> the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
>> sanity check if that is right before I figure out how to hack in a NEWLINE
>> rule and update my grammar ruleset.
>> > ___
>> > Python-Dev mailing list -- [email protected]
>> > To unsubscribe send an email to [email protected]
>> > https://mail.python.org/mailman3/lists/python-dev.python.org/
>> > Message archived at
>> https://mail.python.org/archives/list/[email protected]/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>
>
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/5ZV7BZOYHW3DELYIB4GKRWHUNTYW3V4K/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Thank you for your contributions to Python 3.11!

2022-10-26 Thread Pablo Galindo Salgado
Hi everyone,

Now that the 3.11.0 release is finally done and I can relax a bit, I just
wanted to thank you all
for your fantastic work that has made Python 3.11 such a fantastic release.
No matter if you committed
code to 3.11 or opened a bug, helped with the documentation, reviewed pull
requests, participated in
discussions, made a PEP or help writing one, fixed a bug or one hundred or
make optimizations to the
interpreter or any other of the many ways to contribute. Your work makes a
huge difference and Python
3.11 is much better because of that :)

Also, I want to especially thank all core devs and contributors that have
helped me and the release team take
care of release blockers, buildbot failures, CVE patches, and any other
form of release crisis. Thank you!

Finally, a huge thanks to my colleagues in the release team that make these
releases possible and help to make
sure that my mistakes are not too obvious to end users :P

Being your release manager for 3.11 and 3.10 has been a privilege and an
honor (and it will continue for a couple
of years of bugfixes and security releases, I'm not going anywhere).

Regards from rainy London,
Pablo Galindo Salgado
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/TKNPRJI7QXHC73EKKTN2N5PMEX26POIP/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread David J W
Pablo,
Nl and Newline are tokens but I am interested in NEWLINE's behavior in
the Python grammar, note the casing.

For example in simple_stmts @
https://github.com/python/cpython/blob/main/Grammar/python.gram#L107

Is that NEWLINE some sort of built in rule to the grammar?   In my project
I am running into problems where the parser crashes any time there is some
double like NL & N or Newline & NL but I want to nail down NEWLINE's
behavior in CPython's PEG grammar.

On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado 
wrote:

> Hi,
>
> I am not sure I understand exactly what you are asking but NEWLINE is a
> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
> that has nothing to do with PEG. Normally PEG parsers also acts as
> tokenizers but the one in cpython does not.
>
> Also notice that CPython’s parser uses a version of the tokeniser written
> in C that doesn’t share code with the exposed version. You will find that
> the tokenizer module in the standard library actually behaves differently
> regarding what tokens are emitted in new lines and indentations.
>
> The only way to be sure is check the code unfortunately.
>
> Hope this helps.
>
> Regards from rainy London,
> Pablo Galindo Salgado
>
> > On 26 Oct 2022, at 19:12, David J W  wrote:
> >
> > 
> > I am writing a Rust version of Python for fun and I am at the parser
> stage of development.
> >
> > I copied and modified a PEG grammar ruleset from another open source
> project and I've already noticed some problems (ex Newline vs NL) with how
> they transcribed things.
> >
> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for the
> parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
> sanity check if that is right before I figure out how to hack in a NEWLINE
> rule and update my grammar ruleset.
> > ___
> > Python-Dev mailing list -- [email protected]
> > To unsubscribe send an email to [email protected]
> > https://mail.python.org/mailman3/lists/python-dev.python.org/
> > Message archived at
> https://mail.python.org/archives/list/[email protected]/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
> > Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/LTDXZ4DS2GLICZRWYZ5PVLPBJHVGQPSS/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Guido van Rossum
I wonder if David may be struggling with the rule that a newline is
significant in the grammar unless it appears inside matching
brackets/parentheses/braces? I think that's in the lexer. Similarly,
multiple newlines are collapsed.

On Wed, Oct 26, 2022 at 1:19 PM Pablo Galindo Salgado 
wrote:

> Hi,
>
> As I mentioned, NEWLINE is a token. All uppercase words in the grammar are
> tokens and therefore are produced by the lexer, not the parser. Is not a
> built-in rule. In particular, that token is produced here:
>
>
> https://github.com/python/cpython/blob/6777e09166fc384ea0a4b50202c7b0bd7a23330c/Parser/tokenizer.c#L1773
>
>
> On Wed, 26 Oct 2022 at 20:59, David J W  wrote:
>
>> Pablo,
>> Nl and Newline are tokens but I am interested in NEWLINE's behavior
>> in the Python grammar, note the casing.
>>
>> For example in simple_stmts @
>> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107
>>
>> Is that NEWLINE some sort of built in rule to the grammar?   In my
>> project I am running into problems where the parser crashes any time there
>> is some double like NL & N or Newline & NL but I want to nail down
>> NEWLINE's behavior in CPython's PEG grammar.
>>
>> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> I am not sure I understand exactly what you are asking but NEWLINE is a
>>> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
>>> that has nothing to do with PEG. Normally PEG parsers also acts as
>>> tokenizers but the one in cpython does not.
>>>
>>> Also notice that CPython’s parser uses a version of the tokeniser
>>> written in C that doesn’t share code with the exposed version. You will
>>> find that the tokenizer module in the standard library actually behaves
>>> differently regarding what tokens are emitted in new lines and indentations.
>>>
>>> The only way to be sure is check the code unfortunately.
>>>
>>> Hope this helps.
>>>
>>> Regards from rainy London,
>>> Pablo Galindo Salgado
>>>
>>> > On 26 Oct 2022, at 19:12, David J W  wrote:
>>> >
>>> > 
>>> > I am writing a Rust version of Python for fun and I am at the parser
>>> stage of development.
>>> >
>>> > I copied and modified a PEG grammar ruleset from another open source
>>> project and I've already noticed some problems (ex Newline vs NL) with how
>>> they transcribed things.
>>> >
>>> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for
>>> the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
>>> sanity check if that is right before I figure out how to hack in a NEWLINE
>>> rule and update my grammar ruleset.
>>> > ___
>>> > Python-Dev mailing list -- [email protected]
>>> > To unsubscribe send an email to [email protected]
>>> > https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> > Message archived at
>>> https://mail.python.org/archives/list/[email protected]/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
>>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> ___
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/5ZV7BZOYHW3DELYIB4GKRWHUNTYW3V4K/
> Code of Conduct: http://python.org/psf/codeofconduct/
>


-- 
--Guido van Rossum (python.org/~guido)
*Pronouns: he/him **(why is my pronoun here?)*

___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/MD2THJ5BIBDSOB7HVFDPBUNCW76H5N3S/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: Possible bug in `re` module in Python 3.11?

2022-10-26 Thread Piotr Waszkiewicz
Thank you very much for your input, I've filled in the bug report:
https://github.com/python/cpython/issues/98740

Best regards,
Piotr

On Wed, Oct 26, 2022 at 6:55 PM Serhiy Storchaka 
wrote:

> 26.10.22 11:17, Piotr Waszkiewicz пише:
> > Hi,
> > I would like to ask your guidance as I'm entirely sure whether the
> problem I'm experiencing should be posted in CPython's repo as a bug issue.
> > I've tried using newly released Python 3.11 interpreter in some of my
> projects and one of them failed to start with "RuntimeError: invalid SRE
> code" error.
> >
> > Looking into implementation of one of the dependencies I've found out
> that the issue has been found out by a maintainer and fixed (
> https://github.com/pydicom/pydicom/issues/1658).
> > It looks like this particular regexp caused the `re.compile()` method to
> raise:
> >
> > ```
> > re.compile(
> >  r"(?P^([01][0-9]|2[0-3]))"
> >  r"((?P([0-5][0-9]))?"
> >  r"(?(5)(?P([0-5][0-9]|60))?)"
> >  r"(?(7)(\.(?P([0-9]{1,6})?))?))$"
> >  )
> > ```
> >
> > I've checked and this hasn't been an issue in all previous Python
> interpreter versions, starting from 3.6 (the oldest I've checked).
> > What's more the regex i correctly recognized and does not cause any
> issues in other regexp implementations, e.g. the online tool
> https://regex101.com/
> >
> > Could somebody help me decide whether this is indeed a bug?
>
> Yes, it is a bug, and I have found its cause. Please open an issue on
> the CPython bug tracker.
>
>
>
> ___
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/GQP6P2CLVX4EOND73JYOXWNCIA3HIERW/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/IR2RQENMX2LAG4X3TQCY7L6XFAFXIVYF/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Pablo Galindo Salgado
Hummm… he is also mentioning NL and Newline tokens and if I recall correctly those are tokens that only appear in the Python tokenizer and are emitted differently from the C one (and therefore they are not used in the grammar).Pablo Galindo SalgadoOn 26 Oct 2022, at 21:57, Guido van Rossum  wrote:I wonder if David may be struggling with the rule that a newline is significant in the grammar unless it appears inside matching brackets/parentheses/braces? I think that's in the lexer. Similarly, multiple newlines are collapsed.On Wed, Oct 26, 2022 at 1:19 PM Pablo Galindo Salgado  wrote:Hi,As I mentioned, NEWLINE is a token. All uppercase words in the grammar are tokens and therefore are produced by the lexer, not the parser. Is not a built-in rule. In particular, that token is produced here:https://github.com/python/cpython/blob/6777e09166fc384ea0a4b50202c7b0bd7a23330c/Parser/tokenizer.c#L1773On Wed, 26 Oct 2022 at 20:59, David J W  wrote:Pablo,    Nl and Newline are tokens but I am interested in NEWLINE's behavior in the Python grammar, note the casing.For example in simple_stmts @ https://github.com/python/cpython/blob/main/Grammar/python.gram#L107Is that NEWLINE some sort of built in rule to the grammar?   In my project I am running into problems where the parser crashes any time there is some double like NL & N or Newline & NL but I want to nail down NEWLINE's behavior in CPython's PEG grammar.On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado  wrote:Hi,

I am not sure I understand exactly what you are asking but NEWLINE is a token, not a parser rule. What decides when NEWLINE is emitted is the lexer that has nothing to do with PEG. Normally PEG parsers also acts as tokenizers but the one in cpython does not.

Also notice that CPython’s parser uses a version of the tokeniser written in C that doesn’t share code with the exposed version. You will find that the tokenizer module in the standard library actually behaves differently regarding what tokens are emitted in new lines and indentations.

The only way to be sure is check the code unfortunately.

Hope this helps.

Regards from rainy London,
Pablo Galindo Salgado

> On 26 Oct 2022, at 19:12, David J W  wrote:
> 
> 
> I am writing a Rust version of Python for fun and I am at the parser stage of development.
> 
> I copied and modified a PEG grammar ruleset from another open source project and I've already noticed some problems (ex Newline vs NL) with how they transcribed things.
> 
> I am suspecting that CPython's grammar NEWLINE is a builtin rule for the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to sanity check if that is right before I figure out how to hack in a NEWLINE rule and update my grammar ruleset.
> ___
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at https://mail.python.org/archives/list/[email protected]/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
> Code of Conduct: http://python.org/psf/codeofconduct/


___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at https://mail.python.org/archives/list/[email protected]/message/5ZV7BZOYHW3DELYIB4GKRWHUNTYW3V4K/
Code of Conduct: http://python.org/psf/codeofconduct/
-- --Guido van Rossum (python.org/~guido)Pronouns: he/him (why is my pronoun here?)
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/KUXABSTZP33ZEXB74HS5262TGNFGBCP7/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Matthias Görgens
Hi David,

Could you share what you have so far, perhaps ok GitHub or so? That way
it's easier to diagnose your problems. I'm reasonably familiar with Rust.

Perhaps also add a minimal crashing example?

Cheers,
Matthias.

On Thu, 27 Oct 2022, 04:52 David J W,  wrote:

> Pablo,
> Nl and Newline are tokens but I am interested in NEWLINE's behavior in
> the Python grammar, note the casing.
>
> For example in simple_stmts @
> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107
>
> Is that NEWLINE some sort of built in rule to the grammar?   In my project
> I am running into problems where the parser crashes any time there is some
> double like NL & N or Newline & NL but I want to nail down NEWLINE's
> behavior in CPython's PEG grammar.
>
> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <
> [email protected]> wrote:
>
>> Hi,
>>
>> I am not sure I understand exactly what you are asking but NEWLINE is a
>> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
>> that has nothing to do with PEG. Normally PEG parsers also acts as
>> tokenizers but the one in cpython does not.
>>
>> Also notice that CPython’s parser uses a version of the tokeniser written
>> in C that doesn’t share code with the exposed version. You will find that
>> the tokenizer module in the standard library actually behaves differently
>> regarding what tokens are emitted in new lines and indentations.
>>
>> The only way to be sure is check the code unfortunately.
>>
>> Hope this helps.
>>
>> Regards from rainy London,
>> Pablo Galindo Salgado
>>
>> > On 26 Oct 2022, at 19:12, David J W  wrote:
>> >
>> > 
>> > I am writing a Rust version of Python for fun and I am at the parser
>> stage of development.
>> >
>> > I copied and modified a PEG grammar ruleset from another open source
>> project and I've already noticed some problems (ex Newline vs NL) with how
>> they transcribed things.
>> >
>> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for
>> the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
>> sanity check if that is right before I figure out how to hack in a NEWLINE
>> rule and update my grammar ruleset.
>> > ___
>> > Python-Dev mailing list -- [email protected]
>> > To unsubscribe send an email to [email protected]
>> > https://mail.python.org/mailman3/lists/python-dev.python.org/
>> > Message archived at
>> https://mail.python.org/archives/list/[email protected]/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/LTDXZ4DS2GLICZRWYZ5PVLPBJHVGQPSS/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/ZZDKWS62QG3BTNIT2NYRCLRI4VJ2HBF6/
Code of Conduct: http://python.org/psf/codeofconduct/


[Python-Dev] Re: NEWLINE sentinel behavior in CPython's PEG grammar

2022-10-26 Thread Matthieu Dartiailh
If you look at pegen, that uses the stdlib tokenizer as input, you will see
that the obejct us3d to implement memoization on top of a token stream
simply swallow NL (
https://github.com/we-like-parsers/pegen/blob/main/src/pegen/tokenizer.py#L49).
This is safe since NL has no syntactic meaning only NEWLINE does.

Best

Matthieu

On Thu, Oct 27, 2022, 01:59 Matthias Görgens 
wrote:

> Hi David,
>
> Could you share what you have so far, perhaps ok GitHub or so? That way
> it's easier to diagnose your problems. I'm reasonably familiar with Rust.
>
> Perhaps also add a minimal crashing example?
>
> Cheers,
> Matthias.
>
> On Thu, 27 Oct 2022, 04:52 David J W,  wrote:
>
>> Pablo,
>> Nl and Newline are tokens but I am interested in NEWLINE's behavior
>> in the Python grammar, note the casing.
>>
>> For example in simple_stmts @
>> https://github.com/python/cpython/blob/main/Grammar/python.gram#L107
>>
>> Is that NEWLINE some sort of built in rule to the grammar?   In my
>> project I am running into problems where the parser crashes any time there
>> is some double like NL & N or Newline & NL but I want to nail down
>> NEWLINE's behavior in CPython's PEG grammar.
>>
>> On Wed, Oct 26, 2022 at 12:51 PM Pablo Galindo Salgado <
>> [email protected]> wrote:
>>
>>> Hi,
>>>
>>> I am not sure I understand exactly what you are asking but NEWLINE is a
>>> token, not a parser rule. What decides when NEWLINE is emitted is the lexer
>>> that has nothing to do with PEG. Normally PEG parsers also acts as
>>> tokenizers but the one in cpython does not.
>>>
>>> Also notice that CPython’s parser uses a version of the tokeniser
>>> written in C that doesn’t share code with the exposed version. You will
>>> find that the tokenizer module in the standard library actually behaves
>>> differently regarding what tokens are emitted in new lines and indentations.
>>>
>>> The only way to be sure is check the code unfortunately.
>>>
>>> Hope this helps.
>>>
>>> Regards from rainy London,
>>> Pablo Galindo Salgado
>>>
>>> > On 26 Oct 2022, at 19:12, David J W  wrote:
>>> >
>>> > 
>>> > I am writing a Rust version of Python for fun and I am at the parser
>>> stage of development.
>>> >
>>> > I copied and modified a PEG grammar ruleset from another open source
>>> project and I've already noticed some problems (ex Newline vs NL) with how
>>> they transcribed things.
>>> >
>>> > I am suspecting that CPython's grammar NEWLINE is a builtin rule for
>>> the parser that is something like `(Newline+ | NL+ ) {NOP}` but wanted to
>>> sanity check if that is right before I figure out how to hack in a NEWLINE
>>> rule and update my grammar ruleset.
>>> > ___
>>> > Python-Dev mailing list -- [email protected]
>>> > To unsubscribe send an email to [email protected]
>>> > https://mail.python.org/mailman3/lists/python-dev.python.org/
>>> > Message archived at
>>> https://mail.python.org/archives/list/[email protected]/message/NMCMEDMEBKATYKRNZLX2NDGFOB5UHQ5A/
>>> > Code of Conduct: http://python.org/psf/codeofconduct/
>>>
>> ___
>> Python-Dev mailing list -- [email protected]
>> To unsubscribe send an email to [email protected]
>> https://mail.python.org/mailman3/lists/python-dev.python.org/
>> Message archived at
>> https://mail.python.org/archives/list/[email protected]/message/LTDXZ4DS2GLICZRWYZ5PVLPBJHVGQPSS/
>> Code of Conduct: http://python.org/psf/codeofconduct/
>>
> ___
> Python-Dev mailing list -- [email protected]
> To unsubscribe send an email to [email protected]
> https://mail.python.org/mailman3/lists/python-dev.python.org/
> Message archived at
> https://mail.python.org/archives/list/[email protected]/message/ZZDKWS62QG3BTNIT2NYRCLRI4VJ2HBF6/
> Code of Conduct: http://python.org/psf/codeofconduct/
>
___
Python-Dev mailing list -- [email protected]
To unsubscribe send an email to [email protected]
https://mail.python.org/mailman3/lists/python-dev.python.org/
Message archived at 
https://mail.python.org/archives/list/[email protected]/message/5SPCIOVE5TSZ2DRJT75NKEWQWAKQHKII/
Code of Conduct: http://python.org/psf/codeofconduct/