[issue47152] Reorganize the re module sources

2022-04-05 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

See issue47211 for removing re.TEMPLATE.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Ma Lin


Ma Lin  added the comment:

> cryptic name

In very early versions, "mark" was called register/region.
https://github.com/python/cpython/blob/v1.0.1/Modules/regexpr.h#L48-L52

If span is accessed repeatedly, it's faster than Match.span().
Maybe consider renaming it, and make it as public attribute.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

> Match.regs is an undocumented attribute, it seems it has existed since 1991. 
Can it be removed?

It was kept for compatibility with the pre-SRE implementation of the re module. 
It was an implementation detail in the original Python code, but I am sure that 
somebody still uses it. I am sure some code still use it. If we are going to 
remove it, it needs to be deprecated first.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Matthew Barnett


Matthew Barnett  added the comment:

For reference, I also implemented .regs in the regex module for compatibility, 
but I've never used it myself. I had to do some investigating to find out what 
it did!

It returns a tuple of the spans of the groups.

Perhaps I might have used it if it didn't have such a cryptic name and/or was 
documented.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Ma Lin


Ma Lin  added the comment:

Match.regs is an undocumented attribute, it seems it has existed since 1991. 
Can it be removed?

https://github.com/python/cpython/blob/ff2cf1d7d5fb25224f3ff2e0c678d36f78e1f3cb/Modules/_sre/sre.c#L2871

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset ff2cf1d7d5fb25224f3ff2e0c678d36f78e1f3cb by Serhiy Storchaka in 
branch 'main':
bpo-47152: Remove unused import in re (GH-32298)
https://github.com/python/cpython/commit/ff2cf1d7d5fb25224f3ff2e0c678d36f78e1f3cb


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset 1578f06c1c69fbbb942b90bfbacd512784b599fa by Serhiy Storchaka in 
branch 'main':
bpo-47152: Move sources of the _sre module into a subdirectory (GH-32290)
https://github.com/python/cpython/commit/1578f06c1c69fbbb942b90bfbacd512784b599fa


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-04 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
pull_requests: +30357
pull_request: https://github.com/python/cpython/pull/32298

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-03 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
pull_requests: +30351
pull_request: https://github.com/python/cpython/pull/32290

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-03 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

There are two very different classes with similar names: _sre.SRE_Scanner and 
re.Scanner. The former is used to implement the Pattern.finditer() method, but 
it could be used in other cases. The latter is an experimental implementation 
of generalized lexer using the former class. Both are undocumented. It is 
difficult to document Pattern.scanner() and _sre.SRE_Scanner because the class 
name contains implementation-specific prefix, and without it it would conflict 
with re.Scanner.

But let leave it all to a separate issue.

The original discussion about TEMPLATE was lost. Initially it only affected 
repetition operators, but now using them with TEMPLATE is error.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread STINNER Victor


STINNER Victor  added the comment:

The re.template() function and the re.TEMPLATE functions are not documented and 
not tested.

The re.Scanner class is not documented but has a test_scanner() test in test_re.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread STINNER Victor


STINNER Victor  added the comment:

See also bpo-40259: "re.Scanner groups".

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread STINNER Victor


STINNER Victor  added the comment:

Old python-dev discussions on re.Scanner from 2000 to 2004:

* "[Python-Dev] A standard lexer?" (July 2000)
  
https://mail.python.org/archives/list/python-...@python.org/message/MQ4OMCVIVRJWNGHYGI3OUVZQPN5NNNAU/
  thread: 
https://mail.python.org/archives/list/python-...@python.org/thread/DLMYLYW3QRAAIZDEL3VA7M3TTUWMSPPB/#MQ4OMCVIVRJWNGHYGI3OUVZQPN5NNNAU

* "Scanner" (May 2001)
  
https://mail.python.org/archives/list/python-...@python.org/thread/7FGWHTFA2JT23TMVQXLGZLSKG7EGM44Q/#SVQBSSDWPYVHPRS363RWXWGKJTSEYQDP

* "iterator support for SRE?" (Oct 2001):
  
https://mail.python.org/archives/list/python-...@python.org/thread/IPJJX6MEW4ATOWHSRKLITL4CAZXDEJ5I/#IPJJX6MEW4ATOWHSRKLITL4CAZXDEJ5I

* "should sre.Scanner be exposed through re and documented?" (April 2003)
  
https://mail.python.org/archives/list/python-...@python.org/thread/BHVWYZVMDUJZIJMSSBAAXEH3JI7MTOIJ/#DDFDBY4D6OITPWO26Q5XPBFU7A5X6LXN

* "pre-PEP: Complete, Structured Regular Expression Group Matching" (Aug 2004)
  
https://mail.python.org/archives/list/python-...@python.org/thread/5M4YIZ2UFZF5AEWT3CGG74ZHERC6JV3B/#SNURCRGEYANPQVVQFZTY3LTXE2TFEKEP
  Search for "sre.Scanner".

  See also: "Using Regular Expressions for Lexical Analysis" (Feb 2002) by 
Fredrik Lundh
  
https://web.archive.org/web/20200220172033/http://effbot.org/zone/xml-scanner.htm

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

> Is the "import _locale" still used in re/__init__.py? It cannot see any 
> reference to it in the code and test_re still if it's removed.

It is true.

> *Maybe* it's time to consider that re.template() and re.Scanner are no longer 
> experimental? Maybe change their status to alpha or beta? :-D

First we need to find original discussions for these features (it may be not 
easy) and decide whether we want to finish them or remove.

> In `Modules` folder, there are _sre.c/sre.h/sre_constants.h/sre_lib.h files. 
> Will them be put into a folder?

It is step 2.

> would it be possible to expose `parse_template` -- or at least some way to 
> validate that a regex replacement string is correct prior to executing the 
> replacement?

Maybe, in some form. Currently you can precompile a pattern, but for a 
replacement string you rely on a LRU cache. It is slower, and limited by the 
fixed size of the cache. I think it would be worth to add a function for 
compiling a replacement string. sub() etc should accept both string and a 
precompiled template object. It is a separate issue.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread Anthony Sottile


Anthony Sottile  added the comment:

would it be possible to expose `parse_template` -- or at least some way to 
validate that a regex replacement string is correct prior to executing the 
replacement?

I'm currently using that for my text editor: 
https://github.com/asottile/babi/blob/d37d7d698d560aef7c6a0d1ec0668672e039bd9a/babi/screen.py#L501

--
nosy: +Anthony Sottile

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread Ma Lin


Ma Lin  added the comment:

In `Modules` folder, there are _sre.c/sre.h/sre_constants.h/sre_lib.h files. 
Will them be put into a folder?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread STINNER Victor


STINNER Victor  added the comment:

It's funny to still see mentions of "experimental stuff" in Python 3.11 (2022), 
whereas these "experimental stuff" are there for 20 years.

*Maybe* it's time to consider that re.template() and re.Scanner are no longer 
experimental? Maybe change their status to alpha or beta? :-D


commit 770617b23e286f1147f9480b5f625e88e7badd50
Author: Fredrik Lundh 
Date:   Sun Jan 14 15:06:11 2001 +

SRE fixes for 2.1 alpha:

+# sre extensions (experimental, don't rely on these)
+T = TEMPLATE = sre_compile.SRE_FLAG_TEMPLATE # disable backtracking


commit 7cafe4d7e466996d5fc32e871fe834e0e0c94282
Author: Fredrik Lundh 
Date:   Sun Jul 2 17:33:27 2000 +

- actually enabled charset anchors in the engine (still not
  used by the code generator)

- changed max repeat value in engine (to match earlier array fix)

- added experimental "which part matched?" mechanism to sre; see
  http://hem.passagen.se/eff/2000_07_01_bot-archive.htm#416954
  or python-dev for details.


+# experimental stuff (see python-dev discussions for details)
+
+class Scanner:
(...)

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread STINNER Victor


STINNER Victor  added the comment:

Is the "import _locale" still used in re/__init__.py? It cannot see any 
reference to it in the code and test_re still if it's removed.

The last reference to the _locale module has been removed in 2017 by the commit 
898ff03e1e7925ecde3da66327d3cdc7e07625ba.

diff --git a/Lib/re/__init__.py b/Lib/re/__init__.py
index c47a2650e3..b887722bbb 100644
--- a/Lib/re/__init__.py
+++ b/Lib/re/__init__.py
@@ -124,10 +124,6 @@
 import enum
 from . import _compiler, _parser
 import functools
-try:
-import _locale
-except ImportError:
-_locale = None
 
 
 # public symbols

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread STINNER Victor


STINNER Victor  added the comment:

$ ls Lib/re/
_compiler.py  _constants.py  __init__.py  _parser.py

Thanks, that's a nice enhancement!

Serhiy: Would you mind to explicitly document the 3 deprecated modules in 
What's New in Python 3.11?
https://docs.python.org/dev/whatsnew/3.11.html#deprecated

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-02 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:


New changeset 1be3260a90f16aae334d993aecf7b70426f98013 by Serhiy Storchaka in 
branch 'main':
bpo-47152: Convert the re module into a package (GH-32177)
https://github.com/python/cpython/commit/1be3260a90f16aae334d993aecf7b70426f98013


--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-01 Thread Guido van Rossum


Guido van Rossum  added the comment:

1. If we're reorganizing anyway, I see no reason to keep the old names.
2. For maximum backwards compatibility, I'd say keep as much as you can, as 
long as keeping it won't interfere with the reorganization.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-01 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

Modules with old names are kept (deprecated). The questions are:

1. Should we keep the sre_ prefix in new submodules? Should we prefix them with 
underscores?
2. Should we keep only non-underscored names in the sre_* modules or undescored 
names too?

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-01 Thread Guido van Rossum


Guido van Rossum  added the comment:

I don't mind reorganizing this, but I would insist that we keep code using old 
undocumented things (like the sre_* modules) working for several releases, 
using the standard deprecation approach.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-04-01 Thread STINNER Victor


STINNER Victor  added the comment:

sre_constants, sre_compile and sre_parse are not tested and are not documented. 
I don't consider them as public API currently.

If someone has good reason to use them, IMO we must clearly define which exact 
API is needed, properly document and test it.

If we expose something, I don't think that the API would be exposed as 
re.sre_xxx.xxx, but as re.xxx. 

I suggest to hide sre_xxx submodules by adding an underscore to their name. 
Moreover, the "sre_" prefix is now redundant. I suggest renaming:

* sre_constants => re._constants
* sre_compile => re._compile
* sre_parse => re._parse

--
nosy: +vstinner

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-03-30 Thread Ma Lin


Change by Ma Lin :


--
pull_requests: +30266
pull_request: https://github.com/python/cpython/pull/32188

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-03-30 Thread Serhiy Storchaka


Serhiy Storchaka  added the comment:

It turns out that pip uses sre_constants in its copy of pyparsing. The problem 
is already fixed in the upstream of pyparsing and soon should be fixed in pip. 
We still need to keep sre_constants and maybe other sre_* modules, but 
deprecate them.

> Could the sre_parse and sre_constants modules be kept with public names (i.e. 
> without the leading underscore) but within the re namespace?

It is a good idea which will allow to minimize breakage in short term. You can 
write "from re import sre_parse", and it would work in old and new versions 
because sre_parse and sre_compile were imported in the re module. This trick 
does not work with sre_constants, you still need try/except.

But the code that depends on these modules is fragile and can be broken by 
other ways.

> Please don't merge too close to the 3.11 beta1 release date, I'll submit PRs 
> after this merged.

I am going to implement step 2 only after merging your changes for issue23689.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-03-29 Thread Ma Lin


Ma Lin  added the comment:

Please don't merge too close to the 3.11 beta1 release date, I'll submit PRs 
after this merged.

--

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-03-29 Thread Dominic Davis-Foster


Dominic Davis-Foster  added the comment:

Could the sre_parse and sre_constants modules be kept with public names (i.e. 
without the leading underscore) but within the re namespace? I use them to 
tokenize and then syntax highlight regular expressions.

I did a quick search and found a few other users of the modules:

* pydoctor uses them for regex syntax highlighting[1], although it has its own 
copy of the sre_parse source rather than importing from stdlib.
* lark uses sre_parse to find minimum and maximum length of matching strings[2]
* sre_yield uses them to determine all strings that will match a regex[3]

The whole modules don't necessarily need exposing, but certainly 
sre_parse.parse, sre_parse.parse_template, and the opcodes from sre_constants 
would be the most useful.


[1] 
https://github.com/twisted/pydoctor/blob/c86273dffade5455890570142c8b7b068f5dffd1/pydoctor/epydoc/markup/_pyval_repr.py#L776
[2] 
https://github.com/lark-parser/lark/blob/85ea92ebf4e983e9997f9953a9c1463bb3d1c6cc/lark/utils.py#L120
[3] 
https://github.com/google/sre_yield/blob/3af063a0054c4646608b43b941fbfcbe4e01214a/sre_yield/__init__.py

--
nosy: +dom1310df

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-03-29 Thread Serhiy Storchaka


Change by Serhiy Storchaka :


--
keywords: +patch
pull_requests: +30255
stage:  -> patch review
pull_request: https://github.com/python/cpython/pull/32177

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com



[issue47152] Reorganize the re module sources

2022-03-29 Thread Serhiy Storchaka


New submission from Serhiy Storchaka :

I proposed it several years ago on the Python-Dev mailing list and that change 
was approved in general. The reorganization was deferred because there were 
several known bugs in the RE engine (fixes for which could potentially be 
backported) and there were not merged patches waiting for review. Now the patch 
for atomic groups was merged and bugs was fixed (thanks to Ma Lin).

Both the C code and the Python code for the re module are distributed on few 
files, which lie down in directories Modules and Lib. It makes difficult to 
work with all related files because they are intermixed with source files of 
different modules.

The following changes are planned:

1. Convert the re module into a package. Make sre_* modules its submodules.
2. Move C sources for the _sre module into a separate directory.
3. Extract the code for generating definitions of C constants from definitions 
of Python constants into a separate script and add it in the Tools/scripts 
directory (there are precedences: generate_token.py, etc).

--
components: Library (Lib), Regular Expressions
messages: 416268
nosy: ezio.melotti, gvanrossum, malin, mrabarnett, serhiy.storchaka
priority: normal
severity: normal
status: open
title: Reorganize the re module sources
versions: Python 3.11

___
Python tracker 

___
___
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com