[issue42473] re.sub ignores flag re.M
New submission from Jérôme Laurens : Test code: ``` import re test='''012345678 012345678 ''' pattern = r'^\s+?$' m = re.search(pattern, test, re.M) if m: print(f'TEST FOUND "{m.span()}"') def replace(m): print(f'TEST REMOVE {m.span()}') return '' test = re.sub(pattern, replace, test, re.M) m = re.search(pattern, test, re.M) if m: print(f'TEST STILL THERE "{m.span()}"') print('COMPILE PATTERN FIRST') pattern_re = re.compile(pattern, re.M) m = re.search(pattern_re, test) if m: print(f'TEST FOUND "{m.span()}"') def replace(m): print(f'TEST REMOVE {m.span()}') return '' test = re.sub(pattern_re, replace, test) m = re.search(pattern_re, test) if m: print(f'TEST STILL THERE "{m.span()}"') ``` Actual output: TEST FOUND "(10, 19)" TEST STILL THERE "(10, 19)" COMPILE PATTERN FIRST TEST FOUND "(10, 19)" TEST REMOVE (10, 19) This is an inconsistency between re.search and re.sub. Either this is a bug in the code or in the documentation. -- components: IO messages: 381901 nosy: jlaurens priority: normal severity: normal status: open title: re.sub ignores flag re.M type: behavior versions: Python 3.8 ___ Python tracker <https://bugs.python.org/issue42473> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue38514] pathlib's mkdir documentation improvement
New submission from Jérôme Laurens : There are some inconsistencies in the actual documentation of path lib's mkdir doc. Here is the 3.7 version, annotated and followed by a change proposal Path.mkdir(mode=0o777, parents=False, exist_ok=False) Create a new directory at this given path. If mode is given, it is combined with the process’ umask value to determine the file mode and access flags. If the path already exists, FileExistsError is raised. <<<<<<<<<<<<<<<<<< NOT ALWAYS due to exist_ok. If parents is true, any missing parents of this path are created as needed; they are created with the default permissions without taking mode into account (mimicking the POSIX mkdir -p command). If parents is false (the default), a missing parent raises FileNotFoundError. If exist_ok is false (the default), FileExistsError is raised if the target directory already exists. If exist_ok is true, FileExistsError exceptions will be ignored (same behavior as the POSIX mkdir -p command), but only if the last path component is not an existing non-directory file. <<<<<<<<<<<<<<<<<< UNCLEAR: 1) what is an ignored exception ? 2) The reference to POSIX should appear at the end, like above, 3) the last path component is a string 4) usage of a double negation ignore/is not - CHANGE Path.mkdir(mode=0o777, parents=False, exist_ok=False) Create a new directory in the file system at this given path. If mode is given, it is combined with the process’ umask value to determine the file mode and access flags. If parents is false (the default), a missing parent raises FileNotFoundError. If parents is true, any missing parents of this path are created as needed; they are created with the default permissions without taking mode into account (mimicking the POSIX mkdir -p command). If exist_ok is false (the default), FileExistsError is raised if the given path already exists in the file system, whether a directory or not. If exist_ok is true, FileExistsError is raised only if the given path already exists in the file system and is not a directory (same behavior as the POSIX mkdir -p command). Thanks for reading JL -- assignee: docs@python components: Documentation messages: 354874 nosy: docs@python, jlaurens priority: normal severity: normal status: open title: pathlib's mkdir documentation improvement type: enhancement versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue38514> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35957] Indentation explanation is unclear
Jérôme LAURENS added the comment: To be more precise, consider code def f(x): \tx=0 # 7 spaces + one tab return x # 8 spaces In cpython, both indentation levels are 8 and no indentation error is reported (this is the case where both tab size and alt tab size are equal) If instead of 8 the tab would count for 6 spaces, then we would have 12 and 8 as indentation level, resulting in a mismatch and an indentation error being reported, according to the documentation. This is inconsistent. Then either the documentation is faulty or cpython is. Actually, cpython accepts a mix of space and tabs only when tabs are in 8, 16, 24... positions. -- ___ Python tracker <https://bugs.python.org/issue35957> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue35957] Indentation explanation is unclear
New submission from Jérôme LAURENS : https://docs.python.org/3/reference/lexical_analysis.html#indentation reads Point 1: "Tabs are replaced (from left to right) by one to eight spaces such that the total number of characters up to and including the replacement is a multiple of eight" and in the next paragraph Point 2: "Indentation is rejected as inconsistent if a source file mixes tabs and spaces in a way that makes the meaning dependent on the worth of a tab in spaces" In point 1, each tab has definitely a unique space counterpart, in point 2, tabs may have different space counterpart, which one is reliable ? The documentation should state that Point 1 concerns cPython, or at least indicate that the 8 may depend on the implementation, which then gives sense to point 2. -- assignee: docs@python components: Documentation messages: 335165 nosy: Jérôme LAURENS, docs@python priority: normal severity: normal status: open title: Indentation explanation is unclear type: enhancement versions: Python 3.7 ___ Python tracker <https://bugs.python.org/issue35957> ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Jérôme Laurens added the comment: Since the text and tail notions seem tightly coupled, I would vote for a more detailed explanation in the text doc and a forward link in the tail documentation. text The text attribute holds the text between the element's begin tag and the next tag or None. The tail attribute holds the text between the element's end tag and the next tag or None. For ab1c2d/3/c/b4/a xml data, the a element has None for both text and tail attributes, the b element has text '1' and tail '4', the c element has text '2' and tail None, the d element hast text None and tail '3'. To collect the inner text of an element, see `tostring` with method 'text'. Applications may store arbitrary objects in this attribute. tail The tail attribute holds the text between the element's end tag and the next tag or None. See `text` for more details. Applications may store arbitrary objects in this attribute. It is very important to mention that the 'text' attribute does not always hold a string contrary to what would suggest its name. BTW, I was not aware of the tostring method with 'text' argument. The fact is that the documentation reads Returns an (optionally) encoded string containing the XML data. which is misleading because the text is not xml data in general. This also needs to be rephrased or simply removed. -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Jérôme Laurens added the comment: Erratum def innertext(elt): return (elt.text or '') +''.join(innertext(e)+(e.tail or '') for e in elt) -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
Jérôme Laurens added the comment: The totsstring(..., method='text') is not suitable for the inner text because it adds the tail of the top element. A proper implementation would be def innertext(elt): return (elt.text or '') +''.join(innertext(e)+e.tail for e in elt) that can be included in the doc instead of the mention of the to string trick -- ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24079] xml.etree.ElementTree.Element.text does not conform to the documentation
New submission from Jérôme Laurens: The documentation for xml.etree.ElementTree.Element.text reads If the element is created from an XML file the attribute will contain any text found between the element tags. import xml.etree.ElementTree as ET root3 = ET.fromstring('ab/TEXT/a') print(root3.text) CURRENT OUTPUT None TEXT is between the elements tags but does not appear in the output BTW : this is well formed xml and has nothing to do with tail. -- components: XML messages: 242256 nosy: jlaurens priority: normal severity: normal status: open title: xml.etree.ElementTree.Element.text does not conform to the documentation type: behavior versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24079 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com
[issue24072] xml.etree.ElementTree.Element does not catch text
New submission from Jérôme Laurens: text is not catcher in case 3 below INPUT import xml.etree.ElementTree as ET root1 = ET.fromstring('aTEXT/a') print(root1.text) root2 = ET.fromstring('aTEXTb//a') print(root2.text) root3 = ET.fromstring('ab/TEXT/a') print(root3.text) CURRENT OUTPUT TEXT TEXT None -- ERROR HERE EXPECTED OUTPUT TEXT TEXT TEXT -- messages: 242207 nosy: jlaurens priority: normal severity: normal status: open title: xml.etree.ElementTree.Element does not catch text type: behavior versions: Python 3.4 ___ Python tracker rep...@bugs.python.org http://bugs.python.org/issue24072 ___ ___ Python-bugs-list mailing list Unsubscribe: https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com