#3: xml2rfc: hyphen not escaped in unicode.py
Debian Stretch, uname -a reports:
Linux maria 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23)
x86_64 GNU/Linux
Command python3 reports:
Python 3.8.1 (default, Feb 22 2020, 11:56:23) [GCC 6.3.0 20170516] on
linux
When I enter command "xml2rfc -h" repeatedly, it fails half the time with
error message:
Traceback (most recent call last):
File "/usr/bin/xml2rfc", line 7, in <module>
from xml2rfc.run import main
File "/mnt/home/rprice/.local/lib/python3.5/site-
packages/xml2rfc/__init__.py", line 14, in <module>
from xml2rfc.parser import XmlRfcError, CachingResolver, XmlRfcParser,
XmlRfc
File "/mnt/home/rprice/.local/lib/python3.5/site-
packages/xml2rfc/parser.py", line 20, in <module>
from xml2rfc.writers import base
File "/mnt/home/rprice/.local/lib/python3.5/site-
packages/xml2rfc/writers/__init__.py", line 2, in <module>
from xml2rfc.writers.base import RfcWriterError
File "/mnt/home/rprice/.local/lib/python3.5/site-
packages/xml2rfc/writers/base.py", line 30, in <module>
from xml2rfc.util.unicode import ( punctuation, unicode_replacements,
unicode_content_tags, bare_unicode_tags,
File "/mnt/home/rprice/.local/lib/python3.5/site-
packages/xml2rfc/util/unicode.py", line 260, in <module>
punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys())))
File "/usr/lib/python3.5/re.py", line 224, in compile
return _compile(pattern, flags)
File "/usr/lib/python3.5/re.py", line 293, in _compile
p = sre_compile.compile(pattern, flags)
File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
p = sre_parse.parse(p, flags)
File "/usr/lib/python3.5/sre_parse.py", line 829, in parse
p = _parse_sub(source, pattern, 0)
File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
itemsappend(_parse(source, state))
File "/usr/lib/python3.5/sre_parse.py", line 575, in _parse
raise source.error(msg, len(this) + 1 + len(that))
sre_constants.error: bad character range −-“ at position 3
At line 260 in .../xml2rfc/util/unicode.py I inserted two lines to display
the value of punctuation.keys()
259-punctuation.update(unicode_quote_replacements)
260-import sys
261-print("unicode.py: list(punctuation.keys()) {}"
.format(list(punctuation.keys())),file=sys.stderr)
262-punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys())))
When xml2rfc succeeded, I saw
unicode.py: list(punctuation.keys()) = ['\u2002', '-', '‐', '′', '–', '´',
'…', '’',
'−', '„', '—', '\u2009', '‚', '‘', '”', '“', '\u2003']
unicode.py: list(punctuation.keys()) = ['´', '„', '\u2003', '‚', '−', '“',
'’', '‘', '-', '…',
'\u2009', '—', '–', '”', '′', '‐', '\u2002']
When xml2rfc failed, I saw
unicode.py: list(punctuation.keys()) = ['´', '\u2002', '−', '-', '“', '„',
'‘', '′',
'‚', '–', '…', '’', '‐', '”', '—', '\u2009', '\u2003']
unicode.py: list(punctuation.keys()) = ['‐', '\u2003', '„', '\u2002',
'\u2009', '‚',
'—', '’', '−', '…', '‘', '′', '-', '“', '”', '´', '–']
It looks as if the character "-" is being wrongly interpreted by re as a
range indicator.
Perhaps it should be escaped.
My apologies for the wretched formatting of this message.
Roger
--
----------------------------------+----------------------
Reporter: [email protected] | Owner: somebody
Type: defect | Status: new
Priority: major | Milestone:
Component: component1 | Version:
Keywords: re escape hyphen |
----------------------------------+----------------------
Ticket URL: <https://trac.tools.ietf.org/misc/outcomes/ticket/3>
IETF Successes and Failures <http://tools.ietf.org/misc/outcomes/>
IETF Successes and Failures
_______________________________________________
OPSEC mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsec