#3: xml2rfc: hyphen not escaped in unicode.py

 Debian Stretch, uname -a reports:
 Linux maria 4.9.0-4-amd64 #1 SMP Debian 4.9.65-3+deb9u1 (2017-12-23)
 x86_64 GNU/Linux
 Command python3 reports:
 Python 3.8.1 (default, Feb 22 2020, 11:56:23) [GCC 6.3.0 20170516] on
 linux

 When I enter command "xml2rfc -h" repeatedly, it fails half the time with
 error message:

 Traceback (most recent call last):
 File "/usr/bin/xml2rfc", line 7, in <module>
 from xml2rfc.run import main
 File "/mnt/home/rprice/.local/lib/python3.5/site-
 packages/xml2rfc/__init__.py", line 14, in <module>
 from xml2rfc.parser import  XmlRfcError, CachingResolver, XmlRfcParser,
 XmlRfc
 File "/mnt/home/rprice/.local/lib/python3.5/site-
 packages/xml2rfc/parser.py", line 20, in <module>
 from xml2rfc.writers import base
 File "/mnt/home/rprice/.local/lib/python3.5/site-
 packages/xml2rfc/writers/__init__.py", line 2, in <module>
 from xml2rfc.writers.base import RfcWriterError
 File "/mnt/home/rprice/.local/lib/python3.5/site-
 packages/xml2rfc/writers/base.py", line 30, in <module>
 from xml2rfc.util.unicode import ( punctuation, unicode_replacements,
 unicode_content_tags, bare_unicode_tags,
 File "/mnt/home/rprice/.local/lib/python3.5/site-
 packages/xml2rfc/util/unicode.py", line 260, in <module>
 punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys())))
 File "/usr/lib/python3.5/re.py", line 224, in compile
 return _compile(pattern, flags)
 File "/usr/lib/python3.5/re.py", line 293, in _compile
 p = sre_compile.compile(pattern, flags)
 File "/usr/lib/python3.5/sre_compile.py", line 536, in compile
 p = sre_parse.parse(p, flags)
 File "/usr/lib/python3.5/sre_parse.py", line 829, in parse
 p = _parse_sub(source, pattern, 0)
 File "/usr/lib/python3.5/sre_parse.py", line 437, in _parse_sub
 itemsappend(_parse(source, state))
 File "/usr/lib/python3.5/sre_parse.py", line 575, in _parse
 raise source.error(msg, len(this) + 1 + len(that))
 sre_constants.error: bad character range −-“ at position 3

 At line 260 in .../xml2rfc/util/unicode.py I inserted two lines to display
 the value of punctuation.keys()

 259-punctuation.update(unicode_quote_replacements)
 260-import sys
 261-print("unicode.py: list(punctuation.keys()) {}"
 .format(list(punctuation.keys())),file=sys.stderr)
 262-punctuation_re = re.compile(r'[%s]'%''.join(list(punctuation.keys())))

 When xml2rfc succeeded, I saw

 unicode.py: list(punctuation.keys()) = ['\u2002', '-', '‐', '′', '–', '´',
 '…', '’',
 '−', '„', '—', '\u2009', '‚', '‘', '”', '“', '\u2003']
 unicode.py: list(punctuation.keys()) = ['´', '„', '\u2003', '‚', '−', '“',
 '’', '‘', '-', '…',
 '\u2009', '—', '–', '”', '′', '‐', '\u2002']

 When xml2rfc failed, I saw

 unicode.py: list(punctuation.keys()) = ['´', '\u2002', '−', '-', '“', '„',
 '‘', '′',
 '‚', '–', '…', '’', '‐', '”', '—', '\u2009', '\u2003']
 unicode.py: list(punctuation.keys()) = ['‐', '\u2003', '„', '\u2002',
 '\u2009', '‚',
 '—', '’', '−', '…', '‘', '′', '-', '“', '”', '´', '–']

 It looks as if the character "-" is being wrongly interpreted by re as a
 range indicator.
 Perhaps it should be escaped.

 My apologies for the wretched formatting of this message.
 Roger

-- 
----------------------------------+----------------------
 Reporter:  [email protected]  |      Owner:  somebody
     Type:  defect                |     Status:  new
 Priority:  major                 |  Milestone:
Component:  component1            |    Version:
 Keywords:  re escape hyphen      |
----------------------------------+----------------------

Ticket URL: <https://trac.tools.ietf.org/misc/outcomes/ticket/3>
IETF Successes and Failures <http://tools.ietf.org/misc/outcomes/>
IETF Successes and Failures

_______________________________________________
OPSEC mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/opsec

Reply via email to