New submission from Mark Gordon <msg...@gmail.com>:

cgi.parse_header incorrectly handles unescaping of quoted-strings

Note that you can find the related RFCs to how HTTP encodes the Content-Type 
header at https://www.w3.org/Protocols/rfc2616/rfc2616-sec2.html and further 
discussion on how quoted-string is defined at 
https://greenbytes.de/tech/webdav/draft-ietf-httpbis-p1-messaging-16.html#rfc.section.3.2.1.p.3.

The way parse_header is written it has no context to be able to tell if a 
backslash is escaping a double quote or if the backslash is actually the 
escaped character and the double quote is free-standing, unescaped. For this 
reason it fails to parse values that have a backslash literal at the end. e.g. 
the following Content-Type will fail to be parsed

a/b; foo="moo\\"; bar=baz

Example run on current cpython master demonstrating the bug:

Python 3.10.0a7+ (heads/master:660592f67c, Apr 21 2021, 22:51:04) [GCC 9.3.0] 
on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import cgi
>>> query = 'a; foo="moo\\\\"; bar=cow' 
>>> print(query)
a; foo="moo\\"; bar=cow
>>> cgi.parse_header(query)
('a', {'foo': '"moo\\\\"; bar=cow'})

----------
components: Library (Lib)
messages: 391580
nosy: msg555
priority: normal
severity: normal
status: open
title: cgi.parse_header does not handle escaping correctly
type: behavior
versions: Python 3.10

_______________________________________
Python tracker <rep...@bugs.python.org>
<https://bugs.python.org/issue43910>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to