New submission from Martin Panter:

I propose to document the split_header_words() so that it can be used to parse 
various kinds of HTTP-based header fields. Perhaps it should live in a more 
general module like “http”, or “email.policy.HTTP” (hinted in Issue 3609). 
Perhaps there is also room for finding a better name, such as 
parse_header_attributes() or something, since splitting space-separated words 
is not its most important property.

The function takes a series of header field values, as returned from 
Message.get_all(failobj=()). The field values may be separate strings and may 
also be comma-separated. It parses space- or semicolon-separated name=value 
attributes from each field value. Examples:

RFC 2965 Set-Cookie2 fields:
>>> cookies = (
...     'Cookie1="VALUE";Version=1;Discard, Cookie2="Same field";Version=1',
...     'Cookie3="Separate header field";Version=1',
... )
>>> pprint(http.cookiejar.split_header_words(cookies))
[[('Cookie1', 'VALUE'), ('Version', '1'), ('Discard', None)],
 [('Cookie2', 'Same field'), ('Version', '1')],
 [('Cookie3', 'Separate header field'), ('Version', '1')]]

RTSP 1.0 (RFC 2326) Transport header field:
>>> transport = 'RTP/AVP;unicast;mode="PLAY, RECORD", 
>>> RTP/AVP/TCP;interleaved=0-1'
>>> pprint(http.cookiejar.split_header_words((transport,)))
[[('RTP/AVP', None), ('unicast', None), ('mode', 'PLAY, RECORD')],
 [('RTP/AVP/TCP', None), ('interleaved', '0-1')]]

The parsing of spaces seems to be an attempt to parse headers like 
WWW-Authenticate, although it mixes up the parameters when given this example 
from RFC 7235:

>>> auth = 'Newauth realm="apps", type=1, title="Login to \\"apps\\"", Basic 
>>> realm="simple"'
>>> pprint(http.cookiejar.split_header_words((auth,)))
[[('Newauth', None), ('realm', 'apps')],
 [('type', '1')],
 [('title', 'Login to "apps"')],
 [('Basic', None), ('realm', 'simple')]]

Despite that, the function is still very useful for parsing many kinds of 
header fields that use semicolons. All the alternatives in the standard library 
that I know of have disadvantages:

* cgi.parse_header() does not split comma-separated values apart, and ignores 
any attribute without an equals sign, such as “Discard” and “unicast” above

* email.message.Message.get_params() and get_param() do not split 
comma-separated values either, and parsing header values other than the first 
one in a Message object is awkward

* email.headerregistry.ParameterizedMIMEHeader looks relevant, but I couldn’t 
figure out how to use it

----------
components: Library (Lib)
messages: 236397
nosy: vadmium
priority: normal
severity: normal
status: open
title: Expose http.cookiejar.split_header_words()
type: enhancement

_______________________________________
Python tracker <rep...@bugs.python.org>
<http://bugs.python.org/issue23498>
_______________________________________
_______________________________________________
Python-bugs-list mailing list
Unsubscribe: 
https://mail.python.org/mailman/options/python-bugs-list/archive%40mail-archive.com

Reply via email to