Hello Adam, Ian,
Today I came across your draft "draft-ietf-websec-mime-sniff-03", and
noticed the point below :
2. If the octets were fetched via HTTP and there is an HTTP Content-
Type header field and the value of the last such header field has
octets that *exactly* match the octets contained in one of the
following lines:
+-------------------------------+--------------------------------+
| Bytes in Hexadecimal | Textual Representation |
+-------------------------------+--------------------------------+
| 74 65 78 74 2f 70 6c 61 69 6e | text/plain |
+-------------------------------+--------------------------------+
| 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=ISO-8859-1 |
| 3b 20 63 68 61 72 73 65 74 3d | |
| 49 53 4f 2d 38 38 35 39 2d 31 | |
.../...
I was having a doubt about spaces being optional around the semi-colon,
so I just checked and indeed we have OWS before and after it :
http://www.ietf.org/id/draft-ietf-httpbis-p3-payload-18.txt
2.3. Media Types
HTTP uses Internet Media Types [RFC2046] in the Content-Type
(Section 6.8) and Accept (Section 6.1) header fields in order to
provide open and extensible data typing and type negotiation.
media-type = type "/" subtype *( OWS ";" OWS parameter )
type = token
subtype = token
The type/subtype MAY be followed by parameters in the form of
attribute/value pairs.
parameter = attribute "=" value
attribute = token
value = word
Also, it is said here that quotes are allowed around the parameter
value :
A parameter value that matches the token production can be
transmitted as either a token or within a quoted-string. The quoted
and unquoted values are equivalent.
So examples below are completely valid :
Content-type: text/plain;charset="ISO-8859-1"
Content-type: text/plain ; charset=ISO-8859-1
Content-type: text/plain ;
charset="ISO-8859-1"
Thus the byte matching can only apply to the tokens and values. I think the
safest thing to do would be to refer to the HTTP spec to define the header
format then suggest byte matches for each fields, for instance :
If the octets were fetched via HTTP and there is an HTTP Content-
Type header field and the value of the last such header *exactly*
matches one of the media-types below, then the sniffed-type is
defined as the concatenation of the unquoted matching parts :
media-type = type "/" subtype *( OWS ";" OWS parameter )
sniffed-type = type "/" subtype 1*( "; " attribute "=" value )
All accepted media-types must *exactly* match :
- type = "text" (hex 74 65 78 74)
- subtype = "plain" (hex 70 6c 61 69 6e)
If a parameter is present, its attribute must be "charset"
(hex 63 68 61 72 73 65 74) and the value must be one of :
- "ISO-8859-1" (hex 49 53 4f 2d 38 38 35 39 2d 31)
- "iso-8859-1" (hex 69 73 6f 2d 38 38 35 39 2d 31)
- "UTF-8" (hex 55 54 46 2d 38)
Please also note that HTTP indicates that some attributes accept a
case-insensitive value. I have not yet found in the spec if "charset"
accepts a case-insensitive value, but given that you identified two
possible cases for "iso-8859-1", it is likely that "charset" falls into
this case.
Best regards,
Willy
_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec