The requirement in the spec is what we intend.  The rule applies only
to that exact octet sequence.

Adam


On Sun, Jan 15, 2012 at 11:51 AM, Willy Tarreau <[email protected]> wrote:
> Hello Adam, Ian,
>
> Today I came across your draft "draft-ietf-websec-mime-sniff-03", and
> noticed the point below :
>
>   2.  If the octets were fetched via HTTP and there is an HTTP Content-
>       Type header field and the value of the last such header field has
>       octets that *exactly* match the octets contained in one of the
>       following lines:
>
>      +-------------------------------+--------------------------------+
>      | Bytes in Hexadecimal          | Textual Representation         |
>      +-------------------------------+--------------------------------+
>      | 74 65 78 74 2f 70 6c 61 69 6e | text/plain                     |
>      +-------------------------------+--------------------------------+
>      | 74 65 78 74 2f 70 6c 61 69 6e | text/plain; charset=ISO-8859-1 |
>      | 3b 20 63 68 61 72 73 65 74 3d |                                |
>      | 49 53 4f 2d 38 38 35 39 2d 31 |                                |
>     .../...
>
> I was having a doubt about spaces being optional around the semi-colon,
> so I just checked and indeed we have OWS before and after it :
>
>   http://www.ietf.org/id/draft-ietf-httpbis-p3-payload-18.txt
>
>   2.3.  Media Types
>
>   HTTP uses Internet Media Types [RFC2046] in the Content-Type
>   (Section 6.8) and Accept (Section 6.1) header fields in order to
>   provide open and extensible data typing and type negotiation.
>
>     media-type = type "/" subtype *( OWS ";" OWS parameter )
>     type       = token
>     subtype    = token
>
>   The type/subtype MAY be followed by parameters in the form of
>   attribute/value pairs.
>
>     parameter      = attribute "=" value
>     attribute      = token
>     value          = word
>
> Also, it is said here that quotes are allowed around the parameter
> value :
>
>   A parameter value that matches the token production can be
>   transmitted as either a token or within a quoted-string.  The quoted
>   and unquoted values are equivalent.
>
> So examples below are completely valid :
>
>   Content-type: text/plain;charset="ISO-8859-1"
>
>   Content-type: text/plain   ;  charset=ISO-8859-1
>
>   Content-type: text/plain ;
>         charset="ISO-8859-1"
>
> Thus the byte matching can only apply to the tokens and values. I think the
> safest thing to do would be to refer to the HTTP spec to define the header
> format then suggest byte matches for each fields, for instance :
>
>       If the octets were fetched via HTTP and there is an HTTP Content-
>       Type header field and the value of the last such header *exactly*
>       matches one of the media-types below, then the sniffed-type is
>       defined as the concatenation of the unquoted matching parts :
>
>       media-type = type "/" subtype *( OWS ";" OWS parameter )
>       sniffed-type = type "/" subtype 1*( "; " attribute "=" value )
>
>       All accepted media-types must *exactly* match :
>          - type    = "text" (hex 74 65 78 74)
>          - subtype = "plain" (hex 70 6c 61 69 6e)
>
>       If a parameter is present, its attribute must be "charset"
>       (hex 63 68 61 72 73 65 74) and the value must be one of :
>          - "ISO-8859-1" (hex 49 53 4f 2d 38 38 35 39 2d 31)
>          - "iso-8859-1" (hex 69 73 6f 2d 38 38 35 39 2d 31)
>          - "UTF-8"      (hex 55 54 46 2d 38)
>
> Please also note that HTTP indicates that some attributes accept a
> case-insensitive value. I have not yet found in the spec if "charset"
> accepts a case-insensitive value, but given that you identified two
> possible cases for "iso-8859-1", it is likely that "charset" falls into
> this case.
>
> Best regards,
> Willy
>
_______________________________________________
websec mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/websec

Reply via email to