> Otherwise, if the octets in s starting at pos match any of the sequences of
> octets in the first column of the following table, then the user agent MUST
> follow the steps given in the corresponding cell in the second column of the
> same row. |
What's the stray `|` character at the end of that doing?
The ToC feels double spaced, is that normal?
Would you mind quoting your attributes in source? Things like
class=no-num or href=#web-data scare me. It's easier if you just quote
all attributes :)
Also, I generally recommend `<span ...>x</span> ` over `<span ...>x
</span>` <- i.e. trailing space outside of span (see toc)
> <p>Many web servers supply incorrect Content-Type header fields with their
> HTTP
Can you mark up `Content-Type` in something which results in roughly
"typewriter" font?
s/user agents/User Agents/ as in:
> responses. In order to be compatible with these servers, user agents consider
> Without a clear specification of how to "sniff" the media type, each user
> agent implementor was forced to reverse engineer the behavior of the other
> user agents and to develop
s/the other/other/ -- there are some UAs who were ignored when the
sniffing of a given UA was developed :)
> their own algorithm
I'm not sure if `algorithm` here belongs in singular or plural, I got
distracted :)
> an HTTP response to be interpreted as one media type but some user agents
> interpret the responses as another media type.
s/responses/response/ (agreement with first part)
> However, if a user agent does interpret a low-privilege media type, such as
> image/gif, as a high-privilege media type, such as text/html, the user agent
> has created a privilege escalation vulnerability in the server.
s/, the user agent/, then the user agent/
I believe abarth has addressed the above.
> This document describes a content sniffing algorithm that carefully balances
> the compatibility needs of user agent implementors with the security
> constraints.
`the security constraints` is problematic, I don't think `the`
references anything
so either drop `the`, or provide a reference :/
> and metrics collected from implementations deployed to a sizable number of
> users .
s/ ././
> (such as "strip any leading space characters" or "return false and abort
> these steps") are to be interpreted with the meaning of the key word ("MUST",
> "SHOULD", "MAY", etc)
s/etc/etc./g
"official-type" should probably be given some styling -- preferably
not the same styling as "Content-Type"
> (Such messages are invalid according to RFC2616.
s/./.)/
The rfcs should be href references of some sort :)
> For octets received via HTTP, the Content-Type HTTP header field, if present,
> indicates the media type. Let the official-type be the media type indicted by
> the HTTP Content-Type header field, if present. If the Content-Type header
> field is absent or if its value cannot be interpreted as a media type (e.g.
> because its value doesn't contain a U+002F SOLIDUS ('/') character), then
> there is no official-type. (Such messages are invalid according to RFC2616.
> If an HTTP response contains multiple Content-Type header fields, the User
> Agent MUST use the textually last Content-Type header field to the
> official-type. For example, if the last Content-Type header field contains
> the value "foo", then there is no official media type because "foo" cannot be
> interpreted as a media type (even if the HTTP response contains another
> Content-Type header field that could be interpreted as a media type).
The for example part here applies to the previous paragraph, the
sentence needs to be moved to the paragraph before the instruction for
multiple header fields.
> FTP RFC0959
Is there a reason for the leading 0?
> Comparisons between media types, as defined by MIME specifications, are done
> in an ASCII case-insensitive manner. [RFC2046]
You need to somehow note that this is merely a note about mime
equivalence and doesn't relate to how the spec works.
> If the official-type ends in "+xml", or if it is either "text/xml" or
> "application/xml", then let the sniffed-type be the official-type and abort
> these steps.
Please mark up `sniffed-type` and `official-type`
> If the official-type is an image type supported by the User Agent (e.g.,
> "image/png", "image/gif", "image/jpeg", etc), then jump to the "images"
> section below.
s/etc//
> If none of the first n octets are binary data octets then let the
> sniffed-type be "text/plain" and abort these steps.
> Binary Data Byte Ranges
You don't actually define a `binary data octet` as any item within the
ranges defined in the `binary data byte ranges`.
> If the first octets match one of the octet sequences in the "pattern" column
> of the table in the "unknown type" section below, ignoring any rows whose
> cell in the "security" column says "scriptable" (or "n/a"), then let the
> sniffed-type be the type given in the corresponding cell in the "sniffed
> type" column on that row and abort these steps.
If you could make `"unknown type" section` a link to the section, that
would be helpful.
> For each row in the table below:
> If the row has no "WS" octets:
I know that "WS" appears in the table below, but it hasn't been
defined yet, and I don't want to guess what it means (whitespace?) --
I guessed wrong for the other one.
> If the row has a "WS" octet or a "_>" octet:
> "WS" means "whitespace", and allows insignificant whitespace to be skipped
> when sniffing for a type signature.
Oh, so that's where you hid the definition -- way too late :)
> "_>" means "space-or-bracket", and allows HTML tag names to terminate with
> either a space or a greater than sign.
Oh _ doesn't mean underscore
Please put those definitions before their use, not way below their use :(
> If the octets of the masked-data matches the given pattern octets exactly,
> then let the sniffed-type be the type given in the cell of the third column
> in that row and abort these steps.
s/matches/match/
> LOOP: If index-stream points beyond the end of the octet stream, then this
> row doesn't match and skip this row.
Please style `LOOP`
> If the index-pattern-th octet of the pattern is a normal hexadecimal octet
> and not a "WS" octet or a "_>" octet:
s/or a/nor a/
s/not/neither/
> If the index-stream-th octet of the stream is one of 0x09 (ASCII TAB), 0x0A
> (ASCII LF), 0x0C (ASCII FF), 0x0D (ASCII CR), or 0x20 (ASCII space), then
> increment only the index-stream to the next octet in the octet stream.
If you could style the 0xXX items in something <tt>-ish, that'd be appreciated.
... And if you could style the names (ASCII TAB, etc.) in something,
that'd also be appreciated.
> If the first n octets match the signature for MP4 (as define in ), then let
> the sniffed-type be video/mp4 and abort these steps.
s/define/defined/
-- The markup you're using failed to generate a visible-reference,
could you get the tool to generate an XXX when it fails? :)
> FF FF FF FF FF FF WS 3C 3f 78 6d 6c text/xml Scriptable <?xml (Note the case
> sensitivity and lack of trailing _>)
s/sensitivity/sensitivity [mask = FF instead of DF]/
> A JPEG SOI marker followed by a octet of another marker.
s/a octet/an octet/
-- the table doesn't currently handle .SWF; in the past, that has been a problem
http://www.digitalpreservation.gov/formats/fdd/fdd000130.shtml
> If n is less than 4, then the sequence does not match the signature for MP4
> and abort these steps.
`and` doesn't work; s/ and/;/ ?
In all previous cases, the form was `let foo and abort these steps`;
here it's `then <statement of truth> and`.
The fix is probably to move to "return TRUTH/FALSE value and abort
these steps" (or let state-determined-truth-value-be TRUTH/FASLE value
and ...).
> For each I from 2 to box-size/4 - 1 (inclusive):
If you could put `box-size/4 - 1` into some markup to indicate that
it's a math section, that'd be helpful.
> If octets 4*i through 4*i + 2 (inclusive) of the sequence are 0x6D 0x70 0x34
> (the ASCII string "mp4"), then the sequence does match the signature for MP4
> and abort these steps.
And here for `4*i` and `4*i + 2`
I think you need s/If octets/If any octets/, otherwise, it's ambiguous
between `any` and `all`.
> 7 Images
...
> Otherwise, let the sniffed-type be the official-type and abort these
> steps.
I'd rather otherwise be step 3 instead of part of the bulleted list
inside step 2
> If the octets with positions pos to pos+2 in s are exactly equal to 0x2D,
> 0x2D, 0x3E respectively (ASCII for "-->"), then increase pos by 3 and jump
> back to the previous step (the step labeled loop start) in the overall
> algorithm in this section.
`loop start` should be a link to the LOOP label and preferably have
the same case as the LOOP label.
> Return to step 2 in these substeps.
It'd be nice if this was a link to an anchor in the right part of the steps.
> If RDF-flag is 1 and RSS-flag is 1, then let the sniffed-type be
> "application/rss+xml" and abort these steps.
s/and/or/ ??