Ian Hickson wrote:
On Wed, 14 Oct 2009, Amos Jeffries wrote:
4.1.12 prescribes semi-implicitly that HTTP/1.0 and HTTP/1.2 etc are not compatible. Maybe thats what you want. *very* minor enhancement would be to make that explicitly stated.

I've added a note to this effect.


4.1.13 still has a fragility issue in that it assumes the Upgrade: and
Connection: headers will retain both their specific sending order and be
the very first headers in the reply. It will work in most situations, but
proxies which 'correct' the headers order to have Date: first will kill
WebSockets.

That's intentional; such proxies don't know about Web Sockets (if they did, they wouldn't be modifying the headers!) and thus clearly can't really be trusted to route the traffic unmodified.

At this point of the handshake the client is the only software which knows its using WebSockets. The server may validate-parse the headers mime syntax before sub-parsing the request line. At this point all its seen is the GET and HTTP/1.1.

So... the server and any middleware will be in a state right now thinking that HTTP/1.1 is in use and will do appropriate HTTP/1.1 header alterations.

It is not until the server reply accepting the Upgrade: request is received by middleware that WebSockets protocol actions can start happening.


4.1 14 thru 4.1.23 appear to be a very conflated description of parsing the headers.

It seems to me that referencing rfc2616 section 4.2 should be sufficient for the parse

Unfortunately, HTTP doesn't define how to parse headers. It defines the semantics of valid headers, but doesn't say, e.g., what headers are present in the following:

   HTTP/1.1 200 OK
   : Bar
   Foo
   ::::Quux::::

Section 4.2 is clear:
"Each header field consists of a name followed by a colon (":") and the field value. Field names are case-insensitive."

NP: WebSockets as of draft-49 requires (1.2) "The first three lines in each case are hard-coded (the exact case and order matters)" which is a breach of the final statement above. That final statement permits middeleware to uppercase or CamelCase the headers on a whim without altering their meaning.

References RFC822 section 3.1 for the BNF. Which states:
 " B.1.  SYNTAX

     message         =   *field *(CRLF *text)

     field           =    field-name ":" [field-body] CRLF

     field-name      =  1*<any CHAR, excluding CTLs, SPACE, and ":">

     field-body      =   *text [CRLF LWSP-char field-body]
"
...
"
  C.1.1.  FIELD NAMES

        These now must be a sequence of  printable  characters.   They
        may not contain any LWSP-chars.
"

... which requires a minimum of one ASCII byte header names which may not include ':' or whitespace or non-printables.

NP: WebSockets draft-49 changes the bytes to UNICODE format and permits non-printables which are not LF or CR.


In your above demo request is HTTP/1.1 invalid:
 * first header line has no token in the field-name portion,
 * second line has CRLF in the name portion,
 * third line has zero-byte name portion.

Any one of which will be either dropped by existing middleware or handled as HTTP/0.9 with body content:
  : Bar<CRLF>
  Foo<CRLF>
  ::::Quux::::<CRLF>

The first handling method is good the second may be a major headache.

Since you have spec'd that only valid HTTP/1.1 is acceptable this will be dropped by any WebSockets aware software even if its accepted by WebSockets.

For completeness the rest of rfc822sect3.1 used by rfc2616 specs:
"
     B.2.  SEMANTICS

          Headers occur before the message body and are terminated  by
     a null line (i.e., two contiguous CRLFs).

          A line which continues a header field begins with a SPACE or
     HTAB  character,  while  a  line  beginning a field starts with a
     printable character which is not a colon.

          A field-name consists of one or  more  printable  characters
     (excluding  colon,  space, and control-characters).  A field-name
     MUST be contained on one line.  Upper and lower case are not dis-
     tinguished when comparing field-names.
"

.. the third clause there prohibits headers like your example Foo:

   Foo<CRLF>
   : header text<CRLF>

Supporting the second clause (LWS) will not affect the client sent data. But will help WebSockets cope with headers using very long Cookie data and long auth credentials.


For Web Sockets I would like to have well-defined processing in the face of any input, even invalid input. I'd also like to not require that the processing for headers be as complicated as HTTP's (with continuation lines, multiple headers being merged, etc).

Understood. I'm hoping the above spec 2616 + 822 segments are sufficiently clear for you on what is and is not permitted on the headers.

Things which are not valid HTTP/1.1 as above are of course badly broken WebSockets as well. You can spec as a broad cover that non-valid HTTP/1.1 is a fail connection.


and do away with 4.1.15 through 4.1.21. Similar to the way 4.1.23 mentions www-auth "Obtain [header array] in a manner consistent with the requirements for handling the headers in HTTP"

That's a big cop-out on my part... and I expect it to be the source of many bugs. Unfortunately I don't really see how to make this more explicit without duplicating content from other specs.


You don't have to re-design the whole wheel.

I do wish the commonly shared header syntax was an RFC of its own that could be referenced. But we can work with whats there already. Particularly since you are using HTTP/1.1 syntax, it's best to say so rather than spec'ing in detail something which is incomplete.


Mandating drop of connections not conforming to correct format of headers is implied and some bits are explicitly stated.

What is implied? Any implication is a bug; the intent is for all behaviour to be explicitly normatively required.

draft-48/49 section 5.2 specifies that the field-name is followed by ': ' (COLON SPACE) but does not go as far as HTTP in denying the use of COLON, whitespace, CR, and LF in the field name itself.

(I see this is now fixed by the draft-49 changes in section 1.2 doing the prohibition)

5,2 still says merely "Any fields that lack the colon-space separator should be discarded and may cause the server to disconnect."

Making the "should" and "may" in that final sentence of section 5.2 into MUST drop will make it clearly consistent with the rest of WebSockets always-drop policy when validation fails.

I don't see any cases where you would want to accept HTTP/1.1 invalid headers.


That can be cleaned up and locked in by the above and adding a clear BNF like: (alpha|hyphen) colon space (ascii)* CRLF

Ok, I added a non-normative ABNF in the protocol description in the introduction.


The above would also cover handling of LWS cases. Which are currently
breaking WebSockets. (less important)

Not sure what you mean here.


Multi-line HTTP headers in the "to be ignored" part of the reply/request ...

 Cookie: foo; data=something-very-long;<CRLF>
 <SPACE>domain=example.com<CRLF>

... currently the second line will cause a WebSockets abort despite your spec permitting Cookies.


As a minor issue, it explicitly specifies reading single bytes. I can see people interpreting that as preventing buffering of received data.

As the conformance section says:

#  Conformance requirements phrased as algorithms or specific steps may
#  be implemented in any manner, so long as the end result is
#  equivalent.  (In particular, the algorithms defined in this
#  specification are intended to be easy to follow, and not intended to
#  be performant.)

Ah, okay. I missed that. Fine then.


It would be nice if clients were explicitly allowed to send other headers, e.g., Referer or User-Agent, but it's not critical. Also, by its nature this protocol is going to be fragile on non-CONNECTed HTTP connections, but Ian has already acknowledged this.
That is implied by the mention of also adding www-authenticate and not prohibiting other headers sent following the WebSockets ones. The servers will now cope and discard according to 4.1 of the current draft.

The draft defines exactly what user agents must send. Extensions (like proprietary headers) are non-conforming. Of course, other specifications can extend the handshake to add other headers like Referer, if that's desired. In the case of Referer, of course, it's somewhat rendundant, since the Origin is included in the request; if the author really wants to send the exact referer, he can send it in his data stream.


_send_ is fine as long as the middleware and servers see it as valid HTTP. This you have accomplished.

The only remaining problems are in how to validate what is _received_ after having traversed a number of middleware boxes doing valid HTTP alterations to the headers.


In conclusion. Hooray! nearly there :)

Thanks for the feedback!


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
  Current Beta Squid 3.1.0.14

Reply via email to