Re: WebSockets negotiation over HTTP

Amos Jeffries Sun, 18 Oct 2009 16:06:59 -0700

Ian Hickson wrote:

On Wed, 14 Oct 2009, Amos Jeffries wrote:
4.1.12 prescribes semi-implicitly that HTTP/1.0 and HTTP/1.2 etc are notcompatible. Maybe thats what you want. *very* minor enhancement would beto make that explicitly stated.
I've added a note to this effect.
4.1.13 still has a fragility issue in that it assumes the Upgrade: and
Connection: headers will retain both their specific sending order and be
the very first headers in the reply. It will work in most situations, but
proxies which 'correct' the headers order to have Date: first will kill
WebSockets.
That's intentional; such proxies don't know about Web Sockets (if theydid, they wouldn't be modifying the headers!) and thus clearly can'treally be trusted to route the traffic unmodified.

At this point of the handshake the client is the only software whichknows its using WebSockets.The server may validate-parse the headers mime syntax before sub-parsingthe request line. At this point all its seen is the GET and HTTP/1.1.

So... the server and any middleware will be in a state right nowthinking that HTTP/1.1 is in use and will do appropriate HTTP/1.1 headeralterations.

It is not until the server reply accepting the Upgrade: request isreceived by middleware that WebSockets protocol actions can start happening.

4.1 14 thru 4.1.23 appear to be a very conflated description of parsingthe headers.
It seems to me that referencing rfc2616 section 4.2 should be sufficientfor the parse
Unfortunately, HTTP doesn't define how to parse headers. It defines thesemantics of valid headers, but doesn't say, e.g., what headers arepresent in the following:
   HTTP/1.1 200 OK
   : Bar
   Foo
   ::::Quux::::


Section 4.2 is clear:

"Each header field consists of a name followed by a colon (":") andthe field value. Field names are case-insensitive."

NP: WebSockets as of draft-49 requires (1.2) "The first three linesin each case are hard-coded (the exact case and order matters)" which isa breach of the final statement above. That final statement permitsmiddeleware to uppercase or CamelCase the headers on a whim withoutaltering their meaning.


References RFC822 section 3.1 for the BNF. Which states:
 " B.1.  SYNTAX

     message         =   *field *(CRLF *text)

     field           =    field-name ":" [field-body] CRLF

     field-name      =  1*<any CHAR, excluding CTLs, SPACE, and ":">

     field-body      =   *text [CRLF LWSP-char field-body]
"
...
"
  C.1.1.  FIELD NAMES

        These now must be a sequence of  printable  characters.   They
        may not contain any LWSP-chars.
"

... which requires a minimum of one ASCII byte header names which maynot include ':' or whitespace or non-printables.

NP: WebSockets draft-49 changes the bytes to UNICODE format andpermits non-printables which are not LF or CR.



In your above demo request is HTTP/1.1 invalid:
 * first header line has no token in the field-name portion,
 * second line has CRLF in the name portion,
 * third line has zero-byte name portion.

Any one of which will be either dropped by existing middleware orhandled as HTTP/0.9 with body content:

  : Bar<CRLF>
  Foo<CRLF>
  ::::Quux::::<CRLF>

The first handling method is good the second may be a major headache.

Since you have spec'd that only valid HTTP/1.1 is acceptable this willbe dropped by any WebSockets aware software even if its accepted byWebSockets.


For completeness the rest of rfc822sect3.1 used by rfc2616 specs:
"
     B.2.  SEMANTICS

          Headers occur before the message body and are terminated  by
     a null line (i.e., two contiguous CRLFs).

          A line which continues a header field begins with a SPACE or
     HTAB  character,  while  a  line  beginning a field starts with a
     printable character which is not a colon.

          A field-name consists of one or  more  printable  characters
     (excluding  colon,  space, and control-characters).  A field-name
     MUST be contained on one line.  Upper and lower case are not dis-
     tinguished when comparing field-names.
"

.. the third clause there prohibits headers like your example Foo:

   Foo<CRLF>
   : header text<CRLF>

Supporting the second clause (LWS) will not affect the client sent data.But will help WebSockets cope with headers using very long Cookie dataand long auth credentials.

For Web Sockets I would like to have well-defined processing in the faceof any input, even invalid input. I'd also like to not require that theprocessing for headers be as complicated as HTTP's (with continuationlines, multiple headers being merged, etc).

Understood. I'm hoping the above spec 2616 + 822 segments aresufficiently clear for you on what is and is not permitted on the headers.

Things which are not valid HTTP/1.1 as above are of course badly brokenWebSockets as well. You can spec as a broad cover that non-validHTTP/1.1 is a fail connection.

and do away with 4.1.15 through 4.1.21. Similar to the way 4.1.23mentions www-auth "Obtain [header array] in a manner consistent with therequirements for handling the headers in HTTP"
That's a big cop-out on my part... and I expect it to be the source ofmany bugs. Unfortunately I don't really see how to make this moreexplicit without duplicating content from other specs.


You don't have to re-design the whole wheel.

I do wish the commonly shared header syntax was an RFC of its own thatcould be referenced. But we can work with whats there already.Particularly since you are using HTTP/1.1 syntax, it's best to say sorather than spec'ing in detail something which is incomplete.

Mandating drop of connections not conforming to correct format ofheaders is implied and some bits are explicitly stated.
What is implied? Any implication is a bug; the intent is for allbehaviour to be explicitly normatively required.

draft-48/49 section 5.2 specifies that the field-name is followed by ':' (COLON SPACE) but does not go as far as HTTP in denying the use ofCOLON, whitespace, CR, and LF in the field name itself.

(I see this is now fixed by the draft-49 changes in section 1.2 doingthe prohibition)

5,2 still says merely "Any fields that lack the colon-space separatorshould be discarded and may cause the server to disconnect."

Making the "should" and "may" in that final sentence of section 5.2 intoMUST drop will make it clearly consistent with the rest of WebSocketsalways-drop policy when validation fails.

I don't see any cases where you would want to accept HTTP/1.1 invalidheaders.

That can be cleaned up and locked in by the above and adding a clear BNFlike: (alpha|hyphen) colon space (ascii)* CRLF
Ok, I added a non-normative ABNF in the protocol description in theintroduction.
The above would also cover handling of LWS cases. Which are currently
breaking WebSockets. (less important)
Not sure what you mean here.


Multi-line HTTP headers in the "to be ignored" part of the reply/request ...

 Cookie: foo; data=something-very-long;<CRLF>
 <SPACE>domain=example.com<CRLF>

... currently the second line will cause a WebSockets abort despite yourspec permitting Cookies.

As a minor issue, it explicitly specifies reading single bytes. I cansee people interpreting that as preventing buffering of received data.


As the conformance section says:

#  Conformance requirements phrased as algorithms or specific steps may
#  be implemented in any manner, so long as the end result is
#  equivalent.  (In particular, the algorithms defined in this
#  specification are intended to be easy to follow, and not intended to
#  be performant.)


Ah, okay. I missed that. Fine then.

It would be nice if clients were explicitly allowed to send otherheaders, e.g., Referer or User-Agent, but it's not critical. Also, byits nature this protocol is going to be fragile on non-CONNECTed HTTPconnections, but Ian has already acknowledged this.
That is implied by the mention of also adding www-authenticate and notprohibiting other headers sent following the WebSockets ones. Theservers will now cope and discard according to 4.1 of the current draft.
The draft defines exactly what user agents must send. Extensions (likeproprietary headers) are non-conforming. Of course, other specificationscan extend the handshake to add other headers like Referer, if that'sdesired. In the case of Referer, of course, it's somewhat rendundant,since the Origin is included in the request; if the author really wants tosend the exact referer, he can send it in his data stream.

_send_ is fine as long as the middleware and servers see it as validHTTP. This you have accomplished.

The only remaining problems are in how to validate what is _received_after having traversed a number of middleware boxes doing valid HTTPalterations to the headers.

In conclusion. Hooray! nearly there :)


Thanks for the feedback!


Amos
--
Please be using
  Current Stable Squid 2.7.STABLE7 or 3.0.STABLE19
  Current Beta Squid 3.1.0.14

Re: WebSockets negotiation over HTTP

Reply via email to