Re: Support RFC6265 cookie processing
I have experimented with how different desktop browsers handle cookie values, specifically with: * Aurora-28.0a2 * Chrome-31 * Firefox-26 * Internet Explorer-11 * Safari-7.0.1 on OS X 10.9 except for IE which was on Windows 7. This mail is a dump of things I found out and I will summarize conclusions on the wiki. I first tried setting cookie values Netscape/RFC6265-style without any version attribute: Set-Cookie: cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X Set-Cookie: quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~ Set-Cookie: mismatchedQuote=abc Set-Cookie: semi=a;c Set-Cookie: space= a b c Set-Cookie: comma=a,c Set-Cookie: backslash=a\c Set-Cookie: dquote=ac The cookieOctet value matches the production from RFC6265 and contains all CHARs except DQUOTE, common, semicolon and backslash. The quoted cookie is the same but wrapped in DQUOTEs. All browsers accepted and returned those values as is. In particular, they did not remove the DQUOTEs from quoted, showing them stored client-side as part of the value. The other values contain questionable values. These were all stored as-is with a couple of exceptions: * leading and trailing whitespace was removed * semi was truncated to “a” as would be expected if the “;” was being treated as a delimiter * mismatched quote was stored by Safari as abc, semi=a;c, space= a b c, comma=a,c, backslash=a\c, dquote=ac The Cookie request header generated by Chrome after the response above is: Cookie:cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X; quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~; mismatchedQuote=abc; semi=a; space=a b c; comma=a,c; backslash=a\c; dquote=a”c which is the same value as returned from document.cookie in JavaScript: cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X; quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~; mismatchedQuote=abc; semi=a; space=a b c; comma=a,c; backslash=a\c; dquote=ac When I attempt to set RFC2109 V1 cookies, Chrome’s behaviour is unchanged: Set-Cookie: cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X ; Version=1 Set-Cookie: quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~ ; Version=1 Set-Cookie: mismatchedQuote=abc ; Version=1 Set-Cookie: semi=a;c ; Version=1 Set-Cookie: space= a b c ; Version=1 Set-Cookie: comma=a,c ; Version=1 Set-Cookie: backslash=a\c ; Version=1 Set-Cookie: dquote=ac ; Version=1 Set-Cookie: escaped=a\c ; Version=1 results in Cookie:cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X; quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~; mismatchedQuote=abc; semi=a; space=a b c; comma=a,c; backslash=a\c; dquote=ac; escaped=a\”c” The other browsers show the same with the exception of Safari’s handling of mismatchedQuote: Cookie: cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X; escaped=a\c; mismatchedQuote=abc ; Version=1, semi=a;c ; Version=1, space= a b c ; Version=1, comma=a,c ; Version=1, backslash=a\c ; Version=1, dquote=ac; quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~” The best I’ve been able to come up with for Safari is that if the value starts with a DQUOTE then it parses until it finds a matching DQUOTE, folding headers as it goes, including the commas from header folding, and stopping only when it hits a semi outside the quotes. The mismatched quotes in “mismatchQuote” and “dquote” end up getting paired and the resulting abc ; Version=1, semi=a;c ; Version=1, space= a b c ; Version=1, comma=a,c ; Version=1, backslash=a\c ; Version=1, dquote=a”c value (note the trailing “c”) gets assigned to the “mismatchedQuote” cookie. None of the browsers treat the values as being invalid and ignore the cookie. A response containing Set-Cookie: foo=a;b ; Version=1 which should contain the cookie value “a;b” actually results in: Cookie: foo=“a except on Safari: Cookie: foo=a;b” due to its quote handling above. I have not been able to induce any of the browsers to generate a $Version attribute to indicate the Cookie header is a RFC2109 V1 header. They are also all happy to store a cookie with the name “$Version” which allows such a header to be faked: Set-Cookie: $Version=1 Set-Cookie: foo=bar results in: Cookie: $Version=1; foo=bar All browsers are happy to create cookies with bad names. Given the response: Set-Cookie: X!#$%'()*+-./0123456789:?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X=test
Re: Support RFC6265 cookie processing
I think this is actually the principle that is the crux of the matter: With respect to the requirements of section 4.2, the Tomcat developers have always taken the view that cookie headers should be treated as combinations of token, separators, and quoted-string rather than *TEXT for the purposes of compliance with RFC2616. This view is rather fundamental to how cookies have been handled to date. Whilst I agree with the principle I think there are sufficient edge-cases in the ways user-agents and other servers handle cookies that trying to handle values by using RFC2109’s rules to avoid the need for handling “*TEXT” have resulted in some of the issues seen in the past. Simply put, a Netscape format Cookie header is not a combination of token, separators, and quoted-string. Part of this is that I have not seen widespread adoption of RFC2109 by user-agents. Instead, in the way they handle values they appear to accept RFC2109 format Set-Cookie headers from servers, process them in a Netscape-format manner, and then generate a Cookie header conforming to Netscape’s rules. Their RFC2109 support is primarily limited to attribute handing (e.g. allowing Max-Age or DQUOTE characters in Path (except IE)). They specifically don’t handle “Version” and don’t include the “$Version” attribute required by RFC2109 in the Cookie header. Reviewing the “fun,” much of it can be attributed to presuming RFC2109 can be applied. For example, the fun around “=“ or “/“ characters, or the use of separator characters in unquoted values caused by trying to apply “token” rules to free text. Upgrading to RFC2109 to use quoted values is also an issue as user-agents are not parsing them as such; for example, Chrome-31 does not handle a quoted-string value per RFC2109: Set-Cookie:Customer=WILE_;_COYOTE; Version=1; Path=/ instead treating it, loosely, as a Netscape-style value: Cookie:Customer=WILE_ Note, that value is not valid per RFC6265’s set-cookie-header rule but is per Netscape’s. I think we can avoid this “fun” if we only apply RFC2109 rules to cookies that explicitly use version 1 (i.e. if setVersion(1) is called when setting a cookie or if $Version=“1” is present when parsing). For all others we would use RFC6265’s rules treating them as a less ambiguous version of Netscape’s syntax. Whether that is actually true or not will depend on the behaviour we actually see from user-agents. With that in mind, I plan to start extending the tests to capture that behaviour so we have a clearer model of what is actually happening. That would then define what we need to do to ensure symmetry for the application developer and avoid the type of problem I mentioned above with Chrome. If that can be an incremental change then that would be ideal but that may not be possible if the change needed is to handle V0 and V1 cookies separately. Cheers Jeremy On Dec 24, 2013, at 2:24 AM, Mark Thomas ma...@apache.org wrote: Signed PGP part On 24/12/2013 01:21, Jeremy Boynes wrote: In comments on issue #55917, there was suggestion for refactoring cookie support along the lines described in RFC6265. No, that isn't what I said. What I said was that now might be a good time to refactor the cookie parsing to use the HttpParser and that if we did that, that we should keep RFC6265 in mind. My intention was to suggest that if there were places where RFC6265 suggested relaxing the parsing rules and those rules could be relaxed without creating ambiguities or security concerns then we should consider such a change. I also hinted at the 'fun' we have had with cookies in the past. I strongly recommend that you go and read all the various discussions and bug reports relating to cookie parsing. In particular, the requirements of RFC2616 for HTTP headers were often overlooked. Reading this RFC, it appears to be more of an effort to standardize the actual behaviour seen on the Internet for different browser and server implementations. The observation is the RFC2109 has received limited adoption and RFC2965 virtually none at all, with most implementations falling back to the original specification released by Netscape that contains certain ambiguities. While I agree that RFC2965 has received little adoption, I disagree with the remainder. With the exception of internet explorer (and this might have changed in the few years since I last looked) the browsers all had pretty good support for RFC2109. The Servlet spec’s JavaDoc for Cookie refers to RFC2109 behaviour with caveats around interoperability. It defines version 0 as complying with Netscape’s original specification and version 1 as complying RFC2109 (with the note “Since RFC 2109 is still somewhat new, consider version 1 as experimental; do not use it yet on production sites”). That text is unchanged since at least Servlet 2.3 (12 years) so I wouldn't give it too much weight. The important part is that the
Re: Support RFC6265 cookie processing
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24/12/2013 01:21, Jeremy Boynes wrote: In comments on issue #55917, there was suggestion for refactoring cookie support along the lines described in RFC6265. No, that isn't what I said. What I said was that now might be a good time to refactor the cookie parsing to use the HttpParser and that if we did that, that we should keep RFC6265 in mind. My intention was to suggest that if there were places where RFC6265 suggested relaxing the parsing rules and those rules could be relaxed without creating ambiguities or security concerns then we should consider such a change. I also hinted at the 'fun' we have had with cookies in the past. I strongly recommend that you go and read all the various discussions and bug reports relating to cookie parsing. In particular, the requirements of RFC2616 for HTTP headers were often overlooked. Reading this RFC, it appears to be more of an effort to standardize the actual behaviour seen on the Internet for different browser and server implementations. The observation is the RFC2109 has received limited adoption and RFC2965 virtually none at all, with most implementations falling back to the original specification released by Netscape that contains certain ambiguities. While I agree that RFC2965 has received little adoption, I disagree with the remainder. With the exception of internet explorer (and this might have changed in the few years since I last looked) the browsers all had pretty good support for RFC2109. The Servlet spec’s JavaDoc for Cookie refers to RFC2109 behaviour with caveats around interoperability. It defines version 0 as complying with Netscape’s original specification and version 1 as complying RFC2109 (with the note “Since RFC 2109 is still somewhat new, consider version 1 as experimental; do not use it yet on production sites”). That text is unchanged since at least Servlet 2.3 (12 years) so I wouldn't give it too much weight. The important part is that the Servlet spec requires support for the Netscape spec and RFC2109. The current implementation uses a number of system properties to control how cookies are validated. In implementing RFC6265 I hope that some of these can be eliminated. If not, I would propose to add configuration options on the Connector or Host objects to allow the configuration to be set separately for different host domains. Every single one of those options was created for a good reason. As a minimum, the discussion that lead to the creation of the option needs to be reviewed before dropping the option. Per connector or per host options are unlikely to be useful. Per context options are more likely to be a better choice. RFC6265 has separate sections in respect for generating and parsing cookie headers. It follows the practice that generation be strict but parsing be more tolerant of invalid input. Our current implementation generally follows that trend by suppressing invalid input data (after logging). However, for some input data, primary CTLs, it throws an IllegalArgumentException from the connector which does not allow the application to recover. In refactoring, I would propose to simply ignore that input thereby allowing the application to handle it, for example by parsing the header field manually. Cookie parsing in particular needs to be tolerant of cookies set by other sources, including different servers handling other parts of the domain and JavaScript or other client-side code setting values in the browser. What do you mean by ignoring the input? It could be any of: a) skipping the CTL character but otherwise processing the cookie b) replacing the CTL with a space c) ignoring the individual cookie d) ignoring the entire cookie header I'd support c) but am unlikely to support a) b) or d) In light of this, I propose separating the “Set-Cookie” generation side from the “Cookie” parsing side. A complete separation may not be appropriate. They share a lot of common code. Generation == The general principle here would be to use the version property of Cookie to determine the level of verification to perform: if 0 follow RFC6265, if 1 use RFC2109. The primary verification point would be in HttpServletRequest#addCookie() which would use the version in the Cookie instance. Characters will always be converted to octets using the ISO-8859-1 charset; unmappable values will result in an IAE. That is what used to happen and will lead to invalid cookies being generated. For valid cookies to be generated, the automatic switching from v0 to v1 needs to be retained. (As an aside, I believe the logic for this is correct but the comments and the method names appear to be misleading.) The Servlet spec requires an IAE be thrown in Cookie’s constructor if the name is not valid pre RFC2109. Both RFC6265 and RFC2109 define the name to be a “token” (per RFC2616 HTTP/1.1) so I would propose to always validate by those
Support RFC6265 cookie processing
In comments on issue #55917, there was suggestion for refactoring cookie support along the lines described in RFC6265. Reading this RFC, it appears to be more of an effort to standardize the actual behaviour seen on the Internet for different browser and server implementations. The observation is the RFC2109 has received limited adoption and RFC2965 virtually none at all, with most implementations falling back to the original specification released by Netscape that contains certain ambiguities. The Servlet spec’s JavaDoc for Cookie refers to RFC2109 behaviour with caveats around interoperability. It defines version 0 as complying with Netscape’s original specification and version 1 as complying RFC2109 (with the note “Since RFC 2109 is still somewhat new, consider version 1 as experimental; do not use it yet on production sites”). The current implementation uses a number of system properties to control how cookies are validated. In implementing RFC6265 I hope that some of these can be eliminated. If not, I would propose to add configuration options on the Connector or Host objects to allow the configuration to be set separately for different host domains. RFC6265 has separate sections in respect for generating and parsing cookie headers. It follows the practice that generation be strict but parsing be more tolerant of invalid input. Our current implementation generally follows that trend by suppressing invalid input data (after logging). However, for some input data, primary CTLs, it throws an IllegalArgumentException from the connector which does not allow the application to recover. In refactoring, I would propose to simply ignore that input thereby allowing the application to handle it, for example by parsing the header field manually. Cookie parsing in particular needs to be tolerant of cookies set by other sources, including different servers handling other parts of the domain and JavaScript or other client-side code setting values in the browser. In light of this, I propose separating the “Set-Cookie” generation side from the “Cookie” parsing side. Generation == The general principle here would be to use the version property of Cookie to determine the level of verification to perform: if 0 follow RFC6265, if 1 use RFC2109. The primary verification point would be in HttpServletRequest#addCookie() which would use the version in the Cookie instance. Characters will always be converted to octets using the ISO-8859-1 charset; unmappable values will result in an IAE. The Servlet spec requires an IAE be thrown in Cookie’s constructor if the name is not valid pre RFC2109. Both RFC6265 and RFC2109 define the name to be a “token” (per RFC2616 HTTP/1.1) so I would propose to always validate by those rules; this would allow US-ASCII characters except CTLs and separators. This will different from the current implementation that slash “/“ would be treated as a separator which would not be allowed in a name by default; this is consistent with the RFC’s and Glassfish’s implementation and I’m assuming that allowing it in our current implementation is a hangover from where we enabled use of “/“ in values. The spec allows vendors to provide a configuration option that allows cookie names conforming to the original Netscape Cookie Specification to be accepted” and I propose to retain the system property “org.apache.tomcat.util.http.ServerCookie.STRICT_NAMING” for that. If explicitly set to false, it will verify names using Netscape’s rules and allow a sequence of characters excluding semi-colon, comma and white space” but also excluding “=“ and CTLs per RFC2616; note this *would* allow 8-bit ISO-8859-1 characters in the name and relax the RFC2109 constraint that NAMEs that begin with $ are reserved for other uses and must not be used by applications.” The value would not be checked until addCookie() was called and the cookie version is known. This would in principle use RFC6265’s “cookie-value” rule if version == 0 or RFC2109’s “value” rules if version == 1; values that do not conform would result in an IAE from addCookie(). Unlike the current implementation, this would not automatically upgrade the version or add quotes around RFC2109 “values” that did not match the “token” rule. If STRICT_SERVLET_COMPLIANCE is set, the rule for version 0 values would be relaxed to allow any value conforming to the Netscape specification except CTLs; this would effectively add DQUOTE, backslash, and 0x80-0xFF. For more granular control, I propose adding the system property “org.apache.tomcat.util.http.ServerCookie.ALLOW_IN_VALUE” which would take one of the following enum values to determine what octets were allowed: * Netscape * RFC2616_token * RFC2109_value * RFC6265_cookie_octet * Netscape_restricted (limits the permitted characters as recommended in the Servlet spec) * RFC6265_ISO-8859-1 (adds 0x80-0xff to cookie_octet) RFC6265 does allow value to be