Re: Support RFC6265 cookie processing

2014-01-01 Thread Jeremy Boynes
I have experimented with how different desktop browsers handle cookie values, 
specifically with:
* Aurora-28.0a2
* Chrome-31
* Firefox-26
* Internet Explorer-11
* Safari-7.0.1
on OS X 10.9 except for IE which was on Windows 7. This mail is a dump of 
things I found out and I will summarize conclusions on the wiki.

I first tried setting cookie values Netscape/RFC6265-style without any version 
attribute:
 Set-Cookie: 
cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X
 Set-Cookie: 
quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~
 Set-Cookie: mismatchedQuote=abc
 Set-Cookie: semi=a;c
 Set-Cookie: space= a b c
 Set-Cookie: comma=a,c
 Set-Cookie: backslash=a\c
 Set-Cookie: dquote=ac

The cookieOctet value matches the production from RFC6265 and contains all 
CHARs except DQUOTE, common, semicolon and backslash. The quoted cookie is the 
same but wrapped in DQUOTEs. All browsers accepted and returned those values as 
is. In particular, they did not remove the DQUOTEs from quoted, showing them 
stored client-side as part of the value.

The other values contain questionable values. These were all stored as-is with 
a couple of exceptions:
* leading and trailing whitespace was removed
* semi was truncated to “a” as would be expected if the “;” was being treated 
as a delimiter
* mismatched quote was stored by Safari as abc, semi=a;c, space= a b c, 
comma=a,c, backslash=a\c, dquote=ac

The Cookie request header generated by Chrome after the response above is:
 Cookie:cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X;
  
 quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~;
  mismatchedQuote=abc; semi=a; space=a b c; comma=a,c; backslash=a\c; 
 dquote=a”c
which is the same value as returned from document.cookie in JavaScript:
 cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X;
  
 quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~;
  mismatchedQuote=abc; semi=a; space=a b c; comma=a,c; backslash=a\c; 
 dquote=ac


When I attempt to set RFC2109 V1 cookies, Chrome’s behaviour is unchanged:
 Set-Cookie: 
cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X
 ; Version=1
 Set-Cookie: 
quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~
 ; Version=1
 Set-Cookie: mismatchedQuote=abc ; Version=1
 Set-Cookie: semi=a;c ; Version=1
 Set-Cookie: space= a b c ; Version=1
 Set-Cookie: comma=a,c ; Version=1
 Set-Cookie: backslash=a\c ; Version=1
 Set-Cookie: dquote=ac ; Version=1
 Set-Cookie: escaped=a\c ; Version=1
results in 
 Cookie:cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X;
  
 quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~;
  mismatchedQuote=abc; semi=a; space=a b c; comma=a,c; backslash=a\c; 
 dquote=ac; escaped=a\”c”

The other browsers show the same with the exception of Safari’s handling of 
mismatchedQuote:
 Cookie: 
 cookieOctet=X!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X;
  escaped=a\c; mismatchedQuote=abc ; Version=1, semi=a;c ; Version=1, 
 space= a b c ; Version=1, comma=a,c ; Version=1, backslash=a\c ; Version=1, 
 dquote=ac; 
 quoted=!#$%'()*+-./0123456789:=?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~”
The best I’ve been able to come up with for Safari is that if the value starts 
with a DQUOTE then it parses until it finds a matching DQUOTE, folding headers 
as it goes, including the commas from header folding, and stopping only when it 
hits a semi outside the quotes. The mismatched quotes in “mismatchQuote” and 
“dquote” end up getting paired and the resulting abc ; Version=1, semi=a;c ; 
Version=1, space= a b c ; Version=1, comma=a,c ; Version=1, backslash=a\c ; 
Version=1, dquote=a”c value (note the trailing “c”) gets assigned to the 
“mismatchedQuote” cookie. None of the browsers treat the values as being 
invalid and ignore the cookie. 

A response containing
 Set-Cookie: foo=a;b ; Version=1
which should contain the cookie value “a;b” actually results in:
 Cookie: foo=“a

except on Safari:
 Cookie: foo=a;b”
due to its quote handling above.

I have not been able to induce any of the browsers to generate a $Version 
attribute to indicate the Cookie header is a RFC2109 V1 header. They are also 
all happy to store a cookie with the name “$Version” which allows such a header 
to be faked:
 Set-Cookie: $Version=1
 Set-Cookie: foo=bar
results in:
 Cookie: $Version=1; foo=bar

All browsers are happy to create cookies with bad names. Given the response:
 Set-Cookie: 
X!#$%'()*+-./0123456789:?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[]^_`abcdefghijklmnopqrstuvwxyz{|}~X=test
 

Re: Support RFC6265 cookie processing

2013-12-27 Thread Jeremy Boynes
I think this is actually the principle that is the crux of the matter:
 With respect to the
 requirements of section 4.2, the Tomcat developers have always taken
 the view that cookie headers should be treated as combinations of
 token, separators, and quoted-string rather than *TEXT for the
 purposes of compliance with RFC2616. This view is rather fundamental
 to how cookies have been handled to date. 


Whilst I agree with the principle I think there are sufficient edge-cases in 
the ways user-agents and other servers handle cookies that trying to handle 
values by using RFC2109’s rules to avoid the need for handling “*TEXT” have 
resulted in some of the issues seen in the past. Simply put, a Netscape format 
Cookie header is not a combination of token, separators, and quoted-string.

Part of this is that I have not seen widespread adoption of RFC2109 by 
user-agents. Instead, in the way they handle values they appear to accept 
RFC2109 format Set-Cookie headers from servers, process them in a 
Netscape-format manner, and then generate a Cookie header conforming to 
Netscape’s rules. Their RFC2109 support is primarily limited to attribute 
handing (e.g. allowing Max-Age or DQUOTE characters in Path (except IE)). They 
specifically don’t handle “Version” and don’t include the “$Version” attribute 
required by RFC2109 in the Cookie header.

Reviewing the “fun,” much of it can be attributed to presuming RFC2109 can be 
applied. For example, the fun around “=“ or “/“ characters, or the use of 
separator characters in unquoted values caused by trying to apply “token” rules 
to free text. Upgrading to RFC2109 to use quoted values is also an issue as 
user-agents are not parsing them as such; for example, Chrome-31 does not 
handle a quoted-string value per RFC2109:
  Set-Cookie:Customer=WILE_;_COYOTE; Version=1; Path=/
instead treating it, loosely, as a Netscape-style value:
  Cookie:Customer=WILE_
Note, that value is not valid per RFC6265’s set-cookie-header rule but is per 
Netscape’s.

I think we can avoid this “fun” if we only apply RFC2109 rules to cookies that 
explicitly use version 1 (i.e. if setVersion(1) is called when setting a cookie 
or if $Version=“1” is present when parsing). For all others we would use 
RFC6265’s rules treating them as a less ambiguous version of Netscape’s syntax.

Whether that is actually true or not will depend on the behaviour we actually 
see from user-agents. With that in mind, I plan to start extending the tests to 
capture that behaviour so we have a clearer model of what is actually 
happening. That would then define what we need to do to ensure symmetry for the 
application developer and avoid the type of problem I mentioned above with 
Chrome. If that can be an incremental change then that would be ideal but that 
may not be possible if the change needed is to handle V0 and V1 cookies 
separately.

Cheers
Jeremy

On Dec 24, 2013, at 2:24 AM, Mark Thomas ma...@apache.org wrote:

 Signed PGP part
 On 24/12/2013 01:21, Jeremy Boynes wrote:
  In comments on issue #55917, there was suggestion for refactoring
  cookie support along the lines described in RFC6265.
 
 No, that isn't what I said. What I said was that now might be a good
 time to refactor the cookie parsing to use the HttpParser and that if
 we did that, that we should keep RFC6265 in mind. My intention was to
 suggest that if there were places where RFC6265 suggested relaxing the
 parsing rules and those rules could be relaxed without creating
 ambiguities or security concerns then we should consider such a change.
 
 I also hinted at the 'fun' we have had with cookies in the past. I
 strongly recommend that you go and read all the various discussions
 and bug reports relating to cookie parsing.
 
 In particular, the requirements of RFC2616 for HTTP headers were often
 overlooked.
 
  Reading this RFC, it appears to be more of an effort to standardize
  the actual behaviour seen on the Internet for different browser and
  server implementations. The observation is the RFC2109 has received
  limited adoption and RFC2965 virtually none at all, with most
  implementations falling back to the original specification released
  by Netscape that contains certain ambiguities.
 
 While I agree that RFC2965 has received little adoption, I disagree
 with the remainder. With the exception of internet explorer (and this
 might have changed in the few years since I last looked) the browsers
 all had pretty good support for RFC2109.
 
  The Servlet spec’s JavaDoc for Cookie refers to RFC2109 behaviour
  with caveats around interoperability. It defines version 0 as
  complying with Netscape’s original specification and version 1 as
  complying RFC2109 (with the note “Since RFC 2109 is still somewhat
  new, consider version 1 as experimental; do not use it yet on
  production sites”).
 
 That text is unchanged since at least Servlet 2.3 (12 years) so I
 wouldn't give it too much weight. The important part is that the
 

Re: Support RFC6265 cookie processing

2013-12-24 Thread Mark Thomas
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/12/2013 01:21, Jeremy Boynes wrote:
 In comments on issue #55917, there was suggestion for refactoring 
 cookie support along the lines described in RFC6265.

No, that isn't what I said. What I said was that now might be a good
time to refactor the cookie parsing to use the HttpParser and that if
we did that, that we should keep RFC6265 in mind. My intention was to
suggest that if there were places where RFC6265 suggested relaxing the
parsing rules and those rules could be relaxed without creating
ambiguities or security concerns then we should consider such a change.

I also hinted at the 'fun' we have had with cookies in the past. I
strongly recommend that you go and read all the various discussions
and bug reports relating to cookie parsing.

In particular, the requirements of RFC2616 for HTTP headers were often
overlooked.

 Reading this RFC, it appears to be more of an effort to standardize
 the actual behaviour seen on the Internet for different browser and
 server implementations. The observation is the RFC2109 has received
 limited adoption and RFC2965 virtually none at all, with most
 implementations falling back to the original specification released
 by Netscape that contains certain ambiguities.

While I agree that RFC2965 has received little adoption, I disagree
with the remainder. With the exception of internet explorer (and this
might have changed in the few years since I last looked) the browsers
all had pretty good support for RFC2109.

 The Servlet spec’s JavaDoc for Cookie refers to RFC2109 behaviour 
 with caveats around interoperability. It defines version 0 as 
 complying with Netscape’s original specification and version 1 as 
 complying RFC2109 (with the note “Since RFC 2109 is still somewhat 
 new, consider version 1 as experimental; do not use it yet on 
 production sites”).

That text is unchanged since at least Servlet 2.3 (12 years) so I
wouldn't give it too much weight. The important part is that the
Servlet spec requires support for the Netscape spec and RFC2109.

 The current implementation uses a number of system properties to 
 control how cookies are validated. In implementing RFC6265 I hope 
 that some of these can be eliminated. If not, I would propose to
 add configuration options on the Connector or Host objects to allow
 the configuration to be set separately for different host domains.

Every single one of those options was created for a good reason. As a
minimum, the discussion that lead to the creation of the option needs
to be reviewed before dropping the option.

Per connector or per host options are unlikely to be useful. Per
context options are more likely to be a better choice.

 RFC6265 has separate sections in respect for generating and
 parsing cookie headers. It follows the practice that generation be
 strict but parsing be more tolerant of invalid input. Our current
 implementation generally follows that trend by suppressing invalid
 input data (after logging). However, for some input data, primary
 CTLs, it throws an IllegalArgumentException from the connector
 which does not allow the application to recover. In refactoring, I
 would propose to simply ignore that input thereby allowing the
 application to handle it, for example by parsing the header field
 manually. Cookie parsing in particular needs to be tolerant of
 cookies set by other sources, including different servers handling
 other parts of the domain and JavaScript or other client-side code
 setting values in the browser.

What do you mean by ignoring the input? It could be any of:
a) skipping the CTL character but otherwise processing the cookie
b) replacing the CTL with a space
c) ignoring the individual cookie
d) ignoring the entire cookie header

I'd support c) but am unlikely to support a) b) or d)

 In light of this, I propose separating the “Set-Cookie” generation 
 side from the “Cookie” parsing side.

A complete separation may not be appropriate. They share a lot of
common code.

 Generation == The general principle here would be to use
 the version property of Cookie to determine the level of
 verification to perform: if 0 follow RFC6265, if 1 use RFC2109. The
 primary verification point would be in
 HttpServletRequest#addCookie() which would use the version in the
 Cookie instance. Characters will always be converted to octets
 using the ISO-8859-1 charset; unmappable values will result in an
 IAE.

That is what used to happen and will lead to invalid cookies being
generated. For valid cookies to be generated, the automatic switching
from v0 to v1 needs to be retained.

(As an aside, I believe the logic for this is correct but the comments
and the method names appear to be misleading.)

 The Servlet spec requires an IAE be thrown in Cookie’s constructor
 if the name is not valid pre RFC2109. Both RFC6265 and RFC2109
 define the name to be a “token” (per RFC2616 HTTP/1.1) so I would
 propose to always validate by those 

Support RFC6265 cookie processing

2013-12-23 Thread Jeremy Boynes
In comments on issue #55917, there was suggestion for refactoring cookie 
support along the lines described in RFC6265. Reading this RFC, it appears to 
be more of an effort to standardize the actual behaviour seen on the Internet 
for different browser and server implementations. The observation is the 
RFC2109 has received limited adoption and RFC2965 virtually none at all, with 
most implementations falling back to the original specification released by 
Netscape that contains certain ambiguities. 

The Servlet spec’s JavaDoc for Cookie refers to RFC2109 behaviour with caveats 
around interoperability. It defines version 0 as complying with Netscape’s 
original specification and version 1 as complying RFC2109 (with the note “Since 
RFC 2109 is still somewhat new, consider version 1 as experimental; do not use 
it yet on production sites”).

The current implementation uses a number of system properties to control how 
cookies are validated. In implementing RFC6265 I hope that some of these can be 
eliminated. If not, I would propose to add configuration options on the 
Connector or Host objects to allow the configuration to be set separately for 
different host domains.

RFC6265 has separate sections in respect for generating and parsing cookie 
headers. It follows the practice that generation be strict but parsing be more 
tolerant of invalid input. Our current implementation generally follows that 
trend by suppressing invalid input data (after logging). However, for some 
input data, primary CTLs, it throws an IllegalArgumentException from the 
connector which does not allow the application to recover. In refactoring, I 
would propose to simply ignore that input thereby allowing the application to 
handle it, for example by parsing the header field manually. Cookie parsing in 
particular needs to be tolerant of cookies set by other sources, including 
different servers handling other parts of the domain and JavaScript or other 
client-side code setting values in the browser.

In light of this, I propose separating the “Set-Cookie” generation side from 
the “Cookie” parsing side.

Generation
==
The general principle here would be to use the version property of Cookie to 
determine the level of verification to perform: if 0 follow RFC6265, if 1 use 
RFC2109. The primary verification point would be in 
HttpServletRequest#addCookie() which would use the version in the Cookie 
instance. Characters will always be converted to octets using the ISO-8859-1 
charset; unmappable values will result in an IAE.

The Servlet spec requires an IAE be thrown in Cookie’s constructor if the name 
is not valid pre RFC2109. Both RFC6265 and RFC2109 define the name to be a 
“token” (per RFC2616 HTTP/1.1) so I would propose to always validate by those 
rules; this would allow US-ASCII characters except CTLs and separators. This 
will different from the current implementation that slash “/“ would be treated 
as a separator which would not be allowed in a name by default; this is 
consistent with the RFC’s and Glassfish’s implementation and I’m assuming that 
allowing it in our current implementation is a hangover from where we enabled 
use of “/“ in values. 

The spec allows vendors to provide a configuration option that allows cookie 
names conforming to the original Netscape Cookie Specification to be accepted” 
and I propose to retain the system property 
“org.apache.tomcat.util.http.ServerCookie.STRICT_NAMING” for that. If 
explicitly set to false, it will verify names using Netscape’s rules and allow 
a sequence of characters excluding semi-colon, comma and white space” but also 
excluding “=“ and CTLs per RFC2616; note this *would* allow 8-bit ISO-8859-1 
characters in the name and relax the RFC2109 constraint that NAMEs that begin 
with $ are reserved for other uses and must not be used by applications.” 

The value would not be checked until addCookie() was called and the cookie 
version is known. This would in principle use RFC6265’s “cookie-value” rule if 
version == 0 or RFC2109’s “value” rules if version == 1; values that do not 
conform would result in an IAE from addCookie(). Unlike the current 
implementation, this would not automatically upgrade the version or add quotes 
around RFC2109 “values” that did not match the “token” rule.

If STRICT_SERVLET_COMPLIANCE is set, the rule for version 0 values would be 
relaxed to allow any value conforming to the Netscape specification except 
CTLs; this would effectively add DQUOTE, backslash, and 0x80-0xFF. For more 
granular control, I propose adding the system property 
“org.apache.tomcat.util.http.ServerCookie.ALLOW_IN_VALUE” which would take one 
of the following enum values to determine what octets were allowed:
* Netscape
* RFC2616_token
* RFC2109_value
* RFC6265_cookie_octet
* Netscape_restricted (limits the permitted characters as recommended in the 
Servlet spec)
* RFC6265_ISO-8859-1 (adds 0x80-0xff to cookie_octet)

RFC6265 does allow value to be