2012/3/13 :
> Author: markt
> Date: Tue Mar 13 14:39:24 2012
> New Revision: 1300154
>
> URL: http://svn.apache.org/viewvc?rev=1300154&view=rev
> Log:
> Add an HTTP header parser. The driver for this was an attempt to fix
> https://issues.apache.org/bugzilla/show_bug.cgi?id=52811
>
> Parsing HTTP headers as per RFC2616 is not always as simple as it first
> appears. For headers that only use tokens the simple approach will normally
> be sufficient. However, for the other headers, while simple code meets 99.9%
> of cases, there are often some edge cases that make things far more
> complicated.
>
> The purpose of this parser is to let the parser worry about the edge cases.
> It provides strict parsing of HTTP header values assuming that wrapped header
> lines have already been unwrapped. (The Tomcat header processing code does
> the unwrapping.)
>
> The parser currently supports parsing of the following HTTP header values as
> per RFC 2616:
> - Content-Type
>
> Support for additional headers will be provided as required. A quick scan of
> the Tomcat code base suggested a couple of places where using this parser may
> be useful such as Ranges in the default servlet but there was not - at this
> point - a compelling case for immediate replacement. The expectation is that
> as problems are identified in header parsing, the fix will typically extend
> this parser to support the problematic header and then use the parser rather
> than custom code.
>
> Added:
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/ (with props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstAttribute.java
> (with props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstMediaType.java
> (with props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstParameter.java
> (with props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstSubType.java
> (with props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstType.java (with
> props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstValue.java (with
> props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParser.java
> (with props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParser.jjt
> (with props)
>
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserConstants.java
> (with props)
>
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserTokenManager.java
> (with props)
>
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserTreeConstants.java
> (with props)
>
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/JJTHttpParserState.java
> (with props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/Node.java (with
> props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/ParseException.java
> (with props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/SimpleCharStream.java
> (with props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/SimpleNode.java
> (with props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/Token.java (with
> props)
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/TokenMgrError.java
> (with props)
> tomcat/trunk/test/org/apache/tomcat/util/http/parser/
> tomcat/trunk/test/org/apache/tomcat/util/http/parser/TestMediaType.java
> (with props)
> Modified:
> tomcat/trunk/.gitignore
> tomcat/trunk/build.xml
> tomcat/trunk/java/org/apache/coyote/Response.java
>
Reviewing HttpParser.jjt,
1) It seems that quoted text handling (" TOKEN"
construct) is slightly wrong in its handling of backslash character.
RFC2616:
[[[
quoted-string = ( <"> *(qdtext | quoted-pair ) <"> )
qdtext = >
The backslash character ("\") MAY be used as a single-character
quoting mechanism only within quoted-string and comment constructs.
quoted-pair= "\" CHAR
]]]
CHAR =
So quoted-pair is '\' + any character 0-127, but from the jjt file I
do not see it:
I do not see how it allows "any CHAR" after "\".
As far as I understand our implementation allows either double quote
or QUOTED_TEXT_CHAR, but QUOTED_TEXT_CHAR != CHAR. It does not include
"\", CTL characters, and wrongly includes character 128-255.
2) RFC2616: ch.3.7 Media Types
says
media-type = type "/" subtype *( ";" parameter )
type = token
subtype= token
and then
"Linear white space
(LWS) MUST NOT be used between the type and subtype, nor between an
attribute and its value."
I see that we allow LWS there, because our definitions of HTTP_TOKEN
and QUOTED_STRING already include leading and trailing LWS in them,
and thus there are trim() calls everywhere in the jjt file.
I am OK with it though, because I do not see any problem from it, but
maybe LWS could be removed from TOKEN and QUOTED_STRING constructs
along with those trim() calls.