Re: svn commit: r1300154 [1/2] - in /tomcat/trunk: ./ java/org/apache/coyote/ java/org/apache/tomcat/util/http/parser/ test/org/apache/tomcat/util/http/parser/

2012-03-23 Thread Konstantin Kolinko
2012/3/23 Mark Thomas :
> On 23/03/2012 01:41, Konstantin Kolinko wrote:
>> 2012/3/13  :
>>> Author: markt
>>> Date: Tue Mar 13 14:39:24 2012
>>> New Revision: 1300154
>
>> 1)  It seems that quoted text handling (" TOKEN"
>> construct) is slightly wrong in its handling of backslash character.
>
> Agreed. Fixed.
>

OK

>> 2) RFC2616: ch.3.7 Media Types
>
>> "Linear white space
>>    (LWS) MUST NOT be used between the type and subtype, nor between an
>>    attribute and its value."
>
>> I am OK with it though, because I do not see any problem from it, but
>> maybe LWS could be removed from TOKEN and QUOTED_STRING constructs
>> along with those trim() calls.
>
> Keeping the code as is should make re-use of the parser for other
> headers easier. As long as the LWS does not cause an issue, I'm OK with
> the current approach. After all, servers are meant to be tolerant where
> they can be.
>

ACK

>> 3)
>> I do not see where we allow "charset" parameter name to be
>> case-insensitive. We allow lowercase only.
>
> Agreed. Fixed.
>

toLowerCase()? We are passing Locale.ENGLISH to such calls elsewhere.
I think that String.equalsIgnoreCase() would be faster (no additional
object creation and stops on first mismatch)

It does not matter for charset though.

>> 4) The "value" of a parameter can be quoted-string. We do not handle
>> unquoting of charset parameter value.  I am OK with leaving as is,
>> because nobody asked.
>
> We do handle this. jjtGetValue() will unquote the value if it is quoted.
> toString() will return the original value.
>

OK, I see - in AstValue.java

Unquoting is actually not as trivial as just replacing \" with ",  but
for our use case (charset value) it is OK.

Best regards,
Konstantin Kolinko

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: svn commit: r1300154 [1/2] - in /tomcat/trunk: ./ java/org/apache/coyote/ java/org/apache/tomcat/util/http/parser/ test/org/apache/tomcat/util/http/parser/

2012-03-23 Thread Mark Thomas
On 23/03/2012 01:41, Konstantin Kolinko wrote:
> 2012/3/13  :
>> Author: markt
>> Date: Tue Mar 13 14:39:24 2012
>> New Revision: 1300154

> 1)  It seems that quoted text handling (" TOKEN"
> construct) is slightly wrong in its handling of backslash character.

Agreed. Fixed.

> 2) RFC2616: ch.3.7 Media Types

> "Linear white space
>(LWS) MUST NOT be used between the type and subtype, nor between an
>attribute and its value."

> I am OK with it though, because I do not see any problem from it, but
> maybe LWS could be removed from TOKEN and QUOTED_STRING constructs
> along with those trim() calls.

Keeping the code as is should make re-use of the parser for other
headers easier. As long as the LWS does not cause an issue, I'm OK with
the current approach. After all, servers are meant to be tolerant where
they can be.

> 3) 
> I do not see where we allow "charset" parameter name to be
> case-insensitive. We allow lowercase only.

Agreed. Fixed.

> 4) The "value" of a parameter can be quoted-string. We do not handle
> unquoting of charset parameter value.  I am OK with leaving as is,
> because nobody asked.

We do handle this. jjtGetValue() will unquote the value if it is quoted.
toString() will return the original value.

Mark

-
To unsubscribe, e-mail: dev-unsubscr...@tomcat.apache.org
For additional commands, e-mail: dev-h...@tomcat.apache.org



Re: svn commit: r1300154 [1/2] - in /tomcat/trunk: ./ java/org/apache/coyote/ java/org/apache/tomcat/util/http/parser/ test/org/apache/tomcat/util/http/parser/

2012-03-22 Thread Konstantin Kolinko
2012/3/13  :
> Author: markt
> Date: Tue Mar 13 14:39:24 2012
> New Revision: 1300154
>
> URL: http://svn.apache.org/viewvc?rev=1300154&view=rev
> Log:
> Add an HTTP header parser. The driver for this was an attempt to fix 
> https://issues.apache.org/bugzilla/show_bug.cgi?id=52811
>
> Parsing HTTP headers as per RFC2616 is not always as simple as it first 
> appears. For headers that only use tokens the simple approach will normally 
> be sufficient. However, for the other headers, while simple code meets 99.9% 
> of cases, there are often some edge cases that make things far more 
> complicated.
>
> The purpose of this parser is to let the parser worry about the edge cases. 
> It provides strict parsing of HTTP header values assuming that wrapped header 
> lines have already been unwrapped. (The Tomcat header processing code does 
> the unwrapping.)
>
> The parser currently supports parsing of the following HTTP header values as 
> per RFC 2616:
>  - Content-Type
>
> Support for additional headers will be provided as required. A quick scan of 
> the Tomcat code base suggested a couple of places where using this parser may 
> be useful such as  Ranges in the default servlet but there was not - at this 
> point - a compelling case for immediate replacement. The expectation is that 
> as problems are identified in header parsing, the fix will typically extend 
> this parser to support the problematic header and then use the parser rather 
> than custom code.
>
> Added:
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/   (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstAttribute.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstMediaType.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstParameter.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstSubType.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstType.java   (with 
> props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/AstValue.java   (with 
> props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParser.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParser.jjt   
> (with props)
>    
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserConstants.java 
>   (with props)
>    
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserTokenManager.java
>    (with props)
>    
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/HttpParserTreeConstants.java
>    (with props)
>    
> tomcat/trunk/java/org/apache/tomcat/util/http/parser/JJTHttpParserState.java  
>  (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/Node.java   (with 
> props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/ParseException.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/SimpleCharStream.java 
>   (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/SimpleNode.java   
> (with props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/Token.java   (with 
> props)
>    tomcat/trunk/java/org/apache/tomcat/util/http/parser/TokenMgrError.java   
> (with props)
>    tomcat/trunk/test/org/apache/tomcat/util/http/parser/
>    tomcat/trunk/test/org/apache/tomcat/util/http/parser/TestMediaType.java   
> (with props)
> Modified:
>    tomcat/trunk/.gitignore
>    tomcat/trunk/build.xml
>    tomcat/trunk/java/org/apache/coyote/Response.java
>

Reviewing HttpParser.jjt,

1)  It seems that quoted text handling (" TOKEN"
construct) is slightly wrong in its handling of backslash character.

RFC2616:
[[[
   quoted-string  = ( <"> *(qdtext | quoted-pair ) <"> )
   qdtext = >

   The backslash character ("\") MAY be used as a single-character
   quoting mechanism only within quoted-string and comment constructs.

   quoted-pair= "\" CHAR
]]]
   CHAR   = 

So quoted-pair is '\' + any character 0-127, but from the jjt file I
do not see it:
I do not see how it allows "any CHAR" after "\".

As far as I understand our implementation allows either double quote
or QUOTED_TEXT_CHAR, but QUOTED_TEXT_CHAR != CHAR. It does not include
"\", CTL characters,  and wrongly includes character 128-255.


2) RFC2616: ch.3.7 Media Types

says
   media-type = type "/" subtype *( ";" parameter )
   type   = token
   subtype= token
and then
"Linear white space
   (LWS) MUST NOT be used between the type and subtype, nor between an
   attribute and its value."

I see that we allow LWS there, because our definitions of HTTP_TOKEN
and QUOTED_STRING already include leading and trailing LWS in them,
and thus there are trim() calls everywhere in the jjt file.

I am OK with it though, because I do not see any problem from it, but
maybe LWS could be removed from TOKEN and QUOTED_STRING constructs
along with those trim() calls.