Re: parsing quoted-string HTTP header fields

Amos Jeffries Thu, 26 May 2011 09:20:12 -0700

On 27/05/11 03:01, Alex Rousskov wrote:

On 05/26/2011 06:14 AM, Amos Jeffries wrote:

On 26/05/11 19:02, Tsantilas Christos wrote:

On 05/26/2011 07:42 AM, Amos Jeffries wrote:

On Wed, 25 May 2011 20:07:13 +0300, Tsantilas Christos wrote:

The second patch does not look slower, but maybe we do not want to be
strict for many reasons. One reason is that this function used to
"Configurable SSL error details messages" patch, to parse the "descr:"
and "detail:" fields :-)
The other reason is that it is usual to find web clients or servers
which uses a simple "\n" instead of the "\r\n"...

On 05/25/2011 02:51 PM, Amos Jeffries wrote:

On 25/05/11 22:44, Tsantilas Christos wrote:

The quoted string fields parsing done in squid3 using the
httpHeaderParseQuotedString function (HttpHeaderTools.cc file)

I found that we do not support multiline quoted string fields, but
according the rfc2616 multiline quoted string fields should
supported:

quoted-string = (<">  *(qdtext | quoted-pair )<">  )
qdtext =<any TEXT except<">>
TEXT =<any OCTET except CTLs,
but including LWS>
LWS = [CRLF] 1*( SP | HT



I am planning to fix the httpHeaderParseQuotedString function, using
one
of the following rules:

1) Just ignore any "\r" or "\n" character. This is the fastest and
simpler approach
3) Require "\r\n " or "\r\n\t" as line separator and just remove the
"\r\n"
3) Require "\r\n " or "\r\n\t" as line separator and replace it
with a
space
4) Also support "\n\t" or "\n\n\t" as line separators.


(4) case #2 "\n\n\t" is also non-compliant.


Any comments or suggestions?


RFC states "A recipient MAY replace any linear white space with a
single
SP".

So (1) and the second (3) {replace with 0x20} are the valid options.

It may be worth having two quoted-string parsers. One for each option.
The preference being (1) for its higher performance.

Amos



Ah, okay. I forgot the quoted string un-escapes as it goes. So there is
no extra penalty from (3). In which case I prefer (3) over (1), it has a
lot more error-checking for free.


Patch for case (3) looks great and is RFC compliant. Do we want to be
lenient and allow the absence of \r ?


Sorry, I really do not like both the policing intent of #3 and #3
implementation.

Policing intent: RFC 2616 recommends that we ignore [the lack of] CR
when interpreting CRLF in HTTP headers:

    The line terminator for message-header fields is the sequence CRLF.
    However, we recommend that applications, when parsing such headers,
    recognize a single LF as a line terminator and ignore the leading CR.


Granted, the above can be strictly interpreted to apply to the final
header terminator and not LWS, but I think it is reasonable to extend
the above logic to header continuation (and to LWS in general).


So one vote for leniency (alex). One don't care (amos).


Implementation:

      while (*pos != '"'&&  len>  (pos-start)) {
+
+        if (*pos =='\r') {
+            pos ++;
+            if (*(pos++) != '\n'  || (*pos != ' '&&  *pos != '\t')) {


Can the above double increment lead to *pos pointing beyond the string
boundaries?


Yes. A dangerous only to the debugs(). Which needs a --pos--


Will the above incorrectly accept CR x HT sequence, where "x" is any
character other than LF?


Exactly how dangerous are solo-CR floating around in quoted-string?


Finally, if we are going to replace LWS with a single SP, we should
replace the whole LWS and not just the initial CRLF (SP|HT) prefix.


Well, we may have already skipped over and accepted the prefix of it.

If we ignore the CR, then that too is likely to also be appended to theoutput as a solo.


Overall, I think we should go for #1 for now because it is likely to be
more compatible with real traffic and to have fewer bugs.

Eventually, we should probably do something close to #3 but without bugs
and without rejecting the whole header when a field has a quoted string
with malformed LWS. We should be careful about enabling message
smuggling attacks when policing messages so it may be tricky to get this
right for all cases.

I wanted to consult with you on the smuggling cases again. To check ifwe need to drop all CR here or just the ones in LWS.

This function is restricted to a single quoted-string token. The priorpieces of header are not touched here. The latter I'm not sure about,that is a caller issue.

+        bool parse_error = (*end<= 0x1F&&  *end != '\r'&&  *end != '\n') || 
*end == 0x7F;


This variable can be declared const.

I do not fully understand what the above code and the while loop above
is really doing. If you know what it does, please add a comment.


FYI:

The while loop skips over as many safe octets as possible (excl. \")to optimize calls to append().

The bool checks whether the octet which stopped that scan was importantenough to abort.



Amos
--
Please be using
  Current Stable Squid 2.7.STABLE9 or 3.1.12
  Beta testers wanted for 3.2.0.7 and 3.1.12.1

Re: parsing quoted-string HTTP header fields

Reply via email to