haproxy gives 502 on links with utf-8 chars?!

2010-11-19 Thread Jakov Sosic
Hi.


I have a haproxy doing load balacing between two apache servers which
have mod_jk. Application is on JBoss application server. Problem that I
have noticed is that if link has some UTF-8 character (Croatian language
characters), then haproxy gives error 502. Here is example from log:


Nov 19 12:40:24 porat haproxy[28047]: aaa.bbb.ccc.ddd:port
[19/Nov/2010:12:40:24.040] www www/backend-srv1 0/0/0/-1/135 502 1833 -
- PHVN 1/1/1/0/0 0/0 GET
/pithos/rest/usern...@domain/files/folder%C4%8Di%C4%87/ HTTP/1.1

Nov 19 12:40:34 porat haproxy[28047]: aaa.bbb.ccc.ddd:port
[19/Nov/2010:12:40:34.710] www www/backend-srv1 0/0/0/-1/82 502 1061 - -
PHVN 5/5/5/4/0 0/0 GET
/pithos/rest/usern...@domain/files/%C4%8D%C4%87%C5%A1%C4%91%C5%BE/
HTTP/1.1

Problem only occurs for links with those specific characters.

Interesting thing is that haproxy is the reason for that errors, because
when I try to get those same links directly from backend servers, links
work without problem...

Here is a log from apache backend:

aaa.bbb.ccc.ddd - - [19/Nov/2010:12:40:24 +0100] GET
/pithos/rest/usern...@domain/files/folder%C4%8Di%C4%87/ HTTP/1.1 200
1185
http://somethin.somedomain/pithos/A5707EF1550DF3AECFB3F1CB7B89E240.cache.html;
Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.10 (KHTML, like
Gecko) Chrome/7.0.536.2 Safari/534.10



Any ideas? Is there anything how I can debug this?



-- 
Jakov Sosic



Re: haproxy gives 502 on links with utf-8 chars?!

2010-11-19 Thread Willy Tarreau
Hi Jakov,

On Fri, Nov 19, 2010 at 01:06:39PM +0100, Jakov Sosic wrote:
 Hi.
 
 
 I have a haproxy doing load balacing between two apache servers which
 have mod_jk. Application is on JBoss application server. Problem that I
 have noticed is that if link has some UTF-8 character (Croatian language
 characters), then haproxy gives error 502. Here is example from log:
 
 
 Nov 19 12:40:24 porat haproxy[28047]: aaa.bbb.ccc.ddd:port
 [19/Nov/2010:12:40:24.040] www www/backend-srv1 0/0/0/-1/135 502 1833 -
 - PHVN 1/1/1/0/0 0/0 GET
 /pithos/rest/usern...@domain/files/folder%C4%8Di%C4%87/ HTTP/1.1
 
 Nov 19 12:40:34 porat haproxy[28047]: aaa.bbb.ccc.ddd:port
 [19/Nov/2010:12:40:34.710] www www/backend-srv1 0/0/0/-1/82 502 1061 - -
 PHVN 5/5/5/4/0 0/0 GET
 /pithos/rest/usern...@domain/files/%C4%8D%C4%87%C5%A1%C4%91%C5%BE/
 HTTP/1.1
 
 Problem only occurs for links with those specific characters.
 
 Interesting thing is that haproxy is the reason for that errors, because
 when I try to get those same links directly from backend servers, links
 work without problem...

The issue is with the response, not the request (flags PH). If you have
enabled you stats socket, you can get the exact location of the error that
way :

 # echo show errors | socat stdio unix-connect:/var/run/haproxy.sock

(or whatever the path to the socket). This will be useful because it
indicates that one character in the response was not valid from an HTTP
point of view.

Normally if the error is not too serious, you can force haproxy to let
it pass with this option in your backend :

   option accept-invalid-http-response

However, you should only do that once you've figured what the error is and
you need time to fix it, because unless there is a bug in haproxy, it
generally indicates a wrong header name in the response from the server.

Regards,
Willy




Re: haproxy gives 502 on links with utf-8 chars?!

2010-11-19 Thread Jakov Sosic
On 11/19/2010 01:47 PM, Willy Tarreau wrote:
 echo show errors | socat stdio unix-connect:/var/run/haproxy.sock

# echo show errors | socat stdio unix-connect:/var/run/haproxy.sock

[19/Nov/2010:15:01:56.646] backend www (#1) : invalid response
  src aaa.bbb.ccc.ddd, session #645, frontend www (#1), server
backend-srv1 (#1)
  response length 857 bytes, error at position 268:

  0  HTTP/1.1 200 OK\r\n
  00017  Date: Fri, 19 Nov 2010 14:01:56 GMT\r\n
  00054  Server: Apache/2.2.3 (CentOS)\r\n
  00085  X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1\r\n
  00136  Expires: -1\r\n
  00149  X-GSS-Metadata:
{creationDate:1290002859579,createdBy:ngara...@sr
  00219+
ce.hr,modifiedBy:usern...@domain,name:a\r\x07\x11~,owner:
  00282+
usern...@domain,modificationDate:1290002859579,deleted:false}\r
  00350+ \n
  00351  Content-Length: 418\r\n
  00372  Connection: close\r\n
  00391  Content-Type: application/json;charset=UTF-8\r\n
  00437  \r\n
  00439
{files:[],creationDate:1290002859579,createdBy:usern...@domain
  00509+
,modifiedBy:usern...@domain,readForAll:false,name:\xC5\xA1
  00572+
\xC4\x8D\xC4\x87\xC4\x91\xC5\xBE,permissions:[{modifyACL:true,wr
  00618+
ite:true,read:true,user:usern...@domain}],owner:usern...@domain
  00688+ ce.hr,parent:{name:User User,uri:http://server/p
  00758+
ithos/rest/usern...@domain/files/},folders:[],modificationDate:1
  00828+ 290002859579,deleted:false}



Hmmm, what to do with this output now? Where is the error? :)


-- 
Jakov Sosic



Re: haproxy gives 502 on links with utf-8 chars?!

2010-11-19 Thread German Gutierrez :: OLX Operation Center
Looks like the field

  X-GSS-Metadata:

Has utf-8 encoded characters, I don't know if that's valid or not, I think not.


-- 
Germán Gutiérrez

OLX Operation Center
OLX Inc.
Buenos Aires - Argentina
Phone: 54.11.4775.6696
Mobile: 54.911.5669.6175
Skype: errare_est
Email: germ...@olx.com

Delivering common sense since 1969 Epoch Fail!.

The Nature is not amiable; It treats impartially to all the things.
The wise person is not amiable; He treats all people impartially.

(a)bort (r)etry (e)pic fail?



Re: haproxy gives 502 on links with utf-8 chars?!

2010-11-19 Thread Jakov Sosic
On 11/19/2010 03:07 PM, German Gutierrez :: OLX Operation Center wrote:
 Looks like the field
 
  X-GSS-Metadata:
 
 Has utf-8 encoded characters, I don't know if that's valid or not, I think 
 not.

From wikipedia:
http://en.wikipedia.org/wiki/List_of_HTTP_header_fields

Accept-Charset  Character sets that are acceptable  Accept-Charset: utf-8


So I guess I need to force somehow server to set this HTTP header option?




-- 
Jakov Sosic



Re: haproxy gives 502 on links with utf-8 chars?!

2010-11-19 Thread Hank A. Paulson
Accept-* headers talk about what the ends of the connection want in terms of 
page content. What is allowed in the headers themselves is a different part of 
the spec, not spec'd by the content of a header but by the spec itself.


Many HTTP/1.1 header field values consist of words separated by LWS
   or special characters. These special characters MUST be in a quoted
   string to be used within a parameter value (as defined in section
   3.6).

Unrecognized header fields [anything like X-*] are treated as
   entity-header fields.

So X-GSS-Metadata is considered an entity-header AFAICT.

The extension-header mechanism allows additional entity-header fields
   to be defined without changing the protocol, but these fields cannot
   be assumed to be recognizable by the recipient. Unrecognized header
   fields SHOULD be ignored by the recipient and MUST be forwarded by
   transparent proxies.

7.2.1 talks about encoding the entity body but not entity headers.

I didn't know about trailing headers trailers - Willie, is haproxy coded to 
watch for those?


As is the answer here:
http://stackoverflow.com/questions/1361604/how-to-encode-utf8-filename-for-http-headers-python-django

It looks like you can't do that

On 11/19/10 6:13 AM, Jakov Sosic wrote:

On 11/19/2010 03:07 PM, German Gutierrez :: OLX Operation Center wrote:

Looks like the field


  X-GSS-Metadata:


Has utf-8 encoded characters, I don't know if that's valid or not, I think not.


 From wikipedia:
http://en.wikipedia.org/wiki/List_of_HTTP_header_fields

Accept-Charset  Character sets that are acceptable  Accept-Charset: utf-8


So I guess I need to force somehow server to set this HTTP header option?








Re: haproxy gives 502 on links with utf-8 chars?!

2010-11-19 Thread Willy Tarreau
On Fri, Nov 19, 2010 at 03:05:17PM +0100, Jakov Sosic wrote:
 On 11/19/2010 01:47 PM, Willy Tarreau wrote:
  echo show errors | socat stdio unix-connect:/var/run/haproxy.sock
 
 # echo show errors | socat stdio unix-connect:/var/run/haproxy.sock
 
 [19/Nov/2010:15:01:56.646] backend www (#1) : invalid response
   src aaa.bbb.ccc.ddd, session #645, frontend www (#1), server
 backend-srv1 (#1)
   response length 857 bytes, error at position 268:
 
   0  HTTP/1.1 200 OK\r\n
   00017  Date: Fri, 19 Nov 2010 14:01:56 GMT\r\n
   00054  Server: Apache/2.2.3 (CentOS)\r\n
   00085  X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1\r\n
   00136  Expires: -1\r\n
   00149  X-GSS-Metadata:
 {creationDate:1290002859579,createdBy:ngara...@sr
   00219+
 ce.hr,modifiedBy:usern...@domain,name:a\r\x07\x11~,owner:
   00282+
 usern...@domain,modificationDate:1290002859579,deleted:false}\r
   00350+ \n
   00351  Content-Length: 418\r\n
   00372  Connection: close\r\n
   00391  Content-Type: application/json;charset=UTF-8\r\n
   00437  \r\n
   00439
 {files:[],creationDate:1290002859579,createdBy:usern...@domain
   00509+
 ,modifiedBy:usern...@domain,readForAll:false,name:\xC5\xA1
   00572+
 \xC4\x8D\xC4\x87\xC4\x91\xC5\xBE,permissions:[{modifyACL:true,wr
   00618+
 ite:true,read:true,user:usern...@domain}],owner:usern...@domain
   00688+ ce.hr,parent:{name:User User,uri:http://server/p
   00758+
 ithos/rest/usern...@domain/files/},folders:[],modificationDate:1
   00828+ 290002859579,deleted:false}

Excellent, we have it now.

   00149  X-GSS-Metadata: 
 {creationDate:1290002859579,createdBy:ngara...@sr
   00219+ ce.hr,modifiedBy:usern...@domain,name:a\r\x07\x11~,owner:
   00282+ usern...@domain,modificationDate:1290002859579,deleted:false}\r
   00350+ \n

You see above, position 268 ? It's the \x07 just after the \r on the second
line. The issue is not related to UTF-8 at all, those are just forbidden
characters possibly resulting from corrupted memory. The \r prefixes an
end of header and may only be followed by a \n.

From RFC2616:

   message-header = field-name : [ field-value ]
   field-name = token
   field-value= *( field-content | LWS )
   field-content  = the OCTETs making up the field-value
and consisting of either *TEXT or combinations
of token, separators, and quoted-string

   token  = 1*any CHAR except CTLs or separators
   quoted-string  = (  *(qdtext | quoted-pair )  )
   qdtext = any TEXT except 
   quoted-pair= \ CHAR
   TEXT   = any OCTET except CTLs,
but including LWS
   separators = ( | ) |  |  | @
  | , | ; | : | \ | 
  | / | [ | ] | ? | =
  | { | } | SP | HT

   CHAR   = any US-ASCII character (octets 0 - 127)
   CTL= any US-ASCII control character
(octets 0 - 31) and DEL (127)

So as you can see, CTL characters cannot appear anywhere unescaped
(an HTTPBIS spec refines that further by clearly insisting on the
fact that those chars may not even be escaped). So clearly those
0x0D 0x07 0x11 characters at position 268 are forbidden here and
break the parsing of the line.

What I suspect is that the characters were UTF-8 encoded in the
database, but the application server stripped the 8th bit before
putting them on the wire, which resulted in what you have. That's
just a pure guess, of course. Another possibility is that those bytes
represent an integer value that was accidentely outputted with a %c
formatting instead of a %d.

We can't even let that pass with option accept-invalid-http-response
because the issue will be even worse for characters that are returned
as 0x0D 0x0A, that will end the line and start a new header with the
remaining data.

The only solution right here is to try to see where it breaks in the
application (maybe it's a memory corruption issue after all) and to
fix it ASAP.

Hoping this helps,
Willy