haproxy gives 502 on links with utf-8 chars?!
Hi. I have a haproxy doing load balacing between two apache servers which have mod_jk. Application is on JBoss application server. Problem that I have noticed is that if link has some UTF-8 character (Croatian language characters), then haproxy gives error 502. Here is example from log: Nov 19 12:40:24 porat haproxy[28047]: aaa.bbb.ccc.ddd:port [19/Nov/2010:12:40:24.040] www www/backend-srv1 0/0/0/-1/135 502 1833 - - PHVN 1/1/1/0/0 0/0 GET /pithos/rest/usern...@domain/files/folder%C4%8Di%C4%87/ HTTP/1.1 Nov 19 12:40:34 porat haproxy[28047]: aaa.bbb.ccc.ddd:port [19/Nov/2010:12:40:34.710] www www/backend-srv1 0/0/0/-1/82 502 1061 - - PHVN 5/5/5/4/0 0/0 GET /pithos/rest/usern...@domain/files/%C4%8D%C4%87%C5%A1%C4%91%C5%BE/ HTTP/1.1 Problem only occurs for links with those specific characters. Interesting thing is that haproxy is the reason for that errors, because when I try to get those same links directly from backend servers, links work without problem... Here is a log from apache backend: aaa.bbb.ccc.ddd - - [19/Nov/2010:12:40:24 +0100] GET /pithos/rest/usern...@domain/files/folder%C4%8Di%C4%87/ HTTP/1.1 200 1185 http://somethin.somedomain/pithos/A5707EF1550DF3AECFB3F1CB7B89E240.cache.html; Mozilla/5.0 (X11; U; Linux i686; en-US) AppleWebKit/534.10 (KHTML, like Gecko) Chrome/7.0.536.2 Safari/534.10 Any ideas? Is there anything how I can debug this? -- Jakov Sosic
Re: haproxy gives 502 on links with utf-8 chars?!
Hi Jakov, On Fri, Nov 19, 2010 at 01:06:39PM +0100, Jakov Sosic wrote: Hi. I have a haproxy doing load balacing between two apache servers which have mod_jk. Application is on JBoss application server. Problem that I have noticed is that if link has some UTF-8 character (Croatian language characters), then haproxy gives error 502. Here is example from log: Nov 19 12:40:24 porat haproxy[28047]: aaa.bbb.ccc.ddd:port [19/Nov/2010:12:40:24.040] www www/backend-srv1 0/0/0/-1/135 502 1833 - - PHVN 1/1/1/0/0 0/0 GET /pithos/rest/usern...@domain/files/folder%C4%8Di%C4%87/ HTTP/1.1 Nov 19 12:40:34 porat haproxy[28047]: aaa.bbb.ccc.ddd:port [19/Nov/2010:12:40:34.710] www www/backend-srv1 0/0/0/-1/82 502 1061 - - PHVN 5/5/5/4/0 0/0 GET /pithos/rest/usern...@domain/files/%C4%8D%C4%87%C5%A1%C4%91%C5%BE/ HTTP/1.1 Problem only occurs for links with those specific characters. Interesting thing is that haproxy is the reason for that errors, because when I try to get those same links directly from backend servers, links work without problem... The issue is with the response, not the request (flags PH). If you have enabled you stats socket, you can get the exact location of the error that way : # echo show errors | socat stdio unix-connect:/var/run/haproxy.sock (or whatever the path to the socket). This will be useful because it indicates that one character in the response was not valid from an HTTP point of view. Normally if the error is not too serious, you can force haproxy to let it pass with this option in your backend : option accept-invalid-http-response However, you should only do that once you've figured what the error is and you need time to fix it, because unless there is a bug in haproxy, it generally indicates a wrong header name in the response from the server. Regards, Willy
Re: haproxy gives 502 on links with utf-8 chars?!
On 11/19/2010 01:47 PM, Willy Tarreau wrote: echo show errors | socat stdio unix-connect:/var/run/haproxy.sock # echo show errors | socat stdio unix-connect:/var/run/haproxy.sock [19/Nov/2010:15:01:56.646] backend www (#1) : invalid response src aaa.bbb.ccc.ddd, session #645, frontend www (#1), server backend-srv1 (#1) response length 857 bytes, error at position 268: 0 HTTP/1.1 200 OK\r\n 00017 Date: Fri, 19 Nov 2010 14:01:56 GMT\r\n 00054 Server: Apache/2.2.3 (CentOS)\r\n 00085 X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1\r\n 00136 Expires: -1\r\n 00149 X-GSS-Metadata: {creationDate:1290002859579,createdBy:ngara...@sr 00219+ ce.hr,modifiedBy:usern...@domain,name:a\r\x07\x11~,owner: 00282+ usern...@domain,modificationDate:1290002859579,deleted:false}\r 00350+ \n 00351 Content-Length: 418\r\n 00372 Connection: close\r\n 00391 Content-Type: application/json;charset=UTF-8\r\n 00437 \r\n 00439 {files:[],creationDate:1290002859579,createdBy:usern...@domain 00509+ ,modifiedBy:usern...@domain,readForAll:false,name:\xC5\xA1 00572+ \xC4\x8D\xC4\x87\xC4\x91\xC5\xBE,permissions:[{modifyACL:true,wr 00618+ ite:true,read:true,user:usern...@domain}],owner:usern...@domain 00688+ ce.hr,parent:{name:User User,uri:http://server/p 00758+ ithos/rest/usern...@domain/files/},folders:[],modificationDate:1 00828+ 290002859579,deleted:false} Hmmm, what to do with this output now? Where is the error? :) -- Jakov Sosic
Re: haproxy gives 502 on links with utf-8 chars?!
Looks like the field X-GSS-Metadata: Has utf-8 encoded characters, I don't know if that's valid or not, I think not. -- Germán Gutiérrez OLX Operation Center OLX Inc. Buenos Aires - Argentina Phone: 54.11.4775.6696 Mobile: 54.911.5669.6175 Skype: errare_est Email: germ...@olx.com Delivering common sense since 1969 Epoch Fail!. The Nature is not amiable; It treats impartially to all the things. The wise person is not amiable; He treats all people impartially. (a)bort (r)etry (e)pic fail?
Re: haproxy gives 502 on links with utf-8 chars?!
On 11/19/2010 03:07 PM, German Gutierrez :: OLX Operation Center wrote: Looks like the field X-GSS-Metadata: Has utf-8 encoded characters, I don't know if that's valid or not, I think not. From wikipedia: http://en.wikipedia.org/wiki/List_of_HTTP_header_fields Accept-Charset Character sets that are acceptable Accept-Charset: utf-8 So I guess I need to force somehow server to set this HTTP header option? -- Jakov Sosic
Re: haproxy gives 502 on links with utf-8 chars?!
Accept-* headers talk about what the ends of the connection want in terms of page content. What is allowed in the headers themselves is a different part of the spec, not spec'd by the content of a header but by the spec itself. Many HTTP/1.1 header field values consist of words separated by LWS or special characters. These special characters MUST be in a quoted string to be used within a parameter value (as defined in section 3.6). Unrecognized header fields [anything like X-*] are treated as entity-header fields. So X-GSS-Metadata is considered an entity-header AFAICT. The extension-header mechanism allows additional entity-header fields to be defined without changing the protocol, but these fields cannot be assumed to be recognizable by the recipient. Unrecognized header fields SHOULD be ignored by the recipient and MUST be forwarded by transparent proxies. 7.2.1 talks about encoding the entity body but not entity headers. I didn't know about trailing headers trailers - Willie, is haproxy coded to watch for those? As is the answer here: http://stackoverflow.com/questions/1361604/how-to-encode-utf8-filename-for-http-headers-python-django It looks like you can't do that On 11/19/10 6:13 AM, Jakov Sosic wrote: On 11/19/2010 03:07 PM, German Gutierrez :: OLX Operation Center wrote: Looks like the field X-GSS-Metadata: Has utf-8 encoded characters, I don't know if that's valid or not, I think not. From wikipedia: http://en.wikipedia.org/wiki/List_of_HTTP_header_fields Accept-Charset Character sets that are acceptable Accept-Charset: utf-8 So I guess I need to force somehow server to set this HTTP header option?
Re: haproxy gives 502 on links with utf-8 chars?!
On Fri, Nov 19, 2010 at 03:05:17PM +0100, Jakov Sosic wrote: On 11/19/2010 01:47 PM, Willy Tarreau wrote: echo show errors | socat stdio unix-connect:/var/run/haproxy.sock # echo show errors | socat stdio unix-connect:/var/run/haproxy.sock [19/Nov/2010:15:01:56.646] backend www (#1) : invalid response src aaa.bbb.ccc.ddd, session #645, frontend www (#1), server backend-srv1 (#1) response length 857 bytes, error at position 268: 0 HTTP/1.1 200 OK\r\n 00017 Date: Fri, 19 Nov 2010 14:01:56 GMT\r\n 00054 Server: Apache/2.2.3 (CentOS)\r\n 00085 X-Powered-By: Servlet 2.5; JBoss-5.0/JBossWeb-2.1\r\n 00136 Expires: -1\r\n 00149 X-GSS-Metadata: {creationDate:1290002859579,createdBy:ngara...@sr 00219+ ce.hr,modifiedBy:usern...@domain,name:a\r\x07\x11~,owner: 00282+ usern...@domain,modificationDate:1290002859579,deleted:false}\r 00350+ \n 00351 Content-Length: 418\r\n 00372 Connection: close\r\n 00391 Content-Type: application/json;charset=UTF-8\r\n 00437 \r\n 00439 {files:[],creationDate:1290002859579,createdBy:usern...@domain 00509+ ,modifiedBy:usern...@domain,readForAll:false,name:\xC5\xA1 00572+ \xC4\x8D\xC4\x87\xC4\x91\xC5\xBE,permissions:[{modifyACL:true,wr 00618+ ite:true,read:true,user:usern...@domain}],owner:usern...@domain 00688+ ce.hr,parent:{name:User User,uri:http://server/p 00758+ ithos/rest/usern...@domain/files/},folders:[],modificationDate:1 00828+ 290002859579,deleted:false} Excellent, we have it now. 00149 X-GSS-Metadata: {creationDate:1290002859579,createdBy:ngara...@sr 00219+ ce.hr,modifiedBy:usern...@domain,name:a\r\x07\x11~,owner: 00282+ usern...@domain,modificationDate:1290002859579,deleted:false}\r 00350+ \n You see above, position 268 ? It's the \x07 just after the \r on the second line. The issue is not related to UTF-8 at all, those are just forbidden characters possibly resulting from corrupted memory. The \r prefixes an end of header and may only be followed by a \n. From RFC2616: message-header = field-name : [ field-value ] field-name = token field-value= *( field-content | LWS ) field-content = the OCTETs making up the field-value and consisting of either *TEXT or combinations of token, separators, and quoted-string token = 1*any CHAR except CTLs or separators quoted-string = ( *(qdtext | quoted-pair ) ) qdtext = any TEXT except quoted-pair= \ CHAR TEXT = any OCTET except CTLs, but including LWS separators = ( | ) | | | @ | , | ; | : | \ | | / | [ | ] | ? | = | { | } | SP | HT CHAR = any US-ASCII character (octets 0 - 127) CTL= any US-ASCII control character (octets 0 - 31) and DEL (127) So as you can see, CTL characters cannot appear anywhere unescaped (an HTTPBIS spec refines that further by clearly insisting on the fact that those chars may not even be escaped). So clearly those 0x0D 0x07 0x11 characters at position 268 are forbidden here and break the parsing of the line. What I suspect is that the characters were UTF-8 encoded in the database, but the application server stripped the 8th bit before putting them on the wire, which resulted in what you have. That's just a pure guess, of course. Another possibility is that those bytes represent an integer value that was accidentely outputted with a %c formatting instead of a %d. We can't even let that pass with option accept-invalid-http-response because the issue will be even worse for characters that are returned as 0x0D 0x0A, that will end the line and start a new header with the remaining data. The only solution right here is to try to see where it breaks in the application (maybe it's a memory corruption issue after all) and to fix it ASAP. Hoping this helps, Willy