Re: stick table replication problem
On Fri, Jan 08, 2016 at 02:44:44PM +0100, Tom Atkinson wrote: > On Thu, 2015-12-24 at 19:21 +0100, Willy Tarreau wrote: > > Hi Tom, > > > > On Thu, Dec 24, 2015 at 09:32:05AM +, Tom Atkinson wrote: > > > I had the same thing happen. Here is my peers config and 'show sess > > > all' output for the peer sessions. > > > > > > In this case, www1 had a new stick table entry which was not > > > propagated. > > > > What version are you running ? Could you please test 1.7-dev1 which > > is > > supposed to fix this issue ? The fix will be in 1.6.3 probably next > > week-end. > > > > Thanks, > > Willy > > > > Still happens with 1.6.3 Then there's something else, I guess you'll have to send another "show sess all" next time it fails :-/ Thanks, Willy
Re: stick table replication problem
On Thu, 2015-12-24 at 19:21 +0100, Willy Tarreau wrote: > Hi Tom, > > On Thu, Dec 24, 2015 at 09:32:05AM +, Tom Atkinson wrote: > > I had the same thing happen. Here is my peers config and 'show sess > > all' output for the peer sessions. > > > > In this case, www1 had a new stick table entry which was not > > propagated. > > What version are you running ? Could you please test 1.7-dev1 which > is > supposed to fix this issue ? The fix will be in 1.6.3 probably next > week-end. > > Thanks, > Willy > Still happens with 1.6.3
Re: stick table replication problem
I had the same thing happen. Here is my peers config and 'show sess all' output for the peer sessions. In this case, www1 had a new stick table entry which was not propagated. peers mypeers peer www1.example.com 192.168.140.3:1024 peer www2.example.com 192.168.132.3:1024 peer www3.example.com 192.168.149.3:1024 www1.example.com: 0x55c97d047120: [08/Dec/2015:17:12:51.249700] id=0 proto=tcpv4 flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil) frontend=www1.example.com (id=4294967295 mode=tcp), listener=? (id=0) backend= (id=-1 mode=-) addr=192.168.140.3:43847 server= (id=-1) addr=192.168.149.3:1024 task=0x55c97d0470a0 (state=0x08 nice=0 calls=2 exp= age=15d16h) si[0]=0x55c97d047318 (state=CLO flags=0x4040 endp0=APPCTX:0x55c97d047420 exp=, et=0x000) si[1]=0x55c97d047338 (state=CON flags=0x50 endp1=CONN:0x55c97d04beb0 exp=, et=0x020) app0=0x55c97d047420 st0=11 st1=0 st2=0 applet= co1=0x55c97d04beb0 ctrl=tcpv4 xprt=RAW data=STRM target=PROXY:0x55c97cf83100 flags=0x0020b310 fd=3 fd_spec_e=22 fd_spec_p=0 updt=0 req=0x55c97d047130 (f=0x80a020 an=0x0 pipe=0 tofwd=0 total=0) an_exp= rex= wex= buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 req.next=0 i=0 size=0 res=0x55c97d047170 (f=0x80402020 an=0x0 pipe=0 tofwd=0 total=0) an_exp= rex= wex= buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 rsp.next=0 i=0 size=0 0x55c97d05bb40: [10/Dec/2015:22:44:23.248632] id=0 proto=tcpv4 flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil) frontend=www1.example.com (id=4294967295 mode=tcp), listener=? (id=0) backend= (id=-1 mode=-) addr=192.168.140.3:58534 server= (id=-1) addr=192.168.149.3:1024 task=0x55c97d05bac0 (state=0x08 nice=0 calls=2 exp= age=13d10h) si[0]=0x55c97d05bd38 (state=CLO flags=0x4040 endp0=APPCTX:0x55c97d064a50 exp=, et=0x000) si[1]=0x55c97d05bd58 (state=CON flags=0x50 endp1=CONN:0x55c97d04d370 exp=, et=0x020) app0=0x55c97d064a50 st0=11 st1=0 st2=0 applet= co1=0x55c97d04d370 ctrl=tcpv4 xprt=RAW data=STRM target=PROXY:0x55c97cf83100 flags=0x0020b310 fd=13 fd_spec_e=22 fd_spec_p=0 updt=0 req=0x55c97d05bb50 (f=0x80a020 an=0x0 pipe=0 tofwd=0 total=0) an_exp= rex= wex= buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 req.next=0 i=0 size=0 res=0x55c97d05bb90 (f=0x80402020 an=0x0 pipe=0 tofwd=0 total=0) an_exp= rex= wex= buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 rsp.next=0 i=0 size=0 0x55c97d04e1c0: [11/Dec/2015:18:52:25.323298] id=0 proto=tcpv4 flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil) frontend=www1.example.com (id=4294967295 mode=tcp), listener=? (id=0) backend= (id=-1 mode=-) addr=192.168.140.3:38958 server= (id=-1) addr=192.168.132.3:1024 task=0x55c97d04e140 (state=0x08 nice=0 calls=2 exp= age=12d14h) si[0]=0x55c97d04e3b8 (state=CLO flags=0x4040 endp0=APPCTX:0x55c97d04fdc0 exp=, et=0x000) si[1]=0x55c97d04e3d8 (state=CON flags=0x50 endp1=CONN:0x55c97d064e60 exp=, et=0x020) app0=0x55c97d04fdc0 st0=11 st1=0 st2=0 applet= co1=0x55c97d064e60 ctrl=tcpv4 xprt=RAW data=STRM target=PROXY:0x55c97cf83100 flags=0x0020b310 fd=23 fd_spec_e=22 fd_spec_p=0 updt=0 req=0x55c97d04e1d0 (f=0x80a020 an=0x0 pipe=0 tofwd=0 total=0) an_exp= rex= wex= buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 req.next=0 i=0 size=0 res=0x55c97d04e210 (f=0x80402020 an=0x0 pipe=0 tofwd=0 total=0) an_exp= rex= wex= buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 rsp.next=0 i=0 size=0 0x55c97d0a45b0: [16/Dec/2015:19:52:57.872505] id=0 proto=tcpv4 flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil) frontend=www1.example.com (id=4294967295 mode=tcp), listener=? (id=0) backend= (id=-1 mode=-) addr=192.168.140.3:38595 server= (id=-1) addr=192.168.132.3:1024 task=0x55c97d0a4530 (state=0x08 nice=0 calls=2 exp= age=7d13h) si[0]=0x55c97d0a47a8 (state=CLO flags=0x4040 endp0=APPCTX:0x55c97d054800 exp=, et=0x000) si[1]=0x55c97d0a47c8 (state=CON flags=0x50 endp1=CONN:0x55c97d0bd640 exp=, et=0x020) app0=0x55c97d054800 st0=11 st1=0 st2=0 applet= co1=0x55c97d0bd640 ctrl=tcpv4 xprt=RAW data=STRM target=PROXY:0x55c97cf83100 flags=0x0020b310 fd=12 fd_spec_e=22 fd_spec_p=0 updt=0 req=0x55c97d0a45c0 (f=0x80a020 an=0x0 pipe=0 tofwd=0 total=0) an_exp= rex= wex= buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 req.next=0 i=0 size=0 res=0x55c97d0a4600 (f=0x80402020 an=0x0 pipe=0 tofwd=0 total=0) an_exp= rex= wex= buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 rsp.next=0 i=0 size=0 0x55c97d04d760: [20/Dec/2015:11:59:48.585182] id=0 proto=tcpv4 flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil) frontend=www1.example.com
stick table replication problem
Hi, We just had a strange replication problem on our staging environment. We have 2 HAproxy servers. They were running for 2 weeks now. At the beginning, I checked that the stick tables were properly synced. Today, stick tables were not synced, for example : root@proxy1>: echo "show table front_jsessionid" | socat stdio /usr/share/haproxy/mysocket # table: front_jsessionid, type: string, size:10485760, used:3 0xd1d944: key=MkJm-rbcE3V5EJg1RmTRAw__ use=0 exp=3474723 server_id=1 0xd762c4: key=Z2rxYufqv7B0C2gIgztSfA__ use=0 exp=3571578 server_id=2 0xea9484: key=aUuiOrlUDb7RnvcFMLK9oA__ use=0 exp=2879968 server_id=2 root@proxy2>: echo "show table front_jsessionid" | socat stdio /usr/share/haproxy/mysocket # table: front_jsessionid, type: string, size:10485760, used:5 0x1bc9104: key=D8EWzwrGr2UK3btCwfpweQ__ use=0 exp=3552238 server_id=1 0x1c3ba54: key=Z2rxYufqv7B0C2gIgztSfA__ use=0 exp=3315450 server_id=1 0x1cbecb4: key=aUuiOrlUDb7RnvcFMLK9oA__ use=0 exp=2624851 server_id=2 0x1c49664: key=lmqFlaNhdHXYwH4M4QQjqQ__ use=0 exp=3510898 server_id=2 0x1cbf074: key=zpE6NHQJr~aStxzmJTVEgA__ use=0 exp=2587567 server_id=2 -- I tried : echo "set table front_jsessionid key toto data.server_id 1" | socat stdio /usr/share/haproxy/mysocket The new entry was not replicated on proxy2, and tcpdump showed no outgoing traffic from proxy1 to proxy2. Then, I reloaded the haproxy service on proxy2, and tables were synced, including old data set in proxy1 before the reload. New data replicates all right too. Should this situation happen again, is there anything I can do to debug this further ? -- Here is netstat output from proxy1 while replication was broken : root@proxy1>: netstat -anp |grep 9421 tcp0 0 [proxy1]:94210.0.0.0:*LISTEN 1163/haproxy tcp 35 0 [proxy1]:9421[proxy2]:41205 CLOSE_WAIT - tcp1 0 [proxy1]:20512 [proxy2]:9421CLOSE_WAIT 1163/haproxy tcp1 0 [proxy1]:35662 [proxy2]:9421CLOSE_WAIT 1163/haproxy tcp 35 0 [proxy1]:9421[proxy2]:41257 CLOSE_WAIT - tcp0 0 [proxy1]:54443 [proxy2]:9421FIN_WAIT2 1163/haproxy tcp1 0 [proxy1]:52254 [proxy2]:9421CLOSE_WAIT 1163/haproxy tcp1 0 [proxy1]:54249 [proxy2]:9421CLOSE_WAIT 1163/haproxy tcp 35 0 [proxy1]:9421[proxy2]:41160 CLOSE_WAIT - tcp 35 0 [proxy1]:9421[proxy2]:41110 CLOSE_WAIT - tcp1 0 [proxy1]:40469 [proxy2]:9421CLOSE_WAIT 1163/haproxy -- Here are relevant parts from our setup : peers prod peer proxy1[proxy1]:9421 peer proxy2[proxy2]:9421 backend front stick on urlp(jsessionid),url_dec table front_jsessionid stick on urlp(jsessionid,;),url_dec table front_jsessionid stick on cookie(JSESSIONID) table front_jsessionid stick store-response cookie(JSESSIONID) table front_jsessionid backend front_jsessionid stick-table type string len 24 size 10m expire 1h peers prod -- We are running HAproxy 1.6.2 from the vbernat PPA on Ubuntu 12.04, with nbproc = 1. Thanks in advance. Sylvain
Re: stick table replication problem
Hi Sylvain, On Thu, Dec 03, 2015 at 12:05:02PM +0100, Sylvain Faivre wrote: > Hi, > > We just had a strange replication problem on our staging environment. > We have 2 HAproxy servers. They were running for 2 weeks now. > At the beginning, I checked that the stick tables were properly synced. > > Today, stick tables were not synced, for example : > > root@proxy1>: echo "show table front_jsessionid" | socat stdio > /usr/share/haproxy/mysocket > # table: front_jsessionid, type: string, size:10485760, used:3 > 0xd1d944: key=MkJm-rbcE3V5EJg1RmTRAw__ use=0 exp=3474723 server_id=1 > 0xd762c4: key=Z2rxYufqv7B0C2gIgztSfA__ use=0 exp=3571578 server_id=2 > 0xea9484: key=aUuiOrlUDb7RnvcFMLK9oA__ use=0 exp=2879968 server_id=2 > > root@proxy2>: echo "show table front_jsessionid" | socat stdio > /usr/share/haproxy/mysocket > # table: front_jsessionid, type: string, size:10485760, used:5 > 0x1bc9104: key=D8EWzwrGr2UK3btCwfpweQ__ use=0 exp=3552238 server_id=1 > 0x1c3ba54: key=Z2rxYufqv7B0C2gIgztSfA__ use=0 exp=3315450 server_id=1 > 0x1cbecb4: key=aUuiOrlUDb7RnvcFMLK9oA__ use=0 exp=2624851 server_id=2 > 0x1c49664: key=lmqFlaNhdHXYwH4M4QQjqQ__ use=0 exp=3510898 server_id=2 > 0x1cbf074: key=zpE6NHQJr~aStxzmJTVEgA__ use=0 exp=2587567 server_id=2 This is indeed not expected at all! > I tried : > echo "set table front_jsessionid key toto data.server_id 1" | socat > stdio /usr/share/haproxy/mysocket > > The new entry was not replicated on proxy2, and tcpdump showed no > outgoing traffic from proxy1 to proxy2. So that means that proxy1 was in a bad state. > Then, I reloaded the haproxy service on proxy2, and tables were synced, > including old data set in proxy1 before the reload. Interesting. Thus is means that not everything is broken, only something makes a communication stop without breaking the connection, but once the connection is renewed, it's fine again. > New data replicates all right too. > > Should this situation happen again, is there anything I can do to debug > this further ? There's not much unfortunately because I have not implemented the "show peers" action that was on my todo list. The purpose would be to dump a number of relevant information to debug the protocol, diagnose connection issues etc. > -- > > Here is netstat output from proxy1 while replication was broken : > > root@proxy1>: netstat -anp |grep 9421 > tcp0 0 [proxy1]:94210.0.0.0:*LISTEN > 1163/haproxy > tcp 35 0 [proxy1]:9421[proxy2]:41205 CLOSE_WAIT > - > tcp1 0 [proxy1]:20512 [proxy2]:9421CLOSE_WAIT > 1163/haproxy (...) This proves that a disconnection was already received but ignored by haproxy. It's possible that the other side has detected a timeout and tried to reconnect without success. > Here are relevant parts from our setup : > > peers prod > peer proxy1[proxy1]:9421 > peer proxy2[proxy2]:9421 > > backend front > stick on urlp(jsessionid),url_dec table front_jsessionid > stick on urlp(jsessionid,;),url_dec table front_jsessionid > stick on cookie(JSESSIONID) table front_jsessionid > stick store-response cookie(JSESSIONID) table front_jsessionid > > backend front_jsessionid > stick-table type string len 24 size 10m expire 1h peers prod OK it looks pretty fine. > We are running HAproxy 1.6.2 from the vbernat PPA on Ubuntu 12.04, with > nbproc = 1. Thanks for this precision. All I can say for now is that you clearly encountered a bug but that we don't know what this bug is. We'll have to check in the code for something which could cause this. It would be interesting to know for how long the process has been running before the issue appeared, eventhough it will not tell us for how long the connection remained alive. Thanks, Willy
Re: stick table replication problem
On Thu, Dec 03, 2015 at 03:56:45PM +0100, Sylvain Faivre wrote: > According to our logs, both HAproxy processes were started at Nov 24 > 11:25:xx and application errors caused by lack of session replication > started happenning at Dec 1 17:05:35 > So that's a bit more than 1 week later. > > We'll keep looking for replication errors in the next days, and > specially in 1 week, and report back. OK thank you. I hope it will not affect your activity too much :-/ That just makes me think that having an option to shut down a peers session on the CLI would help. Hmmm in fact there's a way to observe the peers session, since now they're using regular sessions. If you issue "show sess" on the CLI you'll see some sessions with "fe=" (yes I know the name is poorly chose and may cause confusion) : > show sess 0x7ce468: proto=unix_stream src=unix:1 fe=GLOBAL be= srv= ts=02 age=2m11s calls=1 rq[f=c08202h,i=0,an=00h,rx=,wx=,ax=] rp[f=80048202h,i=0,an=00h,rx=,wx=,ax=] s0=[7,ch,fd=9,ex=] s1=[7,4018h,fd=-1,ex=] exp=23h57m 0x7cdd98: proto=tcpv4 src=127.0.0.1:38831 fe=myhost1 be= srv= ts=0a age=0s calls=1 rq[f=c08200h,i=0,an=00h,rx=4s,wx=,ax=] rp[f=80048202h,i=0,an=00h,rx=,wx=,ax=] s0=[7,48h,fd=8,ex=] s1=[7,4058h,fd=-1,ex=] exp=4s Here it's the second line (my local peer is called "myhost1"). The first line is my CLI session. Then "show sess " will show more details (the ID is the pointer at the beginning of the line) : > show sess 0x7cdd98 0x7cdd98: [03/Dec/2015:16:01:58.200647] id=81 proto=tcpv4 source=127.0.0.1:38831 flags=0x80, conn_retries=3, srv_conn=(nil), pend_pos=(nil) frontend=myhost1 (id=4294967295 mode=tcp), listener=? (id=0) addr=127.0.0.1:8521 backend= (id=-1 mode=-) server= (id=-1) task=0x7cab40 (state=0x0a nice=0 calls=1 exp=0s age=4s) si[0]=0x7cdf90 (state=EST flags=0x48 endp0=CONN:0x7ca940 exp=, et=0x000) si[1]=0x7cdfb0 (state=EST flags=0x4058 endp1=APPCTX:0x7ce100 exp=, et=0x000) co0=0x7ca940 ctrl=tcpv4 xprt=RAW data=STRM target=LISTENER:0x7cb688 flags=0x0020b306 fd=8 fd.state=25 fd.cache=0 updt=0 app1=0x7ce100 st0=7 st1=0 st2=0 applet= req=0x7cdda8 (f=0xc08200 an=0x0 pipe=0 tofwd=-1 total=37) an_exp= rex=0s wex= buf=0x77ba20 data=0x77ba34 o=0 p=0 req.next=0 i=0 size=0 res=0x7cdde8 (f=0x80048202 an=0x0 pipe=0 tofwd=-1 total=4) an_exp= rex= wex= buf=0x77ba20 data=0x77ba34 o=0 p=0 rsp.next=0 i=0 size=0 You can even shut it down : > shutdown session 0x7cdd98 Here I have no traffic so the session expires every 5 or 10 seconds and is renewed. In your case it should not happen obviously. So such a dump on both nodes before reloading could significantly help if you face the issue again. If it's too complicated to pick only this session, you can issue "show sess all", send the huge output to a file and then dig for "" in this file. It might be easier for an emergency case. Best regards, Willy