Re: stick table replication problem

2016-01-09 Thread Willy Tarreau
On Fri, Jan 08, 2016 at 02:44:44PM +0100, Tom Atkinson wrote:
> On Thu, 2015-12-24 at 19:21 +0100, Willy Tarreau wrote:
> > Hi Tom,
> > 
> > On Thu, Dec 24, 2015 at 09:32:05AM +, Tom Atkinson wrote:
> > > I had the same thing happen. Here is my peers config and 'show sess
> > > all' output for the peer sessions.
> > > 
> > > In this case, www1 had a new stick table entry which was not
> > > propagated.
> > 
> > What version are you running ? Could you please test 1.7-dev1 which
> > is
> > supposed to fix this issue ? The fix will be in 1.6.3 probably next
> > week-end.
> > 
> > Thanks,
> > Willy
> > 
> 
> Still happens with 1.6.3

Then there's something else, I guess you'll have to send another
"show sess all" next time it fails :-/

Thanks,
Willy




Re: stick table replication problem

2016-01-08 Thread Tom Atkinson
On Thu, 2015-12-24 at 19:21 +0100, Willy Tarreau wrote:
> Hi Tom,
> 
> On Thu, Dec 24, 2015 at 09:32:05AM +, Tom Atkinson wrote:
> > I had the same thing happen. Here is my peers config and 'show sess
> > all' output for the peer sessions.
> > 
> > In this case, www1 had a new stick table entry which was not
> > propagated.
> 
> What version are you running ? Could you please test 1.7-dev1 which
> is
> supposed to fix this issue ? The fix will be in 1.6.3 probably next
> week-end.
> 
> Thanks,
> Willy
> 

Still happens with 1.6.3



Re: stick table replication problem

2015-12-24 Thread Tom Atkinson
I had the same thing happen. Here is my peers config and 'show sess
all' output for the peer sessions.

In this case, www1 had a new stick table entry which was not
propagated.

peers mypeers
peer www1.example.com 192.168.140.3:1024
peer www2.example.com 192.168.132.3:1024
peer www3.example.com 192.168.149.3:1024

www1.example.com:
0x55c97d047120: [08/Dec/2015:17:12:51.249700] id=0 proto=tcpv4
  flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil)
  frontend=www1.example.com (id=4294967295 mode=tcp), listener=?
(id=0)
  backend= (id=-1 mode=-) addr=192.168.140.3:43847
  server= (id=-1) addr=192.168.149.3:1024
  task=0x55c97d0470a0 (state=0x08 nice=0 calls=2 exp=
age=15d16h)
  si[0]=0x55c97d047318 (state=CLO flags=0x4040
endp0=APPCTX:0x55c97d047420 exp=, et=0x000)
  si[1]=0x55c97d047338 (state=CON flags=0x50
endp1=CONN:0x55c97d04beb0 exp=, et=0x020)
  app0=0x55c97d047420 st0=11 st1=0 st2=0 applet=
  co1=0x55c97d04beb0 ctrl=tcpv4 xprt=RAW data=STRM
target=PROXY:0x55c97cf83100
  flags=0x0020b310 fd=3 fd_spec_e=22 fd_spec_p=0 updt=0
  req=0x55c97d047130 (f=0x80a020 an=0x0 pipe=0 tofwd=0 total=0)
  an_exp= rex= wex=
  buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 req.next=0 i=0
size=0
  res=0x55c97d047170 (f=0x80402020 an=0x0 pipe=0 tofwd=0 total=0)
  an_exp= rex= wex=
  buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 rsp.next=0 i=0
size=0
0x55c97d05bb40: [10/Dec/2015:22:44:23.248632] id=0 proto=tcpv4
  flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil)
  frontend=www1.example.com (id=4294967295 mode=tcp), listener=?
(id=0)
  backend= (id=-1 mode=-) addr=192.168.140.3:58534
  server= (id=-1) addr=192.168.149.3:1024
  task=0x55c97d05bac0 (state=0x08 nice=0 calls=2 exp=
age=13d10h)
  si[0]=0x55c97d05bd38 (state=CLO flags=0x4040
endp0=APPCTX:0x55c97d064a50 exp=, et=0x000)
  si[1]=0x55c97d05bd58 (state=CON flags=0x50
endp1=CONN:0x55c97d04d370 exp=, et=0x020)
  app0=0x55c97d064a50 st0=11 st1=0 st2=0 applet=
  co1=0x55c97d04d370 ctrl=tcpv4 xprt=RAW data=STRM
target=PROXY:0x55c97cf83100
  flags=0x0020b310 fd=13 fd_spec_e=22 fd_spec_p=0 updt=0
  req=0x55c97d05bb50 (f=0x80a020 an=0x0 pipe=0 tofwd=0 total=0)
  an_exp= rex= wex=
  buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 req.next=0 i=0
size=0
  res=0x55c97d05bb90 (f=0x80402020 an=0x0 pipe=0 tofwd=0 total=0)
  an_exp= rex= wex=
  buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 rsp.next=0 i=0
size=0
0x55c97d04e1c0: [11/Dec/2015:18:52:25.323298] id=0 proto=tcpv4
  flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil)
  frontend=www1.example.com (id=4294967295 mode=tcp), listener=?
(id=0)
  backend= (id=-1 mode=-) addr=192.168.140.3:38958
  server= (id=-1) addr=192.168.132.3:1024
  task=0x55c97d04e140 (state=0x08 nice=0 calls=2 exp=
age=12d14h)
  si[0]=0x55c97d04e3b8 (state=CLO flags=0x4040
endp0=APPCTX:0x55c97d04fdc0 exp=, et=0x000)
  si[1]=0x55c97d04e3d8 (state=CON flags=0x50
endp1=CONN:0x55c97d064e60 exp=, et=0x020)
  app0=0x55c97d04fdc0 st0=11 st1=0 st2=0 applet=
  co1=0x55c97d064e60 ctrl=tcpv4 xprt=RAW data=STRM
target=PROXY:0x55c97cf83100
  flags=0x0020b310 fd=23 fd_spec_e=22 fd_spec_p=0 updt=0
  req=0x55c97d04e1d0 (f=0x80a020 an=0x0 pipe=0 tofwd=0 total=0)
  an_exp= rex= wex=
  buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 req.next=0 i=0
size=0
  res=0x55c97d04e210 (f=0x80402020 an=0x0 pipe=0 tofwd=0 total=0)
  an_exp= rex= wex=
  buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 rsp.next=0 i=0
size=0
0x55c97d0a45b0: [16/Dec/2015:19:52:57.872505] id=0 proto=tcpv4
  flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil)
  frontend=www1.example.com (id=4294967295 mode=tcp), listener=?
(id=0)
  backend= (id=-1 mode=-) addr=192.168.140.3:38595
  server= (id=-1) addr=192.168.132.3:1024
  task=0x55c97d0a4530 (state=0x08 nice=0 calls=2 exp=
age=7d13h)
  si[0]=0x55c97d0a47a8 (state=CLO flags=0x4040
endp0=APPCTX:0x55c97d054800 exp=, et=0x000)
  si[1]=0x55c97d0a47c8 (state=CON flags=0x50
endp1=CONN:0x55c97d0bd640 exp=, et=0x020)
  app0=0x55c97d054800 st0=11 st1=0 st2=0 applet=
  co1=0x55c97d0bd640 ctrl=tcpv4 xprt=RAW data=STRM
target=PROXY:0x55c97cf83100
  flags=0x0020b310 fd=12 fd_spec_e=22 fd_spec_p=0 updt=0
  req=0x55c97d0a45c0 (f=0x80a020 an=0x0 pipe=0 tofwd=0 total=0)
  an_exp= rex= wex=
  buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 req.next=0 i=0
size=0
  res=0x55c97d0a4600 (f=0x80402020 an=0x0 pipe=0 tofwd=0 total=0)
  an_exp= rex= wex=
  buf=0x55c97b7964e0 data=0x55c97b7964f4 o=0 p=0 rsp.next=0 i=0
size=0
0x55c97d04d760: [20/Dec/2015:11:59:48.585182] id=0 proto=tcpv4
  flags=0x23006, conn_retries=0, srv_conn=(nil), pend_pos=(nil)
  frontend=www1.example.com 

stick table replication problem

2015-12-03 Thread Sylvain Faivre

Hi,

We just had a strange replication problem on our staging environment.
We have 2 HAproxy servers. They were running for 2 weeks now.
At the beginning, I checked that the stick tables were properly synced.

Today, stick tables were not synced, for example :

root@proxy1>: echo "show table front_jsessionid" | socat stdio 
/usr/share/haproxy/mysocket

# table: front_jsessionid, type: string, size:10485760, used:3
0xd1d944: key=MkJm-rbcE3V5EJg1RmTRAw__ use=0 exp=3474723 server_id=1
0xd762c4: key=Z2rxYufqv7B0C2gIgztSfA__ use=0 exp=3571578 server_id=2
0xea9484: key=aUuiOrlUDb7RnvcFMLK9oA__ use=0 exp=2879968 server_id=2

root@proxy2>: echo "show table front_jsessionid" | socat stdio 
/usr/share/haproxy/mysocket

# table: front_jsessionid, type: string, size:10485760, used:5
0x1bc9104: key=D8EWzwrGr2UK3btCwfpweQ__ use=0 exp=3552238 server_id=1
0x1c3ba54: key=Z2rxYufqv7B0C2gIgztSfA__ use=0 exp=3315450 server_id=1
0x1cbecb4: key=aUuiOrlUDb7RnvcFMLK9oA__ use=0 exp=2624851 server_id=2
0x1c49664: key=lmqFlaNhdHXYwH4M4QQjqQ__ use=0 exp=3510898 server_id=2
0x1cbf074: key=zpE6NHQJr~aStxzmJTVEgA__ use=0 exp=2587567 server_id=2

--

I tried :
echo "set table front_jsessionid key toto data.server_id 1" | socat 
stdio /usr/share/haproxy/mysocket


The new entry was not replicated on proxy2, and tcpdump showed no 
outgoing traffic from proxy1 to proxy2.


Then, I reloaded the haproxy service on proxy2, and tables were synced, 
including old data set in proxy1 before the reload.

New data replicates all right too.

Should this situation happen again, is there anything I can do to debug 
this further ?


--

Here is netstat output from proxy1 while replication was broken :

root@proxy1>: netstat -anp |grep 9421
tcp0  0 [proxy1]:94210.0.0.0:*LISTEN 
  1163/haproxy
tcp   35  0 [proxy1]:9421[proxy2]:41205   CLOSE_WAIT 
 -
tcp1  0 [proxy1]:20512   [proxy2]:9421CLOSE_WAIT 
 1163/haproxy
tcp1  0 [proxy1]:35662   [proxy2]:9421CLOSE_WAIT 
 1163/haproxy
tcp   35  0 [proxy1]:9421[proxy2]:41257   CLOSE_WAIT 
 -
tcp0  0 [proxy1]:54443   [proxy2]:9421FIN_WAIT2 
  1163/haproxy
tcp1  0 [proxy1]:52254   [proxy2]:9421CLOSE_WAIT 
 1163/haproxy
tcp1  0 [proxy1]:54249   [proxy2]:9421CLOSE_WAIT 
 1163/haproxy
tcp   35  0 [proxy1]:9421[proxy2]:41160   CLOSE_WAIT 
 -
tcp   35  0 [proxy1]:9421[proxy2]:41110   CLOSE_WAIT 
 -
tcp1  0 [proxy1]:40469   [proxy2]:9421CLOSE_WAIT 
 1163/haproxy


--

Here are relevant parts from our setup :

peers prod
peer  proxy1[proxy1]:9421
peer  proxy2[proxy2]:9421

backend front
stick on urlp(jsessionid),url_dec table front_jsessionid
stick on urlp(jsessionid,;),url_dec table front_jsessionid
stick on cookie(JSESSIONID) table front_jsessionid
stick store-response cookie(JSESSIONID) table front_jsessionid

backend front_jsessionid
stick-table type string len 24 size 10m expire 1h peers prod

--

We are running HAproxy 1.6.2 from the vbernat PPA on Ubuntu 12.04, with 
nbproc = 1.


Thanks in advance.

Sylvain



Re: stick table replication problem

2015-12-03 Thread Willy Tarreau
Hi Sylvain,

On Thu, Dec 03, 2015 at 12:05:02PM +0100, Sylvain Faivre wrote:
> Hi,
> 
> We just had a strange replication problem on our staging environment.
> We have 2 HAproxy servers. They were running for 2 weeks now.
> At the beginning, I checked that the stick tables were properly synced.
> 
> Today, stick tables were not synced, for example :
> 
> root@proxy1>: echo "show table front_jsessionid" | socat stdio 
> /usr/share/haproxy/mysocket
> # table: front_jsessionid, type: string, size:10485760, used:3
> 0xd1d944: key=MkJm-rbcE3V5EJg1RmTRAw__ use=0 exp=3474723 server_id=1
> 0xd762c4: key=Z2rxYufqv7B0C2gIgztSfA__ use=0 exp=3571578 server_id=2
> 0xea9484: key=aUuiOrlUDb7RnvcFMLK9oA__ use=0 exp=2879968 server_id=2
> 
> root@proxy2>: echo "show table front_jsessionid" | socat stdio 
> /usr/share/haproxy/mysocket
> # table: front_jsessionid, type: string, size:10485760, used:5
> 0x1bc9104: key=D8EWzwrGr2UK3btCwfpweQ__ use=0 exp=3552238 server_id=1
> 0x1c3ba54: key=Z2rxYufqv7B0C2gIgztSfA__ use=0 exp=3315450 server_id=1
> 0x1cbecb4: key=aUuiOrlUDb7RnvcFMLK9oA__ use=0 exp=2624851 server_id=2
> 0x1c49664: key=lmqFlaNhdHXYwH4M4QQjqQ__ use=0 exp=3510898 server_id=2
> 0x1cbf074: key=zpE6NHQJr~aStxzmJTVEgA__ use=0 exp=2587567 server_id=2

This is indeed not expected at all!

> I tried :
> echo "set table front_jsessionid key toto data.server_id 1" | socat 
> stdio /usr/share/haproxy/mysocket
> 
> The new entry was not replicated on proxy2, and tcpdump showed no 
> outgoing traffic from proxy1 to proxy2.

So that means that proxy1 was in a bad state.

> Then, I reloaded the haproxy service on proxy2, and tables were synced, 
> including old data set in proxy1 before the reload.

Interesting. Thus is means that not everything is broken, only something
makes a communication stop without breaking the connection, but once the
connection is renewed, it's fine again.

> New data replicates all right too.
> 
> Should this situation happen again, is there anything I can do to debug 
> this further ?

There's not much unfortunately because I have not implemented the
"show peers" action that was on my todo list. The purpose would be to
dump a number of relevant information to debug the protocol, diagnose
connection issues etc.

> --
> 
> Here is netstat output from proxy1 while replication was broken :
> 
> root@proxy1>: netstat -anp |grep 9421
> tcp0  0 [proxy1]:94210.0.0.0:*LISTEN 
>   1163/haproxy
> tcp   35  0 [proxy1]:9421[proxy2]:41205   CLOSE_WAIT 
>  -
> tcp1  0 [proxy1]:20512   [proxy2]:9421CLOSE_WAIT 
>  1163/haproxy
(...)

This proves that a disconnection was already received but ignored by
haproxy. It's possible that the other side has detected a timeout and
tried to reconnect without success.

> Here are relevant parts from our setup :
> 
> peers prod
> peer  proxy1[proxy1]:9421
> peer  proxy2[proxy2]:9421
> 
> backend front
> stick on urlp(jsessionid),url_dec table front_jsessionid
> stick on urlp(jsessionid,;),url_dec table front_jsessionid
> stick on cookie(JSESSIONID) table front_jsessionid
> stick store-response cookie(JSESSIONID) table front_jsessionid
> 
> backend front_jsessionid
> stick-table type string len 24 size 10m expire 1h peers prod

OK it looks pretty fine.

> We are running HAproxy 1.6.2 from the vbernat PPA on Ubuntu 12.04, with 
> nbproc = 1.

Thanks for this precision. All I can say for now is that you clearly
encountered a bug but that we don't know what this bug is. We'll have
to check in the code for something which could cause this. It would
be interesting to know for how long the process has been running before
the issue appeared, eventhough it will not tell us for how long the
connection remained alive.

Thanks,
Willy




Re: stick table replication problem

2015-12-03 Thread Willy Tarreau
On Thu, Dec 03, 2015 at 03:56:45PM +0100, Sylvain Faivre wrote:
> According to our logs, both HAproxy processes were started at Nov 24 
> 11:25:xx and application errors caused by lack of session replication 
> started happenning at Dec  1 17:05:35
> So that's a bit more than 1 week later.
> 
> We'll keep looking for replication errors in the next days, and 
> specially in 1 week, and report back.

OK thank you. I hope it will not affect your activity too much :-/
That just makes me think that having an option to shut down a peers
session on the CLI would help.

Hmmm in fact there's a way to observe the peers session, since now
they're using regular sessions. If you issue "show sess" on the CLI
you'll see some sessions with "fe=" (yes I
know the name is poorly chose and may cause confusion) :

> show sess 
0x7ce468: proto=unix_stream src=unix:1 fe=GLOBAL be= srv= ts=02 
age=2m11s calls=1 rq[f=c08202h,i=0,an=00h,rx=,wx=,ax=] 
rp[f=80048202h,i=0,an=00h,rx=,wx=,ax=] s0=[7,ch,fd=9,ex=] 
s1=[7,4018h,fd=-1,ex=] exp=23h57m
0x7cdd98: proto=tcpv4 src=127.0.0.1:38831 fe=myhost1 be= srv= ts=0a 
age=0s calls=1 rq[f=c08200h,i=0,an=00h,rx=4s,wx=,ax=] 
rp[f=80048202h,i=0,an=00h,rx=,wx=,ax=] s0=[7,48h,fd=8,ex=] 
s1=[7,4058h,fd=-1,ex=] exp=4s

Here it's the second line (my local peer is called "myhost1"). The first line
is my CLI session.

Then "show sess " will show more details (the ID is the pointer
at the beginning of the line) :

> show sess 0x7cdd98
0x7cdd98: [03/Dec/2015:16:01:58.200647] id=81 proto=tcpv4 source=127.0.0.1:38831
  flags=0x80, conn_retries=3, srv_conn=(nil), pend_pos=(nil)
  frontend=myhost1 (id=4294967295 mode=tcp), listener=? (id=0) 
addr=127.0.0.1:8521
  backend= (id=-1 mode=-)
  server= (id=-1)
  task=0x7cab40 (state=0x0a nice=0 calls=1 exp=0s age=4s)
  si[0]=0x7cdf90 (state=EST flags=0x48 endp0=CONN:0x7ca940 exp=, 
et=0x000)
  si[1]=0x7cdfb0 (state=EST flags=0x4058 endp1=APPCTX:0x7ce100 exp=, 
et=0x000)
  co0=0x7ca940 ctrl=tcpv4 xprt=RAW data=STRM target=LISTENER:0x7cb688
  flags=0x0020b306 fd=8 fd.state=25 fd.cache=0 updt=0
  app1=0x7ce100 st0=7 st1=0 st2=0 applet=
  req=0x7cdda8 (f=0xc08200 an=0x0 pipe=0 tofwd=-1 total=37)
  an_exp= rex=0s wex=
  buf=0x77ba20 data=0x77ba34 o=0 p=0 req.next=0 i=0 size=0
  res=0x7cdde8 (f=0x80048202 an=0x0 pipe=0 tofwd=-1 total=4)
  an_exp= rex= wex=
  buf=0x77ba20 data=0x77ba34 o=0 p=0 rsp.next=0 i=0 size=0

You can even shut it down :
> shutdown session 0x7cdd98

Here I have no traffic so the session expires every 5 or 10 seconds and is
renewed. In your case it should not happen obviously.

So such a dump on both nodes before reloading could significantly help if
you face the issue again. If it's too complicated to pick only this session,
you can issue "show sess all", send the huge output to a file and then dig
for "" in this file. It might be easier for an emergency case.

Best regards,
Willy