Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-24 Thread Willy Tarreau
Maksim, On Wed, Apr 24, 2019 at 07:53:08AM +0200, Willy Tarreau wrote: > Hi Maksim, > > On Wed, Apr 24, 2019 at 08:39:23AM +0300, ?? ? wrote: > > Hi! > > > > It seems to me there is something wrong with this patch: for some reason > > process stops responding with 100% CPU used by

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-24 Thread Willy Tarreau
Hi Maksim, On Wed, Apr 24, 2019 at 08:39:23AM +0300, ?? ? wrote: > Hi! > > It seems to me there is something wrong with this patch: for some reason > process stops responding with 100% CPU used by all threads. Ouch! This looks like an awful AB/BA deadlock. Indeed,

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-23 Thread Максим Куприянов
Hi! It seems to me there is something wrong with this patch: for some reason process stops responding with 100% CPU used by all threads. Backtrace: (gdb) thread apply all bt Thread 4 (Thread 0x7fdf68c9c700 (LWP 615744)): #0 0x564fc9a61990 in fwrr_update_server_weight (srv=0x564fcb5014b0) at

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-16 Thread Willy Tarreau
Hi Maksim, On Tue, Apr 16, 2019 at 07:28:28AM +0200, Willy Tarreau wrote: > > So I agree upon another thread activity. The unique thing about > > these servers - all of them use haproxy-agent to set up weights of their > > backends. Other instances with no haproxy-agent in their configs don't > >

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-15 Thread Willy Tarreau
Hi Maksim, On Tue, Apr 16, 2019 at 08:15:42AM +0300, ?? ? wrote: > Hi Willy! > > Actually I don't think this is a CPU fault. The reason is that I have same > cores with non-zero dividers on 4 more hardware servers with different CPU > models. OK that's very useful info, thank you.

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-15 Thread Максим Куприянов
Hi Willy! Actually I don't think this is a CPU fault. The reason is that I have same cores with non-zero dividers on 4 more hardware servers with different CPU models. So I agree upon another thread activity. The unique thing about these servers – all of them use haproxy-agent to set up weights

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-15 Thread Willy Tarreau
Hi Maksim, On Thu, Apr 11, 2019 at 02:03:43PM +0200, Willy Tarreau wrote: > I tried to follow all paths that lead to a zero cur_eweight that I could > find and none of them leave the server in the tree. Then I tried to find > all cases where this entry is updated or used and all are under the

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-11 Thread Willy Tarreau
On Thu, Apr 11, 2019 at 09:37:41PM +0500, ?? ? wrote: > Hello Willy! > > I hope i could find some cores still available and will search for them > tomorrow. Cool! > But since they could contain some sensitive information, its not a good > idea to share it right here on the mail

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-11 Thread Максим Куприянов
Hello Willy! I hope i could find some cores still available and will search for them tomorrow. But since they could contain some sensitive information, its not a good idea to share it right here on the mail list. So could you please tell me some personal email address where I could send the link

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-11 Thread Willy Tarreau
Hi again, On Thu, Apr 11, 2019 at 11:53:28AM +0200, Willy Tarreau wrote: > > Got multiple incidents of failure with 1.9.6: > > Core was generated by `/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p > > /var/run/haproxy'. > > Program terminated with signal SIGFPE, Arithmetic exception. > > #0

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-11 Thread Willy Tarreau
Hi Maxim, On Thu, Apr 04, 2019 at 02:22:59PM +0300, ?? ? wrote: > Hi, everybody! > > Got multiple incidents of failure with 1.9.6: > Core was generated by `/usr/sbin/haproxy -Ws -f /etc/haproxy/haproxy.cfg -p > /var/run/haproxy'. > Program terminated with signal SIGFPE, Arithmetic

Re: 1.9.6: SIGFPE in fwrr_update_position

2019-04-10 Thread Максим Куприянов
Hi! Any news about the reason of these faults? I can mention, that some of our backends set their weights with the help of haproxy agent. Could it be the reason? чт, 4 апр. 2019 г. в 14:22, Максим Куприянов : > Hi, everybody! > > Got multiple incidents of failure with 1.9.6: > Core was