Re: Haproxy CPU 100%, after running about two weeks

2013-05-03 Thread jinge


Thanks! 
I follow your advise, and upgrade my haproxy. And I will observe if there is 
any problem. 

Regards
Jinge



On 2013-5-2, at 下午3:49, Lukas Tribus luky...@hotmail.com wrote:

 Hi Jinge!
 
 
 I believe you are facing 2 different issues here.
 
 
 
 Today, our haproxy CPU grow to 100%. And the machine become terribly slow.
 
 Please upgrade to recent 1.4 code, you are missing a a few fixes, including
 one a security fix. I suggest the snapshot 20130427 which also includes a
 loop fix (causing 100% load from haproxy). Download at [1].
 
 
 
 [1297314.773541] cleanup rbuf bug: copied DBE7B6DA seq DBE7B3C8 rcvnxt 
 DBE7B6DA 
 [...]
 [1297314.773625] [81046a75] ? warn_slowpath_common+0x78/0x8c 
 
 This is a kernel issue with tcp splicing and has probably been fixed.
 Please see [2]. Not sure if Debian is backporting this fix though.
 
 You could just disable tcp splicing as a intermediate workaround.
 
 
 
 Cheers,
 Lukas
 
 [1] http://haproxy.1wt.eu/download/1.4/src/snapshot/
 [2] http://comments.gmane.org/gmane.linux.network/231555  
   




Haproxy CPU 100%, after running about two weeks

2013-05-02 Thread 金 戈
Hi!
Today, our haproxy CPU grow to 100%. And the machine become terribly slow.

There are some messages below.

model name  : Quad-Core AMD Opteron(tm) Processor 2384*2
memory: 8GB
NIC: Intel Corporation 82576 Gigabit Network
Linux haproxy 3.2.0-4-amd64 #1 SMP Debian 3.2.32-1 x86_64 GNU/Linux

root@haproxybackup:/usr/local/etc# haproxy -vv
HA-Proxy version 1.4.22 2012/08/09
Copyright 2000-2012 Willy Tarreau w...@1wt.eu

Build options :
  TARGET  = linux26
  CPU = generic
  CC  = gcc
  CFLAGS  = -m64 -march=x86-64 -O2 -g -fno-strict-aliasing
  OPTIONS = USE_LINUX_SPLICE=1 USE_LINUX_TPROXY=1 USE_EPOLL=1 USE_REGPARM=1 
USE_PCRE=1 USE_STATIC_PCRE=1

Default settings :
  maxconn = 2000, bufsize = 16384, maxrewrite = 8192, maxpollevents = 200

Encrypted password support via crypt(3): yes

Available polling systems :
 sepoll : pref=400,  test result OK
  epoll : pref=300,  test result OK
   poll : pref=200,  test result OK
 select : pref=150,  test result OK
Total: 4 (4 usable), will use sepoll.

and the coredump messages

[1297314.773522] [ cut here ]
[1297314.773536] WARNING: at 
/build/buildd-linux_3.2.32-1-amd64-bkoeca/linux-3.2.32/net/ipv4/tcp.c:1201 
tcp_cleanup_rbuf+0x4a/0xfb()
[1297314.773539] Hardware name: PowerEdge SC1435
[1297314.773541] cleanup rbuf bug: copied DBE7B6DA seq DBE7B3C8 rcvnxt DBE7B6DA
[1297314.773542] Modules linked in: ipt_REDIRECT xt_TPROXY nf_tproxy_core 
xt_set xt_mark xt_socket nf_defrag_ipv6 ip6_tables xt_tcpudp ip_set_hash_net 
ip_set nfnetlink iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 
iptable_raw iptable_mangle iptable_filter ip_tables x_tables ip_vs nf_conntrack 
crc32c libcrc32c nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc 8021q garp 
stp bonding tcp_htcp ext4 crc16 jbd2 loop radeon ttm drm_kms_helper drm 
power_supply i2c_algo_bit k10temp mperf i2c_piix4 i2c_core processor shpchp 
amd64_edac_mod edac_mce_amd edac_core snd_pcm snd_page_alloc snd_timer snd 
soundcore psmouse dcdbas serio_raw pcspkr evdev button thermal_sys ext2 mbcache 
microcode sg sr_mod cdrom usbhid hid sd_mod crc_t10dif ata_generic ohci_hcd 
igb(O) pata_serverworks dca sata_svw libata ehci_hcd tg3 usbcore libphy 
scsi_mod usb_common [last unloaded: scsi_wait_scan]
[1297314.773612] Pid: 23579, comm: haproxy Tainted: G   O 3.2.0-4-amd64 
#1 Debian 3.2.32-1
[1297314.773614] Call Trace:
[1297314.773625]  [81046a75] ? warn_slowpath_common+0x78/0x8c
[1297314.773629]  [81046b21] ? warn_slowpath_fmt+0x45/0x4a
[1297314.773632]  [812c0df1] ? tcp_cleanup_rbuf+0x4a/0xfb
[1297314.773635]  [812c1ed8] ? tcp_read_sock+0x127/0x138
[1297314.773637]  [812c1fba] ? tcp_splice_read+0xd1/0x21f
[1297314.773643]  [8111aab7] ? sys_splice+0x389/0x404
[1297314.773649]  [81351ad2] ? system_call_fastpath+0x16/0x1b
[1297314.773651] ---[ end trace 54eae6935f54c0f5 ]---
[1297315.006259] [ cut here ]
[1297315.006270] WARNING: at 
/build/buildd-linux_3.2.32-1-amd64-bkoeca/linux-3.2.32/net/ipv4/tcp.c:1495 
tcp_recvmsg+0x24e/0x8f7()
[1297315.006273] Hardware name: PowerEdge SC1435
[1297315.006276] recvmsg bug 2: copied DBE7B6DA seq DBE7A5BE rcvnxt DBE7B6DA fl 0
[1297315.006277] Modules linked in: ipt_REDIRECT xt_TPROXY nf_tproxy_core 
xt_set xt_mark xt_socket nf_defrag_ipv6 ip6_tables xt_tcpudp ip_set_hash_net 
ip_set nfnetlink iptable_nat nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 
iptable_raw iptable_mangle iptable_filter ip_tables x_tables ip_vs nf_conntrack 
crc32c libcrc32c nfsd nfs nfs_acl auth_rpcgss fscache lockd sunrpc 8021q garp 
stp bonding tcp_htcp ext4 crc16 jbd2 loop radeon ttm drm_kms_helper drm 
power_supply i2c_algo_bit k10temp mperf i2c_piix4 i2c_core processor shpchp 
amd64_edac_mod edac_mce_amd edac_core snd_pcm snd_page_alloc snd_timer snd 
soundcore psmouse dcdbas serio_raw pcspkr evdev button thermal_sys ext2 mbcache 
microcode sg sr_mod cdrom usbhid hid sd_mod crc_t10dif ata_generic ohci_hcd 
igb(O) pata_serverworks dca sata_svw libata ehci_hcd tg3 usbcore libphy 
scsi_mod usb_common [last unloaded: scsi_wait_scan]
[1297315.006357] Pid: 23579, comm: haproxy Tainted: GW  O 3.2.0-4-amd64 
#1 Debian 3.2.32-1
[1297315.006359] Call Trace:
[1297315.006369]  [81046a75] ? warn_slowpath_common+0x78/0x8c
[1297315.006372]  [81046b21] ? warn_slowpath_fmt+0x45/0x4a
[1297315.006377]  [8134cc83] ? _raw_spin_lock_bh+0xe/0x1c
[1297315.006381]  [810363d8] ? should_resched+0x5/0x23
[1297315.006383]  [812c1708] ? tcp_recvmsg+0x24e/0x8f7
[1297315.006386]  [8134cc83] ? _raw_spin_lock_bh+0xe/0x1c
[1297315.006388]  [810363d8] ? should_resched+0x5/0x23
[1297315.006393]  [812daf7e] ? inet_recvmsg+0x5b/0x6f
[1297315.006397]  [8127d060] ? sock_recvmsg+0xcd/0xec
[1297315.006402]  [8128f562] ? dev_queue_xmit+0x448/0x45b
[1297315.006407]  [810ea1a2] ? virt_to_head_page+0x6/0x29
[1297315.006410]  

Re: Haproxy CPU 100%, after running about two weeks

2013-05-02 Thread Willy Tarreau
On Thu, May 02, 2013 at 03:03:01PM +0800, ??? ??? wrote:
 Hi!
 Today, our haproxy CPU grow to 100%. And the machine become terribly slow.

Please check if that's 100% user or 100% system (or even softirq). From your
traces, it looks like it's system because the kernel complains about some
processing to take a long time. It could be a faulty network driver which
has trouble with splice() for example, you see :

 [1297314.773614] Call Trace:
 [1297314.773625]  [81046a75] ? warn_slowpath_common+0x78/0x8c
 [1297314.773629]  [81046b21] ? warn_slowpath_fmt+0x45/0x4a
 [1297314.773632]  [812c0df1] ? tcp_cleanup_rbuf+0x4a/0xfb
 [1297314.773635]  [812c1ed8] ? tcp_read_sock+0x127/0x138
 [1297314.773637]  [812c1fba] ? tcp_splice_read+0xd1/0x21f
 [1297314.773643]  [8111aab7] ? sys_splice+0x389/0x404
 [1297314.773649]  [81351ad2] ? system_call_fastpath+0x16/0x1b
 [1297314.773651] ---[ end trace 54eae6935f54c0f5 ]---

You seem to be using an igb driver. From my experience, splice() is useless
with most gigabit drivers (including e1000e/igb) before kernel 3.5 where
GRO really became efficient. And anyway, on such a machine, you don't need
splicing to forward at gigabit rate !

Best regards,
Willy




RE: Haproxy CPU 100%, after running about two weeks

2013-05-02 Thread Lukas Tribus
Hi Jinge!


I believe you are facing 2 different issues here.



 Today, our haproxy CPU grow to 100%. And the machine become terribly slow.

Please upgrade to recent 1.4 code, you are missing a a few fixes, including
one a security fix. I suggest the snapshot 20130427 which also includes a
loop fix (causing 100% load from haproxy). Download at [1].



 [1297314.773541] cleanup rbuf bug: copied DBE7B6DA seq DBE7B3C8 rcvnxt 
 DBE7B6DA 
 [...]
 [1297314.773625] [81046a75] ? warn_slowpath_common+0x78/0x8c 

This is a kernel issue with tcp splicing and has probably been fixed.
Please see [2]. Not sure if Debian is backporting this fix though.

You could just disable tcp splicing as a intermediate workaround.



Cheers,
Lukas

[1] http://haproxy.1wt.eu/download/1.4/src/snapshot/
[2] http://comments.gmane.org/gmane.linux.network/231555