Re: Haproxy CPU 100%, after running about two weeks

2013-05-03 Thread jinge


Thanks! 
I follow your advise, and upgrade my haproxy. And I will observe if there is 
any problem. 

Regards
Jinge



On 2013-5-2, at 下午3:49, Lukas Tribus luky...@hotmail.com wrote:

 Hi Jinge!
 
 
 I believe you are facing 2 different issues here.
 
 
 
 Today, our haproxy CPU grow to 100%. And the machine become terribly slow.
 
 Please upgrade to recent 1.4 code, you are missing a a few fixes, including
 one a security fix. I suggest the snapshot 20130427 which also includes a
 loop fix (causing 100% load from haproxy). Download at [1].
 
 
 
 [1297314.773541] cleanup rbuf bug: copied DBE7B6DA seq DBE7B3C8 rcvnxt 
 DBE7B6DA 
 [...]
 [1297314.773625] [81046a75] ? warn_slowpath_common+0x78/0x8c 
 
 This is a kernel issue with tcp splicing and has probably been fixed.
 Please see [2]. Not sure if Debian is backporting this fix though.
 
 You could just disable tcp splicing as a intermediate workaround.
 
 
 
 Cheers,
 Lukas
 
 [1] http://haproxy.1wt.eu/download/1.4/src/snapshot/
 [2] http://comments.gmane.org/gmane.linux.network/231555  
   




Re: Haproxy CPU 100%, after running about two weeks

2013-05-02 Thread Willy Tarreau
On Thu, May 02, 2013 at 03:03:01PM +0800, ??? ??? wrote:
 Hi!
 Today, our haproxy CPU grow to 100%. And the machine become terribly slow.

Please check if that's 100% user or 100% system (or even softirq). From your
traces, it looks like it's system because the kernel complains about some
processing to take a long time. It could be a faulty network driver which
has trouble with splice() for example, you see :

 [1297314.773614] Call Trace:
 [1297314.773625]  [81046a75] ? warn_slowpath_common+0x78/0x8c
 [1297314.773629]  [81046b21] ? warn_slowpath_fmt+0x45/0x4a
 [1297314.773632]  [812c0df1] ? tcp_cleanup_rbuf+0x4a/0xfb
 [1297314.773635]  [812c1ed8] ? tcp_read_sock+0x127/0x138
 [1297314.773637]  [812c1fba] ? tcp_splice_read+0xd1/0x21f
 [1297314.773643]  [8111aab7] ? sys_splice+0x389/0x404
 [1297314.773649]  [81351ad2] ? system_call_fastpath+0x16/0x1b
 [1297314.773651] ---[ end trace 54eae6935f54c0f5 ]---

You seem to be using an igb driver. From my experience, splice() is useless
with most gigabit drivers (including e1000e/igb) before kernel 3.5 where
GRO really became efficient. And anyway, on such a machine, you don't need
splicing to forward at gigabit rate !

Best regards,
Willy




RE: Haproxy CPU 100%, after running about two weeks

2013-05-02 Thread Lukas Tribus
Hi Jinge!


I believe you are facing 2 different issues here.



 Today, our haproxy CPU grow to 100%. And the machine become terribly slow.

Please upgrade to recent 1.4 code, you are missing a a few fixes, including
one a security fix. I suggest the snapshot 20130427 which also includes a
loop fix (causing 100% load from haproxy). Download at [1].



 [1297314.773541] cleanup rbuf bug: copied DBE7B6DA seq DBE7B3C8 rcvnxt 
 DBE7B6DA 
 [...]
 [1297314.773625] [81046a75] ? warn_slowpath_common+0x78/0x8c 

This is a kernel issue with tcp splicing and has probably been fixed.
Please see [2]. Not sure if Debian is backporting this fix though.

You could just disable tcp splicing as a intermediate workaround.



Cheers,
Lukas

[1] http://haproxy.1wt.eu/download/1.4/src/snapshot/
[2] http://comments.gmane.org/gmane.linux.network/231555