Re: Haproxy CPU 100%, after running about two weeks
Thanks! I follow your advise, and upgrade my haproxy. And I will observe if there is any problem. Regards Jinge On 2013-5-2, at 下午3:49, Lukas Tribus luky...@hotmail.com wrote: Hi Jinge! I believe you are facing 2 different issues here. Today, our haproxy CPU grow to 100%. And the machine become terribly slow. Please upgrade to recent 1.4 code, you are missing a a few fixes, including one a security fix. I suggest the snapshot 20130427 which also includes a loop fix (causing 100% load from haproxy). Download at [1]. [1297314.773541] cleanup rbuf bug: copied DBE7B6DA seq DBE7B3C8 rcvnxt DBE7B6DA [...] [1297314.773625] [81046a75] ? warn_slowpath_common+0x78/0x8c This is a kernel issue with tcp splicing and has probably been fixed. Please see [2]. Not sure if Debian is backporting this fix though. You could just disable tcp splicing as a intermediate workaround. Cheers, Lukas [1] http://haproxy.1wt.eu/download/1.4/src/snapshot/ [2] http://comments.gmane.org/gmane.linux.network/231555
Re: Haproxy CPU 100%, after running about two weeks
On Thu, May 02, 2013 at 03:03:01PM +0800, ??? ??? wrote: Hi! Today, our haproxy CPU grow to 100%. And the machine become terribly slow. Please check if that's 100% user or 100% system (or even softirq). From your traces, it looks like it's system because the kernel complains about some processing to take a long time. It could be a faulty network driver which has trouble with splice() for example, you see : [1297314.773614] Call Trace: [1297314.773625] [81046a75] ? warn_slowpath_common+0x78/0x8c [1297314.773629] [81046b21] ? warn_slowpath_fmt+0x45/0x4a [1297314.773632] [812c0df1] ? tcp_cleanup_rbuf+0x4a/0xfb [1297314.773635] [812c1ed8] ? tcp_read_sock+0x127/0x138 [1297314.773637] [812c1fba] ? tcp_splice_read+0xd1/0x21f [1297314.773643] [8111aab7] ? sys_splice+0x389/0x404 [1297314.773649] [81351ad2] ? system_call_fastpath+0x16/0x1b [1297314.773651] ---[ end trace 54eae6935f54c0f5 ]--- You seem to be using an igb driver. From my experience, splice() is useless with most gigabit drivers (including e1000e/igb) before kernel 3.5 where GRO really became efficient. And anyway, on such a machine, you don't need splicing to forward at gigabit rate ! Best regards, Willy
RE: Haproxy CPU 100%, after running about two weeks
Hi Jinge! I believe you are facing 2 different issues here. Today, our haproxy CPU grow to 100%. And the machine become terribly slow. Please upgrade to recent 1.4 code, you are missing a a few fixes, including one a security fix. I suggest the snapshot 20130427 which also includes a loop fix (causing 100% load from haproxy). Download at [1]. [1297314.773541] cleanup rbuf bug: copied DBE7B6DA seq DBE7B3C8 rcvnxt DBE7B6DA [...] [1297314.773625] [81046a75] ? warn_slowpath_common+0x78/0x8c This is a kernel issue with tcp splicing and has probably been fixed. Please see [2]. Not sure if Debian is backporting this fix though. You could just disable tcp splicing as a intermediate workaround. Cheers, Lukas [1] http://haproxy.1wt.eu/download/1.4/src/snapshot/ [2] http://comments.gmane.org/gmane.linux.network/231555