Re: dhclient sucks cpu usage...
On 10.06.2014 07:03, Bryan Venteicher wrote: Hi, - Original Message - So, after finding out that nc has a stupidly small buffer size (2k even though there is space for 16k), I was still not getting as good as performance using nc between machines, so I decided to generate some flame graphs to try to identify issues... (Thanks to who included a full set of modules, including dtraceall on memstick!) So, the first one is: https://www.funkthat.com/~jmg/em.stack.svg As I was browsing around, the em_handle_que was consuming quite a bit of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows me that the taskqueue for em was consuming about 50% cpu... Also pretty high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump or anything, but I think dhclient uses bpf to be able to inject packets and listen in on them, so I kill off dhclient, and instantly, the taskqueue thread for em drops down to 40% CPU... (transfer rate only marginally improves, if it does) I decide to run another flame graph w/o dhclient running: https://www.funkthat.com/~jmg/em.stack.nodhclient.svg and now _rxeof drops from 17.22% to 11.94%, pretty significant... So, if you care about performance, don't run dhclient... Yes, I've noticed the same issue. It can absolutely kill performance in a VM guest. It is much more pronounced on only some of my systems, and I hadn't tracked it down yet. I wonder if this is fallout from the callout work, or if there was some bpf change. I've been using the kludgey workaround patch below. Hm, pretty interesting. dhclient should setup proper filter (and it looks like it does so: 13:10 [0] m@ptichko s netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1224em0 -ifs--l 41225922 011 0 0 dhclient ) see match count. And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for each consumer on interface). It should not introduce significant performance penalties. diff --git a/sys/net/bpf.c b/sys/net/bpf.c index cb3ed27..9751986 100644 --- a/sys/net/bpf.c +++ b/sys/net/bpf.c @@ -2013,9 +2013,11 @@ bpf_gettime(struct bintime *bt, int tstype, struct mbuf *m) return (BPF_TSTAMP_EXTERN); } } +#if 0 if (quality == BPF_TSTAMP_NORMAL) binuptime(bt); else +#endif bpf_getttime() is called IFF packet filter matches some traffic. Can you show your netstat -B output ? getbinuptime(bt); return (quality); -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: dhclient sucks cpu usage...
Alexander V. Chernikov wrote this message on Tue, Jun 10, 2014 at 13:17 +0400: On 10.06.2014 07:03, Bryan Venteicher wrote: Hi, - Original Message - So, after finding out that nc has a stupidly small buffer size (2k even though there is space for 16k), I was still not getting as good as performance using nc between machines, so I decided to generate some flame graphs to try to identify issues... (Thanks to who included a full set of modules, including dtraceall on memstick!) So, the first one is: https://www.funkthat.com/~jmg/em.stack.svg As I was browsing around, the em_handle_que was consuming quite a bit of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows me that the taskqueue for em was consuming about 50% cpu... Also pretty high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump or anything, but I think dhclient uses bpf to be able to inject packets and listen in on them, so I kill off dhclient, and instantly, the taskqueue thread for em drops down to 40% CPU... (transfer rate only marginally improves, if it does) I decide to run another flame graph w/o dhclient running: https://www.funkthat.com/~jmg/em.stack.nodhclient.svg and now _rxeof drops from 17.22% to 11.94%, pretty significant... So, if you care about performance, don't run dhclient... Yes, I've noticed the same issue. It can absolutely kill performance in a VM guest. It is much more pronounced on only some of my systems, and I hadn't tracked it down yet. I wonder if this is fallout from the callout work, or if there was some bpf change. I've been using the kludgey workaround patch below. Hm, pretty interesting. dhclient should setup proper filter (and it looks like it does so: 13:10 [0] m@ptichko s netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1224em0 -ifs--l 41225922 011 0 0 dhclient ) see match count. And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for each consumer on interface). It should not introduce significant performance penalties. Don't forget that it has to process the returning ack's... So, you're looking around 10k+ pps that you have to handle and pass through the filter... That's a lot of packets to process... Just for a bit more double check, instead of using the HD as a source, I used /dev/zero... I ran a netstat -w 1 -I em0 when running the test, and I was getting ~50.7MiB/s w/ dhclient running and then I killed dhclient and it instantly jumped up to ~57.1MiB/s.. So I launched dhclient again, and it dropped back to ~50MiB/s... and some of this slowness is due to nc using small buffers which I will fix shortly.. And with witness disabled it goes from 58MiB/s to 65.7MiB/s.. In both cases, that's a 13% performance improvement by running w/o dhclient... This is using the latest memstick image, r266655 on a (Lenovo T61): FreeBSD 11.0-CURRENT #0 r266655: Sun May 25 18:55:02 UTC 2014 r...@grind.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 WARNING: WITNESS option enabled, expect reduced performance. CPU: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz (1995.05-MHz K8-class CPU) Origin=GenuineIntel Id=0x6fb Family=0x6 Model=0xf Stepping=11 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0xe3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM AMD Features=0x20100800SYSCALL,NX,LM AMD Features2=0x1LAHF TSC: P-state invariant, performance statistics real memory = 2147483648 (2048 MB) avail memory = 2014019584 (1920 MB) -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: dhclient sucks cpu usage...
On 10.06.2014 20:24, John-Mark Gurney wrote: Alexander V. Chernikov wrote this message on Tue, Jun 10, 2014 at 13:17 +0400: On 10.06.2014 07:03, Bryan Venteicher wrote: Hi, - Original Message - So, after finding out that nc has a stupidly small buffer size (2k even though there is space for 16k), I was still not getting as good as performance using nc between machines, so I decided to generate some flame graphs to try to identify issues... (Thanks to who included a full set of modules, including dtraceall on memstick!) So, the first one is: https://www.funkthat.com/~jmg/em.stack.svg As I was browsing around, the em_handle_que was consuming quite a bit of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows me that the taskqueue for em was consuming about 50% cpu... Also pretty high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump or anything, but I think dhclient uses bpf to be able to inject packets and listen in on them, so I kill off dhclient, and instantly, the taskqueue thread for em drops down to 40% CPU... (transfer rate only marginally improves, if it does) I decide to run another flame graph w/o dhclient running: https://www.funkthat.com/~jmg/em.stack.nodhclient.svg and now _rxeof drops from 17.22% to 11.94%, pretty significant... So, if you care about performance, don't run dhclient... Yes, I've noticed the same issue. It can absolutely kill performance in a VM guest. It is much more pronounced on only some of my systems, and I hadn't tracked it down yet. I wonder if this is fallout from the callout work, or if there was some bpf change. I've been using the kludgey workaround patch below. Hm, pretty interesting. dhclient should setup proper filter (and it looks like it does so: 13:10 [0] m@ptichko s netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1224em0 -ifs--l 41225922 011 0 0 dhclient ) see match count. And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for each consumer on interface). It should not introduce significant performance penalties. Don't forget that it has to process the returning ack's... So, you're Well, it can be still captured with the proper filter like ip udp port 67 or port 68. We're using tcpdump on high packet ratios (1M) and it does not influence process _much_. We should probably convert its rwlock to rmlock and use per-cpu counters for statistics, but that's a different story. looking around 10k+ pps that you have to handle and pass through the filter... That's a lot of packets to process... Just for a bit more double check, instead of using the HD as a source, I used /dev/zero... I ran a netstat -w 1 -I em0 when running the test, and I was getting ~50.7MiB/s w/ dhclient running and then I killed dhclient and it instantly jumped up to ~57.1MiB/s.. So I launched dhclient again, and it dropped back to ~50MiB/s... dhclient uses different BPF sockets for reading and writing (and it moves write socket to privileged child process via fork(). The problem we're facing with is the fact that dhclient does not set _any_ read filter on write socket: 21:27 [0] zfscurr0# netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1529em0 --fs--l 86774 86769 86784 4044 3180 dhclient --- ^ -- 1526em0 -ifs--l 86789 0 1 0 0 dhclient so all traffic is pushed down introducing contention on BPF descriptor mutex. (That's why I've asked for netstat -B output.) Please try an attached patch to fix this. This is not the right way to fix this, we'd better change BPF behavior not to attach to interface readers for write-only consumers. This have been partially implemented as net.bpf.optimize_writers hack, but it does not work for all direct BPF consumers (which are not using pcap(3) API). and some of this slowness is due to nc using small buffers which I will fix shortly.. And with witness disabled it goes from 58MiB/s to 65.7MiB/s.. In both cases, that's a 13% performance improvement by running w/o dhclient... This is using the latest memstick image, r266655 on a (Lenovo T61): FreeBSD 11.0-CURRENT #0 r266655: Sun May 25 18:55:02 UTC 2014 r...@grind.freebsd.org:/usr/obj/usr/src/sys/GENERIC amd64 FreeBSD clang version 3.4.1 (tags/RELEASE_34/dot1-final 208032) 20140512 WARNING: WITNESS option enabled, expect reduced performance. CPU: Intel(R) Core(TM)2 Duo CPU T7300 @ 2.00GHz (1995.05-MHz K8-class CPU) Origin=GenuineIntel Id=0x6fb Family=0x6 Model=0xf Stepping=11 Features=0xbfebfbffFPU,VME,DE,PSE,TSC,MSR,PAE,MCE,CX8,APIC,SEP,MTRR,PGE,MCA,CMOV,PAT,PSE36,CLFLUSH,DTS,ACPI,MMX,FXSR,SSE,SSE2,SS,HTT,TM,PBE Features2=0xe3bdSSE3,DTES64,MON,DS_CPL,VMX,EST,TM2,SSSE3,CX16,xTPR,PDCM AMD
Re: dhclient sucks cpu usage...
- Original Message - On 10.06.2014 07:03, Bryan Venteicher wrote: Hi, - Original Message - So, after finding out that nc has a stupidly small buffer size (2k even though there is space for 16k), I was still not getting as good as performance using nc between machines, so I decided to generate some flame graphs to try to identify issues... (Thanks to who included a full set of modules, including dtraceall on memstick!) So, the first one is: https://www.funkthat.com/~jmg/em.stack.svg As I was browsing around, the em_handle_que was consuming quite a bit of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows me that the taskqueue for em was consuming about 50% cpu... Also pretty high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump or anything, but I think dhclient uses bpf to be able to inject packets and listen in on them, so I kill off dhclient, and instantly, the taskqueue thread for em drops down to 40% CPU... (transfer rate only marginally improves, if it does) I decide to run another flame graph w/o dhclient running: https://www.funkthat.com/~jmg/em.stack.nodhclient.svg and now _rxeof drops from 17.22% to 11.94%, pretty significant... So, if you care about performance, don't run dhclient... Yes, I've noticed the same issue. It can absolutely kill performance in a VM guest. It is much more pronounced on only some of my systems, and I hadn't tracked it down yet. I wonder if this is fallout from the callout work, or if there was some bpf change. I've been using the kludgey workaround patch below. Hm, pretty interesting. dhclient should setup proper filter (and it looks like it does so: 13:10 [0] m@ptichko s netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1224em0 -ifs--l 41225922 011 0 0 dhclient ) see match count. And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for each consumer on interface). It should not introduce significant performance penalties. It will be a bit before I'm able to capture that. Here's a Flamegraph from earlier in the year showing an absurd amount of time spent in bpf_mtap(): http://people.freebsd.org/~bryanv/vtnet/vtnet-bpf-10.svg diff --git a/sys/net/bpf.c b/sys/net/bpf.c index cb3ed27..9751986 100644 --- a/sys/net/bpf.c +++ b/sys/net/bpf.c @@ -2013,9 +2013,11 @@ bpf_gettime(struct bintime *bt, int tstype, struct mbuf *m) return (BPF_TSTAMP_EXTERN); } } +#if 0 if (quality == BPF_TSTAMP_NORMAL) binuptime(bt); else +#endif bpf_getttime() is called IFF packet filter matches some traffic. Can you show your netstat -B output ? getbinuptime(bt); return (quality); -- John-Mark GurneyVoice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: dhclient sucks cpu usage...
On 10.06.2014 22:11, Bryan Venteicher wrote: - Original Message - On 10.06.2014 07:03, Bryan Venteicher wrote: Hi, - Original Message - So, after finding out that nc has a stupidly small buffer size (2k even though there is space for 16k), I was still not getting as good as performance using nc between machines, so I decided to generate some flame graphs to try to identify issues... (Thanks to who included a full set of modules, including dtraceall on memstick!) So, the first one is: https://www.funkthat.com/~jmg/em.stack.svg As I was browsing around, the em_handle_que was consuming quite a bit of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows me that the taskqueue for em was consuming about 50% cpu... Also pretty high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump or anything, but I think dhclient uses bpf to be able to inject packets and listen in on them, so I kill off dhclient, and instantly, the taskqueue thread for em drops down to 40% CPU... (transfer rate only marginally improves, if it does) I decide to run another flame graph w/o dhclient running: https://www.funkthat.com/~jmg/em.stack.nodhclient.svg and now _rxeof drops from 17.22% to 11.94%, pretty significant... So, if you care about performance, don't run dhclient... Yes, I've noticed the same issue. It can absolutely kill performance in a VM guest. It is much more pronounced on only some of my systems, and I hadn't tracked it down yet. I wonder if this is fallout from the callout work, or if there was some bpf change. I've been using the kludgey workaround patch below. Hm, pretty interesting. dhclient should setup proper filter (and it looks like it does so: 13:10 [0] m@ptichko s netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1224em0 -ifs--l 41225922 011 0 0 dhclient ) see match count. And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for each consumer on interface). It should not introduce significant performance penalties. It will be a bit before I'm able to capture that. Here's a Flamegraph from earlier in the year showing an absurd amount of time spent in bpf_mtap(): Can you briefly describe test setup? (Actually I'm interested in overall pps rate, bpf filter used and match ratio). For example, for some random box at $work: 22:17 [0] m@sas1-fw1 netstat -I vlan802 -w1 input (vlan802) output packets errs idrops bytespackets errs bytes colls 430418 0 0 337712454 396282 0 333207773 0 CPU: 0.4% user, 0.0% nice, 1.2% system, 15.9% interrupt, 82.5% idle 2:17 [0] sas1-fw1# tcpdump -i vlan802 -lnps0 icmp and host X.X.X.X tcpdump: verbose output suppressed, use -v or -vv for full protocol decode listening on vlan802, link-type EN10MB (Ethernet), capture size 65535 bytes 22:17:14.866085 IP X.X.X.X Y.Y.Y.Y: ICMP echo request, id 6730, seq 1, length 64 22:17 [0] m@sas1-fw1 s netstat -B 2/dev/null | grep tcpdump 98520 vlan802 ---s--- 27979422 040 0 0 tcpdump CPU: 0.9% user, 0.0% nice, 2.7% system, 17.6% interrupt, 78.8% idle (Actually the load is floating due to bursty traffic in 14-20% rate but I can't see much difference with tcpdump turned on/off). http://people.freebsd.org/~bryanv/vtnet/vtnet-bpf-10.svg diff --git a/sys/net/bpf.c b/sys/net/bpf.c index cb3ed27..9751986 100644 --- a/sys/net/bpf.c +++ b/sys/net/bpf.c @@ -2013,9 +2013,11 @@ bpf_gettime(struct bintime *bt, int tstype, struct mbuf *m) return (BPF_TSTAMP_EXTERN); } } +#if 0 if (quality == BPF_TSTAMP_NORMAL) binuptime(bt); else +#endif bpf_getttime() is called IFF packet filter matches some traffic. Can you show your netstat -B output ? getbinuptime(bt); return (quality); -- John-Mark GurneyVoice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-...@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-net To unsubscribe, send any mail to freebsd-net-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: dhclient sucks cpu usage...
Alexander V. Chernikov wrote this message on Tue, Jun 10, 2014 at 22:21 +0400: On 10.06.2014 22:11, Bryan Venteicher wrote: - Original Message - On 10.06.2014 07:03, Bryan Venteicher wrote: Hi, - Original Message - So, after finding out that nc has a stupidly small buffer size (2k even though there is space for 16k), I was still not getting as good as performance using nc between machines, so I decided to generate some flame graphs to try to identify issues... (Thanks to who included a full set of modules, including dtraceall on memstick!) So, the first one is: https://www.funkthat.com/~jmg/em.stack.svg As I was browsing around, the em_handle_que was consuming quite a bit of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows me that the taskqueue for em was consuming about 50% cpu... Also pretty high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump or anything, but I think dhclient uses bpf to be able to inject packets and listen in on them, so I kill off dhclient, and instantly, the taskqueue thread for em drops down to 40% CPU... (transfer rate only marginally improves, if it does) I decide to run another flame graph w/o dhclient running: https://www.funkthat.com/~jmg/em.stack.nodhclient.svg and now _rxeof drops from 17.22% to 11.94%, pretty significant... So, if you care about performance, don't run dhclient... Yes, I've noticed the same issue. It can absolutely kill performance in a VM guest. It is much more pronounced on only some of my systems, and I hadn't tracked it down yet. I wonder if this is fallout from the callout work, or if there was some bpf change. I've been using the kludgey workaround patch below. Hm, pretty interesting. dhclient should setup proper filter (and it looks like it does so: 13:10 [0] m@ptichko s netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1224em0 -ifs--l 41225922 011 0 0 dhclient ) see match count. And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for each consumer on interface). It should not introduce significant performance penalties. It will be a bit before I'm able to capture that. Here's a Flamegraph from earlier in the year showing an absurd amount of time spent in bpf_mtap(): Can you briefly describe test setup? For mine, one machine is sink: nc -l 2387 /dev/null The machine w/ dhclient is source: nc carbon 2387 /dev/zero (Actually I'm interested in overall pps rate, bpf filter used and match ratio). the overal rate is ~26k pps both in and out (so total ~52kpps)... So, netstat -B; sleep 5; netstat -B gives: Pid Netif Flags Recv Drop Match Sblen Hblen Command 919em0 --fs--l 6275907 6275938 6275961 4060 2236 dhclient 937em0 -ifs--l 6275992 0 1 0 0 dhclient Pid Netif Flags Recv Drop Match Sblen Hblen Command 919em0 --fs--l 6539717 6539752 6539775 4060 2236 dhclient 937em0 -ifs--l 6539806 0 1 0 0 dhclient -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: dhclient sucks cpu usage...
Alexander V. Chernikov wrote this message on Tue, Jun 10, 2014 at 21:33 +0400: On 10.06.2014 20:24, John-Mark Gurney wrote: Alexander V. Chernikov wrote this message on Tue, Jun 10, 2014 at 13:17 +0400: On 10.06.2014 07:03, Bryan Venteicher wrote: Hi, - Original Message - So, after finding out that nc has a stupidly small buffer size (2k even though there is space for 16k), I was still not getting as good as performance using nc between machines, so I decided to generate some flame graphs to try to identify issues... (Thanks to who included a full set of modules, including dtraceall on memstick!) So, the first one is: https://www.funkthat.com/~jmg/em.stack.svg As I was browsing around, the em_handle_que was consuming quite a bit of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows me that the taskqueue for em was consuming about 50% cpu... Also pretty high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump or anything, but I think dhclient uses bpf to be able to inject packets and listen in on them, so I kill off dhclient, and instantly, the taskqueue thread for em drops down to 40% CPU... (transfer rate only marginally improves, if it does) I decide to run another flame graph w/o dhclient running: https://www.funkthat.com/~jmg/em.stack.nodhclient.svg and now _rxeof drops from 17.22% to 11.94%, pretty significant... So, if you care about performance, don't run dhclient... Yes, I've noticed the same issue. It can absolutely kill performance in a VM guest. It is much more pronounced on only some of my systems, and I hadn't tracked it down yet. I wonder if this is fallout from the callout work, or if there was some bpf change. I've been using the kludgey workaround patch below. Hm, pretty interesting. dhclient should setup proper filter (and it looks like it does so: 13:10 [0] m@ptichko s netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1224em0 -ifs--l 41225922 011 0 0 dhclient ) see match count. And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for each consumer on interface). It should not introduce significant performance penalties. Don't forget that it has to process the returning ack's... So, you're Well, it can be still captured with the proper filter like ip udp port 67 or port 68. We're using tcpdump on high packet ratios (1M) and it does not influence process _much_. We should probably convert its rwlock to rmlock and use per-cpu counters for statistics, but that's a different story. looking around 10k+ pps that you have to handle and pass through the filter... That's a lot of packets to process... Just for a bit more double check, instead of using the HD as a source, I used /dev/zero... I ran a netstat -w 1 -I em0 when running the test, and I was getting ~50.7MiB/s w/ dhclient running and then I killed dhclient and it instantly jumped up to ~57.1MiB/s.. So I launched dhclient again, and it dropped back to ~50MiB/s... dhclient uses different BPF sockets for reading and writing (and it moves write socket to privileged child process via fork(). The problem we're facing with is the fact that dhclient does not set _any_ read filter on write socket: 21:27 [0] zfscurr0# netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1529em0 --fs--l 86774 86769 86784 4044 3180 dhclient --- ^ -- 1526em0 -ifs--l 86789 0 1 0 0 dhclient so all traffic is pushed down introducing contention on BPF descriptor mutex. (That's why I've asked for netstat -B output.) Please try an attached patch to fix this. This is not the right way to fix this, we'd better change BPF behavior not to attach to interface readers for write-only consumers. This have been partially implemented as net.bpf.optimize_writers hack, but it does not work for all direct BPF consumers (which are not using pcap(3) API). Ok, looks like this patch helps the issue... netstat -B; sleep 5; netstat -B: Pid Netif Flags Recv Drop Match Sblen Hblen Command 958em0 --fs--l 3881435 3868 2236 dhclient 976em0 -ifs--l 3880014 0 1 0 0 dhclient Pid Netif Flags Recv Drop Match Sblen Hblen Command 958em0 --fs--l 41785251435 3868 2236 dhclient 976em0 -ifs--l 4178539 0 1 0 0 dhclient and now the rate only drops from ~66MiB/s to ~63MiB/s when dhclient is running... Still a significant drop (5%), but better than before... -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not.
Re: dhclient sucks cpu usage...
On 10.06.2014 22:56, John-Mark Gurney wrote: Alexander V. Chernikov wrote this message on Tue, Jun 10, 2014 at 21:33 +0400: On 10.06.2014 20:24, John-Mark Gurney wrote: Alexander V. Chernikov wrote this message on Tue, Jun 10, 2014 at 13:17 +0400: On 10.06.2014 07:03, Bryan Venteicher wrote: Hi, - Original Message - So, after finding out that nc has a stupidly small buffer size (2k even though there is space for 16k), I was still not getting as good as performance using nc between machines, so I decided to generate some flame graphs to try to identify issues... (Thanks to who included a full set of modules, including dtraceall on memstick!) So, the first one is: https://www.funkthat.com/~jmg/em.stack.svg As I was browsing around, the em_handle_que was consuming quite a bit of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows me that the taskqueue for em was consuming about 50% cpu... Also pretty high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump or anything, but I think dhclient uses bpf to be able to inject packets and listen in on them, so I kill off dhclient, and instantly, the taskqueue thread for em drops down to 40% CPU... (transfer rate only marginally improves, if it does) I decide to run another flame graph w/o dhclient running: https://www.funkthat.com/~jmg/em.stack.nodhclient.svg and now _rxeof drops from 17.22% to 11.94%, pretty significant... So, if you care about performance, don't run dhclient... Yes, I've noticed the same issue. It can absolutely kill performance in a VM guest. It is much more pronounced on only some of my systems, and I hadn't tracked it down yet. I wonder if this is fallout from the callout work, or if there was some bpf change. I've been using the kludgey workaround patch below. Hm, pretty interesting. dhclient should setup proper filter (and it looks like it does so: 13:10 [0] m@ptichko s netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1224em0 -ifs--l 41225922 011 0 0 dhclient ) see match count. And BPF itself adds the cost of read rwlock (+ bgp_filter() calls for each consumer on interface). It should not introduce significant performance penalties. Don't forget that it has to process the returning ack's... So, you're Well, it can be still captured with the proper filter like ip udp port 67 or port 68. We're using tcpdump on high packet ratios (1M) and it does not influence process _much_. We should probably convert its rwlock to rmlock and use per-cpu counters for statistics, but that's a different story. looking around 10k+ pps that you have to handle and pass through the filter... That's a lot of packets to process... Just for a bit more double check, instead of using the HD as a source, I used /dev/zero... I ran a netstat -w 1 -I em0 when running the test, and I was getting ~50.7MiB/s w/ dhclient running and then I killed dhclient and it instantly jumped up to ~57.1MiB/s.. So I launched dhclient again, and it dropped back to ~50MiB/s... dhclient uses different BPF sockets for reading and writing (and it moves write socket to privileged child process via fork(). The problem we're facing with is the fact that dhclient does not set _any_ read filter on write socket: 21:27 [0] zfscurr0# netstat -B Pid Netif Flags Recv Drop Match Sblen Hblen Command 1529em0 --fs--l 86774 86769 86784 4044 3180 dhclient --- ^ -- 1526em0 -ifs--l 86789 0 1 0 0 dhclient so all traffic is pushed down introducing contention on BPF descriptor mutex. (That's why I've asked for netstat -B output.) Please try an attached patch to fix this. This is not the right way to fix this, we'd better change BPF behavior not to attach to interface readers for write-only consumers. This have been partially implemented as net.bpf.optimize_writers hack, but it does not work for all direct BPF consumers (which are not using pcap(3) API). Ok, looks like this patch helps the issue... netstat -B; sleep 5; netstat -B: Pid Netif Flags Recv Drop Match Sblen Hblen Command 958em0 --fs--l 3881435 3868 2236 dhclient 976em0 -ifs--l 3880014 0 1 0 0 dhclient Pid Netif Flags Recv Drop Match Sblen Hblen Command 958em0 --fs--l 41785251435 3868 2236 dhclient 976em0 -ifs--l 4178539 0 1 0 0 dhclient and now the rate only drops from ~66MiB/s to ~63MiB/s when dhclient is running... Still a significant drop (5%), but better than before... Interesting. Can you provide some traces (pmc or dtrace ones)? I'm unsure if this will help, but it's
dhclient sucks cpu usage...
So, after finding out that nc has a stupidly small buffer size (2k even though there is space for 16k), I was still not getting as good as performance using nc between machines, so I decided to generate some flame graphs to try to identify issues... (Thanks to who included a full set of modules, including dtraceall on memstick!) So, the first one is: https://www.funkthat.com/~jmg/em.stack.svg As I was browsing around, the em_handle_que was consuming quite a bit of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows me that the taskqueue for em was consuming about 50% cpu... Also pretty high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump or anything, but I think dhclient uses bpf to be able to inject packets and listen in on them, so I kill off dhclient, and instantly, the taskqueue thread for em drops down to 40% CPU... (transfer rate only marginally improves, if it does) I decide to run another flame graph w/o dhclient running: https://www.funkthat.com/~jmg/em.stack.nodhclient.svg and now _rxeof drops from 17.22% to 11.94%, pretty significant... So, if you care about performance, don't run dhclient... -- John-Mark Gurney Voice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org
Re: dhclient sucks cpu usage...
Hi, - Original Message - So, after finding out that nc has a stupidly small buffer size (2k even though there is space for 16k), I was still not getting as good as performance using nc between machines, so I decided to generate some flame graphs to try to identify issues... (Thanks to who included a full set of modules, including dtraceall on memstick!) So, the first one is: https://www.funkthat.com/~jmg/em.stack.svg As I was browsing around, the em_handle_que was consuming quite a bit of cpu usage for only doing ~50MB/sec over gige.. Running top -SH shows me that the taskqueue for em was consuming about 50% cpu... Also pretty high for only 50MB/sec... Looking closer, you'll see that bpf_mtap is consuming ~3.18% (under ether_nh_input).. I know I'm not running tcpdump or anything, but I think dhclient uses bpf to be able to inject packets and listen in on them, so I kill off dhclient, and instantly, the taskqueue thread for em drops down to 40% CPU... (transfer rate only marginally improves, if it does) I decide to run another flame graph w/o dhclient running: https://www.funkthat.com/~jmg/em.stack.nodhclient.svg and now _rxeof drops from 17.22% to 11.94%, pretty significant... So, if you care about performance, don't run dhclient... Yes, I've noticed the same issue. It can absolutely kill performance in a VM guest. It is much more pronounced on only some of my systems, and I hadn't tracked it down yet. I wonder if this is fallout from the callout work, or if there was some bpf change. I've been using the kludgey workaround patch below. diff --git a/sys/net/bpf.c b/sys/net/bpf.c index cb3ed27..9751986 100644 --- a/sys/net/bpf.c +++ b/sys/net/bpf.c @@ -2013,9 +2013,11 @@ bpf_gettime(struct bintime *bt, int tstype, struct mbuf *m) return (BPF_TSTAMP_EXTERN); } } +#if 0 if (quality == BPF_TSTAMP_NORMAL) binuptime(bt); else +#endif getbinuptime(bt); return (quality); -- John-Mark GurneyVoice: +1 415 225 5579 All that I will do, has been done, All that I have, has not. ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org ___ freebsd-current@freebsd.org mailing list http://lists.freebsd.org/mailman/listinfo/freebsd-current To unsubscribe, send any mail to freebsd-current-unsubscr...@freebsd.org