Re: L2 network namespace benchmarking (resend with Service Demand)
Daniel Lezcano [EMAIL PROTECTED] writes: Hi, as suggested Rick, I added the Service Demand results to the matrix. A couple of random thoughts in trying to understand the numbers you are seeing. - Checksum offloading? You have noted that with the bridge netfilter support disabled you are still seeing additional checksum overhead. Just like you are seeing in the routing case. Is it possible the problem is simply that etun doesn't support checksum offloading, while your normal test hardware does? - Tagged VLANs? Currently you have tested bridging and routing to get the packets to a network namespace. Could you test tagged vlans? I'm just curious if we have anything in the network stack today that will multiplex a NIC without measurable overhead. - Without NETNS? We should probably see if we can setup the same configuration we are testing without network namespaces (just multiple interfaces on the same machine) and see if we can still measure the same overhead. Just to confirm the overhead is not a network namespace related thing. I know we can configure the same case with bridging and I am fairly confident that we will see the same overhead without network namespaces. Of the top of my head I am insufficiently clever to think how we could configure the routing case without network namespaces, although we might be able to force it and if so it would be interesting to measure. I will work to get the etun setup races fixed and to fix whatever obvious feature deficiencies it has (like no configurable MTU support) and see if I can get that pushed upstream. That should make it easier for other people to reproduce what we are seeing. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking (resend with Service Demand)
Eric W. Biederman wrote: Daniel Lezcano [EMAIL PROTECTED] writes: Hi, as suggested Rick, I added the Service Demand results to the matrix. A couple of random thoughts in trying to understand the numbers you are seeing. - Checksum offloading? You have noted that with the bridge netfilter support disabled you are still seeing additional checksum overhead. Just like you are seeing in the routing case. Is it possible the problem is simply that etun doesn't support checksum offloading, while your normal test hardware does? Looks like you are 100% correct. I feel a bit stupid I didn't think about this small difference between real NIC and etun. If I turn off checksum offloading on my physical NIC, the checksum overhead (load) measured by oprofile is about the same in both case: when running netperf through a real NIC or through an etun tunnel first. Benjamin - Tagged VLANs? Currently you have tested bridging and routing to get the packets to a network namespace. Could you test tagged vlans? I'm just curious if we have anything in the network stack today that will multiplex a NIC without measurable overhead. - Without NETNS? We should probably see if we can setup the same configuration we are testing without network namespaces (just multiple interfaces on the same machine) and see if we can still measure the same overhead. Just to confirm the overhead is not a network namespace related thing. I know we can configure the same case with bridging and I am fairly confident that we will see the same overhead without network namespaces. Of the top of my head I am insufficiently clever to think how we could configure the routing case without network namespaces, although we might be able to force it and if so it would be interesting to measure. I will work to get the etun setup races fixed and to fix whatever obvious feature deficiencies it has (like no configurable MTU support) and see if I can get that pushed upstream. That should make it easier for other people to reproduce what we are seeing. Eric ___ Containers mailing list [EMAIL PROTECTED] https://lists.linux-foundation.org/mailman/listinfo/containers -- B e n j a m i n T h e r y - BULL/DT/Open Software RD http://www.bull.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking (resend with Service Demand)
Benjamin Thery [EMAIL PROTECTED] writes: Eric W. Biederman wrote: A couple of random thoughts in trying to understand the numbers you are seeing. - Checksum offloading? You have noted that with the bridge netfilter support disabled you are still seeing additional checksum overhead. Just like you are seeing in the routing case. Is it possible the problem is simply that etun doesn't support checksum offloading, while your normal test hardware does? Looks like you are 100% correct. I feel a bit stupid I didn't think about this small difference between real NIC and etun. If I turn off checksum offloading on my physical NIC, the checksum overhead (load) measured by oprofile is about the same in both case: when running netperf through a real NIC or through an etun tunnel first. Interesting. You can also 'enable' checksum offloading when using etun with ethtool. Which should just tell the kernel not to do checksumming. A bad idea in general but it might be useful in confirming where the performance overhead is coming from, and when used with routing I believe it is safe. When used with bridging I don't know. Thinking about it the ideal situation is to preserve skb-ip_summed it if came from another device, instead of unconditionally setting it. I need to take a good hard look at etun_xmit and make certain we are dotting all of the i's and crossing all of the t's for best performance and compatibility with the rest of the network stack. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
L2 network namespace benchmarking (resend with Service Demand)
Hi, as suggested Rick, I added the Service Demand results to the matrix. Cheers. Hi, I did some benchmarking on the existing L2 network namespaces. These patches are included in the lxc patchset at: http://lxc.sourceforge.net/patches/2.6.20 The lxc7 patchset series contains Dmitry's patchset The lxc8 patchset series contains Eric's patchset Here are the following scenarii I made in order to do some simple benchmarking on the network namespace. I tested three kernels: * Vanilla kernel 2.6.20 * lxc7 with Dmitry's patchset based on 2.6.20 * L3 network namespace has been removed to do testing * lxc8 with Eric's patchset based on 2.6.20 I didn't do any tests on Linux-Vserver because it is L3 namespace and it is not comparable with the L2 namespace implementation. If anyone is interessted by Linux-Vserver performances, that can be found at http://lxc.sf.net. Roughly, we know there is no performance degradation. For each kernel, several configurations were tested: * vanilla, obviously, only one configuration was tested for reference values. * lxc7, network namespace - compiled out - compiled in - without container - inside a container with ip_forward, route and veth - inside a container with a bridge and veth * lxc8, network namespace - compiled out - compiled in - without container - inside a container with a real network device (eth1 was moved in the container instead of using an etun device) - inside a container with ip_forward, route and etun - inside a container with a bridge and etun Each benchmarking has been done with 2 machines running netperf and tbench. A dedicated machine with a RH4 kernel run the bench servers. For each bench, netperf and tbench, the tests are ran on: * Intel Xeon EM64T, Bi-processor 2,8GHz with hyperthreading activated, 4GB of RAM and Gigabyte NIC (tg3) * AMD Athlon MP 1800+, Bi-processor 1,5GHz, 1GB of RAM and Gigabyte NIC (dl2000) Each tests are run on these machines in order to have a CPU relative overhead. # bench on vanilla === --- | Netperf | CPU usage (%) | Throughput (Mbits/s) | SD (us/KB) | --- | on xeon | 5.99 |941.38|2.084| | on athlon | 28.17 |844.82|5.462| --- --- | Tbench| Throughput (MBytes/s) | --- --- | on xeon | 66.35 | --- | on athlon | 65.31 | --- # bench from Dmitry's patchset == 1 - with net_ns compiled out --- - | Netperf | CPU usage (%) / overhead | Throughput (Mbits/s) / changed | SD (us/KB) | --- --- - | on xeon | 5.93 / -1 % |941.32 / 0 % |2.066| --- - | on athlon | 28.89 / +2.5 % |842.78 / -0.2 % |5.615| - --- - | Tbench| Throughput (MBytes/s) / changed | --- - | on xeon | 67.00 / +0.9 % | - | on athlon | 65.45 / 0 % | - Observation : no noticeable overhead 2 - with net_ns compiled in --- 2.1 - without container --- --- - | Netperf | CPU usage (%) / overhead | Throughput (Mbits/s) / changed | SD (us/KB) | --- --- - | on xeon | 6.23 / +4 % |941.35 / 0 % |2.168| --- - | on athlon | 28.83 / +2.3 % |850.76 / +0.7 % |5.552| - --- - | Tbench| Throughput (MBytes/s) / changed | --- - | on xeon | 67.00 / 0 % | - | on athlon | 65.45 / 0 % | -
Re: L2 network namespace benchmarking (resend with Service Demand)
Daniel Lezcano [EMAIL PROTECTED] writes: Hi, as suggested Rick, I added the Service Demand results to the matrix. Thanks. The latency number is interesting and it confirms what we were seeing looking at cpu usage. We don't have an inexpesive way to get a packet from the outside world to a network namespace. Now if we can figure out why the routing code or the bridging code seems to be performing extra packet copies, perhaps that can be fixed. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Eric W. Biederman wrote: Daniel Lezcano [EMAIL PROTECTED] writes: [...] * When do you expect to have the network namespace into mainline ? My current goal is to finish my rebase against 2.6.linus_lastest in the next couple of days after having figured out how to deal with sysfs. Great news! I also have some questions about this updated version: - Have you integrated the bug fixes and cleanups(*) Daniel wrote for your previous netns patchset (and the few glitches I reported too)? (*) available in LXC8 patchset - Do you already have a public git repository set up for the rebase? - If it is private, any plan to make it public soon? (That would be great) I have been doing reviewing in more code then I know what to do with, and fighting some very strange bugs during the stabilization window. Which has kept me from doing additional development. Plus I have had a cold. I hope you're getting better... and you'll be able to provide us the updated patchset very soon :) [...] If I read the results right it took a 32bit machine from AMD with a gigabit interface before you could measure a throughput difference. That isn't shabby for a non-optimized code path. Indeed the throughput difference is not significant. This is very good to see that it stays constant when using the container. What I'm more worried about is the CPU load increase. But it seems we've identified some of the culprits. This afternoon I had a look at why the bridge setup isn't better than the route setup (section 2.3 and 2.4 of Daniel's report). In the bridge case, we encounter the same problems as the routes case. The oprofile profile is the same: the most demanding routines are pskb_expand_head and csum_partial_copy_generic. pskb_expand_head() is also called by skb_cow(), but this time skb_cow() is called by netfilter's nf_bridge_copy_header(). We can avoid this copy by removing option CONFIG_BRIDGE_NETFILTER. This copy is made even if netfilter is not used on the host. Maybe some optimizations can be made in netfilter's code to prevent this. Regards, Benjamin -- B e n j a m i n T h e r y - BULL/DT/Open Software RD http://www.bull.com - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Benjamin Thery [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Daniel Lezcano [EMAIL PROTECTED] writes: [...] * When do you expect to have the network namespace into mainline ? My current goal is to finish my rebase against 2.6.linus_lastest in the next couple of days after having figured out how to deal with sysfs. Great news! I also have some questions about this updated version: - Have you integrated the bug fixes and cleanups(*) Daniel wrote for your previous netns patchset (and the few glitches I reported too)? About half of them so far. It is my intention to incorporate all of them. They weren't all trivial to include. A couple of Daniel's patches address a real issue in the wrong way so I have to give them some more thought. (*) available in LXC8 patchset - Do you already have a public git repository set up for the rebase? - If it is private, any plan to make it public soon? (That would be great) Yes. Where the current one is now. I have been doing reviewing in more code then I know what to do with, and fighting some very strange bugs during the stabilization window. Which has kept me from doing additional development. Plus I have had a cold. I hope you're getting better... and you'll be able to provide us the updated patchset very soon :) Hopefully. I think I have fixed my last non network regression I know about for 2.6.21-rcX. Which means I can begin to focus again. [...] If I read the results right it took a 32bit machine from AMD with a gigabit interface before you could measure a throughput difference. That isn't shabby for a non-optimized code path. Indeed the throughput difference is not significant. This is very good to see that it stays constant when using the container. What I'm more worried about is the CPU load increase. But it seems we've identified some of the culprits. Yes, and the good news is that they all seem to be in getting the packets to the network namespace. This afternoon I had a look at why the bridge setup isn't better than the route setup (section 2.3 and 2.4 of Daniel's report). In the bridge case, we encounter the same problems as the routes case. The oprofile profile is the same: the most demanding routines are pskb_expand_head and csum_partial_copy_generic. pskb_expand_head() is also called by skb_cow(), but this time skb_cow() is called by netfilter's nf_bridge_copy_header(). We can avoid this copy by removing option CONFIG_BRIDGE_NETFILTER. This copy is made even if netfilter is not used on the host. Maybe some optimizations can be made in netfilter's code to prevent this. Sounds reasonable. I guess the next step is to get some numbers with CONFIG_BRIDGE_NETFILTER disabled. (So we don't hit that case and just in case there are more). I suspect the bridging code has a small enough user base right now it just hasn't been optimized much. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Eric W. Biederman wrote: Daniel Lezcano [EMAIL PROTECTED] writes: 3. General observations --- The objective to have no performances degrations, when the network namespace is off in the kernel, is reached in both solutions. When the network is used outside the container and the network namespace are compiled in, there is no performance degradations. Eric's patchset allows to move network devices between namespaces and this is clearly a good feature, missing in the Dmitry's patchset. This feature helps us to see that the network namespace code does not add overhead when using directly the physical network device into the container. Assuming these results are not contradicted this says that the extra dereference where we need it does not add measurable to the overhead in the Linus network stack. Performance wise this should be good enough to allow merging the code into the linux kernel, as it does not measurably affect networking when we do not have multiple containers in use. I have a few questions about merging code into the linux kernel. * How do you plan to do that ? * When do you expect to have the network namespace into mainline ? * Are Dave Miller and Alexey Kuznetov aware of the network namespace ? * Did they saw your patchset or ever know it exists ? * Do you have any feedbacks from netdev about the network namespace ? Things are good enough that we can even consider not providing an option to compile the support out. The loss of performances is very noticeable inside the container and seems to be directly related to the usage of the pair device and the specific network configuration needed for the container. When the packets are sent by the container, the mac address is for the pair device but the IP address is not owned by the host. That directly implies to have the host to act as a router and the packets to be forwarded. That adds a lot of overhead. Well it adds measurable overhead. A hack has been made in the ip_forward function to avoid useless skb_cow when using the pair device/tunnel device and the overhead is reduced by the half. To be fully satisfactory how we get the packets to the namespace still appears to need work. We have overhead in routing. That may simply be the cost of performing routing or there may be some optimizations opportunities there. We have about the same overhead when performing bridging which I actually find more surprising, as the bridging code should involve less packet handling. Yep. I will try to figure out what is happening. Ideally we can optimize the bridge code or something equivalent to it so that we can take one look at the destination mac address and know which network namespace we should be in. Potentially moving this work to hardware when the hardware supports multiple queues. If we can get the overhead out of the routing code that would be tremendous. However I think it may be more realistic to get the overhead out of the ethernet bridging code where we know we don't need to modify the packet. The routing was optimized for the loopback, no ? Why can't we do the same for the etun device ? - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Daniel Lezcano [EMAIL PROTECTED] writes: Eric W. Biederman wrote: Daniel Lezcano [EMAIL PROTECTED] writes: 3. General observations --- The objective to have no performances degrations, when the network namespace is off in the kernel, is reached in both solutions. When the network is used outside the container and the network namespace are compiled in, there is no performance degradations. Eric's patchset allows to move network devices between namespaces and this is clearly a good feature, missing in the Dmitry's patchset. This feature helps us to see that the network namespace code does not add overhead when using directly the physical network device into the container. Assuming these results are not contradicted this says that the extra dereference where we need it does not add measurable to the overhead in the Linus network stack. Performance wise this should be good enough to allow merging the code into the linux kernel, as it does not measurably affect networking when we do not have multiple containers in use. I have a few questions about merging code into the linux kernel. * How do you plan to do that ? One small comprehensible piece at a time. Basically some variant of etun should not be a problem to merge then I have to get some part of the network namespace code merged, and the concept accepted. Once the basic acceptance occurs it just becomes a long slog of merging more and more patches. * When do you expect to have the network namespace into mainline ? My current goal is to finish my rebase against 2.6.linus_lastest in the next couple of days after having figured out how to deal with sysfs. I have been doing reviewing in more code then I know what to do with, and fighting some very strange bugs during the stabilization window. Which has kept me from doing additional development. Plus I have had a cold. * Are Dave Miller and Alexey Kuznetov aware of the network namespace ? Aware yes, reviewed not yet. I believe Alexey is a little more familiar with the OpenVZ work. The high level concepts still apply. * Did they saw your patchset or ever know it exists ? Yes. * Do you have any feedbacks from netdev about the network namespace ? Not really. Except that Dave Miller wanted to review what I posted last time but the timing was bad and he failed to get around to it. To be fully satisfactory how we get the packets to the namespace still appears to need work. We have overhead in routing. That may simply be the cost of performing routing or there may be some optimizations opportunities there. We have about the same overhead when performing bridging which I actually find more surprising, as the bridging code should involve less packet handling. Yep. I will try to figure out what is happening. Thanks. Ideally we can optimize the bridge code or something equivalent to it so that we can take one look at the destination mac address and know which network namespace we should be in. Potentially moving this work to hardware when the hardware supports multiple queues. If we can get the overhead out of the routing code that would be tremendous. However I think it may be more realistic to get the overhead out of the ethernet bridging code where we know we don't need to modify the packet. The routing was optimized for the loopback, no ? Why can't we do the same for the etun device ? I have no problem with it if we can use valid optimizations. Avoiding a packet copy when the packet is marked as having a second copy somewhere else does not sound like a valid optimization to me. Routing through both network namespaces so that we can set up a dst cache entry that takes you to the final destination I am will to working with. Perhaps something that hits this piece of the etun driver, so we don't have to make a second set of routing decisions. if (skb-dst) skb-dst = dst_pop(skb-dst); /* Allow for smart routing */ tcpdump at any phase of the process should be able to do the right thing. Mostly I care right now in that it is interesting to know where the performance overhead is coming from. Unless it is something of a merge stopper I don't much care about how we are going to fix it yet, especially if it is only cross network namespace traffic. If I read the results right it took a 32bit machine from AMD with a gigabit interface before you could measure a throughput difference. That isn't shabby for a non-optimized code path. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Kirill Korotaev [EMAIL PROTECTED] writes: Ideally we can optimize the bridge code or something equivalent to it so that we can take one look at the destination mac address and know which network namespace we should be in. Potentially moving this work to hardware when the hardware supports multiple queues. yes, we can hack the bridge, so that packets coming out of eth devices can go directly to the container and get out of veth devices from inside the container. If we can get the overhead out of the routing code that would be tremendous. However I think it may be more realistic to get the overhead out of the ethernet bridging code where we know we don't need to modify the packet. Why not optimize both? :) If the optimizations are safe and correct I don't have a problem. When we seem to have multiple copies of a packet in circulation and we skip a what appears to be a required copy on write, I'm dubious. Although the more I look at suggested optimization the less dubious I am as it appears all we are skipping is a ttl decrement and the cow flag exclusively applies to the data chunk and not the header chunk of the packet whatever that means. However we still need to guard against a loop in our routing table setup between multiple guests. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
If I read the results right it took a 32bit machine from AMD with a gigabit interface before you could measure a throughput difference. That isn't shabby for a non-optimized code path. Just some paranoid ramblings - one needs to look beyond just whether or not the performance of a bulk transfer test (eg TCP_STREAM) remains able to hit link-rate. One has to also consider the change in service demand (the normalization of CPU util and throughput). Also, with functionality like TSO in place, the ability to pass very large things down the stack can help cover for a multitude of path-length sins. And with either multiple 1G or 10G NICs becoming more and more prevalent, we have another one of those NIC speed vs CPU speed switch-overs, so maintaining single-NIC 1 gigabit throughput, while necessary, isn't (IMO) sufficient. S, it becomes very important to go beyond just TCP_STREAM tests when evaluating these sorts of things. Another test to run would be the TCP_RR test. TCP_RR with single-byte request/response sizes will bypass the TSO stuff, and the transaction rate will be more directly affected by the change in path length than a TCP_STREAM test. It will also show-up quite clearly in the service demand. Now, with NICs doing interrupt coalescing, if the NIC is strapped poorly (IMO) then you may not see a change in transaction rate - it may be getting limited artifically by the NIC's interrupt coalescing. So, one has to fall-back on service demand, or better yet, disable the interrupt coalescing. Otherwise, measuring peak aggregate request/response becomes necessary. rick jones don't be blinded by bit-rate - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Rick Jones wrote: If I read the results right it took a 32bit machine from AMD with a gigabit interface before you could measure a throughput difference. That isn't shabby for a non-optimized code path. Just some paranoid ramblings - one needs to look beyond just whether or not the performance of a bulk transfer test (eg TCP_STREAM) remains able to hit link-rate. One has to also consider the change in service demand (the normalization of CPU util and throughput). Also, with functionality like TSO in place, the ability to pass very large things down the stack can help cover for a multitude of path-length sins. And with either multiple 1G or 10G NICs becoming more and more prevalent, we have another one of those NIC speed vs CPU speed switch-overs, so maintaining single-NIC 1 gigabit throughput, while necessary, isn't (IMO) sufficient. S, it becomes very important to go beyond just TCP_STREAM tests when evaluating these sorts of things. Another test to run would be the TCP_RR test. TCP_RR with single-byte request/response sizes will bypass the TSO stuff, and the transaction rate will be more directly affected by the change in path length than a TCP_STREAM test. It will also show-up quite clearly in the service demand. Now, with NICs doing interrupt coalescing, if the NIC is strapped poorly (IMO) then you may not see a change in transaction rate - it may be getting limited artifically by the NIC's interrupt coalescing. So, one has to fall-back on service demand, or better yet, disable the interrupt coalescing. Otherwise, measuring peak aggregate request/response becomes necessary. rick jones don't be blinded by bit-rate Thanks Rick, Do you have any pointer to help on benchmarking the network, perhaps a checklist or some scripts for netperf ? Regards. -- Daniel - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Do you have any pointer to help on benchmarking the network, perhaps a checklist or some scripts for netperf ? There are some scripts in doc/examples but they are probably a bit long in the tooth by now. The main writeup _I_ have on netperf would be the manual, which was recently updated for the 2.4.3 release. http://www.netperf.org/svn/netperf2/tags/netperf-2.4.3/doc/netperf.html or the current top of trunk: http://www.netperf.org/svn/netperf2/trunk/doc/netperf.html There is also a [EMAIL PROTECTED] mailing list which one can join and have discussions about netperf, and a [EMAIL PROTECTED] if one wants to discuss actual netperf (netperf2 or netperf4) development. rick jones - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
L2 network namespace benchmarking
Hi, I did some benchmarking on the existing L2 network namespaces. These patches are included in the lxc patchset at: http://lxc.sourceforge.net/patches/2.6.20 The lxc7 patchset series contains Dmitry's patchset The lxc8 patchset series contains Eric's patchset Here are the following scenarii I made in order to do some simple benchmarking on the network namespace. I tested three kernels: * Vanilla kernel 2.6.20 * lxc7 with Dmitry's patchset based on 2.6.20 * L3 network namespace has been removed to do testing * lxc8 with Eric's patchset based on 2.6.20 I didn't do any tests on Linux-Vserver because it is L3 namespace and it is not comparable with the L2 namespace implementation. If anyone is interessted by Linux-Vserver performances, that can be found at http://lxc.sf.net. Roughly, we know there is no performance degradation. For each kernel, several configurations were tested: * vanilla, obviously, only one configuration was tested for reference values. * lxc7, network namespace - compiled out - compiled in - without container - inside a container with ip_forward, route and veth - inside a container with a bridge and veth * lxc8, network namespace - compiled out - compiled in - without container - inside a container with a real network device (eth1 was moved in the container instead of using an etun device) - inside a container with ip_forward, route and etun - inside a container with a bridge and etun Each benchmarking has been done with 2 machines running netperf and tbench. A dedicated machine with a RH4 kernel run the bench servers. For each bench, netperf and tbench, the tests are ran on: * Intel Xeon EM64T, Bi-processor 2,8GHz with hyperthreading activated, 4GB of RAM and Gigabyte NIC (tg3) * AMD Athlon MP 1800+, Bi-processor 1,5GHz, 1GB of RAM and Gigabyte NIC (dl2000) Each tests are run on these machines in order to have a CPU relative overhead. # bench on vanilla === --- -- | Netperf | CPU usage (%) | Throughput (Mbits/s) | --- -- | on xeon | 5.99 |941.38| -- | on athlon | 28.17 |844.82| -- --- --- | Tbench| Throughput (MBytes/s) | --- --- | on xeon | 66.35 | --- | on athlon | 65.31 | --- # bench from Dmitry's patchset == 1 - with net_ns compiled out --- --- | Netperf | CPU usage (%) / overhead | Throughput (Mbits/s) / changed | --- --- | on xeon | 5.93 / -1 % |941.32 / 0 % | --- | on athlon | 28.89 / +2.5 % |842.78 / -0.2 % | --- --- - | Tbench| Throughput (MBytes/s) / changed | --- - | on xeon | 67.00 / +0.9 % | - | on athlon | 65.45 / 0 % | - Observation : no noticeable overhead 2 - with net_ns compiled in --- 2.1 - without container --- --- --- | Netperf | CPU usage (%) / overhead | Throughput (Mbits/s) / changed | --- --- | on xeon | 6.23 / +4 % |941.35 / 0 % | --- | on athlon | 28.83 / +2.3 % |850.76 / +0.7 % | --- --- - | Tbench| Throughput (MBytes/s) / changed | --- - | on xeon | 67.00 / 0 % | - | on athlon | 65.45 / 0 % | - Observation : no noticeable overhead 2.2 - inside the container with veth and routes --- --- --- | Netperf | CPU usage (%) / overhead | Throughput (Mbits/s) / changed | --- --- | on xeon | 17.14 / +186.1 % |
Re: L2 network namespace benchmarking
On Wed, Mar 28, 2007 at 12:16:34AM +0200, Daniel Lezcano wrote: Hi, I did some benchmarking on the existing L2 network namespaces. These patches are included in the lxc patchset at: http://lxc.sourceforge.net/patches/2.6.20 The lxc7 patchset series contains Dmitry's patchset The lxc8 patchset series contains Eric's patchset Here are the following scenarii I made in order to do some simple benchmarking on the network namespace. I tested three kernels: * Vanilla kernel 2.6.20 * lxc7 with Dmitry's patchset based on 2.6.20 * L3 network namespace has been removed to do testing * lxc8 with Eric's patchset based on 2.6.20 I didn't do any tests on Linux-Vserver because it is L3 namespace and it is not comparable with the L2 namespace implementation. If anyone is interessted by Linux-Vserver performances, that can be found at http://lxc.sf.net. Roughly, we know there is no performance degradation. For each kernel, several configurations were tested: * vanilla, obviously, only one configuration was tested for reference values. * lxc7, network namespace - compiled out - compiled in - without container - inside a container with ip_forward, route and veth - inside a container with a bridge and veth * lxc8, network namespace - compiled out - compiled in - without container - inside a container with a real network device (eth1 was moved in the container instead of using an etun device) - inside a container with ip_forward, route and etun - inside a container with a bridge and etun Each benchmarking has been done with 2 machines running netperf and tbench. A dedicated machine with a RH4 kernel run the bench servers. For each bench, netperf and tbench, the tests are ran on: * Intel Xeon EM64T, Bi-processor 2,8GHz with hyperthreading activated, 4GB of RAM and Gigabyte NIC (tg3) * AMD Athlon MP 1800+, Bi-processor 1,5GHz, 1GB of RAM and Gigabyte NIC (dl2000) Each tests are run on these machines in order to have a CPU relative overhead. # bench on vanilla === --- -- | Netperf | CPU usage (%) | Throughput (Mbits/s) | --- -- | on xeon | 5.99 |941.38| -- | on athlon | 28.17 |844.82| -- --- --- | Tbench| Throughput (MBytes/s) | --- --- | on xeon | 66.35 | --- | on athlon | 65.31 | --- # bench from Dmitry's patchset == 1 - with net_ns compiled out --- --- | Netperf | CPU usage (%) / overhead | Throughput (Mbits/s) / changed | --- --- | on xeon | 5.93 / -1 % |941.32 / 0 % | --- | on athlon | 28.89 / +2.5 % |842.78 / -0.2 % | --- --- - | Tbench| Throughput (MBytes/s) / changed | --- - | on xeon | 67.00 / +0.9 % | - | on athlon | 65.45 / 0 % | - Observation : no noticeable overhead 2 - with net_ns compiled in --- 2.1 - without container --- --- --- | Netperf | CPU usage (%) / overhead | Throughput (Mbits/s) / changed | --- --- | on xeon | 6.23 / +4 % |941.35 / 0 % | --- | on athlon | 28.83 / +2.3 % |850.76 / +0.7 % | --- --- - | Tbench| Throughput (MBytes/s) / changed | --- - | on xeon | 67.00 / 0 % | - | on athlon | 65.45 / 0 % | - Observation : no noticeable overhead 2.2 - inside the container with veth and routes --- ---
Re: L2 network namespace benchmarking
Daniel Lezcano [EMAIL PROTECTED] writes: 3. General observations --- The objective to have no performances degrations, when the network namespace is off in the kernel, is reached in both solutions. When the network is used outside the container and the network namespace are compiled in, there is no performance degradations. Eric's patchset allows to move network devices between namespaces and this is clearly a good feature, missing in the Dmitry's patchset. This feature helps us to see that the network namespace code does not add overhead when using directly the physical network device into the container. Assuming these results are not contradicted this says that the extra dereference where we need it does not add measurable to the overhead in the Linus network stack. Performance wise this should be good enough to allow merging the code into the linux kernel, as it does not measurably affect networking when we do not have multiple containers in use. Things are good enough that we can even consider not providing an option to compile the support out. The loss of performances is very noticeable inside the container and seems to be directly related to the usage of the pair device and the specific network configuration needed for the container. When the packets are sent by the container, the mac address is for the pair device but the IP address is not owned by the host. That directly implies to have the host to act as a router and the packets to be forwarded. That adds a lot of overhead. Well it adds measurable overhead. A hack has been made in the ip_forward function to avoid useless skb_cow when using the pair device/tunnel device and the overhead is reduced by the half. To be fully satisfactory how we get the packets to the namespace still appears to need work. We have overhead in routing. That may simply be the cost of performing routing or there may be some optimizations opportunities there. We have about the same overhead when performing bridging which I actually find more surprising, as the bridging code should involve less packet handling. Ideally we can optimize the bridge code or something equivalent to it so that we can take one look at the destination mac address and know which network namespace we should be in. Potentially moving this work to hardware when the hardware supports multiple queues. If we can get the overhead out of the routing code that would be tremendous. However I think it may be more realistic to get the overhead out of the ethernet bridging code where we know we don't need to modify the packet. Eric - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html
Re: L2 network namespace benchmarking
Herbert Poetzl wrote: On Wed, Mar 28, 2007 at 12:16:34AM +0200, Daniel Lezcano wrote: Hi, [ cut ] 3. General observations --- The objective to have no performances degrations, when the network namespace is off in the kernel, is reached in both solutions. When the network is used outside the container and the network namespace are compiled in, there is no performance degradations. Eric's patchset allows to move network devices between namespaces and this is clearly a good feature, missing in the Dmitry's patchset. This feature helps us to see that the network namespace code does not add overhead when using directly the physical network device into the container. The loss of performances is very noticeable inside the container and seems to be directly related to the usage of the pair device and the specific network configuration needed for the container. When the packets are sent by the container, the mac address is for the pair device but the IP address is not owned by the host. That directly implies to have the host to act as a router and the packets to be forwarded. That adds a lot of overhead. A hack has been made in the ip_forward function to avoid useless skb_cow when using the pair device/tunnel device and the overhead is reduced by the half. would it be possible to do some tests regarding scalability? i.e. I would be interested how the following would look like: 10 connections on a single host (in parallel, overall performance) 10 connections from the same net space 10 connections from 10 different net spaces (i.e. one connection from each space) we can assume that L3 isolation will give similar results to the first case, but if needed, we can provide a patch to test this too ... Ok. Assuming, Eric's and Dmitry's patchset are very similar, I will focus on the Eric's patchset because it is more mature and more easy to setup. I will have a look on the bridge optimization before doing that. PS: great work! tx! Thanks. - To unsubscribe from this list: send the line unsubscribe netdev in the body of a message to [EMAIL PROTECTED] More majordomo info at http://vger.kernel.org/majordomo-info.html