Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework
+1 -Maciek On 28 Jul 2018, at 13:28, Damjan Marion mailto:dmar...@me.com>> wrote: Dear All, My personal preference is that make test framework implements cpu assignment code. It should't be rocket science to parse /sys/devices/system/cpu/online and give one cpu to each instance. It will also help to test framework to understand how many parallel jobs it can run... Enforcing single cpu assignment in vpp is done intentionally, to avoid cross numa memory allocation. If main-core is not specified, vpp simply uses cpu core 1 (unless only 0 exists). While adding something like "cpu { main-core any} " should be quite straight forward, it will have broken behaviour when dpdk is loaded and it will just confuse people. Also, we will need to come back to the drawing board when we decide to run multiple workers in make test, as logic there is more complex and will likely require rework of the thread placement code. -- Damjan On 27 Jul 2018, at 20:46, Peter Mikus via Lists.Fd.Io mailto:pmikus=cisco@lists.fd.io>> wrote: Hello, > What is the “significant problem” you’re running into? The problem can be better described as: When python is spawning N instances of VPP process, all processes are from unknown reason placed with affinity 0x2 (bin 10). This can be verified by taskset –p . CFS is then placing all VPP process to the same core, making it inefficient on multicore jenkins slave container. The default vpp startup.conf is not modified thus there is no input to know where to pin the vpp threads. Simply one can said or think that this is related to python multiprocess/subprocess.popen code, which is hard-setting affinity mask to 0x2. There are multiple solutions for workaround that Juraj proposed or Maciek, but none of them is answering why is this happening. Peter Mikus Engineer – Software Cisco Systems Limited From: csit-...@lists.fd.io<mailto:csit-...@lists.fd.io> [mailto:csit-...@lists.fd.io] On Behalf Of Maciek Konstantynowicz (mkonstan) via Lists.Fd.Io Sent: Friday, July 27, 2018 6:53 PM To: Alec Hothan (ahothan) mailto:ahot...@cisco.com>>; Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> Cc: csit-...@lists.fd.io<mailto:csit-...@lists.fd.io> Subject: Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework Alec, This is about make test and not real packet forwarding. Per Juraj’s patch [1] Juraj, My understanding is that if you’re starting VPP without specifying core placement in startup.conf [2] cpu {..}, then Linux CFS will be placing the threads onto available cpu core resources. If you’re saying this is not the case, and indeed the wiki comment indicates this, then the way to address it is to specify different core for main.c thread per vpp instance. What is the “significant problem” you’re running into? Are tests not executing in parallel using python multiprocessing, are vpp’s having issues, else? Could you describe it a bit more? -Maciek [1] https://gerrit.fd.io/r/#/c/13491/ [2] https://git.fd.io/vpp/tree/src/vpp/conf/startup.conf On 27 Jul 2018, at 17:23, Alec Hothan (ahothan) mailto:ahot...@cisco.com>> wrote: Hi Juraj, How many instances and what level of performance are you looking at? Even if you assign different cores to each VPP instance, results can be skewed due to interference at the LLC and PCIe/NIC level (this can be somewhat mitigated by running on separate sockets) Alec From: mailto:vpp-dev@lists.fd.io>> on behalf of Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> Date: Friday, July 27, 2018 at 7:25 AM To: "Maciek Konstantynowicz (mkonstan)" mailto:mkons...@cisco.com>> Cc: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" mailto:vpp-dev@lists.fd.io>>, csit-dev mailto:csit-...@lists.fd.io>> Subject: Re: [vpp-dev] Parallel test execution in VPP Test Framework Hi Maciek and vpp-devs, I've run into a significant problem regarding VPP assignment to cores. All VPPs that are spawned are assigned to core 1. I looked at https://wiki.fd.io/view/VPP/Command-line_Arguments and I guess it's because that's the default behavior of VPP (dpdk coremask is not configured and Note that the "main" thread always occupies the lowest core-id specified in the DPDK [process-level] coremask."). Is my reading of the config options accurate? Obviously, all VPP instances running on the same core goes against running the tests on multiple cores. There are a couple of solutions that come to mind: • Assign VPP instances to cores manually. With possible multiple jobs running on a given host, this creates a situation where the different jobs don't know cores are already occupied (and by how many VPP instances) and thus introduces additional challenges to solve. • Add an option to override this default behavior and let the Linux CFS scheduler assign VPPs to cores or s
Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework
Hi, A couple of corrections/additions: Python spawns processes with proper CFS scheduling (I've tested this), so it's VPP that's overriding CFS scheduling. Damjan, assigning cpus to VPPs is not the problem. The problem is when multiple make test frameworks in different Jenkins slaves try to do the same thing - e.g. when a framework assigns any n cores to different VPP instances, the other framework instances running in other Jenkins slaves don't know which cores are currently assigned to how many VPPs. I guess I could parse this from all running VPP pids, then looking at their affinity and assign cores based on that, but I wanted to know about other approaches. I'll look into this in the meantime. VPP Test Framework doesn't load the dpdk plugin, does it make sense to use CFS scheduler by default when it isn't loaded? Or maybe just use the CFS scheduler by default when dpdk plugin is not loaded and no workers are used? Are there plans for running multiple workers in make test? I don't see that in the framework at the moment, but maybe I'm missing something. Thanks, Juraj From: Damjan Marion [mailto:dmar...@me.com] Sent: Saturday, July 28, 2018 1:28 PM To: Peter Mikus -X (pmikus - PANTHEON TECHNOLOGIES at Cisco) Cc: Maciek Konstantynowicz ; Alec Hothan (ahothan) ; Juraj Linkeš ; vpp-dev@lists.fd.io Subject: Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework Dear All, My personal preference is that make test framework implements cpu assignment code. It should't be rocket science to parse /sys/devices/system/cpu/online and give one cpu to each instance. It will also help to test framework to understand how many parallel jobs it can run... Enforcing single cpu assignment in vpp is done intentionally, to avoid cross numa memory allocation. If main-core is not specified, vpp simply uses cpu core 1 (unless only 0 exists). While adding something like "cpu { main-core any} " should be quite straight forward, it will have broken behaviour when dpdk is loaded and it will just confuse people. Also, we will need to come back to the drawing board when we decide to run multiple workers in make test, as logic there is more complex and will likely require rework of the thread placement code. -- Damjan On 27 Jul 2018, at 20:46, Peter Mikus via Lists.Fd.Io mailto:pmikus=cisco@lists.fd.io>> wrote: Hello, > What is the “significant problem” you’re running into? The problem can be better described as: When python is spawning N instances of VPP process, all processes are from unknown reason placed with affinity 0x2 (bin 10). This can be verified by taskset –p . CFS is then placing all VPP process to the same core, making it inefficient on multicore jenkins slave container. The default vpp startup.conf is not modified thus there is no input to know where to pin the vpp threads. Simply one can said or think that this is related to python multiprocess/subprocess.popen code, which is hard-setting affinity mask to 0x2. There are multiple solutions for workaround that Juraj proposed or Maciek, but none of them is answering why is this happening. Peter Mikus Engineer – Software Cisco Systems Limited From: csit-...@lists.fd.io<mailto:csit-...@lists.fd.io> [mailto:csit-...@lists.fd.io] On Behalf Of Maciek Konstantynowicz (mkonstan) via Lists.Fd.Io Sent: Friday, July 27, 2018 6:53 PM To: Alec Hothan (ahothan) mailto:ahot...@cisco.com>>; Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> Cc: csit-...@lists.fd.io<mailto:csit-...@lists.fd.io> Subject: Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework Alec, This is about make test and not real packet forwarding. Per Juraj’s patch [1] Juraj, My understanding is that if you’re starting VPP without specifying core placement in startup.conf [2] cpu {..}, then Linux CFS will be placing the threads onto available cpu core resources. If you’re saying this is not the case, and indeed the wiki comment indicates this, then the way to address it is to specify different core for main.c thread per vpp instance. What is the “significant problem” you’re running into? Are tests not executing in parallel using python multiprocessing, are vpp’s having issues, else? Could you describe it a bit more? -Maciek [1] https://gerrit.fd.io/r/#/c/13491/ [2] https://git.fd.io/vpp/tree/src/vpp/conf/startup.conf On 27 Jul 2018, at 17:23, Alec Hothan (ahothan) mailto:ahot...@cisco.com>> wrote: Hi Juraj, How many instances and what level of performance are you looking at? Even if you assign different cores to each VPP instance, results can be skewed due to interference at the LLC and PCIe/NIC level (this can be somewhat mitigated by running on separate sockets) Alec From: mailto:vpp-dev@lists.fd.io>> on behalf of Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> Date: Friday, July 27, 2018 at 7
Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework
Dear All, My personal preference is that make test framework implements cpu assignment code. It should't be rocket science to parse /sys/devices/system/cpu/online and give one cpu to each instance. It will also help to test framework to understand how many parallel jobs it can run... Enforcing single cpu assignment in vpp is done intentionally, to avoid cross numa memory allocation. If main-core is not specified, vpp simply uses cpu core 1 (unless only 0 exists). While adding something like "cpu { main-core any} " should be quite straight forward, it will have broken behaviour when dpdk is loaded and it will just confuse people. Also, we will need to come back to the drawing board when we decide to run multiple workers in make test, as logic there is more complex and will likely require rework of the thread placement code. -- Damjan > On 27 Jul 2018, at 20:46, Peter Mikus via Lists.Fd.Io > wrote: > > Hello, > > Ø What is the “significant problem” you’re running into? > > The problem can be better described as: When python is spawning N instances > of VPP process, all processes are from unknown reason placed with affinity > 0x2 (bin 10). This can be verified by taskset –p . CFS is then > placing all VPP process to the same core, making it inefficient on multicore > jenkins slave container. > The default vpp startup.conf is not modified thus there is no input to know > where to pin the vpp threads. Simply one can said or think that this is > related to python multiprocess/subprocess.popen code, which is hard-setting > affinity mask to 0x2. > > There are multiple solutions for workaround that Juraj proposed or Maciek, > but none of them is answering why is this happening. > > Peter Mikus > Engineer – Software > Cisco Systems Limited > > From: csit-...@lists.fd.io <mailto:csit-...@lists.fd.io> > [mailto:csit-...@lists.fd.io <mailto:csit-...@lists.fd.io>] On Behalf Of > Maciek Konstantynowicz (mkonstan) via Lists.Fd.Io > Sent: Friday, July 27, 2018 6:53 PM > To: Alec Hothan (ahothan) mailto:ahot...@cisco.com>>; > Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> > Cc: csit-...@lists.fd.io <mailto:csit-...@lists.fd.io> > Subject: Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test > Framework > > Alec, This is about make test and not real packet forwarding. Per Juraj’s > patch [1] > > Juraj, My understanding is that if you’re starting VPP without specifying > core placement in startup.conf [2] cpu {..}, then Linux CFS will be placing > the threads onto available cpu core resources. If you’re saying this is not > the case, and indeed the wiki comment indicates this, then the way to address > it is to specify different core for main.c thread per vpp instance. > > What is the “significant problem” you’re running into? Are tests not > executing in parallel using python multiprocessing, are vpp’s having issues, > else? Could you describe it a bit more? > > -Maciek > > [1] https://gerrit.fd.io/r/#/c/13491/ <https://gerrit.fd.io/r/#/c/13491/> > [2] https://git.fd.io/vpp/tree/src/vpp/conf/startup.conf > <https://git.fd.io/vpp/tree/src/vpp/conf/startup.conf> > > > On 27 Jul 2018, at 17:23, Alec Hothan (ahothan) <mailto:ahot...@cisco.com>> wrote: > > Hi Juraj, > How many instances and what level of performance are you looking at? > Even if you assign different cores to each VPP instance, results can be > skewed due to interference at the LLC and PCIe/NIC level (this can be > somewhat mitigated by running on separate sockets) > >Alec > > > From: mailto:vpp-dev@lists.fd.io>> on behalf of Juraj > Linkeš mailto:juraj.lin...@pantheon.tech>> > Date: Friday, July 27, 2018 at 7:25 AM > To: "Maciek Konstantynowicz (mkonstan)" <mailto:mkons...@cisco.com>> > Cc: "vpp-dev@lists.fd.io <mailto:vpp-dev@lists.fd.io>" <mailto:vpp-dev@lists.fd.io>>, csit-dev <mailto:csit-...@lists.fd.io>> > Subject: Re: [vpp-dev] Parallel test execution in VPP Test Framework > > Hi Maciek and vpp-devs, > > I've run into a significant problem regarding VPP assignment to cores. All > VPPs that are spawned are assigned to core 1. I looked at > https://wiki.fd.io/view/VPP/Command-line_Arguments > <https://wiki.fd.io/view/VPP/Command-line_Arguments> and I guess it's because > that's the default behavior of VPP (dpdk coremask is not configured and Note > that the "main" thread always occupies the lowest core-id specified in the > DPDK [process-level] coremask."). > > Is my reading of the config options accurate? > > Obviously, all VPP inst
Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework
Guys, The current behavior is probably not what the code author intended. It’s easy to change. I’ve already offered under separate cover to push a draft patch which won’t do this under any circumstances, so folks can kick the tires. In its final form, a command-line argument of the form “... cpu { no-thread-affinity } ...” HTH... Dave From: vpp-dev@lists.fd.io On Behalf Of Peter Mikus via Lists.Fd.Io Sent: Friday, July 27, 2018 2:46 PM To: Maciek Konstantynowicz (mkonstan) ; Alec Hothan (ahothan) ; Juraj Linkeš ; vpp-dev@lists.fd.io Cc: vpp-dev@lists.fd.io Subject: Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework Hello, * What is the “significant problem” you’re running into? The problem can be better described as: When python is spawning N instances of VPP process, all processes are from unknown reason placed with affinity 0x2 (bin 10). This can be verified by taskset –p . CFS is then placing all VPP process to the same core, making it inefficient on multicore jenkins slave container. The default vpp startup.conf is not modified thus there is no input to know where to pin the vpp threads. Simply one can said or think that this is related to python multiprocess/subprocess.popen code, which is hard-setting affinity mask to 0x2. There are multiple solutions for workaround that Juraj proposed or Maciek, but none of them is answering why is this happening. Peter Mikus Engineer – Software Cisco Systems Limited From: csit-...@lists.fd.io<mailto:csit-...@lists.fd.io> [mailto:csit-...@lists.fd.io] On Behalf Of Maciek Konstantynowicz (mkonstan) via Lists.Fd.Io Sent: Friday, July 27, 2018 6:53 PM To: Alec Hothan (ahothan) mailto:ahot...@cisco.com>>; Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> Cc: csit-...@lists.fd.io<mailto:csit-...@lists.fd.io> Subject: Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework Alec, This is about make test and not real packet forwarding. Per Juraj’s patch [1] Juraj, My understanding is that if you’re starting VPP without specifying core placement in startup.conf [2] cpu {..}, then Linux CFS will be placing the threads onto available cpu core resources. If you’re saying this is not the case, and indeed the wiki comment indicates this, then the way to address it is to specify different core for main.c thread per vpp instance. What is the “significant problem” you’re running into? Are tests not executing in parallel using python multiprocessing, are vpp’s having issues, else? Could you describe it a bit more? -Maciek [1] https://gerrit.fd.io/r/#/c/13491/ [2] https://git.fd.io/vpp/tree/src/vpp/conf/startup.conf On 27 Jul 2018, at 17:23, Alec Hothan (ahothan) mailto:ahot...@cisco.com>> wrote: Hi Juraj, How many instances and what level of performance are you looking at? Even if you assign different cores to each VPP instance, results can be skewed due to interference at the LLC and PCIe/NIC level (this can be somewhat mitigated by running on separate sockets) Alec From: mailto:vpp-dev@lists.fd.io>> on behalf of Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> Date: Friday, July 27, 2018 at 7:25 AM To: "Maciek Konstantynowicz (mkonstan)" mailto:mkons...@cisco.com>> Cc: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" mailto:vpp-dev@lists.fd.io>>, csit-dev mailto:csit-...@lists.fd.io>> Subject: Re: [vpp-dev] Parallel test execution in VPP Test Framework Hi Maciek and vpp-devs, I've run into a significant problem regarding VPP assignment to cores. All VPPs that are spawned are assigned to core 1. I looked at https://wiki.fd.io/view/VPP/Command-line_Arguments and I guess it's because that's the default behavior of VPP (dpdk coremask is not configured and Note that the "main" thread always occupies the lowest core-id specified in the DPDK [process-level] coremask."). Is my reading of the config options accurate? Obviously, all VPP instances running on the same core goes against running the tests on multiple cores. There are a couple of solutions that come to mind: • Assign VPP instances to cores manually. With possible multiple jobs running on a given host, this creates a situation where the different jobs don't know cores are already occupied (and by how many VPP instances) and thus introduces additional challenges to solve. • Add an option to override this default behavior and let the Linux CFS scheduler assign VPPs to cores or something similar where VPPs would land on different cores. Is there some other solution? Vpp-devs, what do you think about the second solution? What it be possible? Thanks, Juraj From: Maciek Konstantynowicz (mkonstan) [mailto:mkons...@cisco.com] Sent: Wednesday, July 25, 2018 1:10 PM To: Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; csit-
Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework
Hello, Ø What is the “significant problem” you’re running into? The problem can be better described as: When python is spawning N instances of VPP process, all processes are from unknown reason placed with affinity 0x2 (bin 10). This can be verified by taskset –p . CFS is then placing all VPP process to the same core, making it inefficient on multicore jenkins slave container. The default vpp startup.conf is not modified thus there is no input to know where to pin the vpp threads. Simply one can said or think that this is related to python multiprocess/subprocess.popen code, which is hard-setting affinity mask to 0x2. There are multiple solutions for workaround that Juraj proposed or Maciek, but none of them is answering why is this happening. Peter Mikus Engineer – Software Cisco Systems Limited From: csit-...@lists.fd.io [mailto:csit-...@lists.fd.io] On Behalf Of Maciek Konstantynowicz (mkonstan) via Lists.Fd.Io Sent: Friday, July 27, 2018 6:53 PM To: Alec Hothan (ahothan) ; Juraj Linkeš Cc: csit-...@lists.fd.io Subject: Re: [csit-dev] [vpp-dev] Parallel test execution in VPP Test Framework Alec, This is about make test and not real packet forwarding. Per Juraj’s patch [1] Juraj, My understanding is that if you’re starting VPP without specifying core placement in startup.conf [2] cpu {..}, then Linux CFS will be placing the threads onto available cpu core resources. If you’re saying this is not the case, and indeed the wiki comment indicates this, then the way to address it is to specify different core for main.c thread per vpp instance. What is the “significant problem” you’re running into? Are tests not executing in parallel using python multiprocessing, are vpp’s having issues, else? Could you describe it a bit more? -Maciek [1] https://gerrit.fd.io/r/#/c/13491/ [2] https://git.fd.io/vpp/tree/src/vpp/conf/startup.conf On 27 Jul 2018, at 17:23, Alec Hothan (ahothan) mailto:ahot...@cisco.com>> wrote: Hi Juraj, How many instances and what level of performance are you looking at? Even if you assign different cores to each VPP instance, results can be skewed due to interference at the LLC and PCIe/NIC level (this can be somewhat mitigated by running on separate sockets) Alec From: mailto:vpp-dev@lists.fd.io>> on behalf of Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> Date: Friday, July 27, 2018 at 7:25 AM To: "Maciek Konstantynowicz (mkonstan)" mailto:mkons...@cisco.com>> Cc: "vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>" mailto:vpp-dev@lists.fd.io>>, csit-dev mailto:csit-...@lists.fd.io>> Subject: Re: [vpp-dev] Parallel test execution in VPP Test Framework Hi Maciek and vpp-devs, I've run into a significant problem regarding VPP assignment to cores. All VPPs that are spawned are assigned to core 1. I looked at https://wiki.fd.io/view/VPP/Command-line_Arguments and I guess it's because that's the default behavior of VPP (dpdk coremask is not configured and Note that the "main" thread always occupies the lowest core-id specified in the DPDK [process-level] coremask."). Is my reading of the config options accurate? Obviously, all VPP instances running on the same core goes against running the tests on multiple cores. There are a couple of solutions that come to mind: • Assign VPP instances to cores manually. With possible multiple jobs running on a given host, this creates a situation where the different jobs don't know cores are already occupied (and by how many VPP instances) and thus introduces additional challenges to solve. • Add an option to override this default behavior and let the Linux CFS scheduler assign VPPs to cores or something similar where VPPs would land on different cores. Is there some other solution? Vpp-devs, what do you think about the second solution? What it be possible? Thanks, Juraj From: Maciek Konstantynowicz (mkonstan) [mailto:mkons...@cisco.com] Sent: Wednesday, July 25, 2018 1:10 PM To: Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> Cc: vpp-dev@lists.fd.io<mailto:vpp-dev@lists.fd.io>; csit-dev mailto:csit-...@lists.fd.io>> Subject: Re: [vpp-dev] Parallel test execution in VPP Test Framework On 19 Jul 2018, at 15:44, Juraj Linkeš mailto:juraj.lin...@pantheon.tech>> wrote: Hi VPP devs, I'm implementing parallel test execution of tests in VPP Test Framework (the patch is here https://gerrit.fd.io/r/#/c/13491/) and the last big outstanding question is how scalable the parallelization actually is. That’s a good question. What do the tests say? :) The tests are spawning one VPP instance per each VPPTestCase class How many VPP instances are spawned and run in parallel? Cause assuming there is at least one VPPTestCase class per test_, that’s 70 VPP instances .. and the question is - how do the required compute resources per each VPP instance (cpu,