Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration
> -Original Message- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Monday, February 26, 2018 10:43 PM > To: Igor Mammedov; Tan, Jianfeng > Cc: Jason Wang; Maxime Coquelin; qemu-devel@nongnu.org; Michael S . > Tsirkin > Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as > migration > > On 26/02/2018 13:55, Igor Mammedov wrote: > >>> So how about just adding a new option --mem-share to decide if that's a > >>> private memory or shared memory? That seems much straightforward > way > > Above options are legacy (which we can't remove for compat reasons), > > their replacement is 'memory-backend-file' backend which has all of > > the above including 'share' property. > > More precisely, we have added "-object memory-backend-file" to avoid > proliferation of options related to memory. Besides unifying the cases > of 1 and >1 NUMA node, using -object also has the advantage of > supporting memory hotplug. > > You wrote "I find adding a backend for nonnuma pc RAM is roundabout way" > but basically the command line says "this VM has only one NUMA node, > backed by this memory object" which is a precise description of what the > VM memory looks like. > > > So just add 'memdev' property to machine and reuse memory-backend-file > > with it instead of duplicating functionality in the legacy code. > > That would however also have a different RAMBlock id, effectively > producing the same output as "-numa node,memdev=...". Is it possible that we force that RAMBlock id to be "pc.ram"? > > I think this should be solved at the libvirt level. Libvirt should > write in the migration XML cookie whether the VM is using -object or > -mem-path to declare its memory, and newly-started VMs should always use > -object. This won't fix the problem for VMs that are already running, > but it will fix it the next time they are started. In this case, we return to the start point :-) Thanks, Jianfeng
Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration
> -Original Message- > From: Igor Mammedov [mailto:imamm...@redhat.com] > Sent: Monday, February 26, 2018 8:56 PM > To: Tan, Jianfeng > Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu-devel@nongnu.org; > Michael S . Tsirkin > Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as > migration > > On Sat, 24 Feb 2018 03:11:30 + > "Tan, Jianfeng" wrote: > > > > -Original Message- > > > From: Tan, Jianfeng > > > Sent: Saturday, February 24, 2018 11:08 AM > > > To: 'Igor Mammedov' > > > Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu- > de...@nongnu.org; > > > Michael S . Tsirkin > > > Subject: RE: [Qemu-devel] [RFC] exec: eliminate ram naming issue as > > > migration > > > > > > Hi Igor and all, > > > > > > > -Original Message- > > > > From: Igor Mammedov [mailto:imamm...@redhat.com] > > > > Sent: Thursday, February 8, 2018 7:30 PM > > > > To: Tan, Jianfeng > > > > Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu- > > > de...@nongnu.org; > > > > Michael S . Tsirkin > > > > Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as > > > > migration > > > > > > > [...] > > > > > > It could be solved by adding memdev option to machine, > > > > > > which would allow to specify backend object. And then on > > > > > > top make -mem-path alias new option to clean thing up. > > > > > > > > > > Do you mean? > > > > > > > > > > src vm: -m xG > > > > > dst vm: -m xG,memdev=pc.ram -object memory-backend- > > > file,id=pc.ram,size=xG,mem-path=xxx,share=on ... > > > > Yep, I've meant something like it > > > > > > > > src vm: -m xG,memdev=SHARED_RAM -object memory-backend- > > > file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on > > > > dst vm: -m xG,memdev=SHARED_RAM -object memory-backend- > > > file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on > > > > > > After a second thought, I find adding a backend for nonnuma pc RAM is > > > roundabout way. > > > > > > And we actually have an existing way to add a file-backed RAM: commit > > > c902760fb25f ("Add option to use file backed guest memory"). Basically, > this > > > commit adds two options, --mem-path and --mem-prealloc, without > specify > > > a backend explicitly. > > > > > > So how about just adding a new option --mem-share to decide if that's a > > > private memory or shared memory? That seems much straightforward > way > Above options are legacy (which we can't remove for compat reasons), > their replacement is 'memory-backend-file' backend which has all of > the above including 'share' property. OK, such options are legacy. I've no idea of that. Thanks! That makes sense. > > So just add 'memdev' property to machine and reuse memory-backend-file > with it instead of duplicating functionality in the legacy code. To "-m" or "-machine"?
Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration
> -Original Message- > From: Tan, Jianfeng > Sent: Saturday, February 24, 2018 11:08 AM > To: 'Igor Mammedov' > Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu-devel@nongnu.org; > Michael S . Tsirkin > Subject: RE: [Qemu-devel] [RFC] exec: eliminate ram naming issue as > migration > > Hi Igor and all, > > > -Original Message- > > From: Igor Mammedov [mailto:imamm...@redhat.com] > > Sent: Thursday, February 8, 2018 7:30 PM > > To: Tan, Jianfeng > > Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu- > de...@nongnu.org; > > Michael S . Tsirkin > > Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as > > migration > > > [...] > > > > It could be solved by adding memdev option to machine, > > > > which would allow to specify backend object. And then on > > > > top make -mem-path alias new option to clean thing up. > > > > > > Do you mean? > > > > > > src vm: -m xG > > > dst vm: -m xG,memdev=pc.ram -object memory-backend- > file,id=pc.ram,size=xG,mem-path=xxx,share=on ... > > Yep, I've meant something like it > > > > src vm: -m xG,memdev=SHARED_RAM -object memory-backend- > file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on > > dst vm: -m xG,memdev=SHARED_RAM -object memory-backend- > file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on > > After a second thought, I find adding a backend for nonnuma pc RAM is > roundabout way. > > And we actually have an existing way to add a file-backed RAM: commit > c902760fb25f ("Add option to use file backed guest memory"). Basically, this > commit adds two options, --mem-path and --mem-prealloc, without specify > a backend explicitly. > > So how about just adding a new option --mem-share to decide if that's a > private memory or shared memory? That seems much straightforward way > to me; after this change we can migrate like: > > src vm: -m xG > dst vm: -m xG --mem-path xxx --mem-share > Attach the patch FYI. Look forward to your thoughts. diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h index 31612ca..5eaf367 100644 --- a/include/sysemu/sysemu.h +++ b/include/sysemu/sysemu.h @@ -127,6 +127,7 @@ extern bool enable_mlock; extern uint8_t qemu_extra_params_fw[2]; extern QEMUClockType rtc_clock; extern const char *mem_path; +extern int mem_share; extern int mem_prealloc; #define MAX_NODES 128 diff --git a/numa.c b/numa.c index 7b9c33a..322289f 100644 --- a/numa.c +++ b/numa.c @@ -456,7 +456,7 @@ static void allocate_system_memory_nonnuma(MemoryRegion *mr, Object *owner, if (mem_path) { #ifdef __linux__ Error *err = NULL; -memory_region_init_ram_from_file(mr, owner, name, ram_size, false, +memory_region_init_ram_from_file(mr, owner, name, ram_size, mem_share, mem_path, &err); if (err) { error_report_err(err); diff --git a/qemu-options.hx b/qemu-options.hx index 678181c..c968c53 100644 --- a/qemu-options.hx +++ b/qemu-options.hx @@ -389,6 +389,15 @@ STEXI Allocate guest RAM from a temporarily created file in @var{path}. ETEXI +DEF("mem-share", 0, QEMU_OPTION_memshare, +"-mem-share make guest memory shareable (use with -mem-path)\n", +QEMU_ARCH_ALL) +STEXI +@item -mem-share +@findex -mem-share +Make file-backed guest RAM shareable when using -mem-path. +ETEXI + DEF("mem-prealloc", 0, QEMU_OPTION_mem_prealloc, "-mem-prealloc preallocate guest memory (use with -mem-path)\n", QEMU_ARCH_ALL) diff --git a/vl.c b/vl.c index 444b750..0ff06c2 100644 --- a/vl.c +++ b/vl.c @@ -140,6 +140,7 @@ int display_opengl; const char* keyboard_layout = NULL; ram_addr_t ram_size; const char *mem_path = NULL; +int mem_share = 0; int mem_prealloc = 0; /* force preallocation of physical target memory */ bool enable_mlock = false; int nb_nics; @@ -3395,6 +3396,9 @@ int main(int argc, char **argv, char **envp) case QEMU_OPTION_mempath: mem_path = optarg; break; +case QEMU_OPTION_memshare: +mem_share = 1; +break; case QEMU_OPTION_mem_prealloc: mem_prealloc = 1; break;
Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration
Hi Igor and all, > -Original Message- > From: Igor Mammedov [mailto:imamm...@redhat.com] > Sent: Thursday, February 8, 2018 7:30 PM > To: Tan, Jianfeng > Cc: Paolo Bonzini; Jason Wang; Maxime Coquelin; qemu-devel@nongnu.org; > Michael S . Tsirkin > Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as > migration > [...] > > > It could be solved by adding memdev option to machine, > > > which would allow to specify backend object. And then on > > > top make -mem-path alias new option to clean thing up. > > > > Do you mean? > > > > src vm: -m xG > > dst vm: -m xG,memdev=pc.ram -object > > memory-backend-file,id=pc.ram,size=xG,mem-path=xxx,share=on ... > Yep, I've meant something like it > > src vm: -m xG,memdev=SHARED_RAM -object > memory-backend-file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on > dst vm: -m xG,memdev=SHARED_RAM -object > memory-backend-file,id=SHARED_RAM,size=xG,mem-path=xxx,share=on After a second thought, I find adding a backend for nonnuma pc RAM is roundabout way. And we actually have an existing way to add a file-backed RAM: commit c902760fb25f ("Add option to use file backed guest memory"). Basically, this commit adds two options, --mem-path and --mem-prealloc, without specify a backend explicitly. So how about just adding a new option --mem-share to decide if that's a private memory or shared memory? That seems much straightforward way to me; after this change we can migrate like: src vm: -m xG dst vm: -m xG --mem-path xxx --mem-share Thanks, Jianfeng > > or it could be -machine FOO,inital_ram_memdev=... > maybe making -M optional in this case as size is specified by backend > > PS: > it's not a good idea to use QEMU's internal id 'pc.ram' > for user specified objects as it might cause problems.
Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration
On 2/8/2018 5:51 PM, Igor Mammedov wrote: On Thu, 8 Feb 2018 09:20:45 +0800 "Tan, Jianfeng" wrote: On 2/7/2018 8:06 PM, Igor Mammedov wrote: On Wed, 7 Feb 2018 07:49:58 +0000 "Tan, Jianfeng" wrote: -Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Tuesday, February 6, 2018 1:32 AM To: Igor Mammedov Cc: Tan, Jianfeng; qemu-devel@nongnu.org; Jason Wang; Maxime Coquelin; Michael S . Tsirkin Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration On 05/02/2018 18:15, Igor Mammedov wrote: Then we would have both ram block named pc.ram: Block NamePSize pc.ram 4 KiB /objects/pc.ram2 MiB But I assume it's a corner case which not really happen. Yeah, you're right. :/ I hadn't thought of hotplug. It can happen indeed. perhaps we should fail object_add memory-backend-foo if it resulted in creating ramblock with duplicate id Note that it would only be duplicated with Jianfeng's patch. So I'm worried that his patch is worse than what we have now, because it may create conflicts with system RAMBlock names are not necessarily predictable. Right now, -object creates RAMBlock names that are nicely constrained within /object/. So we are trading off between the benefit it takes and the bad effect it brings. I'm wondering if the above example is the only failed case this patch leads to, i.e, only there is a ram named "pc.ram" and "/object/pc.ram" in the src VM? Please also consider the second option, that adding an alias name for RAMBlock; I'm not a big fan for that one, as it just pushes the problem to OpenStack/Libvirt. looking at provided CLI examples it's configuration issue on src and dst, one shall not mix numa and non numa variants. Aha, that's another thing we also want to change. We now add numa at dst node, only because without -numa, we cannot set up the file-baked memory with share=on. then shouldn't you start src with the same -numa to begin with, changing such things on the fly is not supported. Yes, you are describing the best practice. But we are originally trying to migrate without any changes to QEMU. General rule is that machine on dst has to be the same as on src. OK. (with backend not visible to guest it possible might be changed but it's hard to tell if something would break due to that or would continue working in future since doesn't go along with above rule) For example, "-m xG -mem-path xxx" can set up a file-baked memory, but the file is not share-able. It could be solved by adding memdev option to machine, which would allow to specify backend object. And then on top make -mem-path alias new option to clean thing up. Do you mean? src vm: -m xG dst vm: -m xG,memdev=pc.ram -object memory-backend-file,id=pc.ram,size=xG,mem-path=xxx,share=on ... But then again, You'd need to start both src and dst with the same option. Yeah, got it :-)
Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration
On 2/7/2018 8:06 PM, Igor Mammedov wrote: On Wed, 7 Feb 2018 07:49:58 + "Tan, Jianfeng" wrote: -Original Message- From: Paolo Bonzini [mailto:pbonz...@redhat.com] Sent: Tuesday, February 6, 2018 1:32 AM To: Igor Mammedov Cc: Tan, Jianfeng; qemu-devel@nongnu.org; Jason Wang; Maxime Coquelin; Michael S . Tsirkin Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration On 05/02/2018 18:15, Igor Mammedov wrote: Then we would have both ram block named pc.ram: Block NamePSize pc.ram 4 KiB /objects/pc.ram2 MiB But I assume it's a corner case which not really happen. Yeah, you're right. :/ I hadn't thought of hotplug. It can happen indeed. perhaps we should fail object_add memory-backend-foo if it resulted in creating ramblock with duplicate id Note that it would only be duplicated with Jianfeng's patch. So I'm worried that his patch is worse than what we have now, because it may create conflicts with system RAMBlock names are not necessarily predictable. Right now, -object creates RAMBlock names that are nicely constrained within /object/. So we are trading off between the benefit it takes and the bad effect it brings. I'm wondering if the above example is the only failed case this patch leads to, i.e, only there is a ram named "pc.ram" and "/object/pc.ram" in the src VM? Please also consider the second option, that adding an alias name for RAMBlock; I'm not a big fan for that one, as it just pushes the problem to OpenStack/Libvirt. looking at provided CLI examples it's configuration issue on src and dst, one shall not mix numa and non numa variants. Aha, that's another thing we also want to change. We now add numa at dst node, only because without -numa, we cannot set up the file-baked memory with share=on. For example, "-m xG -mem-path xxx" can set up a file-baked memory, but the file is not share-able. Or any other suggestions? Fix configuration, namely dst side of it (i.e. use the same -m only variant without -numa as it's on src). BTW, what are you trying to achieve adding -numa on dst? See above reply. Thanks, Jianfeng
Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration
> -Original Message- > From: Paolo Bonzini [mailto:pbonz...@redhat.com] > Sent: Tuesday, February 6, 2018 1:32 AM > To: Igor Mammedov > Cc: Tan, Jianfeng; qemu-devel@nongnu.org; Jason Wang; Maxime Coquelin; > Michael S . Tsirkin > Subject: Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as > migration > > On 05/02/2018 18:15, Igor Mammedov wrote: > >>> > >>> Then we would have both ram block named pc.ram: > >>> Block Name PSize > >>> pc.ram 4 KiB > >>> /objects/pc.ram 2 MiB > >>> > >>> But I assume it's a corner case which not really happen. > >> Yeah, you're right. :/ I hadn't thought of hotplug. It can happen indeed. > > > > perhaps we should fail object_add memory-backend-foo if it resulted > > in creating ramblock with duplicate id > > Note that it would only be duplicated with Jianfeng's patch. So I'm > worried that his patch is worse than what we have now, because it may > create conflicts with system RAMBlock names are not necessarily > predictable. Right now, -object creates RAMBlock names that are nicely > constrained within /object/. So we are trading off between the benefit it takes and the bad effect it brings. I'm wondering if the above example is the only failed case this patch leads to, i.e, only there is a ram named "pc.ram" and "/object/pc.ram" in the src VM? Please also consider the second option, that adding an alias name for RAMBlock; I'm not a big fan for that one, as it just pushes the problem to OpenStack/Libvirt. Or any other suggestions? Thanks, Jianfeng
Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration
On 2/6/2018 12:19 AM, Paolo Bonzini wrote: On 05/02/2018 17:12, Tan, Jianfeng wrote: Hi Paolo, On 2/5/2018 11:53 PM, Paolo Bonzini wrote: On 05/02/2018 15:58, Jianfeng Tan wrote: Here are some options to fix this: 1. When we do ram name comparison, we truncate the prefix as this patch shows. It cannot cover the corner case: the source VM could have two ram blocks with name of "pc.ram" and "/object/pc.ram". That shouldn't happen ("pc.ram" exists even in the "-numa node,memdev=..." case, but it has no RAM block). Suppose we have a VM started with "-m xG", and then hot plugged with a ram block: (qemu) object_add memory-backend-file,id=pc.ram,size=1G,mem-path=/dev/hugepages (qemu) device_add pc-dimm,id=pc.ram,memdev=pc.ram Then we would have both ram block named pc.ram: Block NamePSize pc.ram 4 KiB /objects/pc.ram2 MiB But I assume it's a corner case which not really happen. Yeah, you're right. :/ I hadn't thought of hotplug. It can happen indeed. However, note that -m xG -numa node,memdev=pc.ram \ -object memory-backend-file,id=pc.ram,... works for both vhost-kernel and vhost-user, so I'd rather consider this a configuration problem and not do anything. That configuration indeed works for both. But in the production env, lots of VMs are already started with previous mem config. If we do nothing, it will take a long time (shutdown/start for each VM) to migrate to the new setup. This patch is to make this process more smooth without any bad effect if possible. I understand. However it's not as bad as "there's no possibility at all to migrate from vhost-kernel to vhost-user". There are cases that are more problematic: for example, there's no possibility at all to add memory NUMA policy during a live migration, unless -object memory-backend-* was used on the source. Please help me to understand: Are you saying it's always recommended to use -object memory-backend-* configuration even with vhost-kernel backend for the reason you mentioned? Or just another more serious problem that we shall work on in priority? Thanks, Jianfeng
Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration
On 2/6/2018 12:29 AM, Igor Mammedov wrote: On Mon, 5 Feb 2018 14:58:55 + Jianfeng Tan wrote: Existing VMs with virtio devices and vhost-kernel as the backend are always started with mem config: "-m xG" (with a ram block named "pc.ram") while new VMs with virtio devices and vhost-user as the backend are always started with mem config: "-m xG -numa node,memdev=pc.ram -object memory-backend-file,id=pc.ram,..." (with a ram block named "/object/pc.ram") could you elaborate more on what src command line migrating to what dst command line? The src cmdline: $QEMU -enable-kvm -cpu host -smp 4 /path/to/img \ -m 2G \ -netdev tap,id=mynet1,vhost=on \ -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 ... The dst cmdline: $QEMU -enable-kvm -cpu host -smp 4 /path/to/img \ -m 2G -numa node,memdev=pc.ram -mem-prealloc \ -object memory-backend-file,id=pc.ram,size=2G,mem-path=/dev/hugepages,share=on \ -chardev socket,id=char0,path=/tmp/sock0 \ -netdev type=vhost-user,id=mynet1,chardev=char0,vhostforce \ -device virtio-net-pci,netdev=mynet1,mac=52:54:00:12:34:58 \ -incoming tcp:0: ... Thanks, Jianfeng
Re: [Qemu-devel] [RFC] exec: eliminate ram naming issue as migration
Hi Paolo, On 2/5/2018 11:53 PM, Paolo Bonzini wrote: On 05/02/2018 15:58, Jianfeng Tan wrote: Here are some options to fix this: 1. When we do ram name comparison, we truncate the prefix as this patch shows. It cannot cover the corner case: the source VM could have two ram blocks with name of "pc.ram" and "/object/pc.ram". That shouldn't happen ("pc.ram" exists even in the "-numa node,memdev=..." case, but it has no RAM block). Suppose we have a VM started with "-m xG", and then hot plugged with a ram block: (qemu) object_add memory-backend-file,id=pc.ram,size=1G,mem-path=/dev/hugepages (qemu) device_add pc-dimm,id=pc.ram,memdev=pc.ram Then we would have both ram block named pc.ram: Block NamePSize pc.ram 4 KiB /objects/pc.ram2 MiB But I assume it's a corner case which not really happen. RAMBLOCK_FOREACH(block) { -if (!strcmp(name, block->idstr)) { +name2 = strdup(block->idstr); + id2 = basename(name2); +if (!strcmp(id1, id2)) { +free(name1); + free(name2); return block; } + free(name2); Instead of this, perhaps just skip "/object/" in both id1 and block->idstr? This also removes the need for strdup/free. Not familiar with this before, so there would not be something like /object/xxx/id? In that case, we can just skip "/object/". Thanks! However, note that -m xG -numa node,memdev=pc.ram \ -object memory-backend-file,id=pc.ram,... works for both vhost-kernel and vhost-user, so I'd rather consider this a configuration problem and not do anything. That configuration indeed works for both. But in the production env, lots of VMs are already started with previous mem config. If we do nothing, it will take a long time (shutdown/start for each VM) to migrate to the new setup. This patch is to make this process more smooth without any bad effect if possible. Thanks, Jianfeng Thanks, Paolo } +free(name1); return NULL; }
Re: [Qemu-devel] [PATCH 3/4] cryptodev-vhost-user: add crypto session handler
On 11/29/2017 5:11 PM, Paolo Bonzini wrote: On 29/11/2017 05:10, Michael S. Tsirkin wrote: " Interrupt mode for vhost-user is still not supported in current implementation. But we are evaluating the necessity now. That's more or less a spec violation. Guest must get interrupts if it does not disable them. And it must notify host if host does not disable notifications. By that, I mean the transmission from virtio to vhost, if virtio is not actively sending packets, at vhost side, we can stop polling from that vhost port (i.e., virtio); if VM wants to send packets, besides putting packets on virtqueue, it also kicks the backend through pio/mmio. So no need to modify spec. I understood that as "the vhost-user server is always using poll mode for the virtqueues" and cannot use e.g. ioeventfds. We can change to work like this: stop polling, and wait for the eventfd, and then get back to polling. Thanks, Jianfeng Thanks, Paolo
Re: [Qemu-devel] [dpdk-dev] [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address when received first packet in vhost type
On 11/28/2017 1:01 AM, Aaron Conole wrote: "Tan, Jianfeng" writes: On 11/27/2017 10:27 PM, Yuanhan Liu wrote: On Fri, Nov 24, 2017 at 05:59:09PM +0800, Chen Hailin wrote: Hi Aaron Conole && Jianfeng, The stp could not work in ovs-dpdk vhostuser. Because the attached vhost device doesn't have MAC address. Now we have two ways to solve this problem. 1. The vhost learns MAC address from packet like as my first patch. I do agree with Aaron this is not the right way. I do think it should be the vswitch's responsibility to learn mac of vhost port. Except that it's the only feasible way without modifying the spec (yuanhan already makes it very clear below), we can treat the vswitch as a phsical switch, VM as a physical server, virtio/vhost port as a back-to-back connected NICs, the only way of the physical switch to know the mac of the NIC on the other side is ARP learning. Might I ask why you don't think it's a right way? As a quick example, I think a malicious guest in a multi-tenant environment could send traffic out to manipulate this feature into learning an incorrect mac address. Instead, I think it's not right to stop such mac spoofing behavior (suppose someone wants to have such an experiment in an overlay networking). And it actually only affects one “LAN", instead of all "LANs". And it's usually not the switch's responsibility to detect mac spoofing behavior IMHO. To get this right requires doing deep packet inspection, and making sure to only learn based on certain l2 traffic. Yes, should learn based on ARP packets. Your concern is the performance? I suppose there is not to many such packets. Thanks, Jianfeng
Re: [Qemu-devel] [dpdk-dev] [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address when received first packet in vhost type
On 11/28/2017 12:14 AM, Aaron Conole wrote: Yuanhan Liu writes: On Fri, Nov 24, 2017 at 05:59:09PM +0800, Chen Hailin wrote: Hi Aaron Conole && Jianfeng, The stp could not work in ovs-dpdk vhostuser. Because the attached vhost device doesn't have MAC address. Now we have two ways to solve this problem. 1. The vhost learns MAC address from packet like as my first patch. I do agree with Aaron this is not the right way. 2. The virtio notifies MAC address actively to vhost user . Unfortunately, AFAIK, there is no way to achieve that so far. we could either let virtio/QEMU to expose the CQ to vhost or add a new VHOST_USER message to carry the mac address. While vhost-user is a generic interface adding a virtio-net specific message also doesn't seem quite right. Exposing CQ is probably the best we can do. Anyway, both need spec change. There are other possible ways. libvirt knows about the mac address from the domain xml file. Perhaps it's possible to set the mac column in the database to the correct value when the port is being created in ovs? This would be an action taken by the orchestration tool. In OVS db, we can only see vhost port, but not virtio port. That is to say, we could use different mac for vhost port from virtio port, especially when it works as a vrouter instead of vswitch. Additionally, there's a mechanism in virtio-net to set the mac address from the host to the guest. Is it possible to expose that functionality through vhost-user? We can assign mac addr when starting QEMU. After that, I suppose we cannot set mac addr any more, let alone setting it from vhost-user side (vhost-user protocol does not support it yet). Thanks, Jianfeng Then when the orchestration tool sets the mac, it can be propagated, and mac_in_use can reflect the appropriate value. I think that's a workable solution, but I might have missed something. --yliu In my opinions, if we treat it as a device, we should allocate MAC address for the device when the VM started. Which one do you think better? Best Regards, Chen Hailin che...@arraynetworks.com.cn From: Aaron Conole Date: 2017-11-18 10:00 To: Hailin Chen CC: ovs-...@openvswitch.org; Maxime Coquelin; cl...@arraynetworks.com.cn Subject: Re: [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address when received first packet in vhost type Hi Hailin, Hailin Chen writes: The stp could not work on netdev-dpdk if network is loop. Because the stp protocol negotiates designate port by sending BPDU packets which contains MAC address. However the device doesn't have MAC address in vhostuser type. Thus, function send_bpdu_cb would not send BPDU packets. This patch will set the MAC for device when received first packet. Signed-off-by: Hailin Chen --- Thanks for the patch. In general, I don't think this is the right approach to deal with this type of issue. I believe the problem statement is that OvS bridge is unaware of the guest MAC address - did I get it right? In that case, I would think that a better way to solve this would be to have virtio tell the mac address of the guest. I don't recall right now if that's allowed in the virtio spec, but I do remember some kind of negotiation features. I've CC'd Maxime, who is one of the maintainers of the virtio code from DPDK side. Perhaps there is an alternate way to solve this. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev
Re: [Qemu-devel] [dpdk-dev] [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address when received first packet in vhost type
On 11/27/2017 10:27 PM, Yuanhan Liu wrote: On Fri, Nov 24, 2017 at 05:59:09PM +0800, Chen Hailin wrote: Hi Aaron Conole && Jianfeng, The stp could not work in ovs-dpdk vhostuser. Because the attached vhost device doesn't have MAC address. Now we have two ways to solve this problem. 1. The vhost learns MAC address from packet like as my first patch. I do agree with Aaron this is not the right way. I do think it should be the vswitch's responsibility to learn mac of vhost port. Except that it's the only feasible way without modifying the spec (yuanhan already makes it very clear below), we can treat the vswitch as a phsical switch, VM as a physical server, virtio/vhost port as a back-to-back connected NICs, the only way of the physical switch to know the mac of the NIC on the other side is ARP learning. Might I ask why you don't think it's a right way? Thanks, Jianfeng 2. The virtio notifies MAC address actively to vhost user . Unfortunately, AFAIK, there is no way to achieve that so far. we could either let virtio/QEMU to expose the CQ to vhost or add a new VHOST_USER message to carry the mac address. While vhost-user is a generic interface adding a virtio-net specific message also doesn't seem quite right. Exposing CQ is probably the best we can do. Anyway, both need spec change. --yliu In my opinions, if we treat it as a device, we should allocate MAC address for the device when the VM started. Which one do you think better? Best Regards, Chen Hailin che...@arraynetworks.com.cn From: Aaron Conole Date: 2017-11-18 10:00 To: Hailin Chen CC: ovs-...@openvswitch.org; Maxime Coquelin; cl...@arraynetworks.com.cn Subject: Re: [ovs-dev] [PATCH RFC] netdev-dpdk: Fix device obtain mac address when received first packet in vhost type Hi Hailin, Hailin Chen writes: The stp could not work on netdev-dpdk if network is loop. Because the stp protocol negotiates designate port by sending BPDU packets which contains MAC address. However the device doesn't have MAC address in vhostuser type. Thus, function send_bpdu_cb would not send BPDU packets. This patch will set the MAC for device when received first packet. Signed-off-by: Hailin Chen --- Thanks for the patch. In general, I don't think this is the right approach to deal with this type of issue. I believe the problem statement is that OvS bridge is unaware of the guest MAC address - did I get it right? In that case, I would think that a better way to solve this would be to have virtio tell the mac address of the guest. I don't recall right now if that's allowed in the virtio spec, but I do remember some kind of negotiation features. I've CC'd Maxime, who is one of the maintainers of the virtio code from DPDK side. Perhaps there is an alternate way to solve this. ___ dev mailing list d...@openvswitch.org https://mail.openvswitch.org/mailman/listinfo/ovs-dev