Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
On 2019年08月19日 15:39, Dan Carpenter wrote: On Mon, Aug 19, 2019 at 01:32:54PM +0800, Zhao, Yakui wrote: In fact as this driver is mainly used for embedded IOT usage, it doesn't handle the complex cleanup when such error is encountered. Instead the clean up is handled in free_guest_vm. A use after free here seems like a potential security problem. Security matters for IoT... :( Thanks for pointing out the issue. The cleanup will be considered carefully. regards, dan carpenter ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
On Mon, Aug 19, 2019 at 10:39:58AM +0300, Dan Carpenter wrote: > On Mon, Aug 19, 2019 at 01:32:54PM +0800, Zhao, Yakui wrote: > > In fact as this driver is mainly used for embedded IOT usage, it doesn't > > handle the complex cleanup when such error is encountered. Instead the clean > > up is handled in free_guest_vm. > > A use after free here seems like a potential security problem. Security > matters for IoT... :( Yeah, the "S" in "IoT" stands for security. -- Regards/Gruss, Boris. Good mailing practices for 400: avoid top-posting and trim the reply. ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
On Mon, Aug 19, 2019 at 01:32:54PM +0800, Zhao, Yakui wrote: > In fact as this driver is mainly used for embedded IOT usage, it doesn't > handle the complex cleanup when such error is encountered. Instead the clean > up is handled in free_guest_vm. A use after free here seems like a potential security problem. Security matters for IoT... :( regards, dan carpenter ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
On 2019年08月16日 20:58, Dan Carpenter wrote: On Fri, Aug 16, 2019 at 10:25:49AM +0800, Zhao Yakui wrote: +int hugepage_map_guest(struct acrn_vm *vm, struct vm_memmap *memmap) +{ + struct page *page = NULL, *regions_buf_pg = NULL; + unsigned long len, guest_gpa, vma; + struct vm_memory_region *region_array; + struct set_regions *regions; + int max_size = PAGE_SIZE / sizeof(struct vm_memory_region); + int ret; + + if (!vm || !memmap) + return -EINVAL; + + len = memmap->len; + vma = memmap->vma_base; + guest_gpa = memmap->gpa; + + /* prepare set_memory_regions info */ + regions_buf_pg = alloc_page(GFP_KERNEL); + if (!regions_buf_pg) + return -ENOMEM; + + regions = kzalloc(sizeof(*regions), GFP_KERNEL); + if (!regions) { + __free_page(regions_buf_pg); + return -ENOMEM; It's better to do a goto err_free_regions_buf here. More comments below. + } + regions->mr_num = 0; + regions->vmid = vm->vmid; + regions->regions_gpa = page_to_phys(regions_buf_pg); + region_array = page_to_virt(regions_buf_pg); + + while (len > 0) { + unsigned long vm0_gpa, pagesize; + + ret = get_user_pages_fast(vma, 1, 1, &page); + if (unlikely(ret != 1) || (!page)) { + pr_err("failed to pin huge page!\n"); + ret = -ENOMEM; + goto err; goto err is a red flag. It's better if error labels do one specific named thing like: err_regions: kfree(regions); err_free_regions_buf: __free_page(regions_buf_pg); We should unwind in the opposite/mirror order from how things were allocated. Then we can remove the if statements in the error handling. Thanks for the review. Will follow your suggestion to unwind the error handling. In this situation, say the user triggers an -EFAULT in get_user_pages_fast() in the second iteration through the loop. That means that "page" is the non-NULL page from the previous iteration. We have already added it to add_guest_map(). But now we're freeing it without removing it from the map so probably it leads to a use after free. The best way to write the error handling in a loop like this is to clean up the partial iteration that has succeed (nothing here), and then unwind all the successful iterations at the bottom of the function. "goto unwind_loop;" In theory we should cleanup the previous success iteration if it encounters one error in the current iteration. But it will be quite complex to cleanup up the previous iteration. call the set_memory_regions for MR_DEL op. call the remove_guest_map for the added hash item call the put_page for returned page in get_user_pages_fast. In fact as this driver is mainly used for embedded IOT usage, it doesn't handle the complex cleanup when such error is encountered. Instead the clean up is handled in free_guest_vm. + } + + vm0_gpa = page_to_phys(page); + pagesize = PAGE_SIZE << compound_order(page); + + ret = add_guest_map(vm, vm0_gpa, guest_gpa, pagesize); + if (ret < 0) { + pr_err("failed to add memseg for huge page!\n"); + goto err; So then here, it would be: pr_err("failed to add memseg for huge page!\n"); put_page(page); goto unwind_loop; regards, dan carpenter + } + + /* fill each memory region into region_array */ + region_array[regions->mr_num].type = MR_ADD; + region_array[regions->mr_num].gpa = guest_gpa; + region_array[regions->mr_num].vm0_gpa = vm0_gpa; + region_array[regions->mr_num].size = pagesize; + region_array[regions->mr_num].prot = + (MEM_TYPE_WB & MEM_TYPE_MASK) | + (memmap->prot & MEM_ACCESS_RIGHT_MASK); + regions->mr_num++; + if (regions->mr_num == max_size) { + pr_debug("region buffer full, set & renew regions!\n"); + ret = set_memory_regions(regions); + if (ret < 0) { + pr_err("failed to set regions,ret=%d!\n", ret); + goto err; + } + regions->mr_num = 0; + } + + len -= pagesize; + vma += pagesize; + guest_gpa += pagesize; + } + + ret = set_memory_regions(regions); + if (ret < 0) { + pr_err("failed to set regions, ret=%d!\n", ret); + goto err; + } + + __free_page(regions_buf_pg); + kfree(regions); + + return 0; +err: + if (regions_buf_pg) + __fre
Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
On Fri, Aug 16, 2019 at 10:25:49AM +0800, Zhao Yakui wrote: > +int hugepage_map_guest(struct acrn_vm *vm, struct vm_memmap *memmap) > +{ > + struct page *page = NULL, *regions_buf_pg = NULL; > + unsigned long len, guest_gpa, vma; > + struct vm_memory_region *region_array; > + struct set_regions *regions; > + int max_size = PAGE_SIZE / sizeof(struct vm_memory_region); > + int ret; > + > + if (!vm || !memmap) > + return -EINVAL; > + > + len = memmap->len; > + vma = memmap->vma_base; > + guest_gpa = memmap->gpa; > + > + /* prepare set_memory_regions info */ > + regions_buf_pg = alloc_page(GFP_KERNEL); > + if (!regions_buf_pg) > + return -ENOMEM; > + > + regions = kzalloc(sizeof(*regions), GFP_KERNEL); > + if (!regions) { > + __free_page(regions_buf_pg); > + return -ENOMEM; It's better to do a goto err_free_regions_buf here. More comments below. > + } > + regions->mr_num = 0; > + regions->vmid = vm->vmid; > + regions->regions_gpa = page_to_phys(regions_buf_pg); > + region_array = page_to_virt(regions_buf_pg); > + > + while (len > 0) { > + unsigned long vm0_gpa, pagesize; > + > + ret = get_user_pages_fast(vma, 1, 1, &page); > + if (unlikely(ret != 1) || (!page)) { > + pr_err("failed to pin huge page!\n"); > + ret = -ENOMEM; > + goto err; goto err is a red flag. It's better if error labels do one specific named thing like: err_regions: kfree(regions); err_free_regions_buf: __free_page(regions_buf_pg); We should unwind in the opposite/mirror order from how things were allocated. Then we can remove the if statements in the error handling. In this situation, say the user triggers an -EFAULT in get_user_pages_fast() in the second iteration through the loop. That means that "page" is the non-NULL page from the previous iteration. We have already added it to add_guest_map(). But now we're freeing it without removing it from the map so probably it leads to a use after free. The best way to write the error handling in a loop like this is to clean up the partial iteration that has succeed (nothing here), and then unwind all the successful iterations at the bottom of the function. "goto unwind_loop;" > + } > + > + vm0_gpa = page_to_phys(page); > + pagesize = PAGE_SIZE << compound_order(page); > + > + ret = add_guest_map(vm, vm0_gpa, guest_gpa, pagesize); > + if (ret < 0) { > + pr_err("failed to add memseg for huge page!\n"); > + goto err; So then here, it would be: pr_err("failed to add memseg for huge page!\n"); put_page(page); goto unwind_loop; regards, dan carpenter > + } > + > + /* fill each memory region into region_array */ > + region_array[regions->mr_num].type = MR_ADD; > + region_array[regions->mr_num].gpa = guest_gpa; > + region_array[regions->mr_num].vm0_gpa = vm0_gpa; > + region_array[regions->mr_num].size = pagesize; > + region_array[regions->mr_num].prot = > + (MEM_TYPE_WB & MEM_TYPE_MASK) | > + (memmap->prot & MEM_ACCESS_RIGHT_MASK); > + regions->mr_num++; > + if (regions->mr_num == max_size) { > + pr_debug("region buffer full, set & renew regions!\n"); > + ret = set_memory_regions(regions); > + if (ret < 0) { > + pr_err("failed to set regions,ret=%d!\n", ret); > + goto err; > + } > + regions->mr_num = 0; > + } > + > + len -= pagesize; > + vma += pagesize; > + guest_gpa += pagesize; > + } > + > + ret = set_memory_regions(regions); > + if (ret < 0) { > + pr_err("failed to set regions, ret=%d!\n", ret); > + goto err; > + } > + > + __free_page(regions_buf_pg); > + kfree(regions); > + > + return 0; > +err: > + if (regions_buf_pg) > + __free_page(regions_buf_pg); > + if (page) > + put_page(page); > + kfree(regions); > + return ret; > +} > + ___ devel mailing list de...@linuxdriverproject.org http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel
[RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device
In order to launch the ACRN guest system, it needs to setup the mapping between GPA (guest physical address) and HPA (host physical address). This is based on memory virtualization and configured in EPT table. The ioctl related with memory management is added and then the hypercall is called so that the ACRN hypervisor can help to setup the memory mapping for ACRN guest. The 1G/2M huge page is used to optimize the EPT table for guest VM. This will simplify the memory allocation and also optimizes the TLB. For the MMIO mapping: It can support 4K/2M page. IC_SET_MEMSEG: This is used to setup the memory mapping for the memory of guest system by using hugetlb(Guest physical address and host virtual addr).It is also used to setup the device MMIO mapping for PCI device. IC_UNSET_MEMSEG: This is used to remove the device MMIO mapping for PCI device. This is used with updating the MMIO mapping together. As the acrn hypervisor is mainly used for embedded IOT device, it doesn't support the dynamica removal of memory mapping. Co-developed-by: Jason Chen CJ Signed-off-by: Jason Chen CJ Co-developed-by: Li, Fei Signed-off-by: Li, Fei Co-developed-by: Liu Shuo Signed-off-by: Liu Shuo Signed-off-by: Zhao Yakui --- drivers/staging/acrn/Makefile | 4 +- drivers/staging/acrn/acrn_dev.c | 27 +++ drivers/staging/acrn/acrn_drv_internal.h | 90 +++--- drivers/staging/acrn/acrn_mm.c| 227 drivers/staging/acrn/acrn_mm_hugetlb.c| 281 ++ drivers/staging/acrn/acrn_vm_mngt.c | 2 + include/linux/acrn/acrn_drv.h | 86 + include/uapi/linux/acrn/acrn_common_def.h | 25 +++ include/uapi/linux/acrn/acrn_ioctl_defs.h | 41 + 9 files changed, 759 insertions(+), 24 deletions(-) create mode 100644 drivers/staging/acrn/acrn_mm.c create mode 100644 drivers/staging/acrn/acrn_mm_hugetlb.c create mode 100644 include/linux/acrn/acrn_drv.h create mode 100644 include/uapi/linux/acrn/acrn_common_def.h diff --git a/drivers/staging/acrn/Makefile b/drivers/staging/acrn/Makefile index 426d6e8..ec62afe 100644 --- a/drivers/staging/acrn/Makefile +++ b/drivers/staging/acrn/Makefile @@ -1,4 +1,6 @@ obj-$(CONFIG_ACRN_HSM) := acrn.o acrn-y := acrn_dev.o \ acrn_hypercall.o \ - acrn_vm_mngt.o + acrn_vm_mngt.o \ + acrn_mm.o \ + acrn_mm_hugetlb.o diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c index 7372316..cb62819 100644 --- a/drivers/staging/acrn/acrn_dev.c +++ b/drivers/staging/acrn/acrn_dev.c @@ -44,6 +44,7 @@ static int acrn_dev_open(struct inode *inodep, struct file *filep) { struct acrn_vm *vm; + int i; vm = kzalloc(sizeof(*vm), GFP_KERNEL); if (!vm) @@ -53,6 +54,10 @@ int acrn_dev_open(struct inode *inodep, struct file *filep) vm->vmid = ACRN_INVALID_VMID; vm->dev = acrn_device; + for (i = 0; i < HUGEPAGE_HLIST_ARRAY_SIZE; i++) + INIT_HLIST_HEAD(&vm->hugepage_hlist[i]); + mutex_init(&vm->hugepage_lock); + write_lock_bh(&acrn_vm_list_lock); vm_list_add(&vm->list); write_unlock_bh(&acrn_vm_list_lock); @@ -212,6 +217,28 @@ long acrn_dev_ioctl(struct file *filep, return ret; } + case IC_SET_MEMSEG: { + struct vm_memmap memmap; + + if (copy_from_user(&memmap, (void *)ioctl_param, + sizeof(memmap))) + return -EFAULT; + + ret = map_guest_memseg(vm, &memmap); + break; + } + + case IC_UNSET_MEMSEG: { + struct vm_memmap memmap; + + if (copy_from_user(&memmap, (void *)ioctl_param, + sizeof(memmap))) + return -EFAULT; + + ret = unmap_guest_memseg(vm, &memmap); + break; + } + default: pr_warn("Unknown IOCTL 0x%x\n", ioctl_num); ret = -EFAULT; diff --git a/drivers/staging/acrn/acrn_drv_internal.h b/drivers/staging/acrn/acrn_drv_internal.h index 6758dea..5098765 100644 --- a/drivers/staging/acrn/acrn_drv_internal.h +++ b/drivers/staging/acrn/acrn_drv_internal.h @@ -11,6 +11,57 @@ #include #include +struct vm_memory_region { +#define MR_ADD 0 +#define MR_DEL 2 + u32 type; + + /* IN: mem attr */ + u32 prot; + + /* IN: beginning guest GPA to map */ + u64 gpa; + + /* IN: VM0's GPA which foreign gpa will be mapped to */ + u64 vm0_gpa; + + /* IN: size of the region */ + u64 size; +}; + +struct set_regions { + /*IN: vmid for this hypercall */ + u16 vmid; + + /** Reserved */ + u16 reserved[3]; + + /* IN: multi memmaps numbers */ + u32 mr_num; + + /** Reserved */ + u32 reserved1; + /* IN: +* the gpa of memmaps buff