Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device

2019-08-19 Thread Zhao, Yakui



On 2019年08月19日 15:39, Dan Carpenter wrote:

On Mon, Aug 19, 2019 at 01:32:54PM +0800, Zhao, Yakui wrote:

In fact as this driver is mainly used for embedded IOT usage, it doesn't
handle the complex cleanup when such error is encountered. Instead the clean
up is handled in free_guest_vm.


A use after free here seems like a potential security problem.  Security
matters for IoT...  :(


Thanks for pointing out the issue.
The cleanup will be considered carefully.



regards,
dan carpenter


___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device

2019-08-19 Thread Borislav Petkov
On Mon, Aug 19, 2019 at 10:39:58AM +0300, Dan Carpenter wrote:
> On Mon, Aug 19, 2019 at 01:32:54PM +0800, Zhao, Yakui wrote:
> > In fact as this driver is mainly used for embedded IOT usage, it doesn't
> > handle the complex cleanup when such error is encountered. Instead the clean
> > up is handled in free_guest_vm.
> 
> A use after free here seems like a potential security problem.  Security
> matters for IoT...  :(

Yeah, the "S" in "IoT" stands for security.

-- 
Regards/Gruss,
Boris.

Good mailing practices for 400: avoid top-posting and trim the reply.
___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device

2019-08-19 Thread Dan Carpenter
On Mon, Aug 19, 2019 at 01:32:54PM +0800, Zhao, Yakui wrote:
> In fact as this driver is mainly used for embedded IOT usage, it doesn't
> handle the complex cleanup when such error is encountered. Instead the clean
> up is handled in free_guest_vm.

A use after free here seems like a potential security problem.  Security
matters for IoT...  :(

regards,
dan carpenter

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device

2019-08-18 Thread Zhao, Yakui



On 2019年08月16日 20:58, Dan Carpenter wrote:

On Fri, Aug 16, 2019 at 10:25:49AM +0800, Zhao Yakui wrote:

+int hugepage_map_guest(struct acrn_vm *vm, struct vm_memmap *memmap)
+{
+   struct page *page = NULL, *regions_buf_pg = NULL;
+   unsigned long len, guest_gpa, vma;
+   struct vm_memory_region *region_array;
+   struct set_regions *regions;
+   int max_size = PAGE_SIZE / sizeof(struct vm_memory_region);
+   int ret;
+
+   if (!vm || !memmap)
+   return -EINVAL;
+
+   len = memmap->len;
+   vma = memmap->vma_base;
+   guest_gpa = memmap->gpa;
+
+   /* prepare set_memory_regions info */
+   regions_buf_pg = alloc_page(GFP_KERNEL);
+   if (!regions_buf_pg)
+   return -ENOMEM;
+
+   regions = kzalloc(sizeof(*regions), GFP_KERNEL);
+   if (!regions) {
+   __free_page(regions_buf_pg);
+   return -ENOMEM;


It's better to do a goto err_free_regions_buf here.  More comments
below.


+   }
+   regions->mr_num = 0;
+   regions->vmid = vm->vmid;
+   regions->regions_gpa = page_to_phys(regions_buf_pg);
+   region_array = page_to_virt(regions_buf_pg);
+
+   while (len > 0) {
+   unsigned long vm0_gpa, pagesize;
+
+   ret = get_user_pages_fast(vma, 1, 1, &page);
+   if (unlikely(ret != 1) || (!page)) {
+   pr_err("failed to pin huge page!\n");
+   ret = -ENOMEM;
+   goto err;


goto err is a red flag.  It's better if error labels do one specific
named thing like:

err_regions:
kfree(regions);
err_free_regions_buf:
__free_page(regions_buf_pg);

We should unwind in the opposite/mirror order from how things were
allocated.  Then we can remove the if statements in the error handling.


Thanks for the review.

Will follow your suggestion to unwind the error handling.



In this situation, say the user triggers an -EFAULT in
get_user_pages_fast() in the second iteration through the loop.  That
means that "page" is the non-NULL page from the previous iteration.  We
have already added it to add_guest_map().  But now we're freeing it
without removing it from the map so probably it leads to a use after
free.

The best way to write the error handling in a loop like this is to
clean up the partial iteration that has succeed (nothing here), and then
unwind all the successful iterations at the bottom of the function.
"goto unwind_loop;"



In theory we should cleanup the previous success iteration if it 
encounters one error in the current iteration.

But it will be quite complex to cleanup up the previous iteration.
call the set_memory_regions for MR_DEL op.
call the remove_guest_map for the added hash item
call the put_page for returned page in get_user_pages_fast.

In fact as this driver is mainly used for embedded IOT usage, it doesn't 
handle the complex cleanup when such error is encountered. Instead the 
clean up is handled in free_guest_vm.



+   }
+
+   vm0_gpa = page_to_phys(page);
+   pagesize = PAGE_SIZE << compound_order(page);
+
+   ret = add_guest_map(vm, vm0_gpa, guest_gpa, pagesize);
+   if (ret < 0) {
+   pr_err("failed to add memseg for huge page!\n");
+   goto err;


So then here, it would be:

pr_err("failed to add memseg for huge page!\n");
put_page(page);
goto unwind_loop;

regards,
dan carpenter


+   }
+
+   /* fill each memory region into region_array */
+   region_array[regions->mr_num].type = MR_ADD;
+   region_array[regions->mr_num].gpa = guest_gpa;
+   region_array[regions->mr_num].vm0_gpa = vm0_gpa;
+   region_array[regions->mr_num].size = pagesize;
+   region_array[regions->mr_num].prot =
+   (MEM_TYPE_WB & MEM_TYPE_MASK) |
+   (memmap->prot & MEM_ACCESS_RIGHT_MASK);
+   regions->mr_num++;
+   if (regions->mr_num == max_size) {
+   pr_debug("region buffer full, set & renew regions!\n");
+   ret = set_memory_regions(regions);
+   if (ret < 0) {
+   pr_err("failed to set regions,ret=%d!\n", ret);
+   goto err;
+   }
+   regions->mr_num = 0;
+   }
+
+   len -= pagesize;
+   vma += pagesize;
+   guest_gpa += pagesize;
+   }
+
+   ret = set_memory_regions(regions);
+   if (ret < 0) {
+   pr_err("failed to set regions, ret=%d!\n", ret);
+   goto err;
+   }
+
+   __free_page(regions_buf_pg);
+   kfree(regions);
+
+   return 0;
+err:
+   if (regions_buf_pg)
+   __fre

Re: [RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device

2019-08-16 Thread Dan Carpenter
On Fri, Aug 16, 2019 at 10:25:49AM +0800, Zhao Yakui wrote:
> +int hugepage_map_guest(struct acrn_vm *vm, struct vm_memmap *memmap)
> +{
> + struct page *page = NULL, *regions_buf_pg = NULL;
> + unsigned long len, guest_gpa, vma;
> + struct vm_memory_region *region_array;
> + struct set_regions *regions;
> + int max_size = PAGE_SIZE / sizeof(struct vm_memory_region);
> + int ret;
> +
> + if (!vm || !memmap)
> + return -EINVAL;
> +
> + len = memmap->len;
> + vma = memmap->vma_base;
> + guest_gpa = memmap->gpa;
> +
> + /* prepare set_memory_regions info */
> + regions_buf_pg = alloc_page(GFP_KERNEL);
> + if (!regions_buf_pg)
> + return -ENOMEM;
> +
> + regions = kzalloc(sizeof(*regions), GFP_KERNEL);
> + if (!regions) {
> + __free_page(regions_buf_pg);
> + return -ENOMEM;

It's better to do a goto err_free_regions_buf here.  More comments
below.

> + }
> + regions->mr_num = 0;
> + regions->vmid = vm->vmid;
> + regions->regions_gpa = page_to_phys(regions_buf_pg);
> + region_array = page_to_virt(regions_buf_pg);
> +
> + while (len > 0) {
> + unsigned long vm0_gpa, pagesize;
> +
> + ret = get_user_pages_fast(vma, 1, 1, &page);
> + if (unlikely(ret != 1) || (!page)) {
> + pr_err("failed to pin huge page!\n");
> + ret = -ENOMEM;
> + goto err;

goto err is a red flag.  It's better if error labels do one specific
named thing like:

err_regions:
kfree(regions);
err_free_regions_buf:
__free_page(regions_buf_pg);

We should unwind in the opposite/mirror order from how things were
allocated.  Then we can remove the if statements in the error handling.

In this situation, say the user triggers an -EFAULT in
get_user_pages_fast() in the second iteration through the loop.  That
means that "page" is the non-NULL page from the previous iteration.  We
have already added it to add_guest_map().  But now we're freeing it
without removing it from the map so probably it leads to a use after
free.

The best way to write the error handling in a loop like this is to
clean up the partial iteration that has succeed (nothing here), and then
unwind all the successful iterations at the bottom of the function.
"goto unwind_loop;"

> + }
> +
> + vm0_gpa = page_to_phys(page);
> + pagesize = PAGE_SIZE << compound_order(page);
> +
> + ret = add_guest_map(vm, vm0_gpa, guest_gpa, pagesize);
> + if (ret < 0) {
> + pr_err("failed to add memseg for huge page!\n");
> + goto err;

So then here, it would be:

pr_err("failed to add memseg for huge page!\n");
put_page(page);
goto unwind_loop;

regards,
dan carpenter

> + }
> +
> + /* fill each memory region into region_array */
> + region_array[regions->mr_num].type = MR_ADD;
> + region_array[regions->mr_num].gpa = guest_gpa;
> + region_array[regions->mr_num].vm0_gpa = vm0_gpa;
> + region_array[regions->mr_num].size = pagesize;
> + region_array[regions->mr_num].prot =
> + (MEM_TYPE_WB & MEM_TYPE_MASK) |
> + (memmap->prot & MEM_ACCESS_RIGHT_MASK);
> + regions->mr_num++;
> + if (regions->mr_num == max_size) {
> + pr_debug("region buffer full, set & renew regions!\n");
> + ret = set_memory_regions(regions);
> + if (ret < 0) {
> + pr_err("failed to set regions,ret=%d!\n", ret);
> + goto err;
> + }
> + regions->mr_num = 0;
> + }
> +
> + len -= pagesize;
> + vma += pagesize;
> + guest_gpa += pagesize;
> + }
> +
> + ret = set_memory_regions(regions);
> + if (ret < 0) {
> + pr_err("failed to set regions, ret=%d!\n", ret);
> + goto err;
> + }
> +
> + __free_page(regions_buf_pg);
> + kfree(regions);
> +
> + return 0;
> +err:
> + if (regions_buf_pg)
> + __free_page(regions_buf_pg);
> + if (page)
> + put_page(page);
> + kfree(regions);
> + return ret;
> +}
> +

___
devel mailing list
de...@linuxdriverproject.org
http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel


[RFC PATCH 08/15] drivers/acrn: add VM memory management for ACRN char device

2019-08-15 Thread Zhao Yakui
In order to launch the ACRN guest system, it needs to setup the mapping
between GPA (guest physical address) and HPA (host physical address).
This is based on memory virtualization and configured in EPT table.
The ioctl related with memory management is added and then the hypercall
is called so that the ACRN hypervisor can help to setup the memory
mapping for ACRN guest.
The 1G/2M huge page is used to optimize the EPT table for guest VM. This
will simplify the memory allocation and also optimizes the TLB.
For the MMIO mapping: It can support 4K/2M page.

IC_SET_MEMSEG: This is used to setup the memory mapping for the memory
of guest system by using hugetlb(Guest physical address and host virtual
addr).It is also used to setup the device MMIO mapping for PCI device.
IC_UNSET_MEMSEG: This is used to remove the device MMIO mapping for PCI
device. This is used with updating the MMIO mapping together. As the
acrn hypervisor is mainly used for embedded IOT device, it doesn't support
the dynamica removal of memory mapping.

Co-developed-by: Jason Chen CJ 
Signed-off-by: Jason Chen CJ 
Co-developed-by: Li, Fei 
Signed-off-by: Li, Fei 
Co-developed-by: Liu Shuo 
Signed-off-by: Liu Shuo 
Signed-off-by: Zhao Yakui 
---
 drivers/staging/acrn/Makefile |   4 +-
 drivers/staging/acrn/acrn_dev.c   |  27 +++
 drivers/staging/acrn/acrn_drv_internal.h  |  90 +++---
 drivers/staging/acrn/acrn_mm.c| 227 
 drivers/staging/acrn/acrn_mm_hugetlb.c| 281 ++
 drivers/staging/acrn/acrn_vm_mngt.c   |   2 +
 include/linux/acrn/acrn_drv.h |  86 +
 include/uapi/linux/acrn/acrn_common_def.h |  25 +++
 include/uapi/linux/acrn/acrn_ioctl_defs.h |  41 +
 9 files changed, 759 insertions(+), 24 deletions(-)
 create mode 100644 drivers/staging/acrn/acrn_mm.c
 create mode 100644 drivers/staging/acrn/acrn_mm_hugetlb.c
 create mode 100644 include/linux/acrn/acrn_drv.h
 create mode 100644 include/uapi/linux/acrn/acrn_common_def.h

diff --git a/drivers/staging/acrn/Makefile b/drivers/staging/acrn/Makefile
index 426d6e8..ec62afe 100644
--- a/drivers/staging/acrn/Makefile
+++ b/drivers/staging/acrn/Makefile
@@ -1,4 +1,6 @@
 obj-$(CONFIG_ACRN_HSM) := acrn.o
 acrn-y := acrn_dev.o \
  acrn_hypercall.o \
- acrn_vm_mngt.o
+ acrn_vm_mngt.o \
+ acrn_mm.o \
+ acrn_mm_hugetlb.o
diff --git a/drivers/staging/acrn/acrn_dev.c b/drivers/staging/acrn/acrn_dev.c
index 7372316..cb62819 100644
--- a/drivers/staging/acrn/acrn_dev.c
+++ b/drivers/staging/acrn/acrn_dev.c
@@ -44,6 +44,7 @@ static
 int acrn_dev_open(struct inode *inodep, struct file *filep)
 {
struct acrn_vm *vm;
+   int i;
 
vm = kzalloc(sizeof(*vm), GFP_KERNEL);
if (!vm)
@@ -53,6 +54,10 @@ int acrn_dev_open(struct inode *inodep, struct file *filep)
vm->vmid = ACRN_INVALID_VMID;
vm->dev = acrn_device;
 
+   for (i = 0; i < HUGEPAGE_HLIST_ARRAY_SIZE; i++)
+   INIT_HLIST_HEAD(&vm->hugepage_hlist[i]);
+   mutex_init(&vm->hugepage_lock);
+
write_lock_bh(&acrn_vm_list_lock);
vm_list_add(&vm->list);
write_unlock_bh(&acrn_vm_list_lock);
@@ -212,6 +217,28 @@ long acrn_dev_ioctl(struct file *filep,
return ret;
}
 
+   case IC_SET_MEMSEG: {
+   struct vm_memmap memmap;
+
+   if (copy_from_user(&memmap, (void *)ioctl_param,
+  sizeof(memmap)))
+   return -EFAULT;
+
+   ret = map_guest_memseg(vm, &memmap);
+   break;
+   }
+
+   case IC_UNSET_MEMSEG: {
+   struct vm_memmap memmap;
+
+   if (copy_from_user(&memmap, (void *)ioctl_param,
+  sizeof(memmap)))
+   return -EFAULT;
+
+   ret = unmap_guest_memseg(vm, &memmap);
+   break;
+   }
+
default:
pr_warn("Unknown IOCTL 0x%x\n", ioctl_num);
ret = -EFAULT;
diff --git a/drivers/staging/acrn/acrn_drv_internal.h 
b/drivers/staging/acrn/acrn_drv_internal.h
index 6758dea..5098765 100644
--- a/drivers/staging/acrn/acrn_drv_internal.h
+++ b/drivers/staging/acrn/acrn_drv_internal.h
@@ -11,6 +11,57 @@
 #include 
 #include 
 
+struct vm_memory_region {
+#define MR_ADD 0
+#define MR_DEL 2
+   u32 type;
+
+   /* IN: mem attr */
+   u32 prot;
+
+   /* IN: beginning guest GPA to map */
+   u64 gpa;
+
+   /* IN: VM0's GPA which foreign gpa will be mapped to */
+   u64 vm0_gpa;
+
+   /* IN: size of the region */
+   u64 size;
+};
+
+struct set_regions {
+   /*IN: vmid for this hypercall */
+   u16 vmid;
+
+   /** Reserved */
+   u16 reserved[3];
+
+   /* IN: multi memmaps numbers */
+   u32 mr_num;
+
+   /** Reserved */
+   u32 reserved1;
+   /* IN:
+* the gpa of memmaps buff